hadoop - In python MRJob, how to set up the option for tempory output directory -


i using mrjob run simple word count standard hadoop job:

python word_count.py -r hadoop hdfs:///path-to-my-data 

this print error indicating can not create temporary directory temporary output:

stderr: mkdir: incomplete hdfs uri, no host: hdfs:///user/path-to-tmp-dir ... ... subprocess.calledprocesserror: command '['/opt/mapr/hadoop/hadoop-0.20.2/bin/hadoop', 'fs', '-mkdir', 'hdfs:///user/ 

assume can not create directory desired default mrjob. possible pass option mrjob through command line? option found far base_tmp_dir. in description mentioned "path put local temp dirs inside." "local" not looking temporary output directory supposed in hdfs. nevertheless, meant give try (:

python word_count.py --base-tmp-dir=./tmp/ data.txt  

or

python word_count.py -r hadoop --base-tmp-dir=hdfs:///some-path hdfs:///path-to-data 

but failed mrjob complain there no such option:

word_count.py: error: no such option: --base-tmp-dir 

the word_count.py standard 1 found here. may missing essential knowledge on mrjobj or may have go hadoop streaming.

mrjob calls hadoop binary when interacting hdfs. hadoop command needs know namenode located on network uris hdfs:///some-path don't require full host (something hdfs://your-namenode:9000/some-path. command figures out namenode reading configuration xml files.

there's lots of conflicting reports on internet which environment variable set, in environment running latest version of mrjob , apache hadoop 2.4.1, had set hadoop_prefix environment variables. can set command:

export hadoop_prefix=/path/to/your/hadoop

once set, you'll know set correctly if type:

ls $hadoop_prefix/etc/hadoop

and shows configuration xml files.

now run command. should work.


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

jsf - How to ajax update an item in the footer of a PrimeFaces dataTable? -

django - CSRF verification failed. Request aborted. CSRF cookie not set -