hadoop - Save JSON to HDFS using python -


i have python script accesses api returns json. takes json string , saves off file on local file system, move hdfs manually. change python script saving directly hdfs instead of hitting local file system first. trying save file using , hdfs dfs command don't think copy command correct way because isn't file rather json string when trying save it.

current code

import urllib2 import json import os  f = urllib2.urlopen('restful_api_url.json') json_string = json.loads(f.read().decode('utf-8')) open('\home\user\filename.json', 'w') outfile:     json.dump(json_string,outfile) 

new code

f = urllib2.urlopen('restful_api_url.json') json_string = json.loads(f.read().decode('utf-8')) os.environ['json_string'] = json.dump(json_string) os.system('hdfs dfs -cp -f $json_string hdfs/user/test') 

i think problem same thread stream data hdfs directly without copying.

firstly, command can redirect stdin hdfs file,

hadoop fs -put - /path/to/file/in/hdfs.txt 

then, can in python,

os.system('echo "%s" | hadoop fs -put - /path/to/file/in/hdfs.txt' %(json.dump(json_string))) 

Comments

Popular posts from this blog

html - Firefox flex bug applied to buttons? -

html - Missing border-right in select on Firefox -

python - build a suggestions list using fuzzywuzzy -