hadoop - Save JSON to HDFS using python -
i have python script accesses api returns json. takes json string , saves off file on local file system, move hdfs manually. change python script saving directly hdfs instead of hitting local file system first. trying save file using , hdfs dfs command don't think copy command correct way because isn't file rather json string when trying save it.
current code
import urllib2 import json import os f = urllib2.urlopen('restful_api_url.json') json_string = json.loads(f.read().decode('utf-8')) open('\home\user\filename.json', 'w') outfile: json.dump(json_string,outfile)
new code
f = urllib2.urlopen('restful_api_url.json') json_string = json.loads(f.read().decode('utf-8')) os.environ['json_string'] = json.dump(json_string) os.system('hdfs dfs -cp -f $json_string hdfs/user/test')
i think problem same thread stream data hdfs directly without copying.
firstly, command can redirect stdin hdfs file,
hadoop fs -put - /path/to/file/in/hdfs.txt
then, can in python,
os.system('echo "%s" | hadoop fs -put - /path/to/file/in/hdfs.txt' %(json.dump(json_string)))
Comments
Post a Comment