parsing - Split text in text file on the basis of comma and space (python) -
i need parse text of text file 2 categories:
- university
- location(example: lahore, peshawar, jamshoro, faisalabad)
but text file contain following text:
"imperial college of business studies, lahore" "government college university faisalabad" "imperial college of business studies lahore" "university of peshawar, peshawar" "university of sindh, jamshoro" "london school of economics" "lahore school of economics, lahore"
i have written code separate locations on basis of 'comma'. below code work first line of file , prints 'lahore' after give following error 'list index out of range'.
file = open(path,'r') content = file.read().split('\n') line in content: rep = line.replace('"','') loc = rep.split(',')[1] print "uni: "+replace print "loc: "+str(loc)
please i'm stuck on this. thanks
it appear can line has location if there comma. make sense parse file in 2 passes. first pass can build set
holding known locations. can start off known examples or problem cases.
pass 2 use comma match known locations if there no comma, line split set of words. intersection of these location set should give location. if there no intersection flagged "unknown".
locations = set(["london", "faisalabad"]) open(path, 'r') f_input: unknown = 0 # pass 1, build set of locations line in f_input: line = line.strip(' ,"\n') if ',' in line: loc = line.rsplit("," ,1)[1].strip() locations.add(loc) # pass 2, try , find location in line f_input.seek(0) line in f_input: line = line.strip(' "\n') if ',' in line: uni, loc = line.rsplit("," ,1) loc = loc.strip() else: uni = line loc_matches = set(re.findall(r"\b(\w+)\b", line)).intersection(locations) if loc_matches: loc = list(loc_matches)[0] else: loc = "<unknown location>" unknown += 1 uni = uni.strip() print "uni:", uni print "loc:", loc print "unknown locations:", unknown
output be:
uni: imperial college of business studies loc: lahore uni: government college university faisalabad loc: faisalabad uni: imperial college of business studies lahore loc: lahore uni: university of peshawar loc: peshawar uni: university of sindh loc: jamshoro uni: london school of economics loc: london uni: lahore school of economics loc: lahore unknown locations: 0
Comments
Post a Comment