python - Not receiving correct pattern from regex on PyPDF2 for a PDF -


i want extract instances of particular word pdf e.g 'math'. far converting pdf text using pypdf2 , doing regex on find want. example pfd

when run code instead of returning regular expression pattern of 'math' returns string of whole page. please thanks

#first change current working directory desktop  import os os.chdir('/users/hussein/desktop')         #file located on desktop   #second pypdf2  pdffileobj=open('test1.pdf','rb')          #opening file pdfreader=pypdf2.pdffilereader(pdffileobj) pageobj=pdfreader.getpage(3)               #for test need page 3 textversion=pageobj.extracttext() print(textversion)    #third regular expression  import re match=re.findall(r'math',textversion) match in textversion:       print(match) 

instead of getting instances of 'math' receive this:

i n t r o d u c t o n 

etc etc

the textversion variable holds text. when use for loop, give text character @ time have seen. findall function return list of matches, if use instead for loop each word (which in test same).

import re  match in re.findall(r'math',textversion):       print(match) 

the returned result findall like:

["math", "math", "math"] 

so output be:

math math math 

Comments

Popular posts from this blog

html - Firefox flex bug applied to buttons? -

html - Missing border-right in select on Firefox -

c# - two queries in same method -