python - Highlight differences between two xml files in a Tkinter textbox -
i tried kinds of logic , methods , googled lot, yet not able think of satisfactory answer question have. have wrote program shown below highlight specific xml code facing problem. sorry making post bit long. wanted explain problem.
edit: running below given program need 2 xml files here: sample1 , sample2. save files , in below code edit location want save files in c:/users/editthislocation/desktop/sample1.xml
from lxml import etree collections import defaultdict collections import ordereddict distutils.filelist import findall lxml._elementpath import findtext tkinter import * import tkinter tk import ttk root = tk() class customtext(tk.text): def __init__(self, *args, **kwargs): tk.text.__init__(self, *args, **kwargs) def highlight_pattern(self, pattern, tag, start, end, regexp=true): start = self.index(start) end = self.index(end) self.mark_set("matchstart", start) self.mark_set("matchend", start) self.mark_set("searchlimit", end) count = tk.intvar() while true: index = self.search(pattern, "matchend","searchlimit", count=count, regexp=regexp) if index == "": break self.mark_set("matchstart", index) self.mark_set("matchend", "%s+%sc" % (index, count.get())) self.tag_add(tag, "matchstart", "matchend") def remove_pattern(self, pattern, tag, start="1.0", end="end", regexp=true): start = self.index(start) end = self.index(end) self.mark_set("matchstart", start) self.mark_set("matchend", start) self.mark_set("searchlimit", end) count = tk.intvar() while true: index = self.search(pattern, "matchend","searchlimit", count=count, regexp=regexp) if index == "": break self.mark_set("matchstart", index) self.mark_set("matchend", "%s+%sc" % (index, count.get())) self.tag_remove(tag, start, end) recovering_parser = etree.xmlparser(recover=true) sample1file = open('c:/users/editthislocation/desktop/sample1.xml', 'r') contents_sample1 = sample1file.read() sample2file = open('c:/users/editthislocation/desktop/sample2.xml', 'r') contents_sample2 = sample2file.read() frame1 = frame(width=768, height=25, bg="#000000", colormap="new") frame1.pack() label(frame1, text="sample 1 below - scroll see more").pack() textbox = customtext(root) textbox.insert(end,contents_sample1) textbox.pack(expand=1, fill=both) frame2 = frame(width=768, height=25, bg="#000000", colormap="new") frame2.pack() label(frame2, text="sample 2 below - scroll see more").pack() textbox1 = customtext(root) textbox1.insert(end,contents_sample2) textbox1.pack(expand=1, fill=both) sample1 = etree.parse("c:/users/editthislocation/desktop/sample1.xml", parser=recovering_parser).getroot() sample2 = etree.parse("c:/users/editthislocation/desktop/sample2.xml", parser=recovering_parser).getroot() tostringsample1 = etree.tostring(sample1) sample1string = etree.fromstring(tostringsample1, parser=recovering_parser) tostringsample2 = etree.tostring(sample2) sample2string = etree.fromstring(tostringsample2, parser=recovering_parser) timesample1 = sample1string.findall('{http://www.example.org/ehorizon}time') timesample2 = sample2string.findall('{http://www.example.org/ehorizon}time') i,j in zip(timesample1,timesample2): k,l in zip(i.findall("{http://www.example.org/ehorizon}feature"), j.findall("{http://www.example.org/ehorizon}feature")): if [k.attrib.get('color'), k.attrib.get('type')] != [l.attrib.get('color'), l.attrib.get('type')]: faultyline = [k.attrib.get('color'), k.attrib.get('type'), k.text] def high(event): textbox.tag_configure("yellow", background="yellow") limit_1 = '<p1:time ntimestamp="{0}">'.format(5) #limit search between timestamp 5 , timestamp 6 limit_2 = '<p1:time ntimestamp="{0}">'.format((5+1)) # timestamp 6 highlightstring = '<p1:feature color="{0}" type="{1}">{2}</p1:feature>'.format(faultyline[0],faultyline[1],faultyline[2]) #string highlighted textbox.highlight_pattern(limit_1, "yellow", start=textbox.search(limit_1, '1.0', stopindex=end), end=textbox.search(limit_2, '1.0', stopindex=end)) textbox.highlight_pattern(highlightstring, "yellow", start=textbox.search(limit_1, '1.0', stopindex=end), end=textbox.search(limit_2, '1.0', stopindex=end)) button = 'press here highlight error line' c = ttk.label(root, text=button) c.bind("<button-1>",high) c.pack() root.mainloop()
what want
if run above code, present output given below:
as can see in image, intend highlight code marked green tick. of might think of limiting starting , ending index highlight pattern. however, if see in program making use of starting , ending indexes limiting output ntimestamp="5"
, using limit_1
, limit_2
variables.
so in type of data how correctly highlight 1 pattern out of many inside individual ntimestamp
?
edit: here want highlight 3rd item in ntimestamp="5"
because item not present in sample2.xml
can see in 2 xml files , when program runs differentiates this. problem highlight correct item 3rd in case.
i using highlighting class bryan oakley's code here
edit recent
in context kobejohn asked below in comments, target file won't ever empty. there chances target file may have or missing elements. finally, current intention highlight deep elements different or missing , timestamps
in located. however, highlighting of timestamps
done correctly issue highlight deep elements explained above still issue. thank kobejohn clarifying this.
note:
one method know , might suggest works correctly extract index of green color ticked pattern , run highlight tag on it, approach hard-coded , in large data have deal lots of variations ineffective. searching better option.
this solution works performing simplified diff between base.xml
, test.xml
based on description provided. diff result 3rd xml tree combines original trees. output diff color-coded highlighting lines don't match between files.
i hope can use or adapt need.
copy-paste script
import copy lxml import etree import tkinter tk # assumption: root element of both trees same # note: missing subtrees have parent element highlighted def element_content_equal(e1, e2): # starting point here: http://stackoverflow.com/a/24349916/377366 try: if e1.tag != e1.tag: return false elif e1.text != e2.text: return false elif e1.tail != e2.tail: return false elif e1.attrib != e2.attrib: return false except attributeerror: # e.g. none passed in element return false return true def element_is_in_sequence(element, sequence): e in sequence: if element_content_equal(e, element): return true return false def copy_element_without_children(element): e_copy = etree.element(element.tag, attrib=element.attrib, nsmap=element.nsmap) e_copy.text = element.text e_copy.tail = element.tail return e_copy # start @ root of both xml trees parser = etree.xmlparser(recover=true, remove_blank_text=true) base_root = etree.parse('base.xml', parser=parser).getroot() test_root = etree.parse('test.xml', parser=parser).getroot() # each element original xml trees placed merge tree merge_root = copy_element_without_children(base_root) # additionally each merge tree element tagged source diff_attrib = 'diff' from_base_only = 'base' from_test_only = 'test' # process pair of trees, 1 set of parents @ time parent_stack = [(base_root, test_root, merge_root)] while parent_stack: base_parent, test_parent, merge_parent = parent_stack.pop() base_children = base_parent.getchildren() test_children = test_parent.getchildren() # compare children , transfer merge tree base_children_iter = iter(base_children) test_children_iter = iter(test_children) base_child = next(base_children_iter, none) test_child = next(test_children_iter, none) while (base_child not none) or (test_child not none): # first handle case of unique base child if (base_child not none) , (not element_is_in_sequence(base_child, test_children)): # base_child unique: deep copy base tag merge_child = copy.deepcopy(base_child) merge_child.attrib[diff_attrib] = from_base_only merge_parent.append(merge_child) # unique child has been copied merge tree doesn't go on stack # move base child since test child hasn't been handled yet base_child = next(base_children_iter, none) elif (test_child not none) , (not element_is_in_sequence(test_child, base_children)): # test_child unique: deep copy base tag merge_child = copy.deepcopy(test_child) merge_child.attrib[diff_attrib] = from_test_only merge_parent.append(merge_child) # unique child has been copied merge tree doesn't go on stack # move test child since base child hasn't been handled yet test_child = next(test_children_iter, none) elif element_content_equal(base_child, test_child): # both trees share same element: shallow copy either child shared tag merge_child = copy_element_without_children(base_child) merge_parent.append(merge_child) # put pair of children on stack parents tested since children may differ parent_stack.append((base_child, test_child, merge_child)) # move on next children in both trees since shared element base_child = next(base_children_iter, none) test_child = next(test_children_iter, none) else: raise runtimeerror # there wrong - element should unique or shared. # display merge_tree highlighting indicate source of each line # no highlight: common element in both trees # green: line exists in test tree (i.e. additional) # red: line exists in base tree (i.e. missing) root = tk.tk() textbox = tk.text(root) textbox.pack(expand=1, fill=tk.both) textbox.tag_config(from_base_only, background='#ff5555') textbox.tag_config(from_test_only, background='#55ff55') # find diff lines highlight within merge_tree string includes kludge attributes merge_tree_string = etree.tostring(merge_root, pretty_print=true) diffs_by_line = [] line, line_text in enumerate(merge_tree_string.split('\n')): diff_type in (from_base_only, from_test_only): if diff_type in line_text: diffs_by_line.append((line+1, diff_type)) # remove kludge attributes element in merge_root.iter(): try: del(element.attrib[diff_attrib]) except keyerror: pass merge_tree_string = etree.tostring(merge_root, pretty_print=true) # highlight final lines textbox.insert(tk.end, merge_tree_string) line, diff_type in diffs_by_line: textbox.tag_add(diff_type, '{}.0'.format(line), '{}.0'.format(int(line)+1)) root.mainloop()
inputs:
please note cleaned xml because getting inconsistent behavior original xml. original using slashes instead of forward slashes , had false closing slashes on opening tags.
base.xml
(in same location script)
<?xml version="1.0" encoding="utf-8" standalone="no" ?> <p1:sample1 xmlns:p1="http://www.example.org/ehorizon"> <p1:time ntimestamp="5"> <p1:location hours = "1" path = '1'> <p1:feature color="6" type="a">560</p1:feature> <p1:feature color="2" type="a">564</p1:feature> <p1:feature color="3" type="b">570</p1:feature> <p1:feature color="4" type="c">570</p1:feature> </p1:location> <p1:location hours = "5" path = '1'> <p1:feature color="6" type="a">560</p1:feature> <p1:feature color="7" type="b">570</p1:feature> <p1:feature color="8" type="c">580</p1:feature> </p1:location> <p1:location hours = "5" path = '1'> <p1:feature color="6" type="a">560</p1:feature> </p1:location> </p1:time> <p1:time ntimestamp="6"> <p1:location hours = "1" path = '1'> <p1:feature color="2" type="a">564</p1:feature> <p1:feature color="3" type="b">570</p1:feature> <p1:feature color="4" type="c">570</p1:feature> </p1:location> <p1:location hours = "5" path = '1'> <p1:feature color="6" type="a">560</p1:feature> <p1:feature color="9" type="b">590</p1:feature> <p1:feature color="10" type="c">600</p1:feature> </p1:location> <p1:location hours = "5" path = '1'> <p1:feature color="6" type="a">560</p1:feature> <p1:feature color="7" type="b">570</p1:feature> <p1:feature color="8" type="c">580</p1:feature> </p1:location> </p1:time> </p1:sample1>
test.xml
(in same location script)
<?xml version="1.0" encoding="utf-8" standalone="no" ?> <p1:sample1 xmlns:p1="http://www.example.org/ehorizon"> <p1:time ntimestamp="5"> <p1:location hours = "1" path = '1'> <p1:feature color="6" type="a">560</p1:feature> <p1:feature color="2" type="a">564</p1:feature> <p1:feature color="3" type="b">570</p1:feature> <p1:feature color="4" type="c">570</p1:feature> </p1:location> <p1:location hours = "5" path = '1'> <p1:feature color="6" type="a">560</p1:feature> <p1:feature color="7" type="b">570</p1:feature> <p1:feature color="8" type="c">580</p1:feature> </p1:location> <p1:location hours = "5" path = '1'> <p1:feature color="9" type="b">1111</p1:feature> <p1:feature color="10" type="c">2222</p1:feature> </p1:location> </p1:time> <p1:time ntimestamp="6"> <p1:location hours = "1" path = '1'> <p1:feature color="2" type="a">564</p1:feature> <p1:feature color="3" type="b">570</p1:feature> <p1:feature color="4" type="c">570</p1:feature> </p1:location> <p1:location hours = "5" path = '1'> <p1:feature color="6" type="a">560</p1:feature> <p1:feature color="9" type="b">590</p1:feature> <p1:feature color="10" type="c">600</p1:feature> </p1:location> <p1:location hours = "5" path = '1'> <p1:feature color="6" type="a">560</p1:feature> <p1:feature color="7" type="b">570</p1:feature> <p1:feature color="8" type="c">580</p1:feature> </p1:location> </p1:time> </p1:sample1>
Comments
Post a Comment