python - Highlight differences between two xml files in a Tkinter textbox -


i tried kinds of logic , methods , googled lot, yet not able think of satisfactory answer question have. have wrote program shown below highlight specific xml code facing problem. sorry making post bit long. wanted explain problem.

edit: running below given program need 2 xml files here: sample1 , sample2. save files , in below code edit location want save files in c:/users/editthislocation/desktop/sample1.xml

from lxml import etree collections import defaultdict collections import ordereddict distutils.filelist import findall lxml._elementpath import findtext  tkinter import *  import tkinter tk import ttk  root = tk()  class customtext(tk.text):      def __init__(self, *args, **kwargs):         tk.text.__init__(self, *args, **kwargs)       def highlight_pattern(self, pattern, tag, start, end,                           regexp=true):          start = self.index(start)         end = self.index(end)         self.mark_set("matchstart", start)         self.mark_set("matchend", start)         self.mark_set("searchlimit", end)          count = tk.intvar()         while true:             index = self.search(pattern, "matchend","searchlimit",                                 count=count, regexp=regexp)             if index == "": break             self.mark_set("matchstart", index)             self.mark_set("matchend", "%s+%sc" % (index, count.get()))             self.tag_add(tag, "matchstart", "matchend")      def remove_pattern(self, pattern, tag, start="1.0", end="end",                           regexp=true):          start = self.index(start)         end = self.index(end)         self.mark_set("matchstart", start)         self.mark_set("matchend", start)         self.mark_set("searchlimit", end)          count = tk.intvar()         while true:             index = self.search(pattern, "matchend","searchlimit",                                 count=count, regexp=regexp)             if index == "": break             self.mark_set("matchstart", index)             self.mark_set("matchend", "%s+%sc" % (index, count.get()))             self.tag_remove(tag, start, end)    recovering_parser = etree.xmlparser(recover=true)   sample1file = open('c:/users/editthislocation/desktop/sample1.xml', 'r') contents_sample1 = sample1file.read()  sample2file = open('c:/users/editthislocation/desktop/sample2.xml', 'r') contents_sample2 = sample2file.read()   frame1 = frame(width=768, height=25, bg="#000000", colormap="new") frame1.pack() label(frame1, text="sample 1 below - scroll see more").pack()  textbox = customtext(root) textbox.insert(end,contents_sample1) textbox.pack(expand=1, fill=both)  frame2 = frame(width=768, height=25, bg="#000000", colormap="new") frame2.pack() label(frame2, text="sample 2 below - scroll see more").pack()   textbox1 = customtext(root) textbox1.insert(end,contents_sample2) textbox1.pack(expand=1, fill=both)  sample1 = etree.parse("c:/users/editthislocation/desktop/sample1.xml", parser=recovering_parser).getroot() sample2 = etree.parse("c:/users/editthislocation/desktop/sample2.xml", parser=recovering_parser).getroot()  tostringsample1 = etree.tostring(sample1) sample1string = etree.fromstring(tostringsample1, parser=recovering_parser)  tostringsample2 = etree.tostring(sample2) sample2string = etree.fromstring(tostringsample2, parser=recovering_parser)  timesample1 = sample1string.findall('{http://www.example.org/ehorizon}time') timesample2 =  sample2string.findall('{http://www.example.org/ehorizon}time')  i,j in zip(timesample1,timesample2):             k,l in zip(i.findall("{http://www.example.org/ehorizon}feature"), j.findall("{http://www.example.org/ehorizon}feature")):          if [k.attrib.get('color'), k.attrib.get('type')] != [l.attrib.get('color'), l.attrib.get('type')]:              faultyline = [k.attrib.get('color'), k.attrib.get('type'), k.text]   def high(event):      textbox.tag_configure("yellow", background="yellow")     limit_1 = '<p1:time ntimestamp="{0}">'.format(5)     #limit search between timestamp 5 , timestamp 6      limit_2 = '<p1:time ntimestamp="{0}">'.format((5+1)) # timestamp 6      highlightstring = '<p1:feature color="{0}" type="{1}">{2}</p1:feature>'.format(faultyline[0],faultyline[1],faultyline[2]) #string highlighted      textbox.highlight_pattern(limit_1, "yellow", start=textbox.search(limit_1, '1.0', stopindex=end), end=textbox.search(limit_2, '1.0', stopindex=end))     textbox.highlight_pattern(highlightstring, "yellow", start=textbox.search(limit_1, '1.0', stopindex=end), end=textbox.search(limit_2, '1.0', stopindex=end))   button = 'press here highlight error line'  c = ttk.label(root, text=button) c.bind("<button-1>",high) c.pack()    root.mainloop() 

what want

if run above code, present output given below:

my output

as can see in image, intend highlight code marked green tick. of might think of limiting starting , ending index highlight pattern. however, if see in program making use of starting , ending indexes limiting output ntimestamp="5" , using limit_1 , limit_2 variables.

so in type of data how correctly highlight 1 pattern out of many inside individual ntimestamp?

edit: here want highlight 3rd item in ntimestamp="5" because item not present in sample2.xml can see in 2 xml files , when program runs differentiates this. problem highlight correct item 3rd in case.

i using highlighting class bryan oakley's code here

edit recent

in context kobejohn asked below in comments, target file won't ever empty. there chances target file may have or missing elements. finally, current intention highlight deep elements different or missing , timestamps in located. however, highlighting of timestamps done correctly issue highlight deep elements explained above still issue. thank kobejohn clarifying this.

note:

one method know , might suggest works correctly extract index of green color ticked pattern , run highlight tag on it, approach hard-coded , in large data have deal lots of variations ineffective. searching better option.

this solution works performing simplified diff between base.xml , test.xml based on description provided. diff result 3rd xml tree combines original trees. output diff color-coded highlighting lines don't match between files.

i hope can use or adapt need.

enter image description here

copy-paste script

import copy lxml import etree import tkinter tk   # assumption: root element of both trees same # note: missing subtrees have parent element highlighted   def element_content_equal(e1, e2):     # starting point here: http://stackoverflow.com/a/24349916/377366     try:         if e1.tag != e1.tag:             return false         elif e1.text != e2.text:             return false         elif e1.tail != e2.tail:             return false         elif e1.attrib != e2.attrib:             return false     except attributeerror:         # e.g. none passed in element         return false     return true   def element_is_in_sequence(element, sequence):     e in sequence:         if element_content_equal(e, element):             return true     return false   def copy_element_without_children(element):     e_copy = etree.element(element.tag, attrib=element.attrib, nsmap=element.nsmap)     e_copy.text = element.text     e_copy.tail = element.tail     return e_copy   # start @ root of both xml trees parser = etree.xmlparser(recover=true, remove_blank_text=true) base_root = etree.parse('base.xml', parser=parser).getroot() test_root = etree.parse('test.xml', parser=parser).getroot() # each element original xml trees placed merge tree merge_root = copy_element_without_children(base_root)   # additionally each merge tree element tagged source diff_attrib = 'diff' from_base_only = 'base' from_test_only = 'test'  # process pair of trees, 1 set of parents @ time parent_stack = [(base_root, test_root, merge_root)] while parent_stack:     base_parent, test_parent, merge_parent = parent_stack.pop()     base_children = base_parent.getchildren()     test_children = test_parent.getchildren()      # compare children , transfer merge tree     base_children_iter = iter(base_children)     test_children_iter = iter(test_children)     base_child = next(base_children_iter, none)     test_child = next(test_children_iter, none)     while (base_child not none) or (test_child not none):         # first handle case of unique base child         if (base_child not none) , (not element_is_in_sequence(base_child, test_children)):             # base_child unique: deep copy base tag             merge_child = copy.deepcopy(base_child)             merge_child.attrib[diff_attrib] = from_base_only             merge_parent.append(merge_child)             # unique child has been copied merge tree doesn't go on stack             # move base child since test child hasn't been handled yet             base_child = next(base_children_iter, none)         elif (test_child not none) , (not element_is_in_sequence(test_child, base_children)):             # test_child unique: deep copy base tag             merge_child = copy.deepcopy(test_child)             merge_child.attrib[diff_attrib] = from_test_only             merge_parent.append(merge_child)             # unique child has been copied merge tree doesn't go on stack             # move test child since base child hasn't been handled yet             test_child = next(test_children_iter, none)         elif element_content_equal(base_child, test_child):             # both trees share same element: shallow copy either child shared tag             merge_child = copy_element_without_children(base_child)             merge_parent.append(merge_child)             # put pair of children on stack parents tested since children may differ             parent_stack.append((base_child, test_child, merge_child))             # move on next children in both trees since shared element             base_child = next(base_children_iter, none)             test_child = next(test_children_iter, none)         else:             raise runtimeerror  # there wrong - element should unique or shared.  # display merge_tree highlighting indicate source of each line #   no highlight: common element in both trees #   green: line exists in test tree (i.e. additional) #   red: line exists in base tree (i.e. missing) root = tk.tk() textbox = tk.text(root) textbox.pack(expand=1, fill=tk.both) textbox.tag_config(from_base_only, background='#ff5555') textbox.tag_config(from_test_only, background='#55ff55')  # find diff lines highlight within merge_tree string includes kludge attributes merge_tree_string = etree.tostring(merge_root, pretty_print=true) diffs_by_line = [] line, line_text in enumerate(merge_tree_string.split('\n')):     diff_type in (from_base_only, from_test_only):         if diff_type in line_text:             diffs_by_line.append((line+1, diff_type))  # remove kludge attributes element in merge_root.iter():     try:         del(element.attrib[diff_attrib])     except keyerror:         pass merge_tree_string = etree.tostring(merge_root, pretty_print=true)  # highlight final lines textbox.insert(tk.end, merge_tree_string) line, diff_type in diffs_by_line:     textbox.tag_add(diff_type, '{}.0'.format(line), '{}.0'.format(int(line)+1)) root.mainloop() 

inputs:

please note cleaned xml because getting inconsistent behavior original xml. original using slashes instead of forward slashes , had false closing slashes on opening tags.


base.xml (in same location script)

<?xml version="1.0" encoding="utf-8" standalone="no" ?> <p1:sample1 xmlns:p1="http://www.example.org/ehorizon">    <p1:time ntimestamp="5">       <p1:location hours = "1" path = '1'>          <p1:feature color="6" type="a">560</p1:feature>          <p1:feature color="2" type="a">564</p1:feature>          <p1:feature color="3" type="b">570</p1:feature>          <p1:feature color="4" type="c">570</p1:feature>       </p1:location>       <p1:location hours = "5" path = '1'>          <p1:feature color="6" type="a">560</p1:feature>          <p1:feature color="7" type="b">570</p1:feature>          <p1:feature color="8" type="c">580</p1:feature>       </p1:location>       <p1:location hours = "5" path = '1'>          <p1:feature color="6" type="a">560</p1:feature>       </p1:location>    </p1:time>    <p1:time ntimestamp="6">       <p1:location hours = "1" path = '1'>          <p1:feature color="2" type="a">564</p1:feature>          <p1:feature color="3" type="b">570</p1:feature>          <p1:feature color="4" type="c">570</p1:feature>       </p1:location>       <p1:location hours = "5" path = '1'>          <p1:feature color="6" type="a">560</p1:feature>          <p1:feature color="9" type="b">590</p1:feature>          <p1:feature color="10" type="c">600</p1:feature>       </p1:location>       <p1:location hours = "5" path = '1'>          <p1:feature color="6" type="a">560</p1:feature>          <p1:feature color="7" type="b">570</p1:feature>          <p1:feature color="8" type="c">580</p1:feature>       </p1:location>    </p1:time> </p1:sample1> 

test.xml (in same location script)

<?xml version="1.0" encoding="utf-8" standalone="no" ?> <p1:sample1 xmlns:p1="http://www.example.org/ehorizon">    <p1:time ntimestamp="5">       <p1:location hours = "1" path = '1'>          <p1:feature color="6" type="a">560</p1:feature>          <p1:feature color="2" type="a">564</p1:feature>          <p1:feature color="3" type="b">570</p1:feature>          <p1:feature color="4" type="c">570</p1:feature>       </p1:location>       <p1:location hours = "5" path = '1'>          <p1:feature color="6" type="a">560</p1:feature>          <p1:feature color="7" type="b">570</p1:feature>          <p1:feature color="8" type="c">580</p1:feature>       </p1:location>       <p1:location hours = "5" path = '1'>          <p1:feature color="9" type="b">1111</p1:feature>          <p1:feature color="10" type="c">2222</p1:feature>       </p1:location>    </p1:time>    <p1:time ntimestamp="6">       <p1:location hours = "1" path = '1'>          <p1:feature color="2" type="a">564</p1:feature>          <p1:feature color="3" type="b">570</p1:feature>          <p1:feature color="4" type="c">570</p1:feature>       </p1:location>       <p1:location hours = "5" path = '1'>          <p1:feature color="6" type="a">560</p1:feature>          <p1:feature color="9" type="b">590</p1:feature>          <p1:feature color="10" type="c">600</p1:feature>       </p1:location>       <p1:location hours = "5" path = '1'>          <p1:feature color="6" type="a">560</p1:feature>          <p1:feature color="7" type="b">570</p1:feature>          <p1:feature color="8" type="c">580</p1:feature>       </p1:location>    </p1:time> </p1:sample1> 

Comments

Popular posts from this blog

html - Firefox flex bug applied to buttons? -

html - Missing border-right in select on Firefox -

python - build a suggestions list using fuzzywuzzy -