How can I format the output of Python's difflib.HtmlDiff to make it readable?

Question:

I am trying to output the difference between two text files using the library difflib in Python 2, with the function HtmlDiff to generate an html file.

V1 = 'This has four words'
V2 = 'This has more than four words'

res = difflib.HtmlDiff().make_table(V1, V2)

text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()

However the output html looks like this on a browser:

enter image description here

The display is comparing each single character, making it completely unreadable.

What should I modify for the comparison to be more human-friendly? (e.g. full sentences on each side)

If the input specifies “lines”, then the output is also formatted respecting the lines, but it is not displaying the differences:

V1 = ['This has four words']
V2 = ['This has more than four words']

res = difflib.HtmlDiff().make_table(V1, V2)

text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()

Resulting html (as viewed on a browser):

enter image description here

Asked By: hirschme

||

Answers:

To get a markup you can use difflib.SequenceMatcher as in the function defined in this answer https://stackoverflow.com/a/788780/2318649

to get this code:

import difflib

def show_diff(seqm):
    # function from https://stackoverflow.com/questions/774316/python-difflib-highlighting-differences-inline
    """Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
    output= []
    for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
        if opcode == 'equal':
            output.append(seqm.a[a0:a1])
        elif opcode == 'insert':
            output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
        elif opcode == 'delete':
            output.append("<del>" + seqm.a[a0:a1] + "</del>")
        elif opcode == 'replace':
            raise NotImplementedError( "what to do with 'replace' opcode?" )
        else:
            raise RuntimeError( f"unexpected opcode unknown opcode {opcode}" )
    return ''.join(output)


V1 = 'This has four words but fewer than eleven'
V2 = 'This has more than four words'


sm= difflib.SequenceMatcher(None, V1, V2)

html = "<html><body>"+show_diff(sm)+"</body></html>"

open("output.html","wt").write(html)

which produces:

enter image description here

The problem is you don’t have the required styles. Try using make_file instead of make_table, then you’ll see there is some CSS that will make the colors show up as you’re expecting.

Answered By: Randomibis

this is an old question, but i have been struggling with it myself for a few days. I was getting this:

before fixing anything i finally pieced together something. looks like this:

html = difflib.HtmlDiff().make_file(a.split(' '), b.split(' '), fromdesc="original", todesc="modified")

after adding simple little split

Answered By: aliciafig
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.