bzr diff --filter= or equivalent?
Doug Lee
dgl at dlee.org
Thu Feb 3 18:34:31 UTC 2011
I see that bzr diff allows --using for an external differ, but I want
a filter applied to files before the internal differ is used. Sample
usage:
bzr diff --filter=docstream figures.xlsx
where docstream is a program that converts a Microsoft Office
Word/Excel file into a more diff-friendly format. I have written a
quick example of such a filter. Bzr would send each version of the
file through docstream and compare the output instead of the original
content.
Since I don't know if attachments are allowed here and it's a short
program, I'll just drop its 49-line self right here; pardon any
presumptuousness this demonstrates. :) This currently requires the
name of the source file on the command line, but of course it would be
easy enough to allow default-to-stdin or add support of filename "-".
No license restrictions.
Is this planned, currently possible somewhere I missed, ... or should
I file it as a bug/feature request?
==========
#! /usr/bin/env python
"""DocStream - Make a (mostly) text stream out of a Microsoft Office 2007+ file.
Usage: docstream <filename>, where <filename> is an Office 2007+ file.
Example: docstream document.docx, or docstream wb.xlsx
The result is sent to stdout.
The result is a stream like
Zip file: wb.xlsx
File: [Content_Types].xml
<... content of that file ...>
File: _rels/.rels
<... content of that file ...>
<... other files ...>
XML files in the stream have "<" globally prepended with a Newline,
so that the file breaks down into logical segments by line.
The point of all this is to produce more easily/meaningfully diffable data:
Streams from two office documents can be compared with a standard diff utility.
Caveats:
Binary files are not textified before being inserted into the stream.
Example from a .xlsx file: xl/printerSettings/printerSettings1.bin.
They may also have a Newline appended (so the next "File:" line is flush left).
Author: Doug Lee of SSB BART Group
"""
import sys, zipfile
if len(sys.argv) != 2:
exit(__doc__)
zname = sys.argv[1]
z = zipfile.ZipFile(zname)
print "Zip file: %s" % (zname)
for fname in z.namelist():
print ("File: %s" % (fname)),
f = z.open(fname)
txt = f.read()
if txt[0] == "<":
# Presumably XML.
txt = txt.replace("<", "\n<")
else:
# Probably a binary file.
# End the "File:" line first.
print
# TODO: Binary content just dropped in without being textified first.
if txt[-1] != "\n":
txt += "\n"
print txt,
==========
--
Doug Lee dgl at dlee.org http://www.dlee.org
SSB BART Group doug.lee at ssbbartgroup.com http://www.ssbbartgroup.com
"The U. S. Constitution doesn't guarantee happiness, only the pursuit
of it. You have to catch up with it yourself." --Benjamin Franklin
More information about the bazaar
mailing list