ZIP files in working tree

Thu Mar 25 17:06:00 GMT 2010

On Thu, Mar 25, 2010 at 05:47:17PM +0100, Martin von Gagern wrote:
> On 25.03.2010 17:25, Jelmer Vernooij wrote:
> > If you don't require the bzr plugin, you can't do delta's specific to
> > ZIP files because the optimal deltas your plugin creates wouldn't be
> > interpretable by bzr instances that don't have that plugin.
> Yes they would. Without the plugin, bzr would give you a directory.
> Users without the plugin would have to zip the resulting dir in order to
> make it accessable to the application dealing in ZIP files.
Having different results depending whether the plugin is present seems like a 
bad idea where there's no obvious advantage. I haven't seen any indication 
that storing word files in a bzr repo is causing /too/ big deltas to 
be stored in the repo at the moment.

Furthermore, statting will become a lot harder when you store a directory 
in the Bazaar file formats but have a zip file on disk. You'd need to unpack
the zip file every time to do proper statting etc. There are significant 
performance consequences for what you're trying to do. Why not just 
deal with that sort of situation when it turns out the zip file is different? 

> > Wouldn't you rather want to see a diff that aCTUALLY EXPLAINS The
> > differences in terms of ODT or Word? E.g. "Image XY in section 1.2 has
> > changed" or in qdiff actually see the different images? 
> > 
> > The ZIP file in this situation is just a format, so I don't see how it
> > should be presented as such to the user. 
> > 
> > Again, I think that custom merger should be specific to
> > OpenOffice/Word/etc. Users shouldn't have to care that those formats are
> > really zip files with XML files underneath.
> In a perfect world, every app would know not only how to edit its docs,
> but also how to diff and merge them, both two-ways and tree-ways. But we
> are so far from that perfect world, that I'd rather see a diff between
> XML files than between binary files, because that's all I'd get in most
> cases. And to be honest, I wouldn't want to be an application developer
> in such a perfect world, unless I were allowed to make it less than
> perfect. You're asking a lot.
I'm not sure if what you're asking for will actually be an improvement 
over the current situation though, except for those few users that are familiar
with the internals of the ODT format. When conflicts arise you're expecting 
people to go into the meta.xml file in a .odt file and edit the conflict 
markers?

Microsoft Word at least has a merge function itself, we should be able to 
call out to that (not sure about OpenOffice). To show the diff for a doc file
we could just take the text representation and use diff on that. That 
seems like a generally more useful first step towards better support for these 
formats than focussing on the way they happen to store things on disk.

> But even in a perfect world, there are cases where I'd consider a look
> into the ZIP file to be of advantage. Plain text diffs can be displayed
> on web pages without additional requirements (thus my trac-bzr example)
> or piped to simple scripts. Generating them is much faster than starting
> huge office applications and loading bulky documents with lots of
> embedded multimedia, so a "bzr diff -c 1234" will tell me what actually
> changed much faster than an application-assisted diff. I also can't see
> all those applications providing annotate functionality and stuff like
> that in addition to diff and merge.
> 
> I know there is a choice between what you consider more important, the
> document as an opaque entity accessible through an application, or the
> ZIP file as a container format with an internal structure open to
> inspection. Currently, you either have the first and a binary file in
> bzr, or the second and the manual zip/unzip that involves.
> 
> I'd like to make that gap a bit narrower, but stay on the structured
> side as far as repository storage is concerned. A simple ZIP file merger
> would narrow the gap as well, but not as much, and it would stay on the
> opaque side for storage.
I can see how it would be beneficial for "standard" (non-document) ZIP files 
to have a custom merger, a custom status displayer and a custom differ.  
But I don't understand why that requires changing the behaviour of 
the working tree implementation - it's just a matter of representation in 
my mind. 

Cheers,

Jelmer