[RFC] Compact origin information for knit data files

Fri Nov 24 17:10:08 GMT 2006

I think you have missed that it is already supposed to be doing this,
and I added a fix for bzr-0.13 to actually do it correctly. Basically,
we use the index of a revision to refer to it in a later revision. So it
should be:

rev at id-234-234234 no-eol,fulltext 0 350  :
rev at id-456-456456 no-eol,line-delta 350 100 1 :

Rather than:
rev at id-456-456456 no-eol,line-delta 350 100 .revid-234-234234 :

Or are you thinking that something like 'no-eol' and 'line-delta' should
be omitted if it is the same as the previous line?

Also, I think you may be misunderstanding the lines. It isn't the
'origin' as the first entry, but the revision-id. And that should really
be different for every single line. (Though it typically has a very
similar prefix).

There are a few things that could be done to make it smaller. Like
changing 'no-eol' to be a single character. If we really wanted it
small, then we could switch to a binary representation, instead of ascii...

But I'm not 100% convinced that the tradeoff is really worth it at this
point. At this point it is pretty easy to look at the data, and be able
to write a shell script to parse it out of the file. Which has a "I can
get my data" feel which I really like.

...

> 
> It seems the new repository format version number should be introduced.
> How repository may be converted into new format (bzr upgrade?)?
> 
> 
> Thoughts, comments?
> 

This would be a new repository format. Which would mean writing a
converter, and then using 'bzr upgrade', etc. We have most of the
infrastructure for doing it if you really want to dig into it. But I
think you might be misunderstanding some small bits first.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFZydwJdeBCYSNAAMRAhxIAJ4olSNxB/9gltow4NyUD922WXFOAQCfbhqH
ZyvsETUZk4XZtCqDV5R0sNM=
=OYnz
-----END PGP SIGNATURE-----