[RFC] Compact origin information for knit data files

John Arbash Meinel john at arbash-meinel.com
Fri Nov 24 17:10:08 GMT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dmitry Vasiliev wrote:
> 
> Problem/Proposal
> ----------------
> 
> Often adjacent lines of data in annotated knit data files contains the
> same origin information so it would be useful to compact the information
> in such case. I propose to skip origin information for all lines except
> the first line in a block of adjacent lines with the same origin. So
> instead of:
> 
>     origin1 linedata1
>     origin1 linedata2
>     origin2 linedata3
>     origin2 linedata4
> 
> the content will be:
> 
>     origin1 linedata1
>      linedata2
>     origin2 linedata3
>      linedata4
> 
> When knit file parser gets a line without any origin information the
> information will be taken from a previous line which contains such an
> information within the block of adjacent lines.
> 

I think you have missed that it is already supposed to be doing this,
and I added a fix for bzr-0.13 to actually do it correctly. Basically,
we use the index of a revision to refer to it in a later revision. So it
should be:

rev at id-234-234234 no-eol,fulltext 0 350  :
rev at id-456-456456 no-eol,line-delta 350 100 1 :

Rather than:
rev at id-456-456456 no-eol,line-delta 350 100 .revid-234-234234 :

Or are you thinking that something like 'no-eol' and 'line-delta' should
be omitted if it is the same as the previous line?

Also, I think you may be misunderstanding the lines. It isn't the
'origin' as the first entry, but the revision-id. And that should really
be different for every single line. (Though it typically has a very
similar prefix).

There are a few things that could be done to make it smaller. Like
changing 'no-eol' to be a single character. If we really wanted it
small, then we could switch to a binary representation, instead of ascii...

But I'm not 100% convinced that the tradeoff is really worth it at this
point. At this point it is pretty easy to look at the data, and be able
to write a shell script to parse it out of the file. Which has a "I can
get my data" feel which I really like.

...

> 
> It seems the new repository format version number should be introduced.
> How repository may be converted into new format (bzr upgrade?)?
> 
> 
> Thoughts, comments?
> 

This would be a new repository format. Which would mean writing a
converter, and then using 'bzr upgrade', etc. We have most of the
infrastructure for doing it if you really want to dig into it. But I
think you might be misunderstanding some small bits first.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFZydwJdeBCYSNAAMRAhxIAJ4olSNxB/9gltow4NyUD922WXFOAQCfbhqH
ZyvsETUZk4XZtCqDV5R0sNM=
=OYnz
-----END PGP SIGNATURE-----




More information about the bazaar mailing list