[RFC] Compact origin information for knit data files
Dmitry Vasiliev
lists at hlabs.spb.ru
Wed Nov 22 12:43:48 GMT 2006
Problem/Proposal
----------------
Often adjacent lines of data in annotated knit data files contains the same
origin information so it would be useful to compact the information in such
case. I propose to skip origin information for all lines except the first line
in a block of adjacent lines with the same origin. So instead of:
origin1 linedata1
origin1 linedata2
origin2 linedata3
origin2 linedata4
the content will be:
origin1 linedata1
linedata2
origin2 linedata3
linedata4
When knit file parser gets a line without any origin information the
information will be taken from a previous line which contains such an
information within the block of adjacent lines.
Advantages
----------
I expect not only smaller revision store size but also a some speedup (smaller
data files will be processed faster, no need to utf-8 encoding/decoding for
every data line).
Open questions
--------------
Maybe instead of just skip origin information it would be better to place a one
char marker at the start of the line? It would be useful in case of different
markers for different line flavors. For example: '=' marker could be used if
origin information is the same as the version id of block of changes and '+'
marker in case of the same information for adjacent lines.
It seems the new repository format version number should be introduced. How
repository may be converted into new format (bzr upgrade?)?
Thoughts, comments?
--
Dmitry Vasiliev (dima at hlabs.spb.ru)
http://hlabs.spb.ru
More information about the bazaar
mailing list