[RFC] browsing history API

Tue May 23 17:40:26 BST 2006

On Tuesday 23 May 2006 10:01, Martin Pool wrote:
<snip> 
> I think you will want the entire graph of the file leading up to the
> version present in a particular tree.  Although there may be other
> irrelevant versions of that file that haven't been merged, all the ones
> that have been merged in are by definition relevant.  You can do this
> just by looking at the file's graph.
> 
> (Or if not, maybe you can explain more about what you want to get out.)
> 
> I've been wondering if we should save for each revision a list of the
> modifified files, with some details including their file name and
> directory, and the previous revision that modified that particular file
> - essentially a structured diff between the two inventories, similar to
> what is held by a TreeDelta.  Some operations need this information,
> including log -v and files_affected.  If we include the previous
> revision then we can skip back through these records without needing to read
> every revision.

Is it better to change the inventory format ? We need a way to:
- store the inventory for every revision
- compute the changes between revisions
- search the revisions where a change of a file_id happens

The current format is based on a xml structure list stored in a knit/weave 
file format.

Instead I think that it should be a like:

the inventory is a list of record for every revision
  every record is a sequence of line
    every line records the change to a specific entry (file/dir/link ... )
    the order of lines within a record doesn't matter

For example:

revid:	user at host-12345679-8900
+kind=file file_id="file_id" parent_id="parent" size=60 sha1=234 name="foo"
+kind=file file_id="file_id2" parent_id="parent" size=60 sha1=234 name="foo2"
revid:	user at host-12345679-8902
+kind=file file_id="file_id" parent_id="parent" size=60 sha1=456 name="foo"
-kind=file file_id="file_id2"

In the first revision (user at host-12345679-8900), two file are added ( foo and 
foo2 ); in the second (user at host-12345679-8902) the file foo2 is removed, and 
the file foo is changed.

For every revision we store only the difference from the previous, and every N 
revisions we store ALSO a full copy of the inventory.

So:

The file is easily to parser ( every line is an information ); we can found 
quickly the difference between revision without computing any difference; we 
can quickly search on file_id basis ( we have to search within all the 
inventory, but because only the changes are stored, this is a small amount of 
work ). Because we store a full copy of the inventory every N revision, we 
can extract the inventory for every revision without seeking all the file. The 
format is more compact.

The file can be appen only, and we can also add an index.

Moreover this format can be used also for thing like the tags

> -- 
> Martin Pool
> 
> 

Goffredo
-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack at inwind.it>
Key fingerprint = CE3C 7E01 6782 30A3 5B87  87C0 BB86 505C 6B2A CFF9