Small note for serialized inventories

Alexander Belchenko bialix at ukr.net
Fri Feb 9 07:44:44 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel пишет:
> Alexander Belchenko wrote:
> ....
> 
>>> If we do directory-block all-in-one (like dirstate is), then I would
>>> recommend including the full path to the file.
>> Why for? Inside directory block we could switch to inner diff loop with the same
>> behaviour, but using info about path prefix implicitly.
>>
>> Don't writing full path of each file is big save in size of inventory file.
>> And therefore it's a big win to read/write operations.
>>
> 
> I don't know if you noticed what I told you in the other email, but one
> way or another you have to know what your path prefix is. And a lot of
> implicit schemes fail because you have to jump in and out of records.
> 
> It would be possible, I guess to use a logical structure that uses a
> separate identifier for "end of directory block", and always has 1
> directory block entry for every directory even if that directory is
> empty. Something like
> 
> #start block ''
> a file record
> b file record
> c dir record
> d file record
> e dir record
> #end block ''
> # start block 'c'
> f record (e/f)
> g record (e/g)
> # end block 'c'
> # start block 'e'
> # end block 'e'
> 
> You wouldn't really need both start and end, and if you assumed a
> certain order of things, you wouldn't need to label the blocks
> You just need to have a "start block" record for every possible directory.
> 
> However, this file is really hard to partially parse. As information for
> a given record is scattered all over the file

The scheme above is near to my thoughts but I still think that each block
should be for items in directory without subdirectory. And for each
directory block of course it's better to write full path. So I end up
with file like

directory '' TREE_ROOT
file a
file b
symlink e
directory d d-XXXXXXX-YYYYYYYY
file aa
file bb
directory d/foo
directory d/foo/bar
...

etc.

So start of block is easily detected. End of block -- it's a start of next
block.
And with index file that give me offset and size of each directory block
I could read those inventory partially. And this index is easily regenerated
from the whole inventory file.

This scheme also has one big advantage: we can drop parent_id.
And resulting inventory file for Mozilla tree then highly reduced
(converting to nul-separated file and drop parent_id give by 47% smaller
file size).

> 
> Like if you want to read 1 file's full path. You generally have to read
> everything to get there, because there are no back pointers. And no
> redundant information.

Then I need to know in what directory this file resides. And then get
full directory path as prefix.

>>> Just something to keep in mind.
>> I try to figure out what the best way to deal with moving file to another directory
>> and/or rename? Keep file by directory block means that in one place it will be deleted
>> and in another it will be added, and only with unique fileid it's will be detected
>> as move/rename.
>>
> 
> You could mark the deleted record as "renamed to" and the added record
> as "renamed from"

In inventory itself? Hmmm. It make sense.

> 
> That has some very nice properties for working inventories. For
> long-term serialization it is a limitation that you have a snapshot,
> when sometimes you want a delta.
> 
> Partly I've advocated having a separate "changes" file, which mentions
> all the things that have happened (at least what file-ids/paths have
> been effected). So you can look up the delta quickly, and then jump into
> the storage.
> 
> The last time I proposed it, the redundancy was considered a negative,
> because you now have the possibility of disagreeing. The changes file
> could be considered a cache of extracting to Inventories and doing a
> delta against them. If you record this, you may end up disagreeing if
> you did the delta again.
> 
> However, since our current operations are fairly slow, and expect to be
> able to do that exact action on the contents of inventory.knit. And that
> delta is known to be incomplete for deletions (which is a general
> deficiency about all of our indices as .kndx does not record when a
> delete occurs).
> 
> John
> =:->
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFzCZszYr338mxwCURAjseAKCW5+sjSrlqI0xsfOsXM7WUAnSzNACfZ8Cf
18imqUU3LMhQuin4N70H1Zs=
=u+T1
-----END PGP SIGNATURE-----




More information about the bazaar mailing list