Bazaar, OLPC, and the "no-packs" repository format.

Thu Dec 20 13:19:24 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> John Arbash Meinel wrote:
> 
>> I think this is a good analysis, and certainly something we could do. I
>> actually think we could do it with existing packs, and just turn off the
>> autopack functionality (or just set it to run much less often).
> 
>> We would still need a form of index that could handle this sort of thing,
>> though. At the moment packs start to perform poorly when you have too many of
>> them. Because it is bisecting each index looking for the key you requested.
> 
> I guess I should have expanded on this.  The reason I was suggesting not
> doing it with packs is because packs require indices.  And indices get
> slow when you have too many.
> 
> I was planning on using the filesystem as the index.  I'm assuming it
> already has  O(log n) access to named content, and we can easily
> determine the names we're looking for.  The graph data could be
> 1. stored with the content
> 2. stored as a separate set of little files.
> 3. stored as extended file attributes ?
> 

Well, you end up with some crazy filenames. And I'm not sure about their FS
allowed characters.

Certainly with bzr-svn you can also run into super-long paths. I think Jelmer
changed the translation layer to deal with file-ids that wanted to be >128
(256?) characters, which was causing failures with .knit files (especially
after path expanding the Capital and "non-safe" characters).

Normal Bazaar revision ids are ~60 characters, same with file ids. (We actually
do some filtering to make file-ids <=54 chars.)

Anyway, "file_id revision_id" is a possible filename, but it might overflow
filesystem name buffers.
It does seem worth investigating, though I have the feeling it won't perform
quite as well as you might be thinking. And as you say, we need to figure out
where to put the extra meta-information we would like to have quick access to.
Putting the graph into the file would work (and is something we want to do
anyway), but it means reading the head of lots of files, rather than having a
single file (or small set of files) to process.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHamvcJdeBCYSNAAMRAqbuAKCGt8jkzhxhY8YMqgo7Dx7NBATr1wCgxrFl
cQBXdGqqRCAXMlBDZxpUwGQ=
=1veN
-----END PGP SIGNATURE-----