[attn aaron] WorkingTree._write_inventory/_set_inventory considered harmful
Robert Collins
robertc at robertcollins.net
Fri Jul 28 01:17:50 BST 2006
On Thu, 2006-07-27 at 09:43 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> > Hi Aaron,
> > I note that TreeTransform uses private tree interfaces to update the
> > working inventory.
>
> You mean Tree._write_inventory?
yup.
> I can and will make this work with dirstate, but its
> > going to be relatively inefficient I suspect (compared to mutating the
> > data we have as we do operations - for example, after setting a file in
> > the transform, we could fstat it to get the stat value and put that
> > straight into the dirstate.
>
> Until we're in TreeTransform.apply(), all of the operations are subject
> to change. Storing the data in the dirstate before then would mean
> having to keep the TreeTransform and the dirstate data in sync.
I'm not sure thats a burden, but its definately a separate discussion -
perhaps worth having later.
> > IIRC the reason you do this is so that you dont have an inventory that
> > violates the invariants of an inventory, during the transform?
>
> No, it was for performance. All of the WorkingTree inventory
> opererations (e.g. WorkingTree.add) write to disk, and I want to perform
> a bunch of operations without hitting disk every time.
Ok. Well we can definately fix that.
> > I think it would make sense to extend the working tree interface to
> > allow all the operations TT needs to do to be done via it (efficiently),
> > which would allow us to manage the dirstate data and not force a rescan
> > at the end of the merge.
>
> Pardon? What is a rescan, and doesn't dirstate include all the
> inventory stuff already?
dirstate stores:
- the parent revision ids
- the parent inventory contents
- the working inventory content
- stat & sha information
if we 'set inventory' and change the working inventory content, we need
to re-combine the parent inventories and the working inventory all over
again. we dont know what changes have really been done to the tree if
we're just told that its suddenly different, so it seems to me we need
to walk the tree again and gather the stat data to match the new
inventory. At the same time we'd zip the parent data into the dirstate.
> If it is more efficient to express these updates as changes to the
> inventory, rather than as replacement of the inventory, that's fine
> with me. We'd need an interface that allowed us to perform a bunch of
> operations without hitting disk. The changes would still be performed
> in TreeTransform.apply, then.
>
> The operations needed are add, remove and lookup.
So - 'add' - adds a file or dir or symlink that is on disk, with a
fileid. Is the tree allowed to stat the file at that point? (or can you
provide a stat and sha1 value ?)
'remove' - remove a file or dir or symlink from the current working
inventory. Should it remove children ?
Do you represent moves as a remove of the entry, then an add of an entry
with the same id ?
Is lookup 'fileid->path' ?
I'll start working on a patch to allow these to be done in one write
lock with no inventory writes shortly.
> Or alternatively, we can provide a list of files / ids affected by the
> transform, so you can use that to avoid a full rescan.
That might work, though I think mutation is probably better.
Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060728/0482816f/attachment.pgp
More information about the bazaar
mailing list