check improvements
Robert Collins
robert.collins at canonical.com
Tue May 12 22:54:34 BST 2009
On Tue, 2009-05-12 at 09:26 -0500, John Arbash Meinel wrote:
> I don't think lp:~lifeless/bzr is a branch, and I'm pretty sure you
> don't have a branch named '.' :)
...'check'. Sorry.
> Anyway, I think the biggest bit is figuring out how to get the per-file
> graph from the inventory without having to deserialize full inventories
> for each revision (which is probably your Revs*Trees).
No, we get that reasonably efficiently for xml, may need to specialise
for bbc. Its the scan [that I'm about to delete] of the contents of
every tree to determine duplicate file texts and check shas thats an
issue. I'm looking at the best interface for a new cheap-scan to the
repository interface, to scan every InventoryEntry in the repo.
(basically find_file_ids but deserialising all the way to an IE). This
will allow a single pass over the inventory contents. We have to expand
all the inventories to full texts once, to get their validators; this is
cheap on bbc as they are simply root nodes and self validating downwards
from there; not so cheap on xml. I suspect the interface will be
something that yields the inventory text sha, and inventory entries, in
a structured stream. That will allow one pass; read everything; xml can
get_stream with fulltext set on, and then still process the text delta
through a line matching regex, while bbc can process the entire tree.
> One possibility would be to keep the last inventory around, and use the
> delta to update it. Or perhaps a few inventories in an LRUCache, and
> then you peek and find the 'closest' one to the one you want next.
>
>
> ...
>
> Looking at the code in question, we do:
>
> for _, ie in inv.iter_entries():
> key = (ie.file_id, ie.revision)
> result.setdefault(key, False)
> if entry.revision == inv.revision_id:
> result[key] = True
>
> However, I think we could simply change that loop to:
>
> for _, ie in inv.iter_changes(prev_inv):
> # the rest is the same
>
> Because we know we only care about things that are different in this
> inventory, because we know all the keys in the previous inventory have
> been handled.
That code isn't the problem ;). But yes we could use that for bbc;
issues are that it won't cross check the inv sha1 itself; and also we
need to make sure that two ie's in adjacent inventories that claim the
same last-modified but have e.g. different text_sha1 trigger an error.
-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090513/64d94be1/attachment.pgp
More information about the bazaar
mailing list