[RFC] Knit format 2
John Arbash Meinel
john at arbash-meinel.com
Sat Sep 9 22:02:45 BST 2006
Aaron Bentley wrote:
> John Arbash Meinel wrote:
>
...
>>> This is a good argument for for lazy parsing of inventories, though.
>>>
>>> It might be better if we had some way to know the estimated inventory
>>> size (maybe read the first one and use it). And then tune the number of
>>> trees to read at a time based on that, rather than having a fixed #. I
>>> realize that converting a freebsd ports tree will never be fun, but it
>>> should be possible.
>
> Full-ACK. Without that info, all we can do is use the lowest common
> denominator. And if we drop the number too low, then we trade
> small-tree performance against the ability to use large trees. That
> said, I don't think it would be terrible if, in order to upgrade the
> ports tree, you had to set up a big swap file.
I think we just need to factor this sort of thing into a
'get_revision_trees' helper, which will do that sort of thing, and
return sort of a 'best case' read size. I think you can stick with 500
for now, because we aren't dealing with the freebsd ports tree (yet).
But if you are going to do any work, then having one that inspects the
inventory size would be my preference.
...
>>> + def regenerate_inventory(self, fetcher, revs):
>>> + inventory_weave = fetcher.from_repository.get_inventory_weave()
>>> + for tree in self.iter_rev_trees(revs, fetcher.from_repository):
>>> + parents = inventory_weave.get_parents(tree.get_revision_id())
>>> + if tree.inventory.revision_id is None:
>>> + tree.inventory.revision_id = tree.get_revision_id()
>>> + fetcher.to_repository.add_inventory(tree.get_revision_id(),
>>> + tree.inventory, parents)
>>>
>>> ^-- tree.inventory.revision_id will never be None here, because you take
>>> care of that in 'iter_rev_trees()' I assume you just were refactoring
>>> stuff and forgot to remove it.
>
> Old inventories didn't have revision_id set, and the deserialising code
> doesn't know what the revision_id should be. I'm pretty sure it can be
> None. Why do you think it can't?
Because you have the same if revision_id is None: revision_id = ...
check in iter_rev_trees(). (You aren't reading the revision trees directly)
...
>
> I see two options:
> 1. Depend on the inventory format, not the repository format
> 2. Use polymorphism so that this implementation cannot be used if the
> inventory format is incompatible. We could implement it on
> RepositoryFormats or on InventoryFormats, I think.
Well Repository format defines the revision and inventory formats. We
only include the explicit format number for redundancy. (This was
discussed a while ago when we first introduced a format string to the
inventory and revision texts).
I would be okay with you doing it in lots of different ways. But it
should probably still be a white-list style not blacklist. Because
fileids_affected... is really a violation of api constraints. (For
example, we can't switch to the alternative XML format which splits the
attributes across lines, which saves quite a bit of space in the Knit,
but doesn't have the necessary information all on one line anymore).
> Wouldn't it be better to fix the code that sets ./bzr/repository/format,
> so that it uses a newline?
We can't really change existing formats, because they are a disk
representation, and older bzr clients would puke.
We could upgrade our internals, and then in the next format start using
trailing newlines. But the only way to tell that you can do that, is by
having a flag in the format, and you might as well just have that flag
be a '\n' at the end of the line. :)
I would be happy to see an assert somewhere else in the future, but we
can't do that now.
...
> I need an error like that, to handle basis caching properly. I had to
> rename the inventory cache file because old clients don't gracefully
> handle situations in which the cached inventory is in an unexpected
> format. I don't want to have to do that again.
>
> Aaron
Good enough, I guess.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060909/aa20239a/attachment.pgp
More information about the bazaar
mailing list