[MERGE] Packs. Kthxbye.

Robert Collins robertc at robertcollins.net
Fri Oct 19 04:33:54 BST 2007


On Fri, 2007-10-19 at 12:53 +1000, Ian Clatworthy wrote:
> Robert Collins wrote:
> > I think the moment has come to get the pack repository format reviewed.
> 
> Here are some review comments for the code in pack_repo.py before
> RepositoryPackCollection.
> 
> The __init__ method for Pack should take a name and a pack_transport
> parameter. 

It cannot take the former, as NewPack instances do not have name when
they are created. access_tuple() is overridden in NewPack to only use
name when it has been used. pack_transport is reasonable to add as a
parameter.

> These are used by methods at that level (e.g. access tuple)
> so I feel they ought to exist at that level. If you don't want them as
> constructor params, then set them to None in __init__ anyway please. I
> know we have differing views on this but I think base classes need to be
> semantically complete on their own, i.e. if there are abstract
> attributes or methods, they ought to be declared and commented to
> explain their reason for being, e.g.
> 
>   "set this in derived classes to ..." or
>   "override in derived classes to ...".
> 
> BTW, NewPack subclasses Pack yet doesn't call the Pack constructor in
> its own. That smells bad to me.

It calls the constructor, I think you are misreading.

> Along the same lines, index_name(), index_offset() and indices ought to
> migrate from NewPack to Pack.

They are not relevant for ExistingPack, because it never opens its own
indices, and this is deliberate to date but possibly wrong.

>  Those 2 methods can be static. Likewise,
> inventory_index_name(), revision_index_name(), signature_index_name()
> and text_index_name() all take name as a mandatory parameter rather than
> using self.name, so I'd suggest renaming them as *_for_name() and/or
> making them static.

I'm in the process (currently on hiatus) of finding them a permanent
home. They need to be available for creation of ExistingPacks, and they
should take no parameters for NewPack as they entirely state derived.

> In the docstring for ExistingPack, s/exisiting/existing/.

Donel

> In the __init__ method for NewPack, you do a good job of making it very
> clear about the constraint required between upload_transport and
> pack_transport, namely
> 
>   pack_transport = upload_transport.clone("../pack")
> 
> Given that, there appears to be a strong case for not having the
> pack_transport passed in as a parameter. (I think jam raised something
> like this as well IIRC.)

See my reply to him please.

> In this comment:
> 
>   # synchronised with reads, because its not in the transport layer, so
> 
> s/its/it's/
> 
> In the comment "1MB buffer cap" in that method, remove the size and
> simply say "cap the buffer size". If the buffer size is left as zero,
> every write gets flushed. I gather that's by design.


done and done

> In NewPack.finish(), the naming logic scares me. If 2 people working
> from the same revision of a branch/checkout make the same change/fix and
> commit it, will the pack contents be the same and therefore the names?

Unless they have the same username, datestamp and 20 bytes of entropy,
no the names will be different. If they get the same name its either an
MD5 hash collision or the same content.

> I'm yet to check whether this is managed by the fetch logic but choosing
> pack names like this sounds potentially dangerous. OTOH, it *could* be a
> feature? In that case, we need to make it so and explicitly handle it as
> an optimisation throughout the code, yes? "Ah yes, I have that content."
> Not sure about the index implications though.

I've put copious comments on this in the code. Using hashes to mean
'have same content' is not a feature IMNSHO. But race conditions with
renames mean we need a complicated insertion process to do this without
suffering hash collisions anyway. The current state is not sufficiently
good IMO, but not particularly dangerous for now.

> In NewPack.make_index docstring, s/packs/packs's/.

My understanding of possessives is that this is incorrect. The correct
correction is s/packs/pack/.

> In NewPack._write_index docstring, remove the param descriptions for
> index_offset and name_getter. They aren't params anymore/yet.

Done.

> In AggregateIndex docstrings,
> 
>   s/An index from the pack parameter./An index for the pack./

Done

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071019/abdd2ce9/attachment.pgp 


More information about the bazaar mailing list