[MERGE] Packs. Kthxbye.

Fri Oct 19 07:03:30 BST 2007

> > Odd to use 'indices' in some places and 'indexes' in others.  Either is ok,
> > indexes might be more obvious.  I dread having an object with .indices
> > and .indexes :)
>
> I prefer indices.

So it's ok to change indexes->indices globally?

> I agree that the situation with __eq__ etc can be complex and lead to
> confusion with non-trivial objects. However, ExistingPack and GraphIndex
> both act as value objects - they are immutable with only public query
> methods.

I can live with it.

> >
> > +
> > +    def __init__(self, upload_transport, index_transport, pack_transport,
> > +        upload_suffix=''):
> > +        """Create a NewPack instance.
> > +
> > +        :param upload_transport: A writable transport for the pack to be
> > +            incrementally uploaded to.
> > +        :param index_transport: A writable transport for the pack's indices to
> > +            be written to when the pack is finished.
> > +        :param pack_transport: A writable transport for the pack to be renamed
> > +            to when the upload is complete. This *must* be the same as
> > +            upload_transport.clone('../packs').
> >
> > Well, if it must be the same, remove the parameter and clone it yourself?  I
> > guess constructing the new transport would be a bit inefficient as the caller
> > already has them, and this is only called by RepositoryPackCollection.
>
> Thats part of it, the other part I discussed in my reply to John's
> review.

you said:

    pack_transport is used by base class methods, we could clone it
    ourselves, but actually - I'd like to fix the Transport.relpath bug that
    makes it impossible to get a relative path like '../foo' from
    'PREFIX/bar'.relpath('PREFIX/foo')

I'd like that fixed too, but I don't see how it relates to this.

>> > This would be simpler if you just made index_sizes a dictionary keyed
> > by index_type.
>
> Simpler *where*. serialisation and parsing would be made more complex.

I'll try to change it and see if it looks better or not.

>
> > +        self._buffer = [[], 0]
> >
> > can you comment _buffer please, you were doing so well :)
>
> Its a list of byte sequences, and the aggregate size of them, which is
> then string joined when flushing the buffer.
>
>
> > +        # create a callable for adding data
> > +        def _write_data(bytes, flush=False, _buffer=self._buffer,
> > +            _write=self.write_stream.write, _update=self._hash.update):
> > +            _buffer[0].append(bytes)
> > +            _buffer[1] += len(bytes)
> > +            # 1MB buffer cap
> > +            if _buffer[1] > self._cache_limit or flush:
> > +                bytes = ''.join(_buffer[0])
> > +                _write(bytes)
> > +                _update(bytes)
> > +                _buffer[:] = [[], 0]
> > +        # expose this on self, for the occasion when clients want to add data.
> > +        self._write_data = _write_data
> >
> > Is there a reason this cannot just be a regular method?
>
> Yes, you can't bind the parameters like I do, which is done for
> performance.

ok, so looking them up in self's dict every time would be too much?
Then let's add a comment so someone doesn't undo it.

> > +    def index_offset(self, index_type):
> > +        """Get the position in a index_size array for a given index type."""
> > +        return NewPack.indices[index_type][1]
> >
> > That docstring doesn't seem very helpful.
>
> Sorry, I'm not sure what would be clearer there.

I'll try removing the index arrays altogether.

> > Should add_callback be set to None here, and documented?
>
> Yes.
>
> > Rather than
> > generically "add_callback" the name could say more what it does.
> > However I'm not sure a bound method reference is really the best thing to
> > use here: the callers seem like they would be happier getting a reference
> > to the writable index, rather than just the add callback.
>
> Not sure why you think that.

It's just a general preference for passing around objects that support
a protocol, rather
than just a callable.  Also I think it's a bit odd to have a bound method
of one object be attached as an attribute of another object.

> > +        if self._max_pack_count(total_revisions) >= total_packs:
> > +            return False
> ...
> > +        return True
> >
> > I guess this should return False if no pack_operations were planned?
>
> It does :).

Well, it seems like it would be possible (at least if the code changes in the
future) for the actually-planned changes to be null, even if the original check
doesn't trap it.

> > +    def remove_pack_from_memory(self, pack):
> > +        """Remove pack from the packs accessed by this repository.
> > +
> > +        Only affects memory state, until self._save_pack_names() is invoked.
> > +        """
> > +        self._names.pop(pack.name)
> > +        self._packs.pop(pack.name)
> > +        self._remove_pack_indices(pack)
> >
> > Should be private?
> >
> > Just del would be clearer than pop as you don't want the result.
>
> I think you mean 'remove' ? Sure.

No, I did mean

  del self._names[pack.name]

which does work on dicts.

> > (query) And not only that, but it also needs them to be able to do
> > iter_lines_added_or_present_in_versions, which is not supported by
> > the base VersionedFile class.  I guess this is moot because
> > the current code works for all formats and you get a decent error
> > message anyhow.
>
> hmm, iter_lines_added.. is required in the VersionedFile api, by the
> versioned file interface tests.

ok

I'll pull Robert's current branch and make the changes outlined above.

-- 
Martin