[MERGE] Transport support for pack repositories

Thu Aug 9 01:27:07 BST 2007

To be more clear and exclude things that we might or might not add
later, my specific review comments are that

* open_file_stream should take a parameter indicating you want a
writeable stream
* it should return an object rather than a callable.

On 8/9/07, Robert Collins <robertc at robertcollins.net> wrote:
> > I don't quite understand why we need to ensure synchronization before
> > a
> > flush()/close() is done. So I would *prefer* if we could not guarantee
> > synchronization between readv() and stream() unless we need to. (Maybe
> > you were thinking that we would stream data from somewhere, save it
> > locally but not "activate" it until we have reprocessed it, but maybe
> > we
> > could provide a '.flush()' for just that situation).

I had the same thought as John, and talked about it a bit with Robert.

I am pretty reluctant to have interfaces make promises that they
cannot in general keep: matching up by filename is not going to work
in general when there are symlinks, multiple transports touching the
same file, and so on.

If there is a transport client that wants to read and write a
particular file and get consistent results then it seems that the
clean way to do it would be for them to get a read/write file object
and do everything through there.

> There are several things here that I wanted to balance.
>  * We use the knit read-after-write-before-commit facility today. I
> didn't want to have to 'fix' that as part of getting reasonable
> performance out of packs *at all*.
>  * flush() to ensure that the read-after-write reads the correct data is
> in fact bad for performance; what we should rather do is know what data
> has been written, and what hasn't, and satisfy readv/get from local data
> as an when it's possible. flush's semantics are 'let another process see
> this data', we need 'let read on this data see it'. So we could have a
> sync() call with our precise semantics, and I'd be happy enough to
> refactor it to do that, but it looked (and looks) like something we can
> deterministically detect: the transport layer can always tell (for the
> way we do this - same url prefix, as its the same repository object)
> when synchronisation of some form is needed, and could in fact do that
> by having its own cache with the file stream and satisfying readv/get
> from a combination of external storage and the local write cache.

I think this means that you need an in-memory object representing the
file which can hold a cache and satisfy read and write requests.

> That gets into __del__ or exit hooks; I'm really not a fan of these,
> because of the various vaguaries of python and memory management. More
> importantly here, we are using these streams to allow incremental pack
> writing *to an upload directory*. The 'write place' :) to do such a
> check would actually be the rename() call, so that you don't rename an
> open stream.

I think the returned object should have a __del__ method that warns
"stream was not closed".

-- 
Martin