[MERGE] Transport support for pack repositories
Robert Collins
robertc at robertcollins.net
Thu Aug 9 02:16:10 BST 2007
On Thu, 2007-08-09 at 10:27 +1000, Martin Pool wrote:
> To be more clear and exclude things that we might or might not add
> later, my specific review comments are that
>
> * open_file_stream should take a parameter indicating you want a
> writeable stream
I don't like this because we already have an api for getting a readable
stream, and read-write maps very badly to the underlying behaviour of
things like FTP.
> * it should return an object rather than a callable.
I did this.
> On 8/9/07, Robert Collins <robertc at robertcollins.net> wrote:
> > > I don't quite understand why we need to ensure synchronization before
> > > a
> > > flush()/close() is done. So I would *prefer* if we could not guarantee
> > > synchronization between readv() and stream() unless we need to. (Maybe
> > > you were thinking that we would stream data from somewhere, save it
> > > locally but not "activate" it until we have reprocessed it, but maybe
> > > we
> > > could provide a '.flush()' for just that situation).
>
> I had the same thought as John, and talked about it a bit with Robert.
>
> I am pretty reluctant to have interfaces make promises that they
> cannot in general keep: matching up by filename is not going to work
> in general when there are symlinks, multiple transports touching the
> same file, and so on.
OTOH symlinks will cause the same confusion all the way up the stack. In
general anything that allows aliases - e.g. sftp to the site an http
server is at - will make file calls be confusable. This interface is not
meant to be general purpose; its sole reason to exist is to let pack
creation be efficient.
> If there is a transport client that wants to read and write a
> particular file and get consistent results then it seems that the
> clean way to do it would be for them to get a read/write file object
> and do everything through there.
See above for why I don't think that works at all well. But I can
overhaul our Transport API if you want (readv has to be deleted, get()
has to stop ever erroring because it has to be lazy and not open until
what the client tries to do happens - e.g. read(), or readv(), on it is
the only thing that is allowed to error); I think its a massive
distraction from our performance effort though.
> > There are several things here that I wanted to balance.
> > * We use the knit read-after-write-before-commit facility today. I
> > didn't want to have to 'fix' that as part of getting reasonable
> > performance out of packs *at all*.
> > * flush() to ensure that the read-after-write reads the correct data is
> > in fact bad for performance; what we should rather do is know what data
> > has been written, and what hasn't, and satisfy readv/get from local data
> > as an when it's possible. flush's semantics are 'let another process see
> > this data', we need 'let read on this data see it'. So we could have a
> > sync() call with our precise semantics, and I'd be happy enough to
> > refactor it to do that, but it looked (and looks) like something we can
> > deterministically detect: the transport layer can always tell (for the
> > way we do this - same url prefix, as its the same repository object)
> > when synchronisation of some form is needed, and could in fact do that
> > by having its own cache with the file stream and satisfying readv/get
> > from a combination of external storage and the local write cache.
>
> I think this means that you need an in-memory object representing the
> file which can hold a cache and satisfy read and write requests.
Which my updated patch has (the object, not a cache at this point).
> > That gets into __del__ or exit hooks; I'm really not a fan of these,
> > because of the various vaguaries of python and memory management. More
> > importantly here, we are using these streams to allow incremental pack
> > writing *to an upload directory*. The 'write place' :) to do such a
> > check would actually be the rename() call, so that you don't rename an
> > open stream.
>
> I think the returned object should have a __del__ method that warns
> "stream was not closed".
Why? What class of errors are we hoping to catch with this? if its API
misuse, then see my comment above - its rename() that the problem occurs
on, not object deletion.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070809/048b1750/attachment.pgp
More information about the bazaar
mailing list