Python style question

Tue Jun 7 16:40:19 BST 2005

Aaron Bentley wrote:
> John A Meinel wrote:
> | Martin Pool wrote:
> |
> 
> | The problem is that really you want a branch hierarchy. You can argue
> | that you want a standard Branch object and a BranchStorage hierarchy. It
> | doesn't really matter, but to me, that is just one more layer of
> | indirection that you don't really need.
> 
> It reminds me of the PFS layer in Arch, and I think that's one of the
> good things about Arch.  A BranchStore acts as a bridge between the
> high-level Branch object and the filesystem/network/etc.

I agree. I think we can do slightly better by designing a pipeline-able
filesystem layer. So we expect to make multiple requests and then later
analyze the results.

I am certainly in favor of having some sort of factory function that can
register new filesystem handlers (so that plugins can implement new ones).

> 
> Without that, you wind up doing the Template pattern all over the place,
> or else reimplementing the same stuff in each Branch.
> 
> I can imagine it would be pretty easy to do things like locally-cached
> branchstores, or multiple-source branchstores...
> 

Yep. I was just thinking, right now we have these very likely storage
mechanisms:

	Current text-store (with each version store in gz form)
	revfile (with append-only revfile & index files)
	Http (both methods are possible)
	sftp
	smart server
	dumb filesystem storage
	Local revision pool.

I'm not sure how to get them to talk to eachother, for instance the
local rev pool is more like a cache, which should exist for most of the
other methods. And both http and sftp can support the text-store or the
revfile methods.

> | request pipeline protocol. I'm not sure what functions would return a
> | list of entries that we could pipeline, but it would certainly be a
> | speed improvement if you could make a bunch of requests, and then wait
> | for them on the other end.
> 
> You can certainly do something like return a PendingRequest object with
> a finish() method to get the object you're actually interested in.
> 

Certainly. Or even just something that looks like a file, but blocks on
the first actual "read()" request.

> 
> | I think the threads method is a little bit more straightforward to
> | program. As then each object can have it's own downloading thread, and
> | you can just ask the object if it is ready. But I guess a lot of people
> | prefer the select() idiom.
> 
> Yeah, it's more performant, and I've heard bad things about Python's
> threading.  On the other hand, select can be tricky to get right.
> 

I think select() would be much trickier to do correctly. And I *think*
it requires everything to have an os filehandle (number).

The problem with Python threading as I have come to deal with it, is
that there is a Global Interpreter Lock (GIL). Because python is
refcounted, you have to make sure 2 threads don't increment/decrement
the same lock. Python handles this by only letting 1 thread run in the
VM at any given time. I believe it allocates time-slices (or VM steps),
and can interrupt a running thread.

But if you are in a python extension (C/C++ code), it cannot interrupt
you. However in C there is a way to release the GIL, so that you can
indeed have multiple threads running.

I have used threads quite extensively, and as long as you are careful,
it seems to work fine.

I know I had some problems with pyarch/fai under cygwin, but it wasn't
really because of threads as I had thought. cygwin python has had
threads for several years now. It was probably more to do with
improperly cleaning up some of the files. As it was getting "no
filehandle (9) exists", kind of errors.

> Aaron
> 

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 251 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050607/91b5e39f/attachment.pgp