[MERGE] Extend -Dhpss to emit a count of HPSS calls to stderr

Fri Sep 26 00:45:59 BST 2008

Vincent Ladeuil wrote:
> >>>>> "Andrew" == Andrew Bennetts <andrew at canonical.com> writes:
> 
>     Andrew> This patch extends the -Dhpss option to dump a count of HPSS calls to
>     Andrew> stderr for each SmartClientMedium that was used.
> 
>     Andrew> Example output:
> 
>     Andrew> $ ./bzr revno bzr://localhost/branch -Dhpss
>     Andrew> 3731
>     Andrew> HPSS calls: 8 (<bzrlib.smart.medium.SmartTCPClientMedium object at 0x8565dac>)
> 
>     Andrew> I know that it's a bit unusual for a debug flag to
>     Andrew> cause output somewhere other than the log file, but I
>     Andrew> think it's helpful in this case.  I'm open changing
>     Andrew> this behaviour, or to making a new debug flag for
>     Andrew> this separate to the existing -Dhpss, but so far I
>     Andrew> think I like it this way.
> 
> Hmm, it doesn't scale as a debug mean... we always ask for
> .bzr.log in case of bugs.
> 
> '-Dhttp' uses stderr for pycurl because I couldn't find a way to
> use .bzr.log and it's quite a pain to explain to users: "Use
> '-Dhttp' and send .bzr.log. Err, wait, are you using pycurl ?
> Then please send stderr output too".

I mainly want to use this personally, to help me see immediately how
many RPCs a bzr invocation does.  I don't think a total number of calls
is much help for diagnosing bug reports (especially if we get the
.bzr.log file anyway).  It is helpful for me doing interactive testing,
and also as a reminder of how we're doing in my day-to-day use.

Reporting to stderr rather than a log file also makes it a little easier
for something like usertest to capture this statistic.

>     Andrew> Unfortunately I had to resort to using an atexit
>     Andrew> handler to reliably report on each medium that was
>     Andrew> used.  
> 
>     Andrew> Many medium instances are not explicitly
>     Andrew> disconnected, and garbage-collection doesn't reliably
>     Andrew> collect the instance before the end of the process
>     Andrew> (e.g. I saw this happen if there was an error, such
>     Andrew> as a failure to connect to the remote host).
> 
> I encountered the same problem with transport, I don't think we
> can hope it will change any day soon.
> 
> But since you use an atexit handler anyway, you don't need to
> track medium end of lifes. You can just collect and display the
> result in the atexit handler (see below).

I can do just atexit, but I'd rather a more timely report when possible.
It plays much better with the test suite, for instance.

>     Andrew> Although it uses an atexit function as a last resort,
>     Andrew> mostly the code reports on each medium as soon as it
>     Andrew> is garbage collected.  This is implemented using
>     Andrew> weakref callbacks, which avoids the downsides of
>     Andrew> using __del__, and doesn't affect the object's
>     Andrew> lifetime.
> 
> I fully agree :) In fact I resort to the same mean (using
> weakref) for related reasons (circular references) when getting
> rid of multiple inheritance for SmartMedium.
> 
> It looks like we now have two cases for using weakref and I'd
> like others to comment about how comfortable they feel with using
> it.

I don't think there's any great controversy with using weakref.  It
doesn't have much impact on the rest of the code (much less than adding
a __del__ method!).  The main issue I see is that code using it should
be careful to be safe in face of the reference going away (i.e. checking
for “ref() is None”).

>     Andrew> If -Dhpss is not set then the impact of this patch is
>     Andrew> almost zero; one quick check in an infrequently
>     Andrew> called constructor, and one extra module-global
>     Andrew> variable is defined.
> 
>     Andrew> I'm not sure how best to test this.  If we always
>     Andrew> explicitly disconnected our medium instances it'd
>     Andrew> probably be straightforward, but we don't.  We rely
>     Andrew> on garbage collection, and sometimes we rely on the
>     Andrew> process to end, to disconnect our medium instances.
> 
> I'd say test the behavior by explicitly disconnecting and
> consider atexit reliable (or test it in a subprocess ?). In any
> case, test separately.

I'll see what I can do.

>     Andrew> The main drawback to this code is that when -Dhpss is
>     Andrew> used, it will keep adding hook functions for each
>     Andrew> SmartClientMedium that is instantiated.  Those hook
>     Andrew> functions are never unregistered because there's no
>     Andrew> API to do so.  It'd probably be good to fix that, but
>     Andrew> I don't think it's a big deal for debugging code.
> 
> A bit bigger to test it, but I agree otherwise.

I think I will add the facility to uninstall the hook; I don't like the
idea of useless hook functions accumulating and slowing things down.  If
nothing else, that's likely to slow the test suite down when I'm using
-Eallow_debug -Dhpss.

>     Andrew> I have an alternative version of this patch (also
>     Andrew> attached, as a diff relative to this change, I hope
>     Andrew> it doesn't confuse BB), that installs a single hook
>     Andrew> function (rather than one per medium).  The hook
>     Andrew> function then had to find the matching weakref for
>     Andrew> the medium being used, which didn't seem as
>     Andrew> straightforward to me.  Also that approach conflicts
>     Andrew> more with the test suite, which resets hooks after
>     Andrew> each test, but this hook is only installed once per
>     Andrew> process.
> 
> I prefer it anyway as less invasive but that's highly subjective.

I'd rather that the output of “bzr selftest -Eallow_debug -Dhpss” wasn't
then followed by thousands of lines of HPSS calls totals!  That's the
main practical advantage to avoiding atexit (except as a last resort),
and I wouldn't be surprised if there are others.  It just feels cleaner
to tidy up as I go along.

The test suite issue is probably an argument in favour of making it a
new -D flag, though.  When debugging the test suite with -Dhpss I
probably don't care much about the total call summaries.

[...]
>     Andrew> +
>     Andrew> +    def increment(self, params):
>     Andrew> +        """Install this method as a _SmartClient.hooks['call'] hook."""
>     Andrew> +        if params.medium is self.medium_ref():
>     Andrew> +            self.count += 1
> 
> How come params.medium can be different from self.medium_ref() ?

Because one hook function is installed per client medium.

> The toy implementation below gives really different results for
> (in a clone of your branch):
> ./bzr missing lp:bzr
> 
> HPSS calls: 40 (<bzrlib.smart.medium.SmartSSHClientMedium object at 0x8ae21cc>)
> HPSS my calls: 413 (<bzrlib.smart.medium.SmartSSHClientMedium object at 0x8ae21cc>)
> 
> I probably missed something here... which doesn't allow me to
> vote :-/

Because your call_on_medium hook function is installed for each client
medium that is instantiated, it is getting called multiple times per
call.  This will be less of an issue when RemoteTransport stops
inheriting from SmartClientMedium.

If you want to see which is right, count the calls in your ~/.bzr.log by
hand :)

-Andrew.