[PATCH] KnitIndex optimizations

Mon Nov 20 16:15:14 GMT 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dmitry Vasiliev wrote:
> 
> Brief overview of the changes in the attached patch:
> 
> - Extracted the knit index file parsing code from __init__() into
> _load_data() method. For the new code profiling tells about 15% speedup
> while parse all *.kndx from the Bazaar branch.

Can you reproduce this as part of a benchmark with profiling turned off?
Having worked with and without profiling, I've found that profiling is a
little bit skewed. It is generally better than nothing, but it doesn't
always weight functions correctly. (python function calls are relatively
more expensive than C function calls). so lsprof seems to overly
penalize a python function call.

It shouldn't be too hard to create a knit index with several thousand
entries, and then you can benchmark the time it takes to read it a bunch
of times. (Say 1000-10000 entries, and open the file 100 times). We
would like the benchmark to take ~1s, since that makes it reproducible
without spending too much time on a single benchmark.

> - Converted loops into list comprehensions where appropriate.
> - Localized attributes for faster access inside loops where appropriate.
> - Added forgotten import.
> - Removed trailing whitespaces.
> 
> Open questions:
> 
> - KnitVersionedFile._check_versions_present() method now just calls
> self._index.check_versions_present(). The possible problem here is that
> RevisionNotPresent exception now gets filename of the index instead of
> the base filename of the knit. I'm not sure should the exception be
> caught at the KnitVersionedFile level and be rerised with the knit's
> base filename?

Actually 'RevisionNotPresent' raises based on file_id, not filename, so
the error raised should be the same.

Also, this isn't a user-facing error, it is an error that indicates we
have some sort of corruption or internal error, so it doesn't have to
look pretty to the user.

> - Maybe _load_data() should use file's iteration protocol instead of the
> readlines() method for parse index file? A file iterator use read-ahead
> buffer so it should be faster than file.readline() and should consume
> less memory (for big files) while be a little bit slower than
> file.readlines(). For the _load_data() method profiling tells about 10%
> speedup for 'for line in fp' instead of 15% for 'for line in
> fp.readlines()'.

Remember that the file handle given here isn't always a plain file().
Sometimes it is the return value from urllib.open(), and sometimes it is
an sftp file.

http probably always does a full request, and can stream the data back,
so I would guess 'for line in fp' would be fine.

sftp might be bad in that situation, since when you use 'fp.readline()'
it actually does only an 8K read, which means a round trip for every 8k.
 Now, I may be missing an optimization, which seems to be that doing
SFTPTransport.get() requests a prefetch of the entire file. So it may be
that it is already spooling all of the data. If that is the case, then
for line in fp should be fine.

For the smart transport, we also just spool all of the data before
returning, so it shouldn't be a problem there either. This is one place
we should start looking into being able to only read *some* of the data
before yielding it. But I digress.

v- You should at least do the pb.update() inside the try/finally in case
an exception is raised, the pb will still be finished correctly.

Ultimately, I have a patch which removes this, though. Because we
actually spend a lot of time with this, and it doesn't really help the
progress bar in the general sense.

> @@ -1072,83 +1068,78 @@
>          # in the knit.
>          self._history = []
>          pb = bzrlib.ui.ui_factory.nested_progress_bar()
> +        pb.update('read knit index', 0, 1)
>          try:
> -            count = 0
> -            total = 1
>              try:
> -                pb.update('read knit index', count, total)
>                  fp = self._transport.get(self._filename)
> +            except NoSuchFile:
> +                if mode != 'w' or not create:
> +                    raise
> +                elif delay_create:
> +                    self._need_to_create = True
> +                else:
> +                    self._transport.put_bytes_non_atomic(
> +                        self._filename, self.HEADER, mode=self._file_mode)
> +            else:
>                  try:

v- I don't believe we follow this structure in 'if' blocks. It is
unnatural to most people, and python doesn't support:

if value[0] = '.':

It actually raises a SyntaxError. So the workaround isn't actually needed.

> +                if '.' == value[0]:
> +                    # uncompressed reference
> +                    parents.append(value[1:])
> +                else:

and here

>          result = []
>          for value in compressed_parents:
> -            if value[-1] == '.':
> +            if '.' == value[0]:
>                  # uncompressed reference
>                  result.append(value[1:])
>              else:

So I'm generally positive about the changes (+0.8 or so). If you can
write a benchmark that shows these are actual improvements, rather than
just --lsprof improvements I would be happier.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFYdSSJdeBCYSNAAMRAr8tAKCvLNqH1kFIKTH1Gm6jmM5ahcxT1gCeK44V
r/EuLYu7eqn+m4Yc4AESXt0=
=0g4X
-----END PGP SIGNATURE-----