[MERGE] sftp readv performance tweaks (2x faster)
Martin Pool
mbp at canonical.com
Fri Jul 21 00:23:42 BST 2006
On 17 Jul 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> Well, I ran scripts all weekend to figure out how fast I could make sftp
> with my changes. And the attached image shows how it went.
Very nice. Given the improvement it would be good for 0.9 - do you have
any reservations?
> + @staticmethod
> + def _coalesce_offsets(offsets, limit, fudge_factor):
> + """Yield coalesced offsets.
> +
> + With a long list of neighboring requests, combine them
> + into a single large request, while retaining the original
> + offsets.
> + Turns [(15, 10), (25, 10)] => [(15, 20, [(0, 10), (10, 10)])]
> +
> + :param offsets: A list of (start, length) pairs
> + :param limit: Only combine a maximum of this many pairs
> + Some transports penalize multiple reads more than
> + others, and sometimes it is better to return early.
> + 0 means no limit
> + :param fudge_factor: All transports have some level of 'it is
> + better to read some more data and throw it away rather
> + than seek', so collapse if we are 'close enough'
> + :return: a generator of [(start, total_length,
> + [(inner_offset, length)])]
> + (start, total_length) is where to start, and how much to read
> + and (inner_offset, length) is how those chunks should be
> + split up.
These lists are getting to a size/complexity where they might be better
as trivial objects rather than a list - with say .start, .total_length,
.ranges. Unless there's a compelling peformance reason it's better not
to store objects with multiple fields inside a list or tuple. It might
make the tests a bit more reasonable too.
This couldn't be a generator then, but does it get any value from being
so? Since you sort the input it can't really stream.
Just a thought, if you think it's better this way you can leave it.
> + """
> + last_end = None
> + cur_start = None
> + cur_total = None
> + cur_subranges = []
> +
> + for start, size in offsets:
> + end = start + size
> + if (last_end is not None
> + and start <= last_end + fudge_factor
> + and start >= cur_start
> + and (limit <= 0 or len(cur_subranges) < limit)):
> + cur_total = end - cur_start
> + cur_subranges.append((start-cur_start, size))
> else:
> - if (len (combined_offsets) < 50 and
> - combined_offsets[-1][0] + combined_offsets[-1][1] == offset):
> - # combatible offset:
> - combined_offsets.append([offset, size])
> - else:
> - # incompatible, or over the threshold issue a read and yield
> - pending_offsets.appendleft((offset, size))
> - for result in do_combined_read(combined_offsets):
> - yield result
> - combined_offsets = []
> - # whatever is left is a single coalesced request
> - if len(combined_offsets):
> - for result in do_combined_read(combined_offsets):
> - yield result
> + if cur_start is not None:
> + yield (cur_start, cur_total, cur_subranges)
> + cur_start = start
> + cur_total = size
> + cur_subranges = [(0, size)]
> + last_end = end
> +
> + if cur_start is not None:
> + yield (cur_start, cur_total, cur_subranges)
> +
> + return
>
> def get_multi(self, relpaths, pb=None):
> """Get a list of file-like objects, one for each entry in relpaths.
>
> === modified file 'bzrlib/transport/sftp.py'
> --- bzrlib/transport/sftp.py 2006-06-30 15:13:19 +0000
> +++ bzrlib/transport/sftp.py 2006-07-17 18:14:33 +0000
> @@ -426,25 +434,6 @@
> except (IOError, paramiko.SSHException), e:
> self._translate_io_exception(e, path, ': error retrieving')
>
> - def get_partial(self, relpath, start, length=None):
> - """
> - Get just part of a file.
> -
> - :param relpath: Path to the file, relative to base
> - :param start: The starting position to read from
> - :param length: The length to read. A length of None indicates
> - read to the end of the file.
> - :return: A file-like object containing at least the specified bytes.
> - Some implementations may return objects which can be read
> - past this length, but this is not guaranteed.
> - """
> - # TODO: implement get_partial_multi to help with knit support
> - f = self.get(relpath)
> - f.seek(start)
> - if self._do_prefetch and hasattr(f, 'prefetch'):
> - f.prefetch()
> - return f
> -
Was this just no longer used?
--
Martin
More information about the bazaar
mailing list