bug? find_branches or find_bzrdir tries to open all files as bzr locations
Alexander Belchenko
bialix at ukr.net
Tue May 17 22:02:29 UTC 2011
17.05.2011 23:53, Aaron Bentley пишет:
> On 11-05-16 04:48 AM, Alexander Belchenko wrote:
>> `bzr qlog` without any parameters run inside shared repository is trying
>> to detect all available branches and show log for all of them. And often
>> it has very slow start with many branches with many files.
>
>> So, I think it could perform better for local filesystem if it will skip
>> plain files. Perhaps symlinks should be inspected as well as
>> directories, I don't know.
>
> I'm going to assume two things:
> 1. when we list files, we don't get their filetype
> 2. statting a file to determine its filetype takes roughly the same
> amount of time as a failed attempt to open ".bzr/branch-format" where
> ".bzr" is a file.
>
> If those assumptions are true, the fastest thing we can do is attempt to
> open .bzr/branch-format. Testing the filetype would introduce and extra
> filesystem call, which would be slower.
If I understand correctly, when bzr tries to find branch then it tries
to open location with all registered (supported) branch formats.
So, if first attempt (with first or default registered format) to open
branch has failed then bzr tries to open the location with next
format, right? Plus foreign-vcs plugins add more branch formats that
also should be checked.
If this is correct then for every directory/files/symlink bzr will try
to open branch several times, therefore doing many syscalls every
time. Is it correct?
As I can see there are 6 branch formats registered in bzr 2.3 codebase:
BranchFormat.register_format(__format5)
BranchFormat.register_format(BranchReferenceFormat())
BranchFormat.register_format(__format6)
BranchFormat.register_format(__format7)
BranchFormat.register_format(__format8)
BranchFormat.set_default_format(__format7)
_legacy_formats = [BzrBranchFormat4(),
]
network_format_registry.register(
_legacy_formats[0].network_name(), _legacy_formats[0].__class__)
So, for every file (which I believe cannot be opened as branch) bzr
spends 6 syscalls (minimum) just to understand that this is not
branch. As I understand it, there will be enough just 1 syscall to
determine that there is no branch. Is it correct?
So, the real improvements here only for directories, where 1 extra
syscall can be saved.
If you run code that uses find_branches over the tree with *huge*
amount of files then you'll have 6*hugeN useless work, while it could
be just 1*hugeN. Difference in 6 times is worth to think about.
When I worked on heads plugin (which is now part of bzrtools) I've
tried to avoid find_branches() API for local repo/branches and just
os.walk the tree and tried to open only directories as branches. That
was very fast comparing to general `bzr branches` execution time.
So, if my analysis above about trying to open every location as branch
in different formats is correct then I have reasons to think that
checking whether location is file or not for *local* filesystem access
will be much faster for find_branches().
--
All the dude wanted was his rug back
More information about the bazaar
mailing list