'bzr status' stats each file multiple times

John A Meinel john at arbash-meinel.com
Sat Dec 3 05:35:39 GMT 2005


I have a new tree which has quite a few files:
$ bzr info
branch format: Bazaar-NG branch, format 6

in the working tree:
      1646 unchanged
         0 modified
         0 added
         0 removed
         0 renamed
         0 unknown
       805 ignored
       127 versioned subdirectories

branch history:
        63 revisions
         1 committer
         4 days old
   first revision: Mon 2005-11-28 09:02:58 -0600
  latest revision: Fri 2005-12-02 16:57:37 -0600

revision store:
        64 revisions

I've noticed that "bzr status" on an clean tree seems to take quite a
while. So I ran it through strace, and found that each file was stat'ed
4 times.

I'm not sure what is causing this, but since I believe the slow part of
tla was the fact that it had to stat too many files, I'm thinking 4
stats is probably too many.

We probably require at least 2, though it probably would be nice if we
could do it in 1.
Also, it does seem to be issuing an "open()" call for every file. I
thought the point of the hash-cache was that it wouldn't actually have
to issue an open.

I'm also not positive that it isn't doing a sha1sum, since this is what
I'm seeing:
gettimeofday({1133587176, 848039}, NULL) = 0
lstat64(".../docs/CHANGES.310", {st_mode=S_IFREG|0644, st_size=963,
...}) = 0
lstat64(".../dcmsr/docs/dsr2xml.man", {st_mode=S_IFREG|0644,
st_size=5876, ...}) = 0
lstat64(".../dcmsr/docs/dsr2xml.man", {st_mode=S_IFREG|0644,
st_size=5876, ...}) = 0
open(".../dcmsr/docs/dsr2xml.man", O_RDONLY|O_LARGEFILE) = 5
fstat64(5, {st_mode=S_IFREG|0644, st_size=5876, ...}) = 0
_llseek(5, 0, [0], SEEK_CUR)            = 0
read(5, "/*!\n\n\\if MANPAGES\n\\page dsr2xml "..., 130000) = 5876
read(5, "", 65000)                      = 0
mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb778a000
read(5, "", 130000)                     = 0
munmap(0xb778a000, 135168)              = 0
close(5)                                = 0

Now, I'm not sure what it is doing, but it seems to be reading in rather
large chunks, which is what the sha_file code does.

I'm also seeing the same file get lstat'd 2 times, then it gets opened,
and fstated (probably to get the size). Before it is all read in.

Anyway, it is probably something we want to look into. It seems there is
a lot of room for performance improvement.

John
=:->

PS> Yes, this was traced the *second* time I ran 'bzr st', not the
first, so the hash-cache should be up-to-date.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051202/1ae88981/attachment.pgp 


More information about the bazaar mailing list