Excess data size for a single revision

John Arbash Meinel john at arbash-meinel.com
Mon Jan 23 10:18:22 UTC 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/23/2012 10:44 AM, Eli Zaretskii wrote:
>> From: Martin Pool <mbp at sourcefrog.net> Date: Mon, 23 Jan 2012
>> 18:12:18 +1100 Cc: bazaar at lists.canonical.com
>> 
>> On 23 January 2012 17:41, Eli Zaretskii <eliz at gnu.org> wrote:
>>> But why would a single merge result in so much meta-data?
>> 
>> I think the shortest path to a nontrivial answer to that is to
>> run the command with -Dhpss and have a look at what the traffic
>> actually comprises.
> 
> OK, but how can I run that command, when the offending revision is 
> already in my repository?  What am I missing?
> 

Some other bits that you could do:

1) Look at .bzr/repository/packs/*
There should be a fairly recent file that contains that pull. See what
size it is. If this is comparable to the 58MB then we genuinely
downloaded content that we wanted to keep. If it is a lot smaller,
than it is possible that on the server side it is stored
inefficiently, and we copied it, and compressed it on the fly before
writing it locally.


2) Use the name of that file to then inspect the associated index
files (tix = text content, rix = revision, cix = inventory stuff,
iix/six are probably not very interesting). For example, my most
recent file in bzr is: c0ba9a41c20d1b447d3b603361b63bbf.pack

You can use

 head -n5 .bzr/repository/indicies/c0ba9a41c20d1b447d3b603361b63bbf.tix

To get the summary information, and you can this to get the detail:

 bzr dump-btree [--raw]
.bzr/repository/indicies/c0ba9a41c20d1b447d3b603361b63bbf.tix

Note that it will probably be a bit verbose, but if you look around in
it, you can see how many files are affected, etc. In my case, a 2.7MB
bzr pack file had 1712 entries in the .tix (1700 files were affected),
549 entries in .rix (549 total revisions), and 2,159 entries in .cix
(which has to do with inventory management.)

With the raw data, you can start working out what the size on disk
actually comprises of.

3) If you want to test the fetch again, you can create a new
repository, and branch your old revision into it (so it shouldn't copy
any new data) and then do the fetch again. So something like:

bzr branch -r 106888 . ../../somewhere-not-in-the-shared-repo --no-tree
cd ../../somewhere-not-in-the-shared-repo
bzr pull bzr+ssh://eliz@bzr.savannah.gnu.org/emacs/trunk -Dhpss


John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8dM+4ACgkQJdeBCYSNAAPLUQCgjiWlLOOdZt2HWtffre65nKF9
pdEAoNA41yZ886dYC0RU0b9ObAfFeR9r
=qcMf
-----END PGP SIGNATURE-----



More information about the bazaar mailing list