Another look at bzr network traffic

Maritza Mendez martitzam at gmail.com
Sun Apr 4 04:50:37 BST 2010


Dear Bazaar Developers,

I've been mostly absent from the mailing list for many months.  So I'll make
up for it with a ridiculously long post.  :)

This post is not intended to offend anyone.  I'm not working for "those
other guys."  I understand this post may stir some passions.  If that
happens, I will remain as engaged as my obligations permit.  Please
understand it may take a day or so for me to respond.

I've been busy getting work done.  I'm happy to say, a big chunk of that
work has been done with bzr.  We've gained a modest but I think fair
perspective on how bzr fits our commercial workflow.  The news is good.  And
before I say anything else I offer my sincere thanks to all of you for a
very useful vcs with many advantages over other vcs systems.

Now I must touch a topic has been already beaten to death technically,
perhaps from a more human vista.  And that is the the amount of network
traffic compared to the size of the total change-set.  Previous discussions
have typically begun with an observation that the network traffic is much
larger than the user change-set.  That is a false comparison because, as has
also been noted, one must consider the changes both to the user data and to
the branch history stored in .bzr.  Many technical insights were provided
here, which I comprehend in general even though I lack the precision of a
bzr core developer.

Before we get into any specifics, I want to say that I believe this is more
of a perceived problem than a real problem.  When you look at all the
time-saving advantages of bzr workflow, complaining about network traffic
seems like a cheap shot.  So, how do we debunk it on an emotional level as
well as technical grounds?  We get data.

Now put yourself in my position.  I have been running pilot projects in the
real world for over a year to evaluate bzr and to share my enthusiasm for
bzr with the people who work for me.  They've posed some tough questions to
me, and with your help on this listserv I think I did pretty pretty well as
an  evangelista.  But at the end of the day I'm a businesswoman, and I can't
put millions of dollars of revenue and over 2500 KLOC spread over 15K files
at risk on faith.  So we did pilot projects.  Again, the results were
great.  But besides the safety of our code and flexible workflow, morale
also matters. (If morale did not matter, I doubt bzr would exist.)  And the
*one* thing which bugs us (because we love almost everything else about bzr)
is the network traffic and the corresponding time my developers spend
waiting for version control instead of designing, coding, and testing.
Or... avoiding good version control practices because of the time taken by
network traffic.  There's that pesky emotional response having a measurable
impact on productivity.

I am not permitted to share any more specifics than I already have about the
size and content of our code base.  Trust me, it's nothing unusual.  We can
have a good discussion with an example we can all look at together openly:
bzr on launchpad, using my ssh keys.  Ok, sorry, you'll need your own keys.
:)  I hope that either I will be convinced that the network traffic is
worthwhile or the bzr community will be motivated to find efficiencies or
both.

I am attaching a table in CSV format, based on the output of the following
observations made with bzr.dev itself.  Our own experience on projects
significantly larger than bzr is similar on a percentage basis.  So while we
have not proved that this scales linearly, our experience does not
contradict that assumption.

$ bzr -Dbytes branch lp:bzr -r 5057
Branched 5057
revision(s).
Transferred: 70790KiB (308.7K/s r:70784K w:6K)
$ cd bzr
$ du . -s
73004    .
$ du . -s --exclude .bzr
23172    .
$ du .bzr -s
49832
$ bzr -Dbytes pull
 ...
All changes applied
successfully.
Now on revision 5131.
Transferred: 9478KiB (442.0K/s r:9464K w:15K)
$ du . -s
74776    .
$ du . -s --exclude .bzr
23288    .
$ du .bzr -s
51488

Someone will point out that 74 revisions is a long time to go without a
pull.  But I wanted to capture enough traffic (approaching 10MB) to avoid
the statistics of small numbers.

Again let me say that I think it is useless to compare the network traffic
to the change in the size of the user's content.  That would be unfair and
miss the point.  History matters, otherwise we would not be using a vcs at
all.  So instead let us try to develop some better metrics for setting
goals.  The user content increased by 0.5% and this triggered a 3.3%
increase in history but only a 2.4% increase overall.  So on a percentage of
total data basis, one unit of user change drove almost five units of total
change.  That's an interesting metric.  It might suggest a practical bound
on the capability of the current format, but that's not the whole story.

Let's look at the traffic.  Some people look at the traffic as a percentage
of the change.  And there is some merit to that, provided we look at the
*total* change and not just the user change.  After all, history has value!
Now, a ratio of 534% does not sound nearly as bad as 8159%.  (See the CSV
attached.) But 534% does not sound so hot either.  At some point, a smart
developer asks how much more expensive would it be to just get a whole new
branch?  We know the answer, of course.  We see that the traffic was only
13% of the final total size.  So we declare victory right?  Not so fast,
please.

Developers are brilliant organic optimization engines.  If you poke them,
they will adjust to get poked less.  Now what is the #1 historical complaint
managers and developers have made about version control in general?
"Merging is scary.  Avoid it.  Let someone else do it."  Tools like bzr
strive to remove this excuse by making merges painless.  Ay!  But painless
merges presuppose minimal conflicts which in turn imply that you have been
keeping your branch up to date with the branch with which you need to
merge.  For speedy developers like mine, with features broken down into
well-specified tasks, every developer merges to an integration branch twice
per day on average.  That means there is also a lot of 'bzr pull'
happening.  And so there is a lot of network traffic.  So here we are.  A
developer says... if we use our bzr example... instead of doing 'bzr pull'
every time I notice there has been a merge, I will 'bzr pull' only after
every int( 75 MB (100%) / 9.5 MB (13%) ) = 7 merges by my coworkers.  The
total traffic will be about the same, but I can take the hit all at once
while I am at lunch instead of "wasting" my productive time on version
control.  The unfortunate consequence is that developers put off doing
merges and the pace of progress slows to a crawl.  Predictably, merges get
scary again.  It seems we are back where we started!

(Ironically, central vcs has less of this problem than dvcs (not just bzr)
because the server already knows almost everything it needs to know to send
the client just what she needs.)

My solution is to reward activities which add value (like peer review) and
thereby indirectly force developers to stay up to date.  Such are the
deceits of management.  And all of this is just a long way of saying we love
bzr enough to be honest about what it does well as well as opportunities for
improvement.  If the wizards among you (not naming names here, but the bzr
community is blessed) have any ideas for making 'bzr pull' more economical,
I think that should be a major focus for Bazaar 3.0.  Bazaar already rules
in just about every other category that matters.  The first killer app of
dvcs circa 2005 was merge tracking.  The second killer app circa 2007 was
renaming, as per Shuttleworth.  The third killer app might be driving the
ratio of traffic to total change down from 5 to 2.


With Sincere Thanks and Best Regards,
Maritza
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ubuntu.com/archives/bazaar/attachments/20100403/21658506/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Bazaar_Traffic.csv
Type: text/csv
Size: 351 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20100403/21658506/attachment-0001.csv 


More information about the bazaar mailing list