Related question for you guys... Given that memory consumption associated with and/or triggered by a given file has (at least) three drivers -- actual size of the file; number of revisions of the file; and dynamic range of sizes of deltas-over-time committed for the file -- is there any advantage to splitting large versioned binaries into their own repo? Suppose we split our BigDaddyRepo into two repos: BigBinariesRepo (still pretty big) and MostlyTextRepo (perhaps only 20% as big). Obviously, the small repo gets a huge advantage in size and speed. But suppose that BigBinariesRepo really isn't static and has commits just as frequently as MostlyTextRepo. (Imagine a product which is database-driven with frequent versioned updates to the databases.) So there is just as much activity as before. Do you expect the aggregate of the two repos to perform better, worse, or the same as one BigDaddyRepo? In other words, is there any advantage to combined storage, and to what extent is that benefit negated by mixing binary and text data in a single repo? Or are the two sets of data+history essentially non-interacting. I guess what I'm really asking for is a primer on the big-O({N},{n}) scaling of bzr, where {N} is the ordered set of integers representing the number of revisions of types of data (binary and text) and {n} is the ordered set of average sizes of files of the corresponding type. I realize the question will have different answers for different operations. The most interesting to me are branch, commit and stat. I don't expect anyone to produce a treatise on this, but general advice on whether and when it makes sense to break up repos based on size and content types would be super useful. Because no matter how fast you guys make bzr, there will always be a user pushing the limits. Knowing how to keep ourselves out of trouble could be just as valuable as knowing how to get out of trouble. Thanks ~M <div class="gmail_quote">On Fri, Aug 6, 2010 at 1:31 AM, John Szakmeister <<a href="mailto:john@szakmeister.net">john@szakmeister.net</a>> wrote: <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> On Thu, Aug 5, 2010 at 10:07 PM, John Arbash Meinel <div class="im"><<a href="mailto:john@arbash-meinel.com">john@arbash-meinel.com</a>> wrote: [snip] </div><div class="im">> Well, if you are interested in helping out, you can do the checkout with > extensions, and get a memory dump. > You'll need 'Meliae' which is my memory debugging python library. 'bzr > branch lp:meliae'. </div>I'd love to, but I can't give you a memory dump. :-( But... [snip] <div class="im">> Now, it is possible that the memory consumed is actually because of > individual file content, and not what I'm doing here (which is more > about lots of inventory data, aka lots of small files). </div>I think this is my problem anyways. They checked in some rather large files at one point, and it appears to be consuming several times the file size during checkout. It seems to also be related to how many revisions of the file was made... but it's been a while since looking at this, and once we found another way to do what we need, we moved on, so I didn't spend much more time looking at it. :-( -John </blockquote></div>