path tokens

Fri Mar 16 01:27:06 GMT 2007

On Thu, 2007-03-15 at 20:02 +1100, Martin Pool wrote:
> On 3/15/07, Robert Collins <robertc at robertcollins.net> wrote:

> > path tokens should not:
> >  * increase storage size proportional to history or tree size. Note that
> > this isn't the same as saying 'they should have fixed size'.
> 
> all things i'd like to see improved
> 
> .. or at least, these are all things we'd like to improve in how we
> deal with historical trees, and it would be awfully nice if one change
> fixed them all...

Indeed :).

> *file copies*
> 
> I note your explanation but I think we can say more about what
> "support file copies" really means, otherwise we can claim we do it
> already :-)

We can: but I want a different thread for this. This is one of those
thorny things we've not tackled, and I want to be sure we dont get mired
in the detail of the end game - implementation - before consensus
earlier in the piece.

> > No reference to historical data: Accessing lots of historical data
> is
> > expensive - it means performance degrades as history accumulates.
> > Additionally, in order to support history horizons, which is a
> proposal
> > that we allow people to set a strict limit on what historical data
> is
> > available to bzr, we need to be able to identify 'these are the
> same'
> > across trees without necessarily having acccess to a common
> ancestor.
> 
> I agree we should set goals that this works with history horizons, and
> performs well with long histories.  I'm not in favour of setting the
> goal that you should know everything about copies just given two trees
> and no history as that would seem to mean trees grow proportional to
> all the copies.

Thats not necessarily a bad thing. I'd like us to have some handle on
the cost of such a tree-growth algorithm before ruling it out as an
approach. I.e. 'on average, copies are done to 1 file in 50, and on
average once every 100 commits; so a tree with 4000 files might have 80
copies; and one with 10K commits 100 copies', coupled with 'a copy adds
120 bytes of data to the tree state, giving a 4000 file tree with 80
copies a 10Kb overhead to support the file copy metadata'.

Its clear that having history-free comparisons is a massive win; our
tree-delta logic would be significantly more complex and costly if it
had to access the repository every time; the only reason it doesn't is
because of file-ids allowing complete comparisons given only two trees.

Its not clear that having modest growth per copy is enough of a burden
to outweigh the win of history-free comparisons.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070316/4fe805f5/attachment.pgp