line identity and regression suites

John Arbash Meinel john at arbash-meinel.com
Wed Apr 26 18:21:28 BST 2006


John Yates wrote:
> In a conversation about improvement to our development processes
> I expressed a wish for an index that mapped locations within the
> codebase to applicable tests in the regression suite.  At present
> I am unaware of any system that provides such a mapping.
> 
> But that got me to wondering whether the notion of line identity
> over time in bzr might be made sufficiently concrete to make such
> a system plausible.  Here is what I imagine.
> 
> Periodically we run the entire regression suite collecting code
> coverage by individual test.  For the current tree under test today
> this exercise would give me a mapping from file and line to a set
> of applicable tests.
> 
> The problem is that if that mapping is recorded as file path and
> line number then it is very fragile.  It will gradually decay as
> simple edits are applied.  Code shuffling and/or tree rearrangement
> will lead to much more rapid decay.  By contrast, if bzr can expose
> a reasonably stable notion of line identity then the mapping could
> be based on inventory-id and line-id.  This mapping would be fairly
> robust in the face of intra-file edit and tree rearrangements.
> 
> Reactions?
> 
> /john
> 

It sounds interesting. It isn't stable with regard to moving code within
a file, or between files.

But if you are only adding new lines, deleting old lines, and moving
files around it should be fairly stable.

You could use it right now by using the line identity of 'file-id +
revision-id + line-number' when the line is first introduced.
I think that is actually what Codeville uses to manage lines, and is
close to what we use in knits.

file-id represents a unique file. file-id+revision-id represents a
unique snapshot of that file, and with line-number, we have a fully
uniquely defined line.

If you look closely at the knit format, the 'line-delta' storage
mechanism stores the deltas as:
start,stop,count
rev-id a
rev-id b
rev-id c
rev-id d

Where start is the starting line number in the old text, stop is the
ending line number (so if this is a pure insert the numbers are the same).

So we don't actually track lines by perfect identity.  We track them as
revision-id + content.
Codeville tracks them as revision-id + line num + version, because it
allows resurrecting old versions of a line, etc.

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060426/55bfc671/attachment.pgp 


More information about the bazaar mailing list