[MERGE] journalled inventory serialiser

Robert Collins robertc at robertcollins.net
Mon Jan 14 22:55:51 GMT 2008


On Mon, 2008-01-14 at 15:46 -0500, John Arbash Meinel wrote:
> John Arbash Meinel has voted comment.
> Status is now: Semi-approved
> Comment:
> format. (Robert Collins)
> -------------- This line and the fmllowing will be ignored 
> --------------
> 
> modified:
> 
> ^-- looks like you messed up the "the following" which caused the delta 
> to be added as the comment....
> 
> It confused me for a second when I saw a patch to review which didn't 
> quite match your comments. :)

Ah, oops.

> 
> I thought I would start with a review of the journaled inventory 
> document, since that should be a higher overview, before we get into the 
> implementation details. As this is development/work in progress, I'm 
> probably okay with bringing it in while it is still rough. Though I'm 
> curious about how we convey "*this* development format is ready for you 
> to play with, *this* evolution is not" other than by its presence in 
> bzr.dev.
> I know it says "--development" but you are committing to at least a 
> converter to the format and one to the next format.

Well, I won't send a merge request to hook it in as Development on disk
until I'm confident we won't lose data with it; at that point it is
ready for playing with :).

> 
> === modified file 'doc/developers/inventory.txt'
> --- doc/developers/inventory.txt        2008-01-03 01:40:16 +0000
> +++ doc/developers/inventory.txt        2008-01-07 22:15:40 +0000
> ...
> 
> +In a journalled inventory we represent hte inventory as a tree of
> ^- hte => the

Thanks.

> ...
> 
> +We will serialise the lines text for a new inventory as:
> +'format: bzr journalled inventory v1' NL
> +'parent:' SP BASIS_INVENTORY NL
> +'version:' SP NULL_OR_REVISION NL
> +DELTA_LINES
> 
> ^- It might be good to have a line counter here, rather than just 
> assuming
> that the file is perfectly well formed. I know you have the validator
> (which I assume is stored separately), it can just be nice to have a bit
> more to go on.
> 
> I can understand if you would rather not because of buffering issues.
> Though we should certainly validate before processing on the way in, 
> which
> requires either reading 2 times, or buffering.

No line counter to avoid buffering. There is an updated patch I think,
perhaps I haven't sent it. Anyhow, in the updated version validation is
hooked in and the validator is checked when you have assembled the full
log chain. To do individual journal validation you grab the validator
from the revision (which is for the full-to-date chain) and then you can
validate the next up inventory. And you don't need to read twice or
buffer :).


> Are we going to assume you can fit the full contents of a journal in
> memory? It seems reasonable, but then there are people who tried
> versioning all files of all programs in the debian source repository.

Yes; I'm not aiming for 'partial reads' in this format iteration, just
'partial writes'.

> 
> +DELTA_LINES ::= (DELTA_LINE NL)*
> +DELTA_LINE ::= NEWPATH NULL file-id NULL PARENT_ID NULL LAST_MODIFIED 
> NULL CONTENT
> +SP ::= ' '
> +NULL ::= \x00
> +NEWPATH ::= NONE | PATH
> +NONE ::= 'None'
> +PATH ::= path
> +PARENT_ID ::= FILE_ID | ''
> 
> 
> ^- Do we really want a literal "None" on disk? It seems like that would
> prevent you from versioning a file named "None".

The file "None" is serialised as "/None"

> Why not just have it be an empty string?  I suppose the tree root has 
> path
> ''. Either way, I think we need a clear "not a path", not a simple 
> string
> like 'None'.

The tree root has the path "/". We can change the constant but I don't
really get the point; particularly as I have to update a raft of hashes
to do so in the tests (because it validates ... ).

> 
> Also, in 'DELTA_LINE ::= ...' you use 'file-id', but you have: PARENT_ID
> ::= FILE_ID | ''
> 
> You should either capitalize both, or leave them both lower case.

Thanks.

> ...
> 
> +There are two ways that we could create new journalled inventories. We
> +could take the in memory inventory that we create during commit, and
> +that for the basis, then make a new delta and serialise it to create 
> the
> 
> ^- I don't understand "during commit, and that for the". Do you mean 
> "add
> that" or "use that"?

'use that'

> +revision. Additionally doing this we could serialise the inventory in
> +its regular form to get a legacy validator that has no reference to
> 
> ^- Additionally, while doing this we could ...
> 
> +By using the same hueristic as knits do, we need to read the same 
> amount
> +of data to construct a given inventory. We will need to parse the 
> entire
> +data set of data rather than just the result of applying the text data.
> 
> ^- "We will need to parse the entire set of data" not "data set of 
> data".
> And I think you mean "than applying the text delta".

yes.

> ...
> 
> +Iter-inventories
> +^^^^^^^^^^^^^^^^
> +We can apply deltas forward or backwards to move along a chain of
> +inventories trivially, rather than the current requirement of
> +deserialising them all individually.
> 
> ^- You should have a blank space here. Also, as you have defined them,
> these do not seem to be reversible deltas.

They are not.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080115/eb4b5f34/attachment.pgp 


More information about the bazaar mailing list