=============== Bundle format 4 =============== :Date: 2007-06-21 Motivation ---------- Format 4 is designed to be a compact format that can be generated quickly and installed into a repository efficiently. It is not intended to be human-readable; that responsibility has been given to merge directives. Format Name ----------- This is the fourth format to see public use. Previous versions were 0.7, 0.8, and 0.9. Only 0.7's version number was aligned with a Bazaar release. Dependencies ------------ - Container format 1 - Multiparent diffs - Bencode Description ----------- This format was designed to trade human-readability for speed and compactness. It does not contain a human-readable "prelude" patch. Relation to merge directives ---------------------------- A merge directive specifies a merge command to apply and a preview of what that command would do. Merge directives may contain a format-4 bundle. The bundle's job is to provide the data needed to perform that merge command. It is recommended that the bundle be provided in a bzip-compressed, mime64-encoded format, to ensure compactness and resistance to email-transport damage. A preview/overview patch may be provided by the merge directive. Serialization ------------- Format 4 records revision and inventory records in their repository serialization format. This minimizes translation and compression costs in the common case, where the sender and receiver use the same serialization format for their repository. Steps have been taken to ensure a faithful conversion when serialization formats are mismatched. Record naming ------------- All records have a single name. Records are named according to their content-kind, revision-id, and file-id. Content-kind may be one of: :file: a version of a user file :inventory: the tree inventory :revision: the revision metadata for a revision :signature: the revision signature for a revision :testament: a testament for a revision Names are constructed like so: "content-kind:revision-id/file-id". A record has a file-id if-and-only-if it is a file record. Record metainfo --------------- The bundle format subdivides a pack record body into a bundle header and body. The header contains a Bencoded dict of values. It is separated from the body by a newline. :record_kind: The storage strategy of the record. May be "fulltext" (the record body contains the full text of the value), "mpdiff" (the record body contains a multi-parent diff of the value), or "header" (the record body is empty). :parents: Used in fulltext and mpdiff records. The revisions that should be noted as parents of this revision in the repository. For mpdiffs, this is also the list of build-parents. :sha1: Used in mpdiff records. The sha-1 hash of the full-text value. Layout ------ The first record is an info/header record. The subsequent records are mpdiff file records. The are ordered first by file id, then in topological order by revision-id. The next records are mpdiff inventory records. They are topologically sorted. The next records are revision and signature fulltexts. They are interleaved and topologically sorted. Implementation notes -------------------- - knit deltas contain almost enough information to extract the original SequenceMatcher.get_matching_blocks() call used to produce them. Combining that information with the relevant fulltexts allows us to avoid performing sequence matching on any fulltexts for which we have deltas. - MultiParent deltas contain get_matching_blocks output almost verbatim, but if there is more than one parent, the information about the leftmost parent may be incomplete. However, for single-parent multiparent diffs, we can extract the SequenceMatcher.get_matching_blocks output, and therefore the SequenceMatcher.get_opcodes output used to create knit deltas. Installing data across serialization mismatches ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In practice, there cannot be revision serialization mismatches, because the serialization of revisions has been consistent in serializations 5-7 If there is a mismatch in inventory serialization formats, the receiver can 1. extract the inventory objects for the parents 2. serialize them using the bundle serialize 3. apply the mpdiff 4. calculate the fulltext sha1 5. compare the calculated sha1 to the expected sha1 6. deserialize using the bundle serializer 7. serialize using the repository serializer 8. add to the repository This is much slower, of course. But since the since the fulltext is verified at step 5, it should be just as safe as any other conversion.