Google Summer of Code: Encrypted branch/repository format status

John Arbash Meinel john at arbash-meinel.com
Mon Jul 16 21:29:42 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bogdano Arendartchuk wrote:
> Hello,
> 
> I'm working on the encrypted repository and branch format for Bazaar.
> 
> Currently I'm coding a repository format that is intended to write in the
> disk all the data slightly scrambled. This is a protype and nothing is
> encrypted at all, the objective is to know better the Bazaar code/design
> and also plan what can be reused and what should be reimplemented in order
> to fit the application needs.
> 
> What's still missing for finish this prototype format:
> 
> - one scrambled branch format, apparently easy to achieve;
> - extend BzrDir to have more configuration files (as we will need more
>   configuration files containing encryption parameters [more below])
> - extend VersionedFileStore, actually the TransportStore part, to generate
>   scrambled file names, as we don't want to leak revision-ids
> 
> And one open question at the moment is how much should I rely on _KnitIndex
> and _KnitData methods (I'm extending these classes). I'm afraid because of
> these beautiful underscores at the beginning of their names the their
> methods I'm extending. 
> 
> For example, the recent change in knit.py to use pyrex-generated code for
> knit indexes broke my code because I was extending _KnitIndex._load_data,
> and this method now is at module level. I patched knit.py (and the tests)
> to still have one _load_data method and then allow me to hook in
> _KnitIndex. The patch is attached, I can resubmit it if seems reasonable.
> 

Well, as they are private, we won't guarantee that they stay api stable.
However, they aren't likely to change much. Before this change, I don't think
they had changed for several releases.


> So the real question is "should I fork/branch/etc KnitRepository in order
> to not depend on implementation details of upstream knit and the speed of
> how Bazaar changes?".

If you want to be strict about it, then the only things we try extra hard to
guarantee are non-private symbols. I think you can stay on the "it is private,
but doesn't change often".

Certainly I don't think you want to track for bugfixes to the KnitRepository
object.

> 
> Regarding the encryption itself, the only detail defined at the moment is
> that there will be one file (in the bzrdir level) containing the symmetric
> key(s) used to decrypt all the real branch and repository data. This file
> would be encrypted/decrypted using the user's GnuPG keypair. I really need
> to discuss in deep this topic while finishing the prototype, as I think the
> great experience of the folks in the list will be really helpful :-)


The only thing I would probably change with your patch is to call it
"_load_index_data" rather than "_index_load_data".

And we really should add the same tests I did for DirState. Such that if we can
import _knit_load_data_c, then knit._load_data should be the right function.
(Modulo any naming changes).

It also brings up the thought of what we should name the extension module,
since the name is now changing. We could do "_knit_helpers" to be closer to
_dirstate_helpers (which is also better if we add more extension functions).

Thoughts?

John
=:->

PS> I think I'm +1 on bringing back _KnitIndex._load_data if it makes it easier
for you.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGm9U2JdeBCYSNAAMRAr+LAKC1/Pr/zI7ZRjm5nZQEtii9wXOSegCgkK/R
Qn+mv+FI3mHHIVR2pEVE1kA=
=ZyTK
-----END PGP SIGNATURE-----



More information about the bazaar mailing list