Introduction and an Idea

Mon Nov 10 10:37:10 UTC 2008

Dylan McCall ha scritto:
> 
>       * We should present revisions of files. With backups happening
>         passively, a super intelligent system could do them to multiple
>         media.

If we do a conventional backup, we can avoid user intervention in 
configuration by automatically creating a temporary folder for the user, 
where also firefox downloads and all sorts of caches (e.g. the ones in 
gimp, firefox and evolution configuration files) go.

If we want to innovate, that requires careful design! Backing up to 
multiple media might be implementing by indexing files by their md5 and 
uuid. The history of a file is then a list of md5 sums for a single 
uuid. A catalog in the system associates md5 sums to URL, and contains 
the uuid<->list of md5 association. However we have at least three 
problems: 1) metadata should go with data 2) how do diffs fit in this 
pictore 3) how to avoid anybody to be able to see the catalog of an 
encrypted file.

My idea is that volumes (e.g. cds, encrypted drives and so on) can be 
already "mounted" in linux. Now, the mount operation should also load 
the catalog FROM the device, so that it has the same level of protection 
that the device itself. This is similar to what happens in rhythmbox 
when you connect a  MTP source and the catalog is loaded in a few 
seconds. Then, _certain_ volumes could be left in a "semi-mounted" state 
where metadata is stored in the centralised system catalog for 
searching. I don't know if I've been clear but this machinery seems to 
me enough to implement backups to any kind of media, full text indexing 
and so on in a secure way. That is, you can implement a distributed 
"timevault-alike" on top of it easily. If it's not yet clear how, I can 
detail more :) If somebody likes it I declare myself available to start 
a design and implementation effort. But first we have to understand how 
diffs fit in the picture - I would regard a diff as a triple (md5 of 
original file,md5 of the diff, md5 of the obtained file).

For the implementation, I even tried to write a small prototype, but 
current databases engines lacks some bit. For example, sqlite with added 
support for cross-database queries would work (you can currently load 
multiple databases - that would be the catalogs of mounted and 
semi-mounted media, but you can't make a query to search all of your 
catalogs). Also, a row-level permission machinery should be implemented 
which complicates things. But that can be a second step.

Vincenzo