Introduction and an Idea
Vincenzo Ciancia
ciancia at di.unipi.it
Mon Nov 10 10:37:10 UTC 2008
Dylan McCall ha scritto:
>
> * We should present revisions of files. With backups happening
> passively, a super intelligent system could do them to multiple
> media.
If we do a conventional backup, we can avoid user intervention in
configuration by automatically creating a temporary folder for the user,
where also firefox downloads and all sorts of caches (e.g. the ones in
gimp, firefox and evolution configuration files) go.
If we want to innovate, that requires careful design! Backing up to
multiple media might be implementing by indexing files by their md5 and
uuid. The history of a file is then a list of md5 sums for a single
uuid. A catalog in the system associates md5 sums to URL, and contains
the uuid<->list of md5 association. However we have at least three
problems: 1) metadata should go with data 2) how do diffs fit in this
pictore 3) how to avoid anybody to be able to see the catalog of an
encrypted file.
My idea is that volumes (e.g. cds, encrypted drives and so on) can be
already "mounted" in linux. Now, the mount operation should also load
the catalog FROM the device, so that it has the same level of protection
that the device itself. This is similar to what happens in rhythmbox
when you connect a MTP source and the catalog is loaded in a few
seconds. Then, _certain_ volumes could be left in a "semi-mounted" state
where metadata is stored in the centralised system catalog for
searching. I don't know if I've been clear but this machinery seems to
me enough to implement backups to any kind of media, full text indexing
and so on in a secure way. That is, you can implement a distributed
"timevault-alike" on top of it easily. If it's not yet clear how, I can
detail more :) If somebody likes it I declare myself available to start
a design and implementation effort. But first we have to understand how
diffs fit in the picture - I would regard a diff as a triple (md5 of
original file,md5 of the diff, md5 of the obtained file).
For the implementation, I even tried to write a small prototype, but
current databases engines lacks some bit. For example, sqlite with added
support for cross-database queries would work (you can currently load
multiple databases - that would be the catalogs of mounted and
semi-mounted media, but you can't make a query to search all of your
catalogs). Also, a row-level permission machinery should be implemented
which complicates things. But that can be a second step.
Vincenzo
More information about the Ubuntu-devel-discuss
mailing list