gnome "storage" project, smart windows

Tue Dec 13 12:04:32 UTC 2005

wow.

this is pretty rad stuff. 

I agree with you about mono, which I would rther not have to deal
with.  ANd I also agree it'd be great to have thiss stuff built into
the os -- I hear reiser4 was certain similar capabilities maybe?
would be cool if it did.

anyway gotta hop on a plane but thx for this & keep me informed as you
movethrough tis project!  very cool indeed.

matt

On Tue, Dec 13, 2005 at 06:08:32AM -0500, 'Forum Post wrote:
> 
> > "Searching" is the problem. It's a stupid idea for a machine which you
> > control. I cannot tell the folks at wired where to put their pages,
> but
> > I most certainly can do this on my own machine.
> >
> >but you never screw up? or get confused about which project you put
> >stuff in?
> 
> All the time. I have a few well named folders like "music" and "dvds"
> and "tv" (all in /hollywood, of course) but I also have thousands of
> image and music files that don't really fit anywhere in particular. I
> also have about a dozen desktop backups and every one of these has
> within it many duplicate files but also many unsorted files that were
> in some "download" queue or tempdir at the time I backed them up.
> Storing and sorting these will be the next step of the acid test.
> 
> The point I was making is that I should not have to concern myself with
> where the file goes - so long as I describe it relatively adequately or
> am able to describe it in a "search" why should I have to worry about
> where it goes on my drive? Having to mess with sorting stuff into
> "folders" and constantly worry ablut what's updated, where this goes
> now that I've added ten thousand whatevers and the old method doesn't
> work anymore... and then still ahve to rely on a "search engine" to
> interact with that stuff is just nuts - it's twice the work.
> 
> >hmm, well that sounds very interesting and I'd lvoe to try it out.
> >how about e.g. full text earching of structured documents like
> >open-document files? I like that feature in beaggle...
> 
> Of course, that's a fundamental need. You could not locate a doc by
> description if you were not cataloging this stuff. 
> 
> But having to screw with mono on a system when the basis of a
> completely adequate system is part of the existing operating system
> seems to me a great waste of resources on many levels. Rather than
> create a user-centric "search engine" with primitve security and flaky
> behavior, why not instead just build on what's already there and
> stable? No, slocate doesn't index stuff by content - but the greater
> point is this: even slocate works backwards.
> 
> If I save a file I cannot do so by magic. I cannot wish the file to
> exist on the hard drive, nor can I retrieve it without the aid of the
> operating system. Every time a file is stored on the disc it goes
> through linux - nothing goes in or out without linux knowing about it.
> 
> So why does linux then have to go back and "search" for all this stuff?
> Why isn't linux instead catalogin each file into a *quickly searchable*
> database every time it  stores that file? And why do I have to know
> where that file goes? 
> 
> The system has been built as it is and changing this from the ground up
> is impractical. But adapting the system to perform this maintenance is
> not at all impractical. You can even do this with *system* files like
> those in the /etc and /var directories. Because linux has a perfectly
> usable system that allows symbolic linking of resources, "active"
> system files can be stored in a structure that does not fit the
> /usr/var/etc paths but is still acessible in this manner. 
> 
> So, for example, when I edit my /etc/fstab file why does it get
> overwritten? Why doesn't linux just remember the old one and swap in
> the new one? It knows I have edited the file and changed its contents -
> why does this then have to be retroactively indexed?
> 
> I cannot rewrite the kernel - my talents simply are not up to the task.
> But there are ways to model this behavior and that's what I'm working
> with. A system like beagle can work just fine, but beagle's main
> weakness is that it has terribel security and it seeks to overcome this
> weakness by being built inside some "sandbox." It's an illusion of
> safety that really just adds so much complexity it becomes brittle.
> 
> As an example, here's the routine that hashes and stores the file and
> then catalogs basic information about the file. This is a very simple
> example that doesn't incorporate worry about magic numbers and plugin
> miners - it just takes a given bit of metadata, a file (or collection
> of files) and stores them away. It's nothing but a bash script - well
> developed, robust technology in a completely unoptimized, simple,
> readable and maintainable script. 
> 
> if [[ -f "$_file" ]] && [[ -e "$_file" ]];then
> _hash=`md5sum "$_file"`;
> _hash="${_hash:0:32}"
> _FIL=${_file##*/};
> fldr="${storagepath}${_hash:0:2}/${_hash:2:2}/${_hash:4:28}";
> # echo "Storing $_file at hash $fldr" >> ~/wtf.log
> 
> if [[ -e "$fldr" ]];then
> echo "$_file $fldr">>${storagepath}dupes.list;
> #   echo -e "folder exists\n" >> ~/wtf.log
> cmd="insert into _alias values('${_hash}','${_FIL}')";
> sqlite ${storagepath}meta.db "$cmd";  
> 
> else mkdir "$fldr";
> cp "$_file" "$fldr";
> _SIZ=`stat -c "%s" "$_file"`;
> _DATE=`stat -c "%Y" "$_file"`;
> cmd="insert into _files
> values('${_hash}','${_FIL}','${_FIL##*.}','${_SIZ}','${_DATE}',1,'archive','${_DATE}')";
> sqlite ${storagepath}meta.db "$cmd";
> #   echo -e "files info stored: $cmd \n" >> ~/wtf.log
> cmd="insert into _meta
> values('${_hash},','<keywords>${keywords}</keywords><originalpath>${_file}</originalpath>')";
> sqlite ${storagepath}meta.db "$cmd";  
> #   echo -e "meta info stored: $cmd \n" >> ~/wtf.log
> #   rm -f "$_file";
> chmod 444 "${fldr}/${_FIL}"
> fi;
> fi;
> 
> In a structure of about 50,000 files the metadata folder is, at
> present, less than 30MB. This can be backed up separately and could
> also be exchanged with others. Since info about the files is stored
> along with their unique hash (and of course md5 can be replaced with
> any other) the system can quickly and easily decide if a given file is
> present, for example in file sharing applications - just look up the
> hash and then see if the files located there match. Because all this
> isn't "built into the filesystem" it can be used on any filesystem and
> can be maintained with existing, mature and human-friendly tools.
> 
> Simple example: when downloading files from usenet I no longer even
> look at a dialog box; the system itself monitors my incoming usenet
> folder and, when it sees a new file appear there it locates the cached
> text, copies the fields I have specified (posted by, group, date,
> subject line, x-ref number), then hashes and stores the file and its
> metadata. Putting together a playlist involves describing the music or
> files I want without having to be concerned about the location of the
> data. 
> 
> On the "hunter-gatherer" backend, if a "new and improved" version of a
> file is posted the system can instantly tell simply by comparing the
> posted filesizes to the files it already has. Because the metadata is
> more comprehensive than just filenames (as in slocate) it can be
> smarter about telling the difference between, say, Al Cooper and Alice
> Cooper. But because it never -replaces- a file but only adds more,
> mistakes are easily corrected. This would allow building "smart agents"
> that can pool the array of resources available to a desktop machine (web
> search, p2p, torrent, usenet, irc etc) in "tivo like" fashion. The more
> data it collects the more it knows about your tastes and the better it
> is able to find other relevant data for the owner. And because it uses
> existing security models this can all be built to whatever level of
> paranoia the eu happens to feel prudent.
> 
> 

-------------------------------------------
Matt Price	    matt.price at utoronto.ca
History Department, University of Toronto
(416) 978-2094
--------------------------------------------