[msysGit] Re: [Tortoisehg-discuss] Bazzar stratgy regarding shell extension

Mon Apr 21 21:27:21 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stefan Küng wrote:
> John Arbash Meinel wrote:
> 
>> You missed my point entirely.
> 
> I don't think so :)
> 
>> What you gain is a *shared* cache. So that you can allow up to XX
> 
> You mean a shared process. The cached information *can not* be shared
> because it's not the same information!
> 

...

> Ever heard of the term "operating system"? One of the jobs such an
> operating system has is to manage the memory between processes.
> I'm sorry, but sharing a cache for what you describe is *not* what we
> should do. It's the job of the OS to manage the memory, not the job of a
> shared cache process.
> Or have you ever seen something like this? Ever seen graphic editor
> share the memory?
> 
> Stefan
> 

The whole point of this system is that you are explicitly caching *in
memory* what could be determined from the filesystem. You are also (if
you are going to be nice to the OS) probably going to limit the amount
of information you cache in memory. So that if you browse through 50,000
directories, it may chose to only cache the last 10,000 directories
worth of information.

Likely some amount of limits is fine, since most people have a limited
amount of "active workspace" at any given moment.

However, the "active workspace" could very easily jump between different
version control systems. For example, I could work on stuff that is
stored in SVN in the morning, and then switch over to stuff that is
versioned in BZR in the afternoon.

With multiple processes, the SVN cache won't know that I've switched
over to using BZR, and thus will retain its cache, even though *I* no
longer need it, because I've moved on to another location.

At the end of the day, I would end up with 2 processes, each with full
caches. Instead of a single full cache that had whatever I had been
working on last.

There certainly are other ways around it.

1) Let the OS decide when you need to hit swap, and page out the cache
to disk. I would argue that if the TSVN caching process needs to be
swapped to disk, it should instead just discard the data. After all, it
is just going to go back to disk to recompute the information, which
probably isn't a whole lot slower than paging in its cache files.

2) Have a timeout on all cached information. This, however, would
require having your process periodically scan through its cache and
decide what to prune. Probably not hard to write, but does require
scanning through the cache from time to time. Which may be intensive (or
not, all depends on implementation).

On the other hand, a simple LRU cache can prune out the nodes which have
not been accessed in a while when new information is requested. This
also has the advantage that if I leave my machine on overnight, the last
accessed stuff from the night before is still in the cache, even though
it might have exceeded the time threshold. (You would have to set a
relatively short threshold if you wanted to be friendly to other caching
processes.)

Sure, you can slice it 10 different ways. I still think having a cache
be aware of multiple clients is more efficient overall. As it has more
*domain* knowledge than the OS does.

It may all be moot anyway. If the process only stores say 10MB of
information when discussing 100,000 files, then you are unlikely to get
into a situation where memory pressure is an issue. I doubt that is the
case, otherwise you wouldn't have the flag to disable the cache entirely.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIDPipJdeBCYSNAAMRArqaAJsEXLNwj/+69thfRL3TNrO6GO52CACfZoYu
/18itpkJv6bY4l8SqiPAoB0=
=min4
-----END PGP SIGNATURE-----