[RFC] tool to generate a repository for performance analysis

Goffredo Baroncelli kreijack at tiscalinet.it
Fri Jun 6 18:48:17 BST 2008


Hi all,

in order to improve the bazaar performance, I think that our 
analysis should be performed on a "know" repository layout.

In a lot of thread we cited the "Emacs" or "mozilla" repos; the 
problem is that these repos are very big and not easily repeatable nor 
trasportable.

Moreover sometime we are interessed to a history browsing (so the files size 
don't matter and should be 0 in order to compact the repository size). 
Sometime we are interested in the repository size efficacy so the files size 
or the files number matter !

My idea is to develop a tool which is able to create a repository on the 
basis of a set of parameters, like:

- history depth
- number of files 
- files size
- # of file added/deleted/removed per revision
- mainline branch frequency 
- branch branch frequency 
- merge on mainline frequency 
- merge on branch frequency 
- others ?

We can define the parameters above in terms of "average" and "standard 
deviation".

We can use the python standard random generator with a prefixed initial seed. 
So the repositories generated are repeatable.

The output should be a *-fast-export like stream; so we can use this tool with 
different DVCS.

After defining some typical "repository layouts", we can automatically develop 
performance [regression] tests.
Moreover these repository layout can be used for performance comparation with 
every DVCS compatible with the fast-export protocol. 

Thoughts and comments are welcome.

BR
Goffredo


-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) 
<kreijack at inwind_DOT_it>
Key fingerprint = CE3C 7E01 6782 30A3 5B87  87C0 BB86 505C 6B2A CFF9



More information about the bazaar mailing list