[MERGE] * New ``versionedfile.KeyMapper`` interface
John Arbash Meinel
john at arbash-meinel.com
Fri May 2 21:00:27 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
| This abstracts out the various escaping/path choosing logic from the
| Store classes. By abstracting it out it can be used cleanly in the
| upcoming VersionedFiles api, without requiring a dependency on
| Repository or Store.
|
| -Rob
|
It seems very api incompatible that you are passing around key = (file_id,)
instead of key = file_id here. Is it correct to assume that this api break is
from one of your other patches (possibly already merged)? I doubt you introduced
it here.
But stuff like:
Mapper.unmap(xxx):
~ return (xxx,)
Certainly change the objects a bit.
What brought it to my attention was actually this change:
=== modified file 'bzrlib/store/versioned/__init__.py'
- --- bzrlib/store/versioned/__init__.py 2008-04-28 03:52:09 +0000
+++ bzrlib/store/versioned/__init__.py 2008-05-02 03:30:07 +0000
@@ -69,7 +69,7 @@
~ if relpath.endswith(suffix):
~ # TODO: use standard remove_suffix function
~ escaped_id = os.path.basename(relpath[:-len(suffix)])
- - file_id = self._unescape(escaped_id)
+ file_id = self._mapper.unmap(escaped_id)[0]
~ if file_id not in ids:
~ ids.add(file_id)
~ yield file_id
^- Which made it obvious that unmap is returning tuples instead of strings,
though internally we wanted a string here.
=== modified file 'bzrlib/fetch.py'
- --- bzrlib/fetch.py 2008-04-30 04:43:26 +0000
+++ bzrlib/fetch.py 2008-05-02 03:30:07 +0000
@@ -205,7 +205,7 @@
~ return self.to_repository.search_missing_revision_ids(
~ self.from_repository, self._last_revision,
~ find_ghosts=self.find_ghosts)
- - except errors.NoSuchRevision:
+ except errors.NoSuchRevision, e:
~ raise InstallFailed([self._last_revision])
~ def _fetch_weave_text(self, file_id, required_versions):
^- this seems spurious.
=== modified file 'bzrlib/repofmt/knitrepo.py'
- --- bzrlib/repofmt/knitrepo.py 2008-04-25 23:14:10 +0000
+++ bzrlib/repofmt/knitrepo.py 2008-05-02 03:30:07 +0000
@@ -374,7 +374,7 @@
~ versionedfile_kwargs={'delta':False,
~ 'factory':knit.KnitPlainFactory(),
~ },
- - escaped=True,
+ escaped=False,
~ )
~ return KnitRevisionStore(versioned_file_store)
^- This seems like it might be incorrect altogether. (Either it was really wrong
before, which I'm pretty sure we would have caught, or it is wrong now.)
~ def _check_fileid(self, fileid):
- - if not isinstance(fileid, basestring):
- - raise TypeError('Fileids should be a string type: %s %r' %
(type(fileid), fileid))
+ if type(fileid) != str:
+ raise TypeError('Fileids should be bytestrings: %s %r' % (
+ type(fileid), fileid))
~ if '\\' in fileid or '/' in fileid:
~ raise ValueError("invalid store id %r" % fileid)
^- this is the old "isinstance()" versus "type(foo)" versus "foo.__class__"
argument. I don't know what we decided, but I'm pretty sure at a minimum it is
supposed to be:
if type(fileid) is not str:
This is a moderate api break that should probably be more clearly documented.
You are changing the store api to only accept bytestrings. It seems okay, since
it was a while ago that we switched file_ids to be utf-8 bytestrings only. But
as a less-than-trivial change it should probably be more visible. (Maybe I only
care because I feel like I spent a bit of effort getting unicode strings to
work...:)
@@ -99,7 +101,7 @@
~ if not self._prefixed:
~ raise
~ try:
- - self._transport.mkdir(self.hash_prefix(fileid)[:-1],
mode=self._dir_mode)
+ self._transport.mkdir(osutils.dirname(path), mode=self._dir_mode)
~ except FileExists:
~ pass
~ result = other._transport.copy_to([path], self._transport,
Are we sure that "path" is always just a path relative to the transport root?
Probably, but I'm not 100% on that. I would also recommend re-wrapping this
line, as it is 89 chars long.
...
- - def test_paths(self):
- - text_store = self.get_store()
- -
- - self.assertEqual('a', text_store._relpath('a'))
- - self.assertEqual('a', text_store._relpath(u'a'))
- - self.assertEqual('%2520', text_store._relpath(' '))
- - self.assertEqual('%40%253a%253c%253e', text_store._relpath('@:<>'))
- - self.assertEqual('%25c3%25a5', text_store._relpath(u'\xe5'))
- -
~ def test_prefixed(self):
~ # Prefix should be determined by unescaped string
~ text_store = self.get_store(prefixed=True)
@@ -58,7 +49,6 @@
~ self.assertEqual('88/%2520', text_store._relpath(' '))
~ self.assertEqual('72/%40%253a%253c%253e',
~ text_store._relpath('@:<>'))
- - self.assertEqual('77/%25c3%25a5', text_store._relpath(u'\xe5'))
~ def test_files(self):
~ text_store = self.get_store(prefixed=True)
=== modified file 'bzrlib/tests/test_store.py'
- --- bzrlib/tests/test_store.py 2008-04-28 03:52:09 +0000
+++ bzrlib/tests/test_store.py 2008-05-02 03:30:07 +0000
@@ -398,9 +398,10 @@
~ def test_escaped_uppercase(self):
~ """Uppercase letters are escaped for safety on Windows"""
- - my_store = store.TransportStore(MemoryTransport(), escaped=True)
+ my_store = store.TransportStore(MemoryTransport(), prefixed=True,
+ escaped=True)
~ # a particularly perverse file-id! :-)
- - self.assertEquals(my_store._escape_file_id('C:<>'), '%43%3a%3c%3e')
+ self.assertEquals(my_store._relpath('C:<>'), 'be/%2543%253a%253c%253e')
^- it seems odd to be removing a series of tests for "_relpath()" and then turn
around and use _relpath() to make your later
...
+
+ def test___iter__escaped(self):
+ self.vfstore = store.versioned.VersionedFileStore(MemoryTransport(),
+ prefixed=True, escaped=True)
+ self.vfstore.get_scope = self.get_scope
+ self._transaction = transactions.WriteTransaction()
+ vf = self.vfstore.get_weave_or_empty(' ', self._transaction)
+ vf.add_lines('a', [], [])
+ del vf
+ self._transaction.finish()
+ self.assertEqual([' '], list(self.vfstore))
^- This test could use a little bit of documenting. It is unclear why a black
space for the file_id is testing the escaped nature of __iter__.
Further, I believe blank spaces are forbidden (elsewhere) in our code. Certainly
our Knit indices would break badly if they were allowed. So I question its use
in this test.
I know I was a bit confused by your definition of "Prefix" for the class
"PrefixMapper". Specifically, it means (I beliieve) that you are only using a
prefix of the key passed in. Not that the files on disk have a prefix directory.
+ def test_prefix_mapper(self):
+ #format5: plain
+ mapper = versionedfile.PrefixMapper()
+ self.assertEqual("file-id", mapper.map(("file-id", "revision-id")))
+ self.assertEqual("new-id", mapper.map(("new-id", "revision-id")))
+ self.assertEqual(('file-id',), mapper.unmap("file-id"))
+ self.assertEqual(('new-id',), mapper.unmap("new-id"))
+
+ def test_hash_prefix_mapper(self):
+ #format6: hash + plain
+ mapper = versionedfile.HashPrefixMapper()
+ self.assertEqual("9b/file-id", mapper.map(("file-id", "revision-id")))
+ self.assertEqual("45/new-id", mapper.map(("new-id", "revision-id")))
+ self.assertEqual(('file-id',), mapper.unmap("9b/file-id"))
+ self.assertEqual(('new-id',), mapper.unmap("45/new-id"))
+
That, at least, is what I understand from those tests.
I'm not sure if it is just an unfortunate terminology clash, or if there is
something that I'm misunderstanding.
I think the doc string here:
+class PrefixMapper(KeyMapper):
+ """A key mapper that extracts the first component of a key."""
Does make it clear that Prefix applies to the key and not to the output. I'm
just not sure if it is the clearest way to do it. (correct, I don't have a
better answer, so if we can't find one with discussion, we'll just stick with
what you have.)
I would also say that your tests for "hash_prefix_mapper()" should include stuff
like "filE-Id" to make sure that it *doesn't* escape paths (otherwise
HashEscapedPrefixMapper passes all of these tests as well). I would also request
using non-ascii characters. As you aren't hitting disk here, any 8-bit strings
should be valid.
...
+class HashEscapedPrefixMapper(HashPrefixMapper):
+ """Combines the escaped first component of a key with a hash."""
+
+ _safe = "abcdefghijklmnopqrstuvwxyz0123456789-_@,."
+
+ def _escape(self, prefix):
+ """Turn a key element into a filesystem safe string.
+
+ This is similar to a plain urllib.quote, except
+ it uses specific safe characters, so that it doesn't
+ have to translate a lot of valid file ids.
+ """
+ # @ does not get escaped. This is because it is a valid
+ # filesystem character we use all the time, and it looks
+ # a lot better than seeing %40 all the time.
+ r = [((c in self._safe) and c or ('%%%02x' % ord(c)))
+ for c in prefix]
+ return ''.join(r)
^- As this is a function which gets called a lot, I would recommend turning
"self._safe" into a local variable.
safe = self._safe
return ''.join([((c in safe) and c or ('%%%%02x' % ord(c))) for c in prefix]
The comment about '@' probably is better to go with the definition of "_safe = "
rather than inside the _escape() function. Since you don't have any context as
to how @ isn't being escaped.
BB:tweak
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkgbctsACgkQJdeBCYSNAAOs0QCfUoNQdMrBBYiVUcTqChNxLCLw
sFUAoKheEZHCmj/K+FmtEPzaBRt4CxoK
=CPt9
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list