[MERGE] * New ``versionedfile.KeyMapper`` interface

John Arbash Meinel john at arbash-meinel.com
Fri May 2 21:00:27 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
| This abstracts out the various escaping/path choosing logic from the
| Store classes. By abstracting it out it can be used cleanly in the
| upcoming VersionedFiles api, without requiring a dependency on
| Repository or Store.
|
| -Rob
|

It seems very api incompatible that you are passing around key = (file_id,)
instead of key = file_id here. Is it correct to assume that this api break is
from one of your other patches (possibly already merged)? I doubt you introduced
it here.

But stuff like:

Mapper.unmap(xxx):
~  return (xxx,)

Certainly change the objects a bit.

What brought it to my attention was actually this change:
=== modified file 'bzrlib/store/versioned/__init__.py'
- --- bzrlib/store/versioned/__init__.py	2008-04-28 03:52:09 +0000
+++ bzrlib/store/versioned/__init__.py	2008-05-02 03:30:07 +0000
@@ -69,7 +69,7 @@
~                 if relpath.endswith(suffix):
~                     # TODO: use standard remove_suffix function
~                     escaped_id = os.path.basename(relpath[:-len(suffix)])
- -                    file_id = self._unescape(escaped_id)
+                    file_id = self._mapper.unmap(escaped_id)[0]
~                     if file_id not in ids:
~                         ids.add(file_id)
~                         yield file_id

^- Which made it obvious that unmap is returning tuples instead of strings,
though internally we wanted a string here.


=== modified file 'bzrlib/fetch.py'
- --- bzrlib/fetch.py	2008-04-30 04:43:26 +0000
+++ bzrlib/fetch.py	2008-05-02 03:30:07 +0000
@@ -205,7 +205,7 @@
~             return self.to_repository.search_missing_revision_ids(
~                 self.from_repository, self._last_revision,
~                 find_ghosts=self.find_ghosts)
- -        except errors.NoSuchRevision:
+        except errors.NoSuchRevision, e:
~             raise InstallFailed([self._last_revision])

~     def _fetch_weave_text(self, file_id, required_versions):

^- this seems spurious.

=== modified file 'bzrlib/repofmt/knitrepo.py'
- --- bzrlib/repofmt/knitrepo.py	2008-04-25 23:14:10 +0000
+++ bzrlib/repofmt/knitrepo.py	2008-05-02 03:30:07 +0000
@@ -374,7 +374,7 @@
~             versionedfile_kwargs={'delta':False,
~                                   'factory':knit.KnitPlainFactory(),
~                                  },
- -            escaped=True,
+            escaped=False,
~             )
~         return KnitRevisionStore(versioned_file_store)

^- This seems like it might be incorrect altogether. (Either it was really wrong
before, which I'm pretty sure we would have caught, or it is wrong now.)

~     def _check_fileid(self, fileid):
- -        if not isinstance(fileid, basestring):
- -            raise TypeError('Fileids should be a string type: %s %r' %
(type(fileid), fileid))
+        if type(fileid) != str:
+            raise TypeError('Fileids should be bytestrings: %s %r' % (
+                type(fileid), fileid))
~         if '\\' in fileid or '/' in fileid:
~             raise ValueError("invalid store id %r" % fileid)

^- this is the old "isinstance()" versus "type(foo)" versus "foo.__class__"
argument. I don't know what we decided, but I'm pretty sure at a minimum it is
supposed to be:

if type(fileid) is not str:

This is a moderate api break that should probably be more clearly documented.
You are changing the store api to only accept bytestrings. It seems okay, since
it was a while ago that we switched file_ids to be utf-8 bytestrings only. But
as a less-than-trivial change it should probably be more visible. (Maybe I only
care because I feel like I spent a bit of effort getting unicode strings to
work...:)


@@ -99,7 +101,7 @@
~             if not self._prefixed:
~                 raise
~             try:
- -                self._transport.mkdir(self.hash_prefix(fileid)[:-1],
mode=self._dir_mode)
+                self._transport.mkdir(osutils.dirname(path), mode=self._dir_mode)
~             except FileExists:
~                 pass
~             result = other._transport.copy_to([path], self._transport,


Are we sure that "path" is always just a path relative to the transport root?
Probably, but I'm not 100% on that. I would also recommend re-wrapping this
line, as it is 89 chars long.

...

- -    def test_paths(self):
- -        text_store = self.get_store()
- -
- -        self.assertEqual('a', text_store._relpath('a'))
- -        self.assertEqual('a', text_store._relpath(u'a'))
- -        self.assertEqual('%2520', text_store._relpath(' '))
- -        self.assertEqual('%40%253a%253c%253e', text_store._relpath('@:<>'))
- -        self.assertEqual('%25c3%25a5', text_store._relpath(u'\xe5'))
- -
~     def test_prefixed(self):
~         # Prefix should be determined by unescaped string
~         text_store = self.get_store(prefixed=True)
@@ -58,7 +49,6 @@
~         self.assertEqual('88/%2520', text_store._relpath(' '))
~         self.assertEqual('72/%40%253a%253c%253e',
~                 text_store._relpath('@:<>'))
- -        self.assertEqual('77/%25c3%25a5', text_store._relpath(u'\xe5'))

~     def test_files(self):
~         text_store = self.get_store(prefixed=True)

=== modified file 'bzrlib/tests/test_store.py'
- --- bzrlib/tests/test_store.py	2008-04-28 03:52:09 +0000
+++ bzrlib/tests/test_store.py	2008-05-02 03:30:07 +0000
@@ -398,9 +398,10 @@

~     def test_escaped_uppercase(self):
~         """Uppercase letters are escaped for safety on Windows"""
- -        my_store = store.TransportStore(MemoryTransport(), escaped=True)
+        my_store = store.TransportStore(MemoryTransport(), prefixed=True,
+            escaped=True)
~         # a particularly perverse file-id!  :-)
- -        self.assertEquals(my_store._escape_file_id('C:<>'), '%43%3a%3c%3e')
+        self.assertEquals(my_store._relpath('C:<>'), 'be/%2543%253a%253c%253e')

^- it seems odd to be removing a series of tests for "_relpath()" and then turn
around and use _relpath() to make your later


...
+
+    def test___iter__escaped(self):
+        self.vfstore = store.versioned.VersionedFileStore(MemoryTransport(),
+            prefixed=True, escaped=True)
+        self.vfstore.get_scope = self.get_scope
+        self._transaction = transactions.WriteTransaction()
+        vf = self.vfstore.get_weave_or_empty(' ', self._transaction)
+        vf.add_lines('a', [], [])
+        del vf
+        self._transaction.finish()
+        self.assertEqual([' '], list(self.vfstore))

^- This test could use a little bit of documenting. It is unclear why a black
space for the file_id is testing the escaped nature of __iter__.

Further, I believe blank spaces are forbidden (elsewhere) in our code. Certainly
our Knit indices would break badly if they were allowed. So I question its use
in this test.



I know I was a bit confused by your definition of "Prefix" for the class
"PrefixMapper". Specifically, it means (I beliieve) that you are only using a
prefix of the key passed in. Not that the files on disk have a prefix directory.

+    def test_prefix_mapper(self):
+        #format5: plain
+        mapper = versionedfile.PrefixMapper()
+        self.assertEqual("file-id", mapper.map(("file-id", "revision-id")))
+        self.assertEqual("new-id", mapper.map(("new-id", "revision-id")))
+        self.assertEqual(('file-id',), mapper.unmap("file-id"))
+        self.assertEqual(('new-id',), mapper.unmap("new-id"))
+
+    def test_hash_prefix_mapper(self):
+        #format6: hash + plain
+        mapper = versionedfile.HashPrefixMapper()
+        self.assertEqual("9b/file-id", mapper.map(("file-id", "revision-id")))
+        self.assertEqual("45/new-id", mapper.map(("new-id", "revision-id")))
+        self.assertEqual(('file-id',), mapper.unmap("9b/file-id"))
+        self.assertEqual(('new-id',), mapper.unmap("45/new-id"))
+

That, at least, is what I understand from those tests.

I'm not sure if it is just an unfortunate terminology clash, or if there is
something that I'm misunderstanding.

I think the doc string here:

+class PrefixMapper(KeyMapper):
+    """A key mapper that extracts the first component of a key."""

Does make it clear that Prefix applies to the key and not to the output. I'm
just not sure if it is the clearest way to do it. (correct, I don't have a
better answer, so if we can't find one with discussion, we'll just stick with
what you have.)


I would also say that your tests for "hash_prefix_mapper()" should include stuff
like "filE-Id" to make sure that it *doesn't* escape paths (otherwise
HashEscapedPrefixMapper passes all of these tests as well). I would also request
using non-ascii characters. As you aren't hitting disk here, any 8-bit strings
should be valid.

...
+class HashEscapedPrefixMapper(HashPrefixMapper):
+    """Combines the escaped first component of a key with a hash."""
+
+    _safe = "abcdefghijklmnopqrstuvwxyz0123456789-_@,."
+
+    def _escape(self, prefix):
+        """Turn a key element into a filesystem safe string.
+
+        This is similar to a plain urllib.quote, except
+        it uses specific safe characters, so that it doesn't
+        have to translate a lot of valid file ids.
+        """
+        # @ does not get escaped. This is because it is a valid
+        # filesystem character we use all the time, and it looks
+        # a lot better than seeing %40 all the time.
+        r = [((c in self._safe) and c or ('%%%02x' % ord(c)))
+             for c in prefix]
+        return ''.join(r)

^- As this is a function which gets called a lot, I would recommend turning
"self._safe" into a local variable.

safe = self._safe
return ''.join([((c in safe) and c or ('%%%%02x' % ord(c))) for c in prefix]

The comment about '@' probably is better to go with the definition of "_safe = "
rather than inside the _escape() function. Since you don't have any context as
to how @ isn't being escaped.


BB:tweak

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgbctsACgkQJdeBCYSNAAOs0QCfUoNQdMrBBYiVUcTqChNxLCLw
sFUAoKheEZHCmj/K+FmtEPzaBRt4CxoK
=CPt9
-----END PGP SIGNATURE-----



More information about the bazaar mailing list