[MERGE] * New ``versionedfile.KeyMapper`` interface

Sun May 4 22:12:46 BST 2008

On Fri, 2008-05-02 at 15:00 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> | This abstracts out the various escaping/path choosing logic from the
> | Store classes. By abstracting it out it can be used cleanly in the
> | upcoming VersionedFiles api, without requiring a dependency on
> | Repository or Store.
> |
> | -Rob
> |
> 
> It seems very api incompatible that you are passing around key = (file_id,)
> instead of key = file_id here. Is it correct to assume that this api break is
> from one of your other patches (possibly already merged)? I doubt you introduced
> it here.

The New VersionedFiles api is tuple key based; so the friction will go
away as that lands.

> What brought it to my attention was actually this change:
> === modified file 'bzrlib/store/versioned/__init__.py'
> - --- bzrlib/store/versioned/__init__.py	2008-04-28 03:52:09 +0000
> +++ bzrlib/store/versioned/__init__.py	2008-05-02 03:30:07 +0000
> @@ -69,7 +69,7 @@
> ~                 if relpath.endswith(suffix):
> ~                     # TODO: use standard remove_suffix function
> ~                     escaped_id = os.path.basename(relpath[:-len(suffix)])
> - -                    file_id = self._unescape(escaped_id)
> +                    file_id = self._mapper.unmap(escaped_id)[0]
> ~                     if file_id not in ids:
> ~                         ids.add(file_id)
> ~                         yield file_id
> 
> ^- Which made it obvious that unmap is returning tuples instead of strings,
> though internally we wanted a string here.

Right, this is because store has not yet been eliminated/migrated.

> === modified file 'bzrlib/fetch.py'
> - --- bzrlib/fetch.py	2008-04-30 04:43:26 +0000
> +++ bzrlib/fetch.py	2008-05-02 03:30:07 +0000
> @@ -205,7 +205,7 @@
> ~             return self.to_repository.search_missing_revision_ids(
> ~                 self.from_repository, self._last_revision,
> ~                 find_ghosts=self.find_ghosts)
> - -        except errors.NoSuchRevision:
> +        except errors.NoSuchRevision, e:
> ~             raise InstallFailed([self._last_revision])
> 
> ~     def _fetch_weave_text(self, file_id, required_versions):
> 
> ^- this seems spurious.

Will revert.

> === modified file 'bzrlib/repofmt/knitrepo.py'
> - --- bzrlib/repofmt/knitrepo.py	2008-04-25 23:14:10 +0000
> +++ bzrlib/repofmt/knitrepo.py	2008-05-02 03:30:07 +0000
> @@ -374,7 +374,7 @@
> ~             versionedfile_kwargs={'delta':False,
> ~                                   'factory':knit.KnitPlainFactory(),
> ~                                  },
> - -            escaped=True,
> +            escaped=False,
> ~             )
> ~         return KnitRevisionStore(versioned_file_store)
> 
> ^- This seems like it might be incorrect altogether. (Either it was really wrong
> before, which I'm pretty sure we would have caught, or it is wrong now.)

It was wrong. This is the part of the control_weaves store
instantiation, which is only used to hold inventories. Thus we don't
escape any names.

> ~     def _check_fileid(self, fileid):
> - -        if not isinstance(fileid, basestring):
> - -            raise TypeError('Fileids should be a string type: %s %r' %
> (type(fileid), fileid))
> +        if type(fileid) != str:
> +            raise TypeError('Fileids should be bytestrings: %s %r' % (
> +                type(fileid), fileid))
> ~         if '\\' in fileid or '/' in fileid:
> ~             raise ValueError("invalid store id %r" % fileid)
> 
> ^- this is the old "isinstance()" versus "type(foo)" versus "foo.__class__"
> argument. I don't know what we decided, but I'm pretty sure at a minimum it is
> supposed to be:
> 
> if type(fileid) is not str:
> 
> This is a moderate api break that should probably be more clearly documented.
> You are changing the store api to only accept bytestrings. It seems okay, since
> it was a while ago that we switched file_ids to be utf-8 bytestrings only. But
> as a less-than-trivial change it should probably be more visible. (Maybe I only
> care because I feel like I spent a bit of effort getting unicode strings to
> work...:)

Actually, I would describe this by saying that the file_ids work was
incomplete - the store works in file_ids, and file_ids have been made
utf8 already; this assertion is not matched to the file id constraints,
so all I'm doing is aligning them.

> @@ -99,7 +101,7 @@
> ~             if not self._prefixed:
> ~                 raise
> ~             try:
> - -                self._transport.mkdir(self.hash_prefix(fileid)[:-1],
> mode=self._dir_mode)
> +                self._transport.mkdir(osutils.dirname(path), mode=self._dir_mode)
> ~             except FileExists:
> ~                 pass
> ~             result = other._transport.copy_to([path], self._transport,
> 
> 
> Are we sure that "path" is always just a path relative to the transport root?
> Probably, but I'm not 100% on that. I would also recommend re-wrapping this
> line, as it is 89 chars long.

Yes, we are. Note that we already use path in the copy_to line.

> - -    def test_paths(self):
> - -        text_store = self.get_store()
> - -
> - -        self.assertEqual('a', text_store._relpath('a'))
> - -        self.assertEqual('a', text_store._relpath(u'a'))
> - -        self.assertEqual('%2520', text_store._relpath(' '))
> - -        self.assertEqual('%40%253a%253c%253e', text_store._relpath('@:<>'))
> - -        self.assertEqual('%25c3%25a5', text_store._relpath(u'\xe5'))
> - -
> ~     def test_prefixed(self):
> ~         # Prefix should be determined by unescaped string
> ~         text_store = self.get_store(prefixed=True)
> @@ -58,7 +49,6 @@
> ~         self.assertEqual('88/%2520', text_store._relpath(' '))
> ~         self.assertEqual('72/%40%253a%253c%253e',
> ~                 text_store._relpath('@:<>'))
> - -        self.assertEqual('77/%25c3%25a5', text_store._relpath(u'\xe5'))
> 
> ~     def test_files(self):
> ~         text_store = self.get_store(prefixed=True)
> 
> === modified file 'bzrlib/tests/test_store.py'
> - --- bzrlib/tests/test_store.py	2008-04-28 03:52:09 +0000
> +++ bzrlib/tests/test_store.py	2008-05-02 03:30:07 +0000
> @@ -398,9 +398,10 @@
> 
> ~     def test_escaped_uppercase(self):
> ~         """Uppercase letters are escaped for safety on Windows"""
> - -        my_store = store.TransportStore(MemoryTransport(), escaped=True)
> +        my_store = store.TransportStore(MemoryTransport(), prefixed=True,
> +            escaped=True)
> ~         # a particularly perverse file-id!  :-)
> - -        self.assertEquals(my_store._escape_file_id('C:<>'), '%43%3a%3c%3e')
> +        self.assertEquals(my_store._relpath('C:<>'), 'be/%2543%253a%253c%253e')
> 
> ^- it seems odd to be removing a series of tests for "_relpath()" and then turn
> around and use _relpath() to make your later

Incomplete sentence here. I'm not sure I needed to remove the plain text
store _relpath tests now that I read it in the diff; however note that
two different classes are being tested here. The first is the JBOF
store, the second is the VersionedFile store.

> 
> ...
> +
> +    def test___iter__escaped(self):
> +        self.vfstore = store.versioned.VersionedFileStore(MemoryTransport(),
> +            prefixed=True, escaped=True)
> +        self.vfstore.get_scope = self.get_scope
> +        self._transaction = transactions.WriteTransaction()
> +        vf = self.vfstore.get_weave_or_empty(' ', self._transaction)
> +        vf.add_lines('a', [], [])
> +        del vf
> +        self._transaction.finish()
> +        self.assertEqual([' '], list(self.vfstore))
> 
> ^- This test could use a little bit of documenting. It is unclear why a black
> space for the file_id is testing the escaped nature of __iter__.
> 
> Further, I believe blank spaces are forbidden (elsewhere) in our code. Certainly
> our Knit indices would break badly if they were allowed. So I question its use
> in this test.

It was one of the tests of file id -> file name mapping elsewhere.
__iter__ needs to unescape the contents of the partitioned storage back
to the original file ids.

> I know I was a bit confused by your definition of "Prefix" for the class
> "PrefixMapper". Specifically, it means (I beliieve) that you are only using a
> prefix of the key passed in. Not that the files on disk have a prefix directory.

Thats true; I realise that its a terminology clash with the Store
constructor parameters, but as they will be being superceded by the new
api I don't think the clash will last long.

> +    def test_prefix_mapper(self):
> +        #format5: plain
> +        mapper = versionedfile.PrefixMapper()
> +        self.assertEqual("file-id", mapper.map(("file-id", "revision-id")))
> +        self.assertEqual("new-id", mapper.map(("new-id", "revision-id")))
> +        self.assertEqual(('file-id',), mapper.unmap("file-id"))
> +        self.assertEqual(('new-id',), mapper.unmap("new-id"))
> +
> +    def test_hash_prefix_mapper(self):
> +        #format6: hash + plain
> +        mapper = versionedfile.HashPrefixMapper()
> +        self.assertEqual("9b/file-id", mapper.map(("file-id", "revision-id")))
> +        self.assertEqual("45/new-id", mapper.map(("new-id", "revision-id")))
> +        self.assertEqual(('file-id',), mapper.unmap("9b/file-id"))
> +        self.assertEqual(('new-id',), mapper.unmap("45/new-id"))
> +
> 
> That, at least, is what I understand from those tests.
> 
> I'm not sure if it is just an unfortunate terminology clash, or if there is
> something that I'm misunderstanding.
> 
> I think the doc string here:
> 
> +class PrefixMapper(KeyMapper):
> +    """A key mapper that extracts the first component of a key."""
> 
> Does make it clear that Prefix applies to the key and not to the output. I'm
> just not sure if it is the clearest way to do it. (correct, I don't have a
> better answer, so if we can't find one with discussion, we'll just stick with
> what you have.)

Yup, thats what it does.

> I would also say that your tests for "hash_prefix_mapper()" should include stuff
> like "filE-Id" to make sure that it *doesn't* escape paths (otherwise
> HashEscapedPrefixMapper passes all of these tests as well). I would also request
> using non-ascii characters. As you aren't hitting disk here, any 8-bit strings
> should be valid.

Good point, will do.

> ...
> +class HashEscapedPrefixMapper(HashPrefixMapper):
> +    """Combines the escaped first component of a key with a hash."""
> +
> +    _safe = "abcdefghijklmnopqrstuvwxyz0123456789-_@,."
> +
> +    def _escape(self, prefix):
> +        """Turn a key element into a filesystem safe string.
> +
> +        This is similar to a plain urllib.quote, except
> +        it uses specific safe characters, so that it doesn't
> +        have to translate a lot of valid file ids.
> +        """
> +        # @ does not get escaped. This is because it is a valid
> +        # filesystem character we use all the time, and it looks
> +        # a lot better than seeing %40 all the time.
> +        r = [((c in self._safe) and c or ('%%%02x' % ord(c)))
> +             for c in prefix]
> +        return ''.join(r)
> 
> ^- As this is a function which gets called a lot, I would recommend turning
> "self._safe" into a local variable.
> 
> safe = self._safe
> return ''.join([((c in safe) and c or ('%%%%02x' % ord(c))) for c in prefix]
> 
> The comment about '@' probably is better to go with the definition of "_safe = "
> rather than inside the _escape() function. Since you don't have any context as
> to how @ isn't being escaped.

Thanks!

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080505/fcd2642a/attachment.pgp