[MERGE] * New ``versionedfile.KeyMapper`` interface
Robert Collins
robertc at robertcollins.net
Sun May 4 22:12:46 BST 2008
On Fri, 2008-05-02 at 15:00 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> | This abstracts out the various escaping/path choosing logic from the
> | Store classes. By abstracting it out it can be used cleanly in the
> | upcoming VersionedFiles api, without requiring a dependency on
> | Repository or Store.
> |
> | -Rob
> |
>
> It seems very api incompatible that you are passing around key = (file_id,)
> instead of key = file_id here. Is it correct to assume that this api break is
> from one of your other patches (possibly already merged)? I doubt you introduced
> it here.
The New VersionedFiles api is tuple key based; so the friction will go
away as that lands.
> What brought it to my attention was actually this change:
> === modified file 'bzrlib/store/versioned/__init__.py'
> - --- bzrlib/store/versioned/__init__.py 2008-04-28 03:52:09 +0000
> +++ bzrlib/store/versioned/__init__.py 2008-05-02 03:30:07 +0000
> @@ -69,7 +69,7 @@
> ~ if relpath.endswith(suffix):
> ~ # TODO: use standard remove_suffix function
> ~ escaped_id = os.path.basename(relpath[:-len(suffix)])
> - - file_id = self._unescape(escaped_id)
> + file_id = self._mapper.unmap(escaped_id)[0]
> ~ if file_id not in ids:
> ~ ids.add(file_id)
> ~ yield file_id
>
> ^- Which made it obvious that unmap is returning tuples instead of strings,
> though internally we wanted a string here.
Right, this is because store has not yet been eliminated/migrated.
> === modified file 'bzrlib/fetch.py'
> - --- bzrlib/fetch.py 2008-04-30 04:43:26 +0000
> +++ bzrlib/fetch.py 2008-05-02 03:30:07 +0000
> @@ -205,7 +205,7 @@
> ~ return self.to_repository.search_missing_revision_ids(
> ~ self.from_repository, self._last_revision,
> ~ find_ghosts=self.find_ghosts)
> - - except errors.NoSuchRevision:
> + except errors.NoSuchRevision, e:
> ~ raise InstallFailed([self._last_revision])
>
> ~ def _fetch_weave_text(self, file_id, required_versions):
>
> ^- this seems spurious.
Will revert.
> === modified file 'bzrlib/repofmt/knitrepo.py'
> - --- bzrlib/repofmt/knitrepo.py 2008-04-25 23:14:10 +0000
> +++ bzrlib/repofmt/knitrepo.py 2008-05-02 03:30:07 +0000
> @@ -374,7 +374,7 @@
> ~ versionedfile_kwargs={'delta':False,
> ~ 'factory':knit.KnitPlainFactory(),
> ~ },
> - - escaped=True,
> + escaped=False,
> ~ )
> ~ return KnitRevisionStore(versioned_file_store)
>
> ^- This seems like it might be incorrect altogether. (Either it was really wrong
> before, which I'm pretty sure we would have caught, or it is wrong now.)
It was wrong. This is the part of the control_weaves store
instantiation, which is only used to hold inventories. Thus we don't
escape any names.
> ~ def _check_fileid(self, fileid):
> - - if not isinstance(fileid, basestring):
> - - raise TypeError('Fileids should be a string type: %s %r' %
> (type(fileid), fileid))
> + if type(fileid) != str:
> + raise TypeError('Fileids should be bytestrings: %s %r' % (
> + type(fileid), fileid))
> ~ if '\\' in fileid or '/' in fileid:
> ~ raise ValueError("invalid store id %r" % fileid)
>
> ^- this is the old "isinstance()" versus "type(foo)" versus "foo.__class__"
> argument. I don't know what we decided, but I'm pretty sure at a minimum it is
> supposed to be:
>
> if type(fileid) is not str:
>
> This is a moderate api break that should probably be more clearly documented.
> You are changing the store api to only accept bytestrings. It seems okay, since
> it was a while ago that we switched file_ids to be utf-8 bytestrings only. But
> as a less-than-trivial change it should probably be more visible. (Maybe I only
> care because I feel like I spent a bit of effort getting unicode strings to
> work...:)
Actually, I would describe this by saying that the file_ids work was
incomplete - the store works in file_ids, and file_ids have been made
utf8 already; this assertion is not matched to the file id constraints,
so all I'm doing is aligning them.
> @@ -99,7 +101,7 @@
> ~ if not self._prefixed:
> ~ raise
> ~ try:
> - - self._transport.mkdir(self.hash_prefix(fileid)[:-1],
> mode=self._dir_mode)
> + self._transport.mkdir(osutils.dirname(path), mode=self._dir_mode)
> ~ except FileExists:
> ~ pass
> ~ result = other._transport.copy_to([path], self._transport,
>
>
> Are we sure that "path" is always just a path relative to the transport root?
> Probably, but I'm not 100% on that. I would also recommend re-wrapping this
> line, as it is 89 chars long.
Yes, we are. Note that we already use path in the copy_to line.
> - - def test_paths(self):
> - - text_store = self.get_store()
> - -
> - - self.assertEqual('a', text_store._relpath('a'))
> - - self.assertEqual('a', text_store._relpath(u'a'))
> - - self.assertEqual('%2520', text_store._relpath(' '))
> - - self.assertEqual('%40%253a%253c%253e', text_store._relpath('@:<>'))
> - - self.assertEqual('%25c3%25a5', text_store._relpath(u'\xe5'))
> - -
> ~ def test_prefixed(self):
> ~ # Prefix should be determined by unescaped string
> ~ text_store = self.get_store(prefixed=True)
> @@ -58,7 +49,6 @@
> ~ self.assertEqual('88/%2520', text_store._relpath(' '))
> ~ self.assertEqual('72/%40%253a%253c%253e',
> ~ text_store._relpath('@:<>'))
> - - self.assertEqual('77/%25c3%25a5', text_store._relpath(u'\xe5'))
>
> ~ def test_files(self):
> ~ text_store = self.get_store(prefixed=True)
>
> === modified file 'bzrlib/tests/test_store.py'
> - --- bzrlib/tests/test_store.py 2008-04-28 03:52:09 +0000
> +++ bzrlib/tests/test_store.py 2008-05-02 03:30:07 +0000
> @@ -398,9 +398,10 @@
>
> ~ def test_escaped_uppercase(self):
> ~ """Uppercase letters are escaped for safety on Windows"""
> - - my_store = store.TransportStore(MemoryTransport(), escaped=True)
> + my_store = store.TransportStore(MemoryTransport(), prefixed=True,
> + escaped=True)
> ~ # a particularly perverse file-id! :-)
> - - self.assertEquals(my_store._escape_file_id('C:<>'), '%43%3a%3c%3e')
> + self.assertEquals(my_store._relpath('C:<>'), 'be/%2543%253a%253c%253e')
>
> ^- it seems odd to be removing a series of tests for "_relpath()" and then turn
> around and use _relpath() to make your later
Incomplete sentence here. I'm not sure I needed to remove the plain text
store _relpath tests now that I read it in the diff; however note that
two different classes are being tested here. The first is the JBOF
store, the second is the VersionedFile store.
>
> ...
> +
> + def test___iter__escaped(self):
> + self.vfstore = store.versioned.VersionedFileStore(MemoryTransport(),
> + prefixed=True, escaped=True)
> + self.vfstore.get_scope = self.get_scope
> + self._transaction = transactions.WriteTransaction()
> + vf = self.vfstore.get_weave_or_empty(' ', self._transaction)
> + vf.add_lines('a', [], [])
> + del vf
> + self._transaction.finish()
> + self.assertEqual([' '], list(self.vfstore))
>
> ^- This test could use a little bit of documenting. It is unclear why a black
> space for the file_id is testing the escaped nature of __iter__.
>
> Further, I believe blank spaces are forbidden (elsewhere) in our code. Certainly
> our Knit indices would break badly if they were allowed. So I question its use
> in this test.
It was one of the tests of file id -> file name mapping elsewhere.
__iter__ needs to unescape the contents of the partitioned storage back
to the original file ids.
> I know I was a bit confused by your definition of "Prefix" for the class
> "PrefixMapper". Specifically, it means (I beliieve) that you are only using a
> prefix of the key passed in. Not that the files on disk have a prefix directory.
Thats true; I realise that its a terminology clash with the Store
constructor parameters, but as they will be being superceded by the new
api I don't think the clash will last long.
> + def test_prefix_mapper(self):
> + #format5: plain
> + mapper = versionedfile.PrefixMapper()
> + self.assertEqual("file-id", mapper.map(("file-id", "revision-id")))
> + self.assertEqual("new-id", mapper.map(("new-id", "revision-id")))
> + self.assertEqual(('file-id',), mapper.unmap("file-id"))
> + self.assertEqual(('new-id',), mapper.unmap("new-id"))
> +
> + def test_hash_prefix_mapper(self):
> + #format6: hash + plain
> + mapper = versionedfile.HashPrefixMapper()
> + self.assertEqual("9b/file-id", mapper.map(("file-id", "revision-id")))
> + self.assertEqual("45/new-id", mapper.map(("new-id", "revision-id")))
> + self.assertEqual(('file-id',), mapper.unmap("9b/file-id"))
> + self.assertEqual(('new-id',), mapper.unmap("45/new-id"))
> +
>
> That, at least, is what I understand from those tests.
>
> I'm not sure if it is just an unfortunate terminology clash, or if there is
> something that I'm misunderstanding.
>
> I think the doc string here:
>
> +class PrefixMapper(KeyMapper):
> + """A key mapper that extracts the first component of a key."""
>
> Does make it clear that Prefix applies to the key and not to the output. I'm
> just not sure if it is the clearest way to do it. (correct, I don't have a
> better answer, so if we can't find one with discussion, we'll just stick with
> what you have.)
Yup, thats what it does.
> I would also say that your tests for "hash_prefix_mapper()" should include stuff
> like "filE-Id" to make sure that it *doesn't* escape paths (otherwise
> HashEscapedPrefixMapper passes all of these tests as well). I would also request
> using non-ascii characters. As you aren't hitting disk here, any 8-bit strings
> should be valid.
Good point, will do.
> ...
> +class HashEscapedPrefixMapper(HashPrefixMapper):
> + """Combines the escaped first component of a key with a hash."""
> +
> + _safe = "abcdefghijklmnopqrstuvwxyz0123456789-_@,."
> +
> + def _escape(self, prefix):
> + """Turn a key element into a filesystem safe string.
> +
> + This is similar to a plain urllib.quote, except
> + it uses specific safe characters, so that it doesn't
> + have to translate a lot of valid file ids.
> + """
> + # @ does not get escaped. This is because it is a valid
> + # filesystem character we use all the time, and it looks
> + # a lot better than seeing %40 all the time.
> + r = [((c in self._safe) and c or ('%%%02x' % ord(c)))
> + for c in prefix]
> + return ''.join(r)
>
> ^- As this is a function which gets called a lot, I would recommend turning
> "self._safe" into a local variable.
>
> safe = self._safe
> return ''.join([((c in safe) and c or ('%%%%02x' % ord(c))) for c in prefix]
>
> The comment about '@' probably is better to go with the definition of "_safe = "
> rather than inside the _escape() function. Since you don't have any context as
> to how @ isn't being escaped.
Thanks!
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080505/fcd2642a/attachment.pgp
More information about the bazaar
mailing list