Rev 3773: (jam) Add a hidden 'dump-btree' command for getting the raw info out in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Fri Oct 10 21:13:52 BST 2008


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3773
revision-id: pqm at pqm.ubuntu.com-20081010201349-ccw3kwu9fe7iaw77
parent: pqm at pqm.ubuntu.com-20081010194144-0hujuzlipigm8pbs
parent: john at arbash-meinel.com-20081010191519-jrqt2sf7jw4u392o
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Fri 2008-10-10 21:13:49 +0100
message:
  (jam) Add a hidden 'dump-btree' command for getting the raw info out
  	of a btree index.
added:
  bzrlib/tests/blackbox/test_dump_btree.py test_dump_btree.py-20081008203335-zkpcq230b6vubszz-1
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
  bzrlib/tests/blackbox/__init__.py __init__.py-20051128053524-eba30d8255e08dc3
    ------------------------------------------------------------
    revno: 3770.1.5
    revision-id: john at arbash-meinel.com-20081010191519-jrqt2sf7jw4u392o
    parent: john at arbash-meinel.com-20081010185341-bbrdlq1ydy2ovnv7
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dump_btree
    timestamp: Fri 2008-10-10 14:15:19 -0500
    message:
      Add a trailing period for the option '--raw'
    modified:
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
    ------------------------------------------------------------
    revno: 3770.1.4
    revision-id: john at arbash-meinel.com-20081010185341-bbrdlq1ydy2ovnv7
    parent: john at arbash-meinel.com-20081008215612-y9v94tqxreqoangx
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dump_btree
    timestamp: Fri 2008-10-10 13:53:41 -0500
    message:
      Clarify the help text a bit.
    modified:
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
    ------------------------------------------------------------
    revno: 3770.1.3
    revision-id: john at arbash-meinel.com-20081008215612-y9v94tqxreqoangx
    parent: john at arbash-meinel.com-20081008215137-wu18nhhorncyon50
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dump_btree
    timestamp: Wed 2008-10-08 16:56:12 -0500
    message:
      Simplify the --raw mode.
      
      I didn't realize, but the only node that is special cased is the 'root' node,
      and to read it, you actually have to parse it directly, because the
      compressed bytes start immediately after the end of the header, rather than
      having any padding before the zlib bytes.
    modified:
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
      bzrlib/tests/blackbox/test_dump_btree.py test_dump_btree.py-20081008203335-zkpcq230b6vubszz-1
    ------------------------------------------------------------
    revno: 3770.1.2
    revision-id: john at arbash-meinel.com-20081008215137-wu18nhhorncyon50
    parent: john at arbash-meinel.com-20081008204023-z1u32sjby509wl12
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dump_btree
    timestamp: Wed 2008-10-08 16:51:37 -0500
    message:
      Add a --raw output for dump-btree.
      
      This does the minimum it can, so that we can dump out the
      raw bytes in a meaningful manner.
    modified:
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
      bzrlib/tests/blackbox/test_dump_btree.py test_dump_btree.py-20081008203335-zkpcq230b6vubszz-1
    ------------------------------------------------------------
    revno: 3770.1.1
    revision-id: john at arbash-meinel.com-20081008204023-z1u32sjby509wl12
    parent: pqm at pqm.ubuntu.com-20081008020104-e68hyxx45qo19nzx
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dump_btree
    timestamp: Wed 2008-10-08 15:40:23 -0500
    message:
      First draft of a basic dump-btree command.
      
      Does enough for what I need with pack-names files, but I'd like it to be a
      bit more 'raw'.
    added:
      bzrlib/tests/blackbox/test_dump_btree.py test_dump_btree.py-20081008203335-zkpcq230b6vubszz-1
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
      bzrlib/tests/blackbox/__init__.py __init__.py-20051128053524-eba30d8255e08dc3
=== modified file 'NEWS'
--- a/NEWS	2008-10-08 01:28:40 +0000
+++ b/NEWS	2008-10-08 20:40:23 +0000
@@ -7,6 +7,12 @@
 IN DEVELOPMENT
 --------------
 
+  IMPROVEMENTS:
+
+    * ``bzr dump-btree`` is a hidden command introduced to allow dumping
+      the contents of a compressed btree file.  (John Arbash Meinel)
+
+
 bzr 1.8rc1 2008-10-07
 ---------------------
 

=== modified file 'bzrlib/builtins.py'
--- a/bzrlib/builtins.py	2008-10-02 17:28:44 +0000
+++ b/bzrlib/builtins.py	2008-10-10 19:15:19 +0000
@@ -29,6 +29,7 @@
 from bzrlib import (
     bugtracker,
     bundle,
+    btree_index,
     bzrdir,
     delta,
     config,
@@ -255,7 +256,81 @@
                                                  ' revision.')
                 rev_id = rev.as_revision_id(b)
                 self.outf.write(b.repository.get_revision_xml(rev_id).decode('utf-8'))
-    
+
+
+class cmd_dump_btree(Command):
+    """Dump the contents of a btree index file to stdout.
+
+    PATH is a btree index file, it can be any URL. This includes things like
+    .bzr/repository/pack-names, or .bzr/repository/indices/a34b3a...ca4a4.iix
+
+    By default, the tuples stored in the index file will be displayed. With
+    --raw, we will uncompress the pages, but otherwise display the raw bytes
+    stored in the index.
+    """
+
+    # TODO: Do we want to dump the internal nodes as well?
+    # TODO: It would be nice to be able to dump the un-parsed information,
+    #       rather than only going through iter_all_entries. However, this is
+    #       good enough for a start
+    hidden = True
+    encoding_type = 'exact'
+    takes_args = ['path']
+    takes_options = [Option('raw', help='Write the uncompressed bytes out,'
+                                        ' rather than the parsed tuples.'),
+                    ]
+
+    def run(self, path, raw=False):
+        dirname, basename = osutils.split(path)
+        t = transport.get_transport(dirname)
+        if raw:
+            self._dump_raw_bytes(t, basename)
+        else:
+            self._dump_entries(t, basename)
+
+    def _get_index_and_bytes(self, trans, basename):
+        """Create a BTreeGraphIndex and raw bytes."""
+        bt = btree_index.BTreeGraphIndex(trans, basename, None)
+        bytes = trans.get_bytes(basename)
+        bt._file = cStringIO.StringIO(bytes)
+        bt._size = len(bytes)
+        return bt, bytes
+
+    def _dump_raw_bytes(self, trans, basename):
+        import zlib
+
+        # We need to parse at least the root node.
+        # This is because the first page of every row starts with an
+        # uncompressed header.
+        bt, bytes = self._get_index_and_bytes(trans, basename)
+        for page_idx, page_start in enumerate(xrange(0, len(bytes),
+                                                     btree_index._PAGE_SIZE)):
+            page_end = min(page_start + btree_index._PAGE_SIZE, len(bytes))
+            page_bytes = bytes[page_start:page_end]
+            if page_idx == 0:
+                self.outf.write('Root node:\n')
+                header_end, data = bt._parse_header_from_bytes(page_bytes)
+                self.outf.write(page_bytes[:header_end])
+                page_bytes = data
+            self.outf.write('\nPage %d\n' % (page_idx,))
+            decomp_bytes = zlib.decompress(page_bytes)
+            self.outf.write(decomp_bytes)
+            self.outf.write('\n')
+
+    def _dump_entries(self, trans, basename):
+        try:
+            st = trans.stat(basename)
+        except errors.TransportNotPossible:
+            # We can't stat, so we'll fake it because we have to do the 'get()'
+            # anyway.
+            bt, _ = self._get_index_and_bytes(trans, basename)
+        else:
+            bt = btree_index.BTreeGraphIndex(trans, basename, st.st_size)
+        for node in bt.iter_all_entries():
+            # Node is made up of:
+            # (index, key, value, [references])
+            self.outf.write('%s\n' % (node[1:],))
+
 
 class cmd_remove_tree(Command):
     """Remove the working tree from a given branch/checkout.

=== modified file 'bzrlib/tests/blackbox/__init__.py'
--- a/bzrlib/tests/blackbox/__init__.py	2008-06-05 16:27:16 +0000
+++ b/bzrlib/tests/blackbox/__init__.py	2008-10-08 20:40:23 +0000
@@ -62,6 +62,7 @@
                      'bzrlib.tests.blackbox.test_conflicts',
                      'bzrlib.tests.blackbox.test_debug',
                      'bzrlib.tests.blackbox.test_diff',
+                     'bzrlib.tests.blackbox.test_dump_btree',
                      'bzrlib.tests.blackbox.test_exceptions',
                      'bzrlib.tests.blackbox.test_export',
                      'bzrlib.tests.blackbox.test_find_merge_base',

=== added file 'bzrlib/tests/blackbox/test_dump_btree.py'
--- a/bzrlib/tests/blackbox/test_dump_btree.py	1970-01-01 00:00:00 +0000
+++ b/bzrlib/tests/blackbox/test_dump_btree.py	2008-10-08 21:56:12 +0000
@@ -0,0 +1,80 @@
+# Copyright (C) 2008 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+
+"""Tests of the 'bzr dump-btree' command."""
+
+from bzrlib import (
+    btree_index,
+    tests,
+    )
+from bzrlib.tests import (
+    http_server,
+    )
+
+
+class TestDumpBtree(tests.TestCaseWithTransport):
+
+    def create_sample_btree_index(self):
+        builder = btree_index.BTreeBuilder(
+            reference_lists=1, key_elements=2)
+        builder.add_node(('test', 'key1'), 'value', ((('ref', 'entry'),),))
+        builder.add_node(('test', 'key2'), 'value2', ((('ref', 'entry2'),),))
+        builder.add_node(('test2', 'key3'), 'value3', ((('ref', 'entry3'),),))
+        out_f = builder.finish()
+        try:
+            self.build_tree_contents([('test.btree', out_f.read())])
+        finally:
+            out_f.close()
+
+    def test_dump_btree_smoke(self):
+        self.create_sample_btree_index()
+        out, err = self.run_bzr('dump-btree test.btree')
+        self.assertEqualDiff(
+            "(('test', 'key1'), 'value', ((('ref', 'entry'),),))\n"
+            "(('test', 'key2'), 'value2', ((('ref', 'entry2'),),))\n"
+            "(('test2', 'key3'), 'value3', ((('ref', 'entry3'),),))\n",
+            out)
+
+    def test_dump_btree_http_smoke(self):
+        self.transport_readonly_server = http_server.HttpServer
+        self.create_sample_btree_index()
+        url = self.get_readonly_url('test.btree')
+        out, err = self.run_bzr(['dump-btree', url])
+        self.assertEqualDiff(
+            "(('test', 'key1'), 'value', ((('ref', 'entry'),),))\n"
+            "(('test', 'key2'), 'value2', ((('ref', 'entry2'),),))\n"
+            "(('test2', 'key3'), 'value3', ((('ref', 'entry3'),),))\n",
+            out)
+
+    def test_dump_btree_raw_smoke(self):
+        self.create_sample_btree_index()
+        out, err = self.run_bzr('dump-btree test.btree --raw')
+        self.assertEqualDiff(
+            'Root node:\n'
+            'B+Tree Graph Index 2\n'
+            'node_ref_lists=1\n'
+            'key_elements=2\n'
+            'len=3\n'
+            'row_lengths=1\n'
+            '\n'
+            'Page 0\n'
+            'type=leaf\n'
+            'test\0key1\0ref\0entry\0value\n'
+            'test\0key2\0ref\0entry2\0value2\n'
+            'test2\0key3\0ref\0entry3\0value3\n'
+            '\n',
+            out)




More information about the bazaar-commits mailing list