Rev 4469: (jam) Add VF._add_text and reduce memory overhead during commit (see in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Mon Jun 22 18:11:25 BST 2009

At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 4469
revision-id: pqm at pqm.ubuntu.com-20090622171120-fuxez9ylfqpxynqn
parent: pqm at pqm.ubuntu.com-20090622161421-cjjkok3a60d6uqho
parent: john at arbash-meinel.com-20090622155255-nl46rjtd2s0yy4sb
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Mon 2009-06-22 18:11:20 +0100
message:
  (jam) Add VF._add_text and reduce memory overhead during commit (see
  	bug #109114)
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
  bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
  bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
  bzrlib/tests/test_tuned_gzip.py test_tuned_gzip.py-20060418042056-c576dfc708984968
  bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
  bzrlib/tuned_gzip.py           tuned_gzip.py-20060407014720-5aadc518e928e8d2
  bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
    ------------------------------------------------------------
    revno: 4398.8.10
    revision-id: john at arbash-meinel.com-20090622155255-nl46rjtd2s0yy4sb
    parent: john at arbash-meinel.com-20090622154725-eidwkrs93j1qhmsf
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Mon 2009-06-22 10:52:55 -0500
    message:
      Add a NEWS entry documenting the relation to bug #109114
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
    ------------------------------------------------------------
    revno: 4398.8.9
    revision-id: john at arbash-meinel.com-20090622154725-eidwkrs93j1qhmsf
    parent: john at arbash-meinel.com-20090622153706-55n968lsh3v3dht7
    parent: pqm at pqm.ubuntu.com-20090622102620-6mdwon5k3pg1brgl
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Mon 2009-06-22 10:47:25 -0500
    message:
      Merge bzr.dev 4467 in prep for updating NEWS
    removed:
      doc/developers/performance-contributing.txt performancecontribut-20070621063612-ac4zhhagjzkr21qp-1
    added:
      bzrlib/_known_graph_py.py      _known_graph_py.py-20090610185421-vw8vfda2cgnckgb1-1
      bzrlib/_known_graph_pyx.pyx    _known_graph_pyx.pyx-20090610194911-yjk73td9hpjilas0-1
      bzrlib/help_topics/en/diverged-branches.txt divergedbranches.txt-20090608035534-mb4ry8so4hw238n0-1
      bzrlib/tests/per_repository_reference/test_get_rev_id_for_revno.py test_get_rev_id_for_-20090615064050-b6mq6co557towrxh-1
      bzrlib/tests/test__known_graph.py test__known_graph.py-20090610185421-vw8vfda2cgnckgb1-2
      bzrlib/util/bencode.py         bencode.py-20090609141817-jtvhqq6vyryjoeky-1
      doc/developers/bug-handling.txt bughandling.txt-20090615072247-mplym00zjq2n4s61-1
      doc/index.ru.txt               index.ru.txt-20080819091426-kfq61l02dhm9pplk-1
      doc/ru/                        ru-20080818031309-t3nyctvfbvfh4h2u-1
      doc/ru/mini-tutorial/          minitutorial-20080818031309-t3nyctvfbvfh4h2u-2
      doc/ru/mini-tutorial/index.txt index.txt-20080818031309-t3nyctvfbvfh4h2u-4
      doc/ru/quick-reference/        quickreference-20080818031309-t3nyctvfbvfh4h2u-3
      doc/ru/quick-reference/Makefile makefile-20080818031309-t3nyctvfbvfh4h2u-5
      doc/ru/quick-reference/quick-start-summary.pdf quickstartsummary.pd-20080818031309-t3nyctvfbvfh4h2u-6
      doc/ru/quick-reference/quick-start-summary.png quickstartsummary.pn-20080818031309-t3nyctvfbvfh4h2u-7
      doc/ru/quick-reference/quick-start-summary.svg quickstartsummary.sv-20080818031309-t3nyctvfbvfh4h2u-8
      doc/ru/tutorials/              docrututorials-20090427084615-toum0jo7qohd807p-1
      doc/ru/tutorials/centralized_workflow.txt centralized_workflow-20090531190825-ex3ums4bcuaf2r6k-1
      doc/ru/tutorials/tutorial.txt  tutorial.txt-20090602180629-wkp7wr27jl4i2zep-1
      doc/ru/tutorials/using_bazaar_with_launchpad.txt using_bazaar_with_la-20090427084917-b22ppqtdx7q4hapw-1
      doc/ru/user-guide/             docruuserguide-20090601191403-rcoy6nsre0vjiozm-1
      doc/ru/user-guide/branching_a_project.txt branching_a_project.-20090602104644-pjpwfx7xh2k5l0ba-1
      doc/ru/user-guide/core_concepts.txt core_concepts.txt-20090602104644-pjpwfx7xh2k5l0ba-2
      doc/ru/user-guide/images/      images-20090601201124-cruf3mmq5cfxeb1w-1
      doc/ru/user-guide/images/workflows_centralized.png workflows_centralize-20090601201124-cruf3mmq5cfxeb1w-3
      doc/ru/user-guide/images/workflows_centralized.svg workflows_centralize-20090601201124-cruf3mmq5cfxeb1w-4
      doc/ru/user-guide/images/workflows_gatekeeper.png workflows_gatekeeper-20090601201124-cruf3mmq5cfxeb1w-5
      doc/ru/user-guide/images/workflows_gatekeeper.svg workflows_gatekeeper-20090601201124-cruf3mmq5cfxeb1w-6
      doc/ru/user-guide/images/workflows_localcommit.png workflows_localcommi-20090601201124-cruf3mmq5cfxeb1w-7
      doc/ru/user-guide/images/workflows_localcommit.svg workflows_localcommi-20090601201124-cruf3mmq5cfxeb1w-8
      doc/ru/user-guide/images/workflows_peer.png workflows_peer.png-20090601201124-cruf3mmq5cfxeb1w-9
      doc/ru/user-guide/images/workflows_peer.svg workflows_peer.svg-20090601201124-cruf3mmq5cfxeb1w-10
      doc/ru/user-guide/images/workflows_pqm.png workflows_pqm.png-20090601201124-cruf3mmq5cfxeb1w-11
      doc/ru/user-guide/images/workflows_pqm.svg workflows_pqm.svg-20090601201124-cruf3mmq5cfxeb1w-12
      doc/ru/user-guide/images/workflows_shared.png workflows_shared.png-20090601201124-cruf3mmq5cfxeb1w-13
      doc/ru/user-guide/images/workflows_shared.svg workflows_shared.svg-20090601201124-cruf3mmq5cfxeb1w-14
      doc/ru/user-guide/images/workflows_single.png workflows_single.png-20090601201124-cruf3mmq5cfxeb1w-15
      doc/ru/user-guide/images/workflows_single.svg workflows_single.svg-20090601201124-cruf3mmq5cfxeb1w-16
      doc/ru/user-guide/index.txt    index.txt-20090601201124-cruf3mmq5cfxeb1w-2
      doc/ru/user-guide/introducing_bazaar.txt introducing_bazaar.t-20090601221109-6ehwbt2pvzgpftlu-1
      doc/ru/user-guide/specifying_revisions.txt specifying_revisions-20090602104644-pjpwfx7xh2k5l0ba-3
      doc/ru/user-guide/stacked.txt  stacked.txt-20090602104644-pjpwfx7xh2k5l0ba-4
      doc/ru/user-guide/using_checkouts.txt using_checkouts.txt-20090602104644-pjpwfx7xh2k5l0ba-5
      doc/ru/user-guide/zen.txt      zen.txt-20090602104644-pjpwfx7xh2k5l0ba-6
      tools/time_graph.py            time_graph.py-20090608210127-6g0epojxnqjo0f0s-1
    renamed:
      generate_docs.py => tools/generate_docs.py bzrinfogen.py-20051211224525-78e7c14f2c955e55
      tools/doc_generate => bzrlib/doc_generate bzrinfogen-20051211214907-45ff5f0af3a80b32
    modified:
      .bzrignore                     bzrignore-20050311232317-81f7b71efa2db11a
      Makefile                       Makefile-20050805140406-d96e3498bb61c5bb
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzr                            bzr.py-20050313053754-5485f144c7006fa6
      bzrlib/__init__.py             __init__.py-20050309040759-33e65acf91bbcd5d
      bzrlib/_dirstate_helpers_c.pyx dirstate_helpers.pyx-20070503201057-u425eni465q4idwn-3
      bzrlib/branch.py               branch.py-20050309040759-e4baf4e0d046576e
      bzrlib/bugtracker.py           bugtracker.py-20070410073305-vu1vu1qosjurg8kb-1
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
      bzrlib/bzrdir.py               bzrdir.py-20060131065624-156dfea39c4387cb
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/chk_serializer.py       chk_serializer.py-20081002064345-2tofdfj2eqq01h4b-1
      bzrlib/commands.py             bzr.py-20050309040720-d10f4714595cf8c3
      bzrlib/commit.py               commit.py-20050511101309-79ec1a0168e0e825
      bzrlib/config.py               config.py-20051011043216-070c74f4e9e338e8
      bzrlib/dirstate.py             dirstate.py-20060728012006-d6mvoihjb3je9peu-1
      bzrlib/doc_generate/__init__.py __init__.py-20051211214907-df9e0e6b493553f1
      bzrlib/doc_generate/autodoc_bash_completion.py big_bash_completion.py-20051211223059-00ecfbfcc8056b78
      bzrlib/doc_generate/autodoc_man.py bzrman.py-20050601153041-0ff7f74de456d15e
      bzrlib/doc_generate/autodoc_rstx.py autodoc_rstx.py-20060420024836-3e0d4a526452193c
      bzrlib/errors.py               errors.py-20050309040759-20512168c4e14fbd
      bzrlib/fetch.py                fetch.py-20050818234941-26fea6105696365d
      bzrlib/filters/__init__.py     __init__.py-20080416080515-mkxl29amuwrf6uir-2
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/help.py                 help.py-20050505025907-4dd7a6d63912f894
      bzrlib/help_topics/__init__.py help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
      bzrlib/help_topics/en/configuration.txt configuration.txt-20060314161707-868350809502af01
      bzrlib/help_topics/en/eol.txt  eol.txt-20090327060429-todzdjmqt3bpv5r8-3
      bzrlib/hooks.py                hooks.py-20070325015548-ix4np2q0kd8452au-1
      bzrlib/index.py                index.py-20070712131115-lolkarso50vjr64s-1
      bzrlib/inventory.py            inventory.py-20050309040759-6648b84ca2005b37
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/lock.py                 lock.py-20050527050856-ec090bb51bc03349
      bzrlib/mail_client.py          mail_client.py-20070809192806-vuxt3t19srtpjpdn-1
      bzrlib/osutils.py              osutils.py-20050309040759-eeaff12fbf77ac86
      bzrlib/pack.py                 container.py-20070607160755-tr8zc26q18rn0jnb-1
      bzrlib/progress.py             progress.py-20050610070202-df9faaab791964c0
      bzrlib/push.py                 push.py-20080606021927-5fe39050e8xne9un-1
      bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
      bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
      bzrlib/repofmt/knitrepo.py     knitrepo.py-20070206081537-pyy4a00xdas0j4pf-1
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/revision.py             revision.py-20050309040759-e77802c08f3999d5
      bzrlib/revisiontree.py         revisiontree.py-20060724012533-bg8xyryhxd0o0i0h-1
      bzrlib/serializer.py           serializer.py-20090402143702-wmkh9cfjhwpju0qi-1
      bzrlib/shellcomplete.py        shellcomplete.py-20050822153127-3be115ff5e70fc39
      bzrlib/smart/bzrdir.py         bzrdir.py-20061122024551-ol0l0o0oofsu9b3t-1
      bzrlib/smart/medium.py         medium.py-20061103051856-rgu2huy59fkz902q-1
      bzrlib/smart/repository.py     repository.py-20061128022038-vr5wy5bubyb8xttk-1
      bzrlib/smart/request.py        request.py-20061108095550-gunadhxmzkdjfeek-1
      bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
      bzrlib/tests/blackbox/test_branch.py test_branch.py-20060524161337-noms9gmcwqqrfi8y-1
      bzrlib/tests/blackbox/test_diff.py test_diff.py-20060110203741-aa99ac93e633d971
      bzrlib/tests/blackbox/test_init.py test_init.py-20060309032856-a292116204d86eb7
      bzrlib/tests/blackbox/test_ls.py test_ls.py-20060712232047-0jraqpecwngee12y-1
      bzrlib/tests/blackbox/test_pull.py test_pull.py-20051201144907-64959364f629947f
      bzrlib/tests/blackbox/test_push.py test_push.py-20060329002750-929af230d5d22663
      bzrlib/tests/blackbox/test_split.py test_split.py-20061008023421-qy0vdpzysh5rriu8-1
      bzrlib/tests/blackbox/test_status.py teststatus.py-20050712014354-508855eb9f29f7dc
      bzrlib/tests/branch_implementations/test_dotted_revno_to_revision_id.py test_dotted_revno_to-20090121014844-6x7d9jtri5sspg1o-1
      bzrlib/tests/branch_implementations/test_push.py test_push.py-20070130153159-fhfap8uoifevg30j-1
      bzrlib/tests/branch_implementations/test_stacking.py test_stacking.py-20080214020755-msjlkb7urobwly0f-1
      bzrlib/tests/bzrdir_implementations/test_bzrdir.py test_bzrdir.py-20060131065642-0ebeca5e30e30866
      bzrlib/tests/per_repository/test_add_inventory_by_delta.py test_add_inventory_d-20081013002626-rut81igtlqb4590z-1
      bzrlib/tests/per_repository/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
      bzrlib/tests/per_repository_reference/__init__.py __init__.py-20080220025549-nnm2s80it1lvcwnc-2
      bzrlib/tests/test_bzrdir.py    test_bzrdir.py-20060131065654-deba40eef51cf220
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
      bzrlib/tests/test_commands.py  test_command.py-20051019190109-3b17be0f52eaa7a8
      bzrlib/tests/test_commit_merge.py test_commit_merge.py-20050920084723-819eeeff77907bc5
      bzrlib/tests/test_eol_filters.py test_eol_filters.py-20090327060429-todzdjmqt3bpv5r8-2
      bzrlib/tests/test_filters.py   test_filters.py-20080417120614-tc3zok0vvvprsc99-1
      bzrlib/tests/test_generate_docs.py test_generate_docs.p-20070102123151-cqctnsrlqwmiljd7-1
      bzrlib/tests/test_graph.py     test_graph_walker.py-20070525030405-enq4r60hhi9xrujc-1
      bzrlib/tests/test_help.py      test_help.py-20070419045354-6q6rq15j9e2n5fna-1
      bzrlib/tests/test_inv.py       testinv.py-20050722220913-1dc326138d1a5892
      bzrlib/tests/test_knit.py      test_knit.py-20051212171302-95d4c00dd5f11f2b
      bzrlib/tests/test_mail_client.py test_mail_client.py-20070809192806-vuxt3t19srtpjpdn-2
      bzrlib/tests/test_options.py   testoptions.py-20051014093702-96457cfc86319a8f
      bzrlib/tests/test_pack.py      test_container.py-20070607160755-tr8zc26q18rn0jnb-2
      bzrlib/tests/test_pack_repository.py test_pack_repository-20080801043947-eaw0e6h2gu75kwmy-1
      bzrlib/tests/test_plugins.py   plugins.py-20050622075746-32002b55e5e943e9
      bzrlib/tests/test_progress.py  test_progress.py-20060308160359-978c397bc79b7fda
      bzrlib/tests/test_remote.py    test_remote.py-20060720103555-yeeg2x51vn0rbtdp-2
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
      bzrlib/tests/test_smart.py     test_smart.py-20061122024551-ol0l0o0oofsu9b3t-2
      bzrlib/tests/test_ui.py        test_ui.py-20051130162854-458e667a7414af09
      bzrlib/tests/tree_implementations/test_list_files.py test_list_files.py-20070216005501-cjh6fzprbe9lbs2t-1
      bzrlib/tests/workingtree_implementations/test_content_filters.py test_content_filters-20080424071441-8navsrmrfdxpn90a-1
      bzrlib/tests/workingtree_implementations/test_eol_conversion.py test_eol_conversion.-20090327060429-todzdjmqt3bpv5r8-4
      bzrlib/transform.py            transform.py-20060105172343-dd99e54394d91687
      bzrlib/transport/sftp.py       sftp.py-20051019050329-ab48ce71b7e32dfe
      bzrlib/ui/text.py              text.py-20051130153916-2e438cffc8afc478
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
      bzrlib/weave.py                knit.py-20050627021749-759c29984154256b
      bzrlib/win32utils.py           win32console.py-20051021033308-123c6c929d04973d
      bzrlib/workingtree.py          workingtree.py-20050511021032-29b6ec0a681e02e3
      bzrlib/workingtree_4.py        workingtree_4.py-20070208044105-5fgpc5j3ljlh5q6c-1
      bzrlib/xml4.py                 xml4.py-20050916091259-db5ab55e7e6ca324
      bzrlib/xml8.py                 xml5.py-20050907032657-aac8f960815b66b1
      bzrlib/xml_serializer.py       xml.py-20050309040759-57d51586fdec365d
      doc/developers/cycle.txt       cycle.txt-20081017031739-rw24r0cywm2ok3xu-1
      doc/developers/index.txt       index.txt-20070508041241-qznziunkg0nffhiw-1
      doc/developers/performance-roadmap.txt performanceroadmap.t-20070507174912-mwv3xv517cs4sisd-2
      doc/developers/planned-change-integration.txt plannedchangeintegra-20070619004702-i1b3ccamjtfaoq6w-1
      doc/developers/releasing.txt   releasing.txt-20080502015919-fnrcav8fwy8ccibu-1
      doc/en/developer-guide/HACKING.txt HACKING-20050805200004-2a5dc975d870f78c
      doc/en/quick-reference/Makefile makefile-20070813143223-5i7bgw7w8s7l3ae2-2
      doc/en/quick-reference/quick-start-summary.png quickstartsummary.pn-20071203142852-hsiybkmh37q5owwe-1
      doc/en/tutorials/using_bazaar_with_launchpad.txt using_bazaar_with_lp-20071211073140-7msh8uf9a9h4y9hb-1
      doc/en/user-guide/images/workflows_centralized.png workflows_centralize-20071114035000-q36a9h57ps06uvnl-8
      doc/en/user-guide/images/workflows_gatekeeper.png workflows_gatekeeper-20071114035000-q36a9h57ps06uvnl-9
      doc/en/user-guide/images/workflows_localcommit.png workflows_localcommi-20071114035000-q36a9h57ps06uvnl-10
      doc/en/user-guide/images/workflows_peer.png workflows_peer.png-20071114035000-q36a9h57ps06uvnl-11
      doc/en/user-guide/images/workflows_pqm.png workflows_pqm.png-20071114035000-q36a9h57ps06uvnl-12
      doc/en/user-guide/images/workflows_shared.png workflows_shared.png-20071114035000-q36a9h57ps06uvnl-13
      doc/en/user-guide/images/workflows_single.png workflows_single.png-20071114035000-q36a9h57ps06uvnl-14
      doc/en/user-guide/introducing_bazaar.txt introducing_bazaar.t-20071114035000-q36a9h57ps06uvnl-5
      doc/index.txt                  index.txt-20070813101924-07gd9i9d2jt124bf-1
      setup.py                       setup.py-20050314065409-02f8a0a6e3f9bc70
      tools/win32/build_release.py   build_release.py-20081105204355-2ghh5cv01v1x4rzz-1
      tools/generate_docs.py         bzrinfogen.py-20051211224525-78e7c14f2c955e55
    ------------------------------------------------------------
    revno: 4398.8.8
    revision-id: john at arbash-meinel.com-20090622153706-55n968lsh3v3dht7
    parent: john at arbash-meinel.com-20090605140819-s7mbsn5e4ifr9xub
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Mon 2009-06-22 10:37:06 -0500
    message:
      Respond to Andrew's review comments.
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
    ------------------------------------------------------------
    revno: 4398.8.7
    revision-id: john at arbash-meinel.com-20090605140819-s7mbsn5e4ifr9xub
    parent: john at arbash-meinel.com-20090604210951-5mxlt1h8p4xdh6pl
    parent: pqm at pqm.ubuntu.com-20090605081039-abvojdsxjbg5i4ff
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Fri 2009-06-05 09:08:19 -0500
    message:
      Merge bzr.dev 4413, bringing in the no-delta-index code.
    removed:
      bzrlib/util/tests/test_bencode.py test_bencode.py-20070713042202-qjw8rppxaz7ky6i6-1
    added:
      bzrlib/_bencode_pyx.h          _bencode_pyx.h-20090604155331-53bg7d0udmrvz44n-1
      bzrlib/_bencode_pyx.pyx        bencode.pyx-20070806220735-j75g4ebfnado2i60-3
      bzrlib/benchmarks/bench_tags.py bench_tags.py-20070812104202-0q5i0mqkt72hubof-1
      bzrlib/bencode.py              bencode.py-20070806220735-j75g4ebfnado2i60-2
      bzrlib/tests/test_bencode.py   test_bencode.py-20070806225234-s51cnnkh6raytxti-1
      bzrlib/tests/test_chk_serializer.py test_chk_serializer.-20090515105921-urte9wnhknlj5dyp-1
    renamed:
      bzrlib/util/bencode.py => bzrlib/util/_bencode_py.py bencode.py-20070220044742-sltr28q21w2wzlxi-1
    modified:
      .bzrignore                     bzrignore-20050311232317-81f7b71efa2db11a
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/_groupcompress_pyx.pyx  _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      bzrlib/benchmarks/__init__.py  __init__.py-20060516064526-eb0d37c78e86065d
      bzrlib/branch.py               branch.py-20050309040759-e4baf4e0d046576e
      bzrlib/bundle/serializer/v4.py v10.py-20070611062757-5ggj7k18s9dej0fr-1
      bzrlib/bzrdir.py               bzrdir.py-20060131065624-156dfea39c4387cb
      bzrlib/cache_utf8.py           cache_utf8.py-20060810004311-x4cph46la06h9azm-1
      bzrlib/chk_serializer.py       chk_serializer.py-20081002064345-2tofdfj2eqq01h4b-1
      bzrlib/diff.py                 diff.py-20050309040759-26944fbbf2ebbf36
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/inventory.py            inventory.py-20050309040759-6648b84ca2005b37
      bzrlib/mail_client.py          mail_client.py-20070809192806-vuxt3t19srtpjpdn-1
      bzrlib/multiparent.py          __init__.py-20070410133617-n1jdhcc1n1mibarp-1
      bzrlib/osutils.py              osutils.py-20050309040759-eeaff12fbf77ac86
      bzrlib/python-compat.h         pythoncompat.h-20080924041409-9kvi0fgtuuqp743j-1
      bzrlib/reconcile.py            reweave_inventory.py-20051108164726-1e5e0934febac06e
      bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
      bzrlib/repofmt/groupcompress_repo.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/serializer.py           serializer.py-20090402143702-wmkh9cfjhwpju0qi-1
      bzrlib/shelf.py                prepare_shelf.py-20081005181341-n74qe6gu1e65ad4v-1
      bzrlib/smart/protocol.py       protocol.py-20061108035435-ot0lstk2590yqhzr-1
      bzrlib/smart/repository.py     repository.py-20061128022038-vr5wy5bubyb8xttk-1
      bzrlib/tag.py                  tag.py-20070212110532-91cw79inah2cfozx-1
      bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
      bzrlib/tests/blackbox/test_branch.py test_branch.py-20060524161337-noms9gmcwqqrfi8y-1
      bzrlib/tests/blackbox/test_export.py test_export.py-20051229024010-e6c26658e460fb1c
      bzrlib/tests/branch_implementations/test_check.py test_check.py-20080429151303-1sbfclxhddpz0tnj-1
      bzrlib/tests/branch_implementations/test_reconcile.py test_reconcile.py-20080429161555-qlmccuyeyt6pvho7-1
      bzrlib/tests/branch_implementations/test_sprout.py test_sprout.py-20070521151739-b8t8p7axw1h966ws-1
      bzrlib/tests/inventory_implementations/basics.py basics.py-20070903044446-kdjwbiu1p1zi9phs-1
      bzrlib/tests/per_repository/test_iter_reverse_revision_history.py test_iter_reverse_re-20070217015036-spu7j5ggch7pbpyd-1
      bzrlib/tests/per_repository/test_reconcile.py test_reconcile.py-20060223022332-572ef70a3288e369
      bzrlib/tests/per_repository/test_revision.py testrevprops.py-20051013073044-92bc3c68302ce1bf
      bzrlib/tests/per_repository_reference/test_initialize.py test_initialize.py-20090527083941-4rz2urcthjet5e2i-1
      bzrlib/tests/test__groupcompress.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
      bzrlib/tests/test_mail_client.py test_mail_client.py-20070809192806-vuxt3t19srtpjpdn-2
      bzrlib/tests/test_osutils.py   test_osutils.py-20051201224856-e48ee24c12182989
      bzrlib/tests/test_remote.py    test_remote.py-20060720103555-yeeg2x51vn0rbtdp-2
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
      bzrlib/tests/test_serializer.py test_serializer.py-20090403213933-q6x117y8t9fbeyoz-1
      bzrlib/tests/test_smart.py     test_smart.py-20061122024551-ol0l0o0oofsu9b3t-2
      bzrlib/tests/test_source.py    test_source.py-20051207061333-a58dea6abecc030d
      bzrlib/tests/test_transform.py test_transaction.py-20060105172520-b3ffb3946550e6c4
      bzrlib/transform.py            transform.py-20060105172343-dd99e54394d91687
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
      setup.py                       setup.py-20050314065409-02f8a0a6e3f9bc70
      bzrlib/util/_bencode_py.py     bencode.py-20070220044742-sltr28q21w2wzlxi-1
    ------------------------------------------------------------
    revno: 4398.8.6
    revision-id: john at arbash-meinel.com-20090604210951-5mxlt1h8p4xdh6pl
    parent: john at arbash-meinel.com-20090602203242-fc65n34rxr7169dh
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Thu 2009-06-04 16:09:51 -0500
    message:
      Switch the api from VF.add_text to VF._add_text and trim some extra 'features'.
      
      Commit won't be using parent_texts or left_matching_blocks or check_content.
      And for something like fast-import, it will be tuned for GC repositories anyway,
      and there we don't need parent_texts anyway.
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
    ------------------------------------------------------------
    revno: 4398.8.5
    revision-id: john at arbash-meinel.com-20090602203242-fc65n34rxr7169dh
    parent: john at arbash-meinel.com-20090602200816-to10sqhv812zxn5p
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Tue 2009-06-02 15:32:42 -0500
    message:
      Fix a few more cases where we were adding a list rather than an empty string.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
    ------------------------------------------------------------
    revno: 4398.8.4
    revision-id: john at arbash-meinel.com-20090602200816-to10sqhv812zxn5p
    parent: john at arbash-meinel.com-20090602195917-j9wym7m75ed9tnk8
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Tue 2009-06-02 15:08:16 -0500
    message:
      Implement add_text for GroupCompressVersionedFiles
      The main change is to just use a FulltextContentFactory instead of Chunked.
      Should ultimately do much better for memory consumption, etc of initial commit.
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
    ------------------------------------------------------------
    revno: 4398.8.3
    revision-id: john at arbash-meinel.com-20090602195917-j9wym7m75ed9tnk8
    parent: john at arbash-meinel.com-20090602195624-utljsyz0qgmq63lg
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Tue 2009-06-02 14:59:17 -0500
    message:
      Rewrite some of the internals of KnitVersionedFiles._add()
      
      Avoid creating lots of copies of the same data using a direct implementation
      of Knit.add_text.
      We still always create 1 list of the lines and 1 fulltext string, but at least
      now we should avoid having more than those 2 copies.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
    ------------------------------------------------------------
    revno: 4398.8.2
    revision-id: john at arbash-meinel.com-20090602195624-utljsyz0qgmq63lg
    parent: john at arbash-meinel.com-20090602185918-86l9eljnn8z2iljk
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Tue 2009-06-02 14:56:24 -0500
    message:
      Add a chunks_to_gzip function.
      This allows the _record_to_data code to build up a list of chunks,
      rather than requiring a single string.
      It should be ~ the same performance when using a single string, since
      we are only adding a for() loop over the chunks and an if check.
      We could possibly just remove the if check and not worry about adding
      some empty strings in there.
    modified:
      bzrlib/tests/test_tuned_gzip.py test_tuned_gzip.py-20060418042056-c576dfc708984968
      bzrlib/tuned_gzip.py           tuned_gzip.py-20060407014720-5aadc518e928e8d2
    ------------------------------------------------------------
    revno: 4398.8.1
    revision-id: john at arbash-meinel.com-20090602185918-86l9eljnn8z2iljk
    parent: pqm at pqm.ubuntu.com-20090602153906-1q6bresxw669b34i
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.16-commit-fulltext
    timestamp: Tue 2009-06-02 13:59:18 -0500
    message:
      Add a VersionedFile.add_text() api.
      
      Similar to VF.add_lines() except it takes a string for the content, rather
      than a list of lines.
      
      For now, it just thunks over to VF.add_lines(), but it will be special
      cased in the future.
    modified:
      bzrlib/groupcompress.py        groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
      bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
=== modified file 'NEWS'

--- a/NEWS	2009-06-22 15:13:45 +0000
+++ b/NEWS	2009-06-22 17:11:20 +0000
@@ -42,6 +42,11 @@
   ``BZR_PROGRESS_BAR`` is set to ``none``.
   (Martin Pool, #339385)
 
+* Reduced memory consumption during ``bzr commit`` of large files. For
+  pre 2a formats, should be down to ~3x the size of a file, and for
+  ``--2a`` formats should be down to exactly 2x the size. Related to bug
+  #109114. (John Arbash Meinel)
+
 * Unshelve works correctly when multiple zero-length files are present on
   the shelf. (Aaron Bentley, #363444)
 
@@ -66,6 +71,12 @@
   properly fetch the minimum number of texts for non-smart fetching.
   (John Arbash Meinel)
 
+* ``VersionedFiles._add_text`` is a new api that lets us insert text into
+  the repository as a single string, rather than a list of lines. This can
+  improve memory overhead and performance of committing large files.
+  (Currently a private api, used only by commit). (John Arbash Meinel)
+  
+
 
 Improvements
 ************

=== modified file 'bzrlib/groupcompress.py'
--- a/bzrlib/groupcompress.py	2009-06-11 20:34:56 +0000
+++ b/bzrlib/groupcompress.py	2009-06-22 15:47:25 +0000
@@ -1008,6 +1008,24 @@
                                                nostore_sha=nostore_sha))[0]
         return sha1, length, None
 
+    def _add_text(self, key, parents, text, nostore_sha=None, random_id=False):
+        """See VersionedFiles.add_text()."""
+        self._index._check_write_ok()
+        self._check_add(key, None, random_id, check_content=False)
+        if text.__class__ is not str:
+            raise errors.BzrBadParameterUnicode("text")
+        if parents is None:
+            # The caller might pass None if there is no graph data, but kndx
+            # indexes can't directly store that, so we give them
+            # an empty tuple instead.
+            parents = ()
+        # double handling for now. Make it work until then.
+        length = len(text)
+        record = FulltextContentFactory(key, parents, None, text)
+        sha1 = list(self._insert_record_stream([record], random_id=random_id,
+                                               nostore_sha=nostore_sha))[0]
+        return sha1, length, None
+
     def add_fallback_versioned_files(self, a_versioned_files):
         """Add a source of texts for texts not present in this knit.
 
@@ -1521,8 +1539,6 @@
 
         :return: An iterator over (line, key).
         """
-        if pb is None:
-            pb = progress.DummyProgress()
         keys = set(keys)
         total = len(keys)
         # we don't care about inclusions, the caller cares.
@@ -1532,13 +1548,15 @@
             'unordered', True)):
             # XXX: todo - optimise to use less than full texts.
             key = record.key
-            pb.update('Walking content', key_idx, total)
+            if pb is not None:
+                pb.update('Walking content', key_idx, total)
             if record.storage_kind == 'absent':
                 raise errors.RevisionNotPresent(key, self)
             lines = osutils.split_lines(record.get_bytes_as('fulltext'))
             for line in lines:
                 yield line, key
-        pb.update('Walking content', total, total)
+        if pb is not None:
+            pb.update('Walking content', total, total)
 
     def keys(self):
         """See VersionedFiles.keys."""
@@ -1605,7 +1623,7 @@
                 if refs:
                     for ref in refs:
                         if ref:
-                            raise KnitCorrupt(self,
+                            raise errors.KnitCorrupt(self,
                                 "attempt to add node with parents "
                                 "in parentless index.")
                     refs = ()
@@ -1668,7 +1686,7 @@
         if check_present:
             missing_keys = keys.difference(found_keys)
             if missing_keys:
-                raise RevisionNotPresent(missing_keys.pop(), self)
+                raise errors.RevisionNotPresent(missing_keys.pop(), self)
 
     def get_parent_map(self, keys):
         """Get a map of the parents of keys.

=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py	2009-06-17 21:00:32 +0000
+++ b/bzrlib/knit.py	2009-06-22 15:47:25 +0000
@@ -53,7 +53,7 @@
 
 
 from cStringIO import StringIO
-from itertools import izip, chain
+from itertools import izip
 import operator
 import os
 import sys
@@ -686,7 +686,7 @@
         content = knit._get_content(key)
         # adjust for the fact that serialised annotations are only key suffixes
         # for this factory.
-        if type(key) == tuple:
+        if type(key) is tuple:
             prefix = key[:-1]
             origins = content.annotate()
             result = []
@@ -909,18 +909,45 @@
             # indexes can't directly store that, so we give them
             # an empty tuple instead.
             parents = ()
+        line_bytes = ''.join(lines)
         return self._add(key, lines, parents,
-            parent_texts, left_matching_blocks, nostore_sha, random_id)
+            parent_texts, left_matching_blocks, nostore_sha, random_id,
+            line_bytes=line_bytes)
+
+    def _add_text(self, key, parents, text, nostore_sha=None, random_id=False):
+        """See VersionedFiles.add_text()."""
+        self._index._check_write_ok()
+        self._check_add(key, None, random_id, check_content=False)
+        if text.__class__ is not str:
+            raise errors.BzrBadParameterUnicode("text")
+        if parents is None:
+            # The caller might pass None if there is no graph data, but kndx
+            # indexes can't directly store that, so we give them
+            # an empty tuple instead.
+            parents = ()
+        return self._add(key, None, parents,
+            None, None, nostore_sha, random_id,
+            line_bytes=text)
 
     def _add(self, key, lines, parents, parent_texts,
-        left_matching_blocks, nostore_sha, random_id):
+        left_matching_blocks, nostore_sha, random_id,
+        line_bytes):
         """Add a set of lines on top of version specified by parents.
 
         Any versions not present will be converted into ghosts.
+
+        :param lines: A list of strings where each one is a single line (has a
+            single newline at the end of the string) This is now optional
+            (callers can pass None). It is left in its location for backwards
+            compatibility. It should ''.join(lines) must == line_bytes
+        :param line_bytes: A single string containing the content
+
+        We pass both lines and line_bytes because different routes bring the
+        values to this function. And for memory efficiency, we don't want to
+        have to split/join on-demand.
         """
         # first thing, if the content is something we don't need to store, find
         # that out.
-        line_bytes = ''.join(lines)
         digest = sha_string(line_bytes)
         if nostore_sha == digest:
             raise errors.ExistingContent
@@ -947,25 +974,34 @@
 
         text_length = len(line_bytes)
         options = []
-        if lines:
-            if lines[-1][-1] != '\n':
-                # copy the contents of lines.
+        no_eol = False
+        # Note: line_bytes is not modified to add a newline, that is tracked
+        #       via the no_eol flag. 'lines' *is* modified, because that is the
+        #       general values needed by the Content code.
+        if line_bytes and line_bytes[-1] != '\n':
+            options.append('no-eol')
+            no_eol = True
+            # Copy the existing list, or create a new one
+            if lines is None:
+                lines = osutils.split_lines(line_bytes)
+            else:
                 lines = lines[:]
-                options.append('no-eol')
-                lines[-1] = lines[-1] + '\n'
-                line_bytes += '\n'
+            # Replace the last line with one that ends in a final newline
+            lines[-1] = lines[-1] + '\n'
+        if lines is None:
+            lines = osutils.split_lines(line_bytes)
 
         for element in key[:-1]:
-            if type(element) != str:
+            if type(element) is not str:
                 raise TypeError("key contains non-strings: %r" % (key,))
         if key[-1] is None:
             key = key[:-1] + ('sha1:' + digest,)
-        elif type(key[-1]) != str:
+        elif type(key[-1]) is not str:
                 raise TypeError("key contains non-strings: %r" % (key,))
         # Knit hunks are still last-element only
         version_id = key[-1]
         content = self._factory.make(lines, version_id)
-        if 'no-eol' in options:
+        if no_eol:
             # Hint to the content object that its text() call should strip the
             # EOL.
             content._should_strip_eol = True
@@ -986,8 +1022,11 @@
             if self._factory.__class__ is KnitPlainFactory:
                 # Use the already joined bytes saving iteration time in
                 # _record_to_data.
+                dense_lines = [line_bytes]
+                if no_eol:
+                    dense_lines.append('\n')
                 size, bytes = self._record_to_data(key, digest,
-                    lines, [line_bytes])
+                    lines, dense_lines)
             else:
                 # get mixed annotation + content and feed it into the
                 # serialiser.
@@ -1920,21 +1959,16 @@
             function spends less time resizing the final string.
         :return: (len, a StringIO instance with the raw data ready to read.)
         """
-        # Note: using a string copy here increases memory pressure with e.g.
-        # ISO's, but it is about 3 seconds faster on a 1.2Ghz intel machine
-        # when doing the initial commit of a mozilla tree. RBC 20070921
-        bytes = ''.join(chain(
-            ["version %s %d %s\n" % (key[-1],
-                                     len(lines),
-                                     digest)],
-            dense_lines or lines,
-            ["end %s\n" % key[-1]]))
-        if type(bytes) != str:
-            raise AssertionError(
-                'data must be plain bytes was %s' % type(bytes))
+        chunks = ["version %s %d %s\n" % (key[-1], len(lines), digest)]
+        chunks.extend(dense_lines or lines)
+        chunks.append("end %s\n" % key[-1])
+        for chunk in chunks:
+            if type(chunk) is not str:
+                raise AssertionError(
+                    'data must be plain bytes was %s' % type(chunk))
         if lines and lines[-1][-1] != '\n':
             raise ValueError('corrupt lines value %r' % lines)
-        compressed_bytes = tuned_gzip.bytes_to_gzip(bytes)
+        compressed_bytes = tuned_gzip.chunks_to_gzip(chunks)
         return len(compressed_bytes), compressed_bytes
 
     def _split_header(self, line):
@@ -2375,7 +2409,7 @@
                     line = "\n%s %s %s %s %s :" % (
                         key[-1], ','.join(options), pos, size,
                         self._dictionary_compress(parents))
-                    if type(line) != str:
+                    if type(line) is not str:
                         raise AssertionError(
                             'data must be utf8 was %s' % type(line))
                     lines.append(line)
@@ -2570,7 +2604,7 @@
         result = set()
         # Identify all key prefixes.
         # XXX: A bit hacky, needs polish.
-        if type(self._mapper) == ConstantMapper:
+        if type(self._mapper) is ConstantMapper:
             prefixes = [()]
         else:
             relpaths = set()
@@ -2608,7 +2642,7 @@
                     del self._history
                 except NoSuchFile:
                     self._kndx_cache[prefix] = ({}, [])
-                    if type(self._mapper) == ConstantMapper:
+                    if type(self._mapper) is ConstantMapper:
                         # preserve behaviour for revisions.kndx etc.
                         self._init_index(path)
                     del self._cache
@@ -3094,7 +3128,7 @@
             opaque index memo. For _KnitKeyAccess the memo is (key, pos,
             length), where the key is the record key.
         """
-        if type(raw_data) != str:
+        if type(raw_data) is not str:
             raise AssertionError(
                 'data must be plain bytes was %s' % type(raw_data))
         result = []
@@ -3183,7 +3217,7 @@
             length), where the index field is the write_index object supplied
             to the PackAccess object.
         """
-        if type(raw_data) != str:
+        if type(raw_data) is not str:
             raise AssertionError(
                 'data must be plain bytes was %s' % type(raw_data))
         result = []

=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py	2009-06-17 17:57:15 +0000
+++ b/bzrlib/repository.py	2009-06-22 15:47:25 +0000
@@ -494,12 +494,12 @@
             ie.executable = content_summary[2]
             file_obj, stat_value = tree.get_file_with_stat(ie.file_id, path)
             try:
-                lines = file_obj.readlines()
+                text = file_obj.read()
             finally:
                 file_obj.close()
             try:
                 ie.text_sha1, ie.text_size = self._add_text_to_weave(
-                    ie.file_id, lines, heads, nostore_sha)
+                    ie.file_id, text, heads, nostore_sha)
                 # Let the caller know we generated a stat fingerprint.
                 fingerprint = (ie.text_sha1, stat_value)
             except errors.ExistingContent:
@@ -517,8 +517,7 @@
                 # carry over:
                 ie.revision = parent_entry.revision
                 return self._get_delta(ie, basis_inv, path), False, None
-            lines = []
-            self._add_text_to_weave(ie.file_id, lines, heads, None)
+            self._add_text_to_weave(ie.file_id, '', heads, None)
         elif kind == 'symlink':
             current_link_target = content_summary[3]
             if not store:
@@ -532,8 +531,7 @@
                 ie.symlink_target = parent_entry.symlink_target
                 return self._get_delta(ie, basis_inv, path), False, None
             ie.symlink_target = current_link_target
-            lines = []
-            self._add_text_to_weave(ie.file_id, lines, heads, None)
+            self._add_text_to_weave(ie.file_id, '', heads, None)
         elif kind == 'tree-reference':
             if not store:
                 if content_summary[3] != parent_entry.reference_revision:
@@ -544,8 +542,7 @@
                 ie.revision = parent_entry.revision
                 return self._get_delta(ie, basis_inv, path), False, None
             ie.reference_revision = content_summary[3]
-            lines = []
-            self._add_text_to_weave(ie.file_id, lines, heads, None)
+            self._add_text_to_weave(ie.file_id, '', heads, None)
         else:
             raise NotImplementedError('unknown kind')
         ie.revision = self._new_revision_id
@@ -745,7 +742,7 @@
                         entry.executable = True
                     else:
                         entry.executable = False
-                    if (carry_over_possible and 
+                    if (carry_over_possible and
                         parent_entry.executable == entry.executable):
                             # Check the file length, content hash after reading
                             # the file.
@@ -754,12 +751,12 @@
                         nostore_sha = None
                     file_obj, stat_value = tree.get_file_with_stat(file_id, change[1][1])
                     try:
-                        lines = file_obj.readlines()
+                        text = file_obj.read()
                     finally:
                         file_obj.close()
                     try:
                         entry.text_sha1, entry.text_size = self._add_text_to_weave(
-                            file_id, lines, heads, nostore_sha)
+                            file_id, text, heads, nostore_sha)
                         yield file_id, change[1][1], (entry.text_sha1, stat_value)
                     except errors.ExistingContent:
                         # No content change against a carry_over parent
@@ -774,7 +771,7 @@
                         parent_entry.symlink_target == entry.symlink_target):
                         carried_over = True
                     else:
-                        self._add_text_to_weave(change[0], [], heads, None)
+                        self._add_text_to_weave(change[0], '', heads, None)
                 elif kind == 'directory':
                     if carry_over_possible:
                         carried_over = True
@@ -782,7 +779,7 @@
                         # Nothing to set on the entry.
                         # XXX: split into the Root and nonRoot versions.
                         if change[1][1] != '' or self.repository.supports_rich_root():
-                            self._add_text_to_weave(change[0], [], heads, None)
+                            self._add_text_to_weave(change[0], '', heads, None)
                 elif kind == 'tree-reference':
                     if not self.repository._format.supports_tree_reference:
                         # This isn't quite sane as an error, but we shouldn't
@@ -797,7 +794,7 @@
                         parent_entry.reference_revision == reference_revision):
                         carried_over = True
                     else:
-                        self._add_text_to_weave(change[0], [], heads, None)
+                        self._add_text_to_weave(change[0], '', heads, None)
                 else:
                     raise AssertionError('unknown kind %r' % kind)
                 if not carried_over:
@@ -818,17 +815,11 @@
             self._require_root_change(tree)
         self.basis_delta_revision = basis_revision_id
 
-    def _add_text_to_weave(self, file_id, new_lines, parents, nostore_sha):
-        # Note: as we read the content directly from the tree, we know its not
-        # been turned into unicode or badly split - but a broken tree
-        # implementation could give us bad output from readlines() so this is
-        # not a guarantee of safety. What would be better is always checking
-        # the content during test suite execution. RBC 20070912
-        parent_keys = tuple((file_id, parent) for parent in parents)
-        return self.repository.texts.add_lines(
-            (file_id, self._new_revision_id), parent_keys, new_lines,
-            nostore_sha=nostore_sha, random_id=self.random_revid,
-            check_content=False)[0:2]
+    def _add_text_to_weave(self, file_id, new_text, parents, nostore_sha):
+        parent_keys = tuple([(file_id, parent) for parent in parents])
+        return self.repository.texts._add_text(
+            (file_id, self._new_revision_id), parent_keys, new_text,
+            nostore_sha=nostore_sha, random_id=self.random_revid)[0:2]
 
 
 class RootCommitBuilder(CommitBuilder):

=== modified file 'bzrlib/tests/test_tuned_gzip.py'
--- a/bzrlib/tests/test_tuned_gzip.py	2009-03-23 14:59:43 +0000
+++ b/bzrlib/tests/test_tuned_gzip.py	2009-06-02 19:56:24 +0000
@@ -85,3 +85,28 @@
         self.assertEqual('', stream.read())
         # and it should be new member time in the stream.
         self.failUnless(myfile._new_member)
+
+
+class TestToGzip(TestCase):
+
+    def assertToGzip(self, chunks):
+        bytes = ''.join(chunks)
+        gzfromchunks = tuned_gzip.chunks_to_gzip(chunks)
+        gzfrombytes = tuned_gzip.bytes_to_gzip(bytes)
+        self.assertEqual(gzfrombytes, gzfromchunks)
+        decoded = tuned_gzip.GzipFile(fileobj=StringIO(gzfromchunks)).read()
+        self.assertEqual(bytes, decoded)
+
+    def test_single_chunk(self):
+        self.assertToGzip(['a modest chunk\nwith some various\nbits\n'])
+
+    def test_simple_text(self):
+        self.assertToGzip(['some\n', 'strings\n', 'to\n', 'process\n'])
+
+    def test_large_chunks(self):
+        self.assertToGzip(['a large string\n'*1024])
+        self.assertToGzip(['a large string\n']*1024)
+
+    def test_enormous_chunks(self):
+        self.assertToGzip(['a large string\n'*1024*256])
+        self.assertToGzip(['a large string\n']*1024*256)

=== modified file 'bzrlib/tests/test_versionedfile.py'
--- a/bzrlib/tests/test_versionedfile.py	2009-05-01 18:09:24 +0000
+++ b/bzrlib/tests/test_versionedfile.py	2009-06-22 15:37:06 +0000
@@ -1471,6 +1471,53 @@
             self.addCleanup(lambda:self.cleanup(files))
         return files
 
+    def get_simple_key(self, suffix):
+        """Return a key for the object under test."""
+        if self.key_length == 1:
+            return (suffix,)
+        else:
+            return ('FileA',) + (suffix,)
+
+    def test_add_lines(self):
+        f = self.get_versionedfiles()
+        key0 = self.get_simple_key('r0')
+        key1 = self.get_simple_key('r1')
+        key2 = self.get_simple_key('r2')
+        keyf = self.get_simple_key('foo')
+        f.add_lines(key0, [], ['a\n', 'b\n'])
+        if self.graph:
+            f.add_lines(key1, [key0], ['b\n', 'c\n'])
+        else:
+            f.add_lines(key1, [], ['b\n', 'c\n'])
+        keys = f.keys()
+        self.assertTrue(key0 in keys)
+        self.assertTrue(key1 in keys)
+        records = []
+        for record in f.get_record_stream([key0, key1], 'unordered', True):
+            records.append((record.key, record.get_bytes_as('fulltext')))
+        records.sort()
+        self.assertEqual([(key0, 'a\nb\n'), (key1, 'b\nc\n')], records)
+
+    def test__add_text(self):
+        f = self.get_versionedfiles()
+        key0 = self.get_simple_key('r0')
+        key1 = self.get_simple_key('r1')
+        key2 = self.get_simple_key('r2')
+        keyf = self.get_simple_key('foo')
+        f._add_text(key0, [], 'a\nb\n')
+        if self.graph:
+            f._add_text(key1, [key0], 'b\nc\n')
+        else:
+            f._add_text(key1, [], 'b\nc\n')
+        keys = f.keys()
+        self.assertTrue(key0 in keys)
+        self.assertTrue(key1 in keys)
+        records = []
+        for record in f.get_record_stream([key0, key1], 'unordered', True):
+            records.append((record.key, record.get_bytes_as('fulltext')))
+        records.sort()
+        self.assertEqual([(key0, 'a\nb\n'), (key1, 'b\nc\n')], records)
+
     def test_annotate(self):
         files = self.get_versionedfiles()
         self.get_diamond_files(files)
@@ -1520,7 +1567,7 @@
             trailing_eol=trailing_eol, nograph=not self.graph,
             left_only=left_only, nokeys=nokeys)
 
-    def test_add_lines_nostoresha(self):
+    def _add_content_nostoresha(self, add_lines):
         """When nostore_sha is supplied using old content raises."""
         vf = self.get_versionedfiles()
         empty_text = ('a', [])
@@ -1528,7 +1575,12 @@
         sample_text_no_nl = ('c', ["foo\n", "bar"])
         shas = []
         for version, lines in (empty_text, sample_text_nl, sample_text_no_nl):
-            sha, _, _ = vf.add_lines(self.get_simple_key(version), [], lines)
+            if add_lines:
+                sha, _, _ = vf.add_lines(self.get_simple_key(version), [],
+                                         lines)
+            else:
+                sha, _, _ = vf._add_text(self.get_simple_key(version), [],
+                                         ''.join(lines))
             shas.append(sha)
         # we now have a copy of all the lines in the vf.
         for sha, (version, lines) in zip(
@@ -1537,10 +1589,19 @@
             self.assertRaises(errors.ExistingContent,
                 vf.add_lines, new_key, [], lines,
                 nostore_sha=sha)
+            self.assertRaises(errors.ExistingContent,
+                vf._add_text, new_key, [], ''.join(lines),
+                nostore_sha=sha)
             # and no new version should have been added.
             record = vf.get_record_stream([new_key], 'unordered', True).next()
             self.assertEqual('absent', record.storage_kind)
 
+    def test_add_lines_nostoresha(self):
+        self._add_content_nostoresha(add_lines=True)
+
+    def test__add_text_nostoresha(self):
+        self._add_content_nostoresha(add_lines=False)
+
     def test_add_lines_return(self):
         files = self.get_versionedfiles()
         # save code by using the stock data insertion helper.
@@ -1692,13 +1753,6 @@
         self.capture_stream(files, entries, seen.add, parent_map)
         self.assertEqual(set(keys), seen)
 
-    def get_simple_key(self, suffix):
-        """Return a key for the object under test."""
-        if self.key_length == 1:
-            return (suffix,)
-        else:
-            return ('FileA',) + (suffix,)
-
     def get_keys_and_sort_order(self):
         """Get diamond test keys list, and their sort ordering."""
         if self.key_length == 1:

=== modified file 'bzrlib/tuned_gzip.py'
--- a/bzrlib/tuned_gzip.py	2009-03-23 14:59:43 +0000
+++ b/bzrlib/tuned_gzip.py	2009-06-02 19:56:24 +0000
@@ -52,6 +52,18 @@
     width=-zlib.MAX_WBITS, mem=zlib.DEF_MEM_LEVEL,
     crc32=zlib.crc32):
     """Create a gzip file containing bytes and return its content."""
+    return chunks_to_gzip([bytes])
+
+
+def chunks_to_gzip(chunks, factory=zlib.compressobj,
+    level=zlib.Z_DEFAULT_COMPRESSION, method=zlib.DEFLATED,
+    width=-zlib.MAX_WBITS, mem=zlib.DEF_MEM_LEVEL,
+    crc32=zlib.crc32):
+    """Create a gzip file containing chunks and return its content.
+
+    :param chunks: An iterable of strings. Each string can have arbitrary
+        layout.
+    """
     result = [
         '\037\213'  # self.fileobj.write('\037\213')  # magic header
         '\010'      # self.fileobj.write('\010')      # compression method
@@ -69,11 +81,17 @@
     # using a compressobj avoids a small header and trailer that the compress()
     # utility function adds.
     compress = factory(level, method, width, mem, 0)
-    result.append(compress.compress(bytes))
+    crc = 0
+    total_len = 0
+    for chunk in chunks:
+        crc = crc32(chunk, crc)
+        total_len += len(chunk)
+        zbytes = compress.compress(chunk)
+        if zbytes:
+            result.append(zbytes)
     result.append(compress.flush())
-    result.append(struct.pack("<L", LOWU32(crc32(bytes))))
     # size may exceed 2GB, or even 4GB
-    result.append(struct.pack("<L", LOWU32(len(bytes))))
+    result.append(struct.pack("<LL", LOWU32(crc), LOWU32(total_len)))
     return ''.join(result)
 
 

=== modified file 'bzrlib/versionedfile.py'
--- a/bzrlib/versionedfile.py	2009-06-10 03:56:49 +0000
+++ b/bzrlib/versionedfile.py	2009-06-22 15:47:25 +0000
@@ -829,6 +829,36 @@
         """
         raise NotImplementedError(self.add_lines)
 
+    def _add_text(self, key, parents, text, nostore_sha=None, random_id=False):
+        """Add a text to the store.
+
+        This is a private function for use by CommitBuilder.
+
+        :param key: The key tuple of the text to add. If the last element is
+            None, a CHK string will be generated during the addition.
+        :param parents: The parents key tuples of the text to add.
+        :param text: A string containing the text to be committed.
+        :param nostore_sha: Raise ExistingContent and do not add the lines to
+            the versioned file if the digest of the lines matches this.
+        :param random_id: If True a random id has been selected rather than
+            an id determined by some deterministic process such as a converter
+            from a foreign VCS. When True the backend may choose not to check
+            for uniqueness of the resulting key within the versioned file, so
+            this should only be done when the result is expected to be unique
+            anyway.
+        :param check_content: If True, the lines supplied are verified to be
+            bytestrings that are correctly formed lines.
+        :return: The text sha1, the number of bytes in the text, and an opaque
+                 representation of the inserted version which can be provided
+                 back to future _add_text calls in the parent_texts dictionary.
+        """
+        # The default implementation just thunks over to .add_lines(),
+        # inefficient, but it works.
+        return self.add_lines(key, parents, osutils.split_lines(text),
+                              nostore_sha=nostore_sha,
+                              random_id=random_id,
+                              check_content=True)
+
     def add_mpdiffs(self, records):
         """Add mpdiffs to this VersionedFile.