[MERGE] line-endings support: part 1 of 2: versioned properties

Alexander Belchenko bialix at ukr.net
Mon Apr 14 07:20:40 BST 2008


OK, this finally happens.

This is ground part of my line-endings support work. This patch introduce versioned properties in 
bzrlib.

There was a lot of discussions and flames in bzr ML about line-endings and every time discussion
went nowhere because in my understanding without any support for file properties we can't
implement support for line-endings. So my primary goal is eol, and not versioned properties per se.
But I'm trying to do the best I can do in this area.

After several discussions with Dmitry Vasiliev (in ru_bzr ML) I decided to go simplest possible way 
for versioned properties. And this way is to using config-like store for them. I don't want to
implement new working/revision tree format, nor new repository format. So I just use for
storage the .bzrprop file in working tree and leave all other stuff on existing bzrlib code.
Therefore my implementation works with any workingtree/revisiontree formats (WT2, 3 and 4) and
entirely optional for the most of bzr code. But it will be heavily used by actual line-endings support.

I was focused on simple possible implementation, so there are some tradeoffs and limitations in my 
implementation. Because performance is very important I focused on make read property operations
as fast as possible. And simple code is very helpful here. That said I don't have any explicit 
support for namespaces and I think it's right. Any explicit namespaces will require more code
to me to write and therefore only slow down things. I understand that namespaces will be useful
for interoperations with foreign VCS, e.g. in bzr-svn plugin. But I decided to not implement it
at all. As I said my focus was simple and fast code.

I think it's better for all reviewers to read specification first before read my patch. So I put
it below. This specification describes my implementation.

I want to note that integration with Merge3Merger is not finished. Actually properties is merged
in 3-way but only used on the fly in merge process itself. So .bzrprop file actually merged
with default text merger. This should be improved in the future with some help of core developers
(probably Aaron Bentley is right man) who give me advices how I could store content of merged 
properties in final stage of merge and also add specific conflicts. IMO it could be done
after (or if) my patch is landing.

Also, please note that my English is still far from ideal. So if you see wrong wordings in my 
documents and you know how to write better -- please say me.

I want to say that I understand that my VP design is not the best possible. As I said I need simple
code that I could finish in short period of time and I did it. And I proud of this code.
(Because it's actually 3rd version of VP API, thanks to Dmitry Vasiliev who gives me some valuable
advices about my code).

But if other people decided it's not satisfied approach or it's really bad, then there will be no 
sense for me to send 2nd patch for review actual line-endings support.

document from doc/developers/versioned-properties.txt follows.


====================
Versioned properties
====================

Status
======

:Date: 2008-04-11

This document describes the simple implementation of versioned
properties in bzr.


.. contents::


Motivation
==========
To provide a *simple* way to implement versioned properties and to use
them by end users and various pieces of software working with bzr
repositories or working trees.

Simple is the key part here. Because versioned properties are not ends,
but are means. They provide the required basis for implementing many
other features, like line-endings support, better diff and annotation
representation of non-ascii content in GUI tools, or better integration
with webtools like Loggerhead or Trac+bzr.

Terminology
===========
*Versioned Properties* (VP) is a generic container to store special
meta information associated with versioned entries (files, symlinks,
probably directories). VP is tightly coupled with working or revision tree,
and with their inventories.

Overview
========
VP provide generic container for data.

It's intended to store special internal data about content of files
under version control. Although there is no restriction on content length,
it should be used for relatively short strings of data.

 From performance reasons VP implementations kept very simple under the hood
and should be optimized for fastest possible 'get property' operations.

Internals
---------
VP behaves as dictionary object, with strings keys and values.
Every property therefore could be expressed as pair ``key = value``.
VP does not provide any particular meaning for key/value pairs.
Clients may assert a meaning for themselves.

VP provide API to read/write value by key, or delete some keys.

Limitations
~~~~~~~~~~~
Under the hood ConfigObj library is used for reading, parsing content of
VP storage, making dictionary lookups, and for writing VP back to disk.
So all limitations of used ConfigObj library applied to current
VP implementation.

Namespaces
~~~~~~~~~~
VP does not support explicit namespaces for keys. All keys are just strings,
so clients could use their own notation for namespaces,
e.g. like in svn used 'svn:ignore' etc.

Reasons to do not implement explicit namespaces are limitations of
used VP file parser (ConfigObj does not have explicit support for namespaces)
and performance in mind.

Explicit namespace will require additional dictionary, i.e.

	namespace -> key -> value

and therefore each operation will do additional dictionary lookup. This is
important point because line-endings support will heavily used 'get property'
method for most of operations with working trees.

VP Storage Format
-----------------
In working and revision trees VP stored in versioned text file
with name '.bzrprop'. Content of this file follows the format of config
files. I.e. there is several sections, each section name decorated by
square brackets, within each sections there is several pairs ``key = value``.
E.g.::

     [*.txt]
     eol = native

     [*.bin]
     bin = yes

     [bar-file-id]
     eol = CRLF
     encoding = cp1251

User can put their comments to '.bzrprop' file, comments should starting with
'#' sign.

Name of sections should be either file-id or file mask (e.g. ``*.txt``).

file-id is the standard way of bzrlib to work with versioned files internally.
Using file-id has big advantage over plain filenames: it makes support of
files moves and renames is trivial.

File mask provides a simple way to set the same properties for group of files.
Usually, files with the same extension behaves similarly, so it's a decent
assumption. This is also helps to reduce duplications of properties and
therefore reduce overall size of VP storage.

Handling special cases
~~~~~~~~~~~~~~~~~~~~~~
Because VP storage is just plain versioned file bzr will pay attention
for some special cases:

1) Content of .bzrprop is wrong or misformatted: in this case VP will refuse
    to use this .bzrprop and will use default empty properties dictionary.
2) File .bzrprop is present in working tree but is not versioned: in this
    case VP will refuse to use this .bzrprop and will use default empty
    properties dictionary.
3) Selective commit of files without .bzrprop file: in this case committed
    revision will contain previously committed .bzrprop, so bzr will use
    historical data for initializing VP dictionary for commit operation.

Benefits of config-like storage format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1) Config-like format is human-readable and looks familiar to users.

2) VP stored in usual versioned file, so bzr does not need special handling
    of VP at checkout/commit phase. Therefore bzr does not require new working
    tree or repository format. Therefore current simple implementation
    is backward compatible with existing current and legacy working tree
    formats.

3) Diff and merge operations made by existing bzr code for text diff and merge.
    VP merge conflicts should be resolved by user manually as regular text
    conflict.

Disadvantages of config-like storage format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Many disadvantages are in fact just limitations of used ConfigObj library.

1) User can't store binary data directly.

2) For any operations with VP (get/set/delete) the entire content of VP
    storage should be read and parsed. This in turn lead to following
    limitations:

    * The size of the entire properties set could affect execution time
      of often called commands.

    * Unresolved text conflicts in the file .bzrprop will entirely prevent
      to using properties until user resolve it.

    These are more limitations of used library.

3) Merge operations made by existing bzr code for 3-way text merge.
    VP merge conflicts should be resolved by user manually as regular text
    conflict.
    Although current implementation have its own 3-way merge
    algorithm for merging properties. But it's used only internally for now.


UI changes
==========
No special UI changes required for exisiting commands.

bzr needs some new commands to work with VP from command line.
All new commands related to VP have ``prop-`` prefix.

New commands
------------
prop-set
~~~~~~~~
Set new property value.

Set new VALUE for versioned property PROPERTYNAME.
Changes applied to some specific FILENAME or to
default properties set if PATTERN specified.

prop-get
~~~~~~~~
Write the value of versioned property to standard output.

If property name is not specified then dump all values
for given filename or pattern.

prop-del
~~~~~~~~
Delete existing property.

Delete versioned property PROPERTYNAME.
Changes applied to some specific FILENAME or to
default properties if PATTERN specified.


API changes
===========
vpdict
------
Internal representation of versioned properties is ``vpdict`` dictionary.
Such dictionary provides mapping between file_ids and/or filemasks and
corresponding set of properties. Each set of properties is dictionary too::

     {fileid: {propname: propvalue},
      filemask: {propname: propvalue}}

TreeBzrProperties interface
---------------------------
Despite the fact vpdict could be used directly there is special helper
wrapper class ``TreeBzrProperties`` that aims is support of filemasks
in transparent way.

All ``TreeBzrProperties`` get methods accepts either file_id or filename and
under the hoood will do filemask matching when needed.

Class ``TreeBzrProperties`` is also have helper methods to save current vpdict
in working tree, or reload vpdict from the tree (useful for testing).

Changes in WorkingTree and ReveisionTree API
--------------------------------------------
To obtain vpdict from WorkingTree or ReveisionTree instances there is new
method: ``get_vpdict``.

WorkingTrees are also have method to save vpdict on disk: ``save_vpdict``.

The main public API for versioned properties is ``properties`` attribute
of WorkingTree or RevisionTree instances. The ``properties`` attribute
provides access to corresponding instance of ``TreeBzrProperties``.

Merge3TreeBzrProperties
-----------------------
``Merge3TreeBzrProperties`` class provides 3-way merger for versioned
properties.

To run merge you hould use ``do_merge`` method.

Merged properties (instance of class ``BzrProperties``) available as
attribute ``merged_prop``. List of file_ids/filemasks with conflicts
in their dictionaries available as attribute ``conflited_ids``.

``Merge3TreeBzrProperties`` provides ``TreeBzrProperties`` like readonly API.

Serialization
-------------
Details of serialization format decoupled from ``TreeBzrProperties`` class.
Current version of versioned properties based on ConfigObj library but
also provide basic support for different serialization formats.

The main API is ``VpSerializer`` class. It provides classmethod ``from_lines``
to create new vpdict instance from lines of .bzrprop file. This method is
also returns instance of VpSerializerFormat class that should be used
to save vpdict back on disk.

First line of .bzrprop file is used to specify format name, it's the comment
line with special prefix::

	# DONT EDIT THIS LINE: bzr properties format 1




-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bzr.prop.simple-3366.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080414/2d8abd87/attachment-0001.diff 


More information about the bazaar mailing list