Binary file handling discussion

John Arbash Meinel john at arbash-meinel.com
Fri Nov 3 20:08:24 GMT 2006


Jari Aalto wrote:
> Nicholas Allen <allen at ableton.com> writes:
> 
>> The way CVS does it is really bad and it often
>> makes mistakes by assuming that all files are text files unless the
>> user specifies that they are binary (and often users forget this). So
>> CVSs policy is one of data destruction by default and I do not think
>> this would be a good idea for bzr!
> 
> I understood that a VCS is primary for text files and only secondary
> used for binary files
>  
> CVS way of assuming text by default is for the typical situation and
> explicitly marking other types of files as binary is logical. CVS has
> similar list to "bzr ignore", where patterns can be added to the
> server to automatically treat certain files as binary. So, the "-kb"
> tagging of individual files is quite transparent to the casual users:
> 
>         *.jpg
>         *.xls
>         *.doc

CVS uses a bad policy. You really don't want to start out assuming
everything is text, and then switch to binary on request. Because by the
time the switch is requested, you've already corrupted the data.
Especially if you start doing stuff like keyword expansion. But even
line-ending conversions can quickly corrupt files. As an example, PNG
files have an explicit \r\n as part of their magic number. It was
explicitly put there to detect "accidental" but unnoticed corruption.

But that is also why the storage aspect could be kept separate from the
diff/merge logic. Because by default you would assume that you could
diff and merge, but by default you would store the exact text.

The suggestion for .bzrtypes or some other versioned-in-the-source-tree.
We've had some discussions about whether it is the "right" thing, though.

Being in the source tree makes it easier for users to edit. And makes
merging and versioning come naturally from the rest of the system.

On the downside, if we change how we interpret those files, we have no
good way to maintain compatibility with old or new versions. We could
have a format field in the file, which would at least give us some
flexibility.

Do we want to allow people to change meta information like this for old
versions? Is that necessary or desirable? If it is desired, then we need
a different mechanism.

What about merging/conflicts. Do the normal conflict mechanisms make
sense? They make more sense for something like this than they do for
.bzrignore. For .bzrignore it really is a set operation, not a series of
lines information. So 2 people adding different entries in approximately
the same spot isn't really a conflict. Though in this case if order is
important for pattern matching, then users *do* need to resolve a
conflict because they need to give an explicit priority.

One thing I really like about the proposal is that it gives an easy way
to give values for lots of files in the tree. And to update that
property. Tracking stuff like this in a per-file method like SVN means
that you need to remember to set them for any new file. One of my
biggest beefs is that svn:ignore doesn't have a way to make it
recursive, so adding a new sub-project tends to add all the things that
you just asked it to ignore in the other project.

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061103/2a938692/attachment.pgp 


More information about the bazaar mailing list