[MERGE] Handle references to line data in _patiencediff_c.c properly

Sun Aug 17 21:23:25 BST 2008

Dňa Ne, 2008-08-17 o 14:46 -0500, John Arbash Meinel napísal:
> Lukáš Lalinský wrote:
> > This code is called in bzrlib only with lists or tuples of strings,
> > which works fine, because PySequence_Fast always returns the same object
> > and so the items from PySequence_Fast_GET_ITEM will not be deleted until
> > the original sequence is deleted. But if it's called with a sequence
> > than needs to be converted to list (e.g. PyUnicode), the items will be
> > garbage collected in the contructor and still used later in code (which
> > leads to segfaults).
> > 
> > I really don't how to write a failing test for something like this, so
> > there are no tests in the patch. :(
> > 
> > Lukas
> 
> So... just write a test that we can do PatienceDiff on a Unicode sequence.
> 
> matcher = PatienceSequenceMatcher(u'Ab\xb5\u1234', u'....')
> matcher.get_matching_blocks()

Well, this will crash randomly, at best. I'm personally not convinced
that a random test is better than no test, but I can add it.

> However, I'm also noticing that you switched "free(b)' => 'delete_lines(b,
> bsize)' but that delete_lines isn't freeing the b pointer.

Yep, I've noticed this right after I submitted the fist patch.

> BB:tweak
> 
> If it were up to me, I would just have hung onto the PySequenceFast object for
> the lifetime of the 'lines' variables, but both are valid. (Just a lot fewer
> INCREF/DECREF to keep track of.)

The thing is that this way it reuses the conversion lists. For example, if I do
PatienceSequenceMatcher(None, u'abc', u'def'), both load_lines calls will
use the same fast list. While it's not a big deal, it saves some memory
and should be slightly faster.

Lukas