[MERGE] push handles file-ids containing quotes correctly

Mon Jul 10 18:50:05 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>> Aaron Bentley wrote:
>>>
>>>>> I'm just fixing a broken function, not rewriting it the way I think it
>>>>> should have been written in the first place.
>>>
>>> Why not fix it the right way... Either way you are taking wrong behavior
>>> and making it right.
> 
> Because I assumed (wrongly, it seems) that conservative changes would be
> less controversial than radical changes.  Also, I've always been dubious
> of this approach to finding modified file id's.  The whole thing is
> wrong, and changing this won't make it right.

When a function is pretty broken, I prefer to clean it up correctly. I
understand your conservative approach. But doing it correctly in this
case is just as easy (creating a regex to assert that we aren't missing
anything is the same work as creating a regex to fix it correctly.).

I believe I also wrote this broken code. So I'm more likely to actually
want it fixed to hide my shame. :) (I believe I just copied it out of
xml because at the time we had an xml.py so I couldn't just import it,
and it was a big dependency for a single function).

> 
>>> I don't really think of it as a codepath problem, but more of
>>> unsanitized data. (It isn't like we are passing a int when we expect a
>>> string, you are expected to pass data that you read in from a file).
> 
> There is that.  But this approach already assumes that there are no
> comments or processing instructions or CDATA sections in the file
> version, so assuming the data contains no entities except for the
> predefined ones doesn't seem like much more of a leap.  I'd be more
> concerned about mangled entities like '&amp' or '&&aammpp;;'.

Well, this is just reading the XML <file .../> record, which can only
have attributes, and I didn't think CDATA was allowed in attributes.

But yes, it is tied quite strongly against to the current storage
format, and is likely to break if we change anything. And it would be
nicer to have one that wasn't so strongly tied. We should keep 'deltas'
in mind when developing the next one.

> 
>>>>> Strongly disagree.  Private methods need not be tested, but their
>>>>> clients must be tested.
> 
>>> Well, that is an argument for testing 'file_ids_altered_by_revision_ids'
>>> not for testing 'push'.
> 
> I wasn't arguing for testing push, I was arguing for testing
> Repository.fetch.
> 
> I hope the argument for testing push is self evident.  It was broken,
> we're fixing it, we don't want regressions.

Not really. To quote Robert and the follow up by Martin (in the
hashcache thread):

>> I can best describe my feelings about this with an analogy - if we had a
>> > bug deep in transport, that was exposed by the command line, we would
>> > expect to add a test of transport, and possibly a test of the
>> > commandline that checks the right sequence of calls are made by the
>> > specific command, using an instrumented, or simulated transport
>> > implementation. What we would rarely do is add a new integration test.
> 
> OK, so it would definitely not be good to test for this bug by
> repeatedly committing, even though that's how it was discovered.
> 

So testing push/fetch is at best 'iffy', because they just expected the
underlying code to do something it wasn't doing (unescaping &apos;).
It was exposed because somebody tried to do a push, and it failed, but
the bug isn't in 'push' it is in something that push relies upon.

I'm 0 about a push test. I won't block on it being there. I know my
personal fault is testing too high (look at test_bundle, what was I
thinking ...) So I'm trying to find the right balance still.

> 
> But I'd be happy to test file_ids_altered_by_revision_ids as well.

We actually already have tests that look for <>, and a couple others.
Just add " and ' to the file-id, and I think it should flush out the bug.

> 
>>> I think it is fine to directly test a function so that it conforms to
>>> its interface. We have lots of tests that directly call a private function.
> 
> You said 'should'.  In RFC terms, I'd consider it a MAY, not a SHOULD.
> We also have lots of untested private functions.

*I* would like to see a test for it. But you are correct, private
function testing is a MAY, not a SHOULD.

> 
>>> We just have more flexibility in the future if we want to modify a
>>> private function. It doesn't *have* to conform to the interface we gave
>>> it. But test should fail so that we know we modified it somehow.
> 
> Requiring testing of private functions makes refactoring punishingly
> difficult, so I don't think it should be required.
> 
> Aaron

I think it is important to know when it breaks, on something that is
semi-critical. It is the tradeoff of adding more tests to
file_ids_affected_by, since you aren't asserting anything about the
functions it calls, you have to do everything though testing it.

It is the only caller (right now), so I would be okay with testing it at
that level.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEspNMJdeBCYSNAAMRAg5FAJ9gSt4JNzBnhHz26jzBF464fk81JQCeNnUy
NMglU13tWB57BJNRYBUQUHw=
=3oQs
-----END PGP SIGNATURE-----