import failures

Tue Jan 5 18:03:48 GMT 2010

On Tue, 05 Jan 2010 11:18:59 -0600, John Arbash Meinel <john at arbash-meinel.com> wrote:
> ^- 'unsupported Unicode code range' sounds funny, but it may just be
> that they have latin-1 chars in what should otherwise be a UTF-8 doc. Is
> changelog *defined* as UTF-8? Or is it just '8-bit, put whatever feels
> good to you' in there?

It is UTF-8 now, but wasn't in the past, so some will be different
encodings when we are importing old packages.

> My guess is that you are handing us 8-bit paths, and inside bzrlib all
> *paths* are supposed to be Unicode. And if you hand us an 8-bit string,
> and we up-cast it to Unicode, then we fail because the upcast is
> generally done via ascii.
> 
> So I would at least take a first look at the 'import_archive' code, and
> make sure it is trying to work in Unicode paths, rather than 8-bit strings.

I'll check that, but I don't think it's doing anything odd here.

> Could this be related to the overloading of whatever machine that also
> happened? Meaning running this stuff is hammering on a machine hard
> enough that it times out occassionally? (Swapping, etc?)

It's possible. It was my understanding that the API is served by the
same appservers as the webapp, and so my requests shouldn't be a large
percentage of the requests they are getting. It's really hard to debug
this without someone on the LP side to look at it: these aren't errors
that we get OOPS numbers in the response for.

> An author field that is non-ascii and not utf-8. There is always the:
> 
> def decode_as_best_you_can(s):
>   try:
>     return s.decode('utf-8')
>   except UnicodeDecodeError:
>     return s.decode('latin-1')

Worth a go. This isn't critical data anyway, so graceful handling would
be good if the data can't be decoded.

I squashed some more bugs in this area after I sent the message
though. I had some confusion over which level was doing the decoding, so
there should be less issues like this now.

> Is it possible to get a query of old ones, and just run a bulk-update of
> them?

I have the list of packages, and mwhudson was going to query for the
list of branches based on that, and then request server-side upgrade I
believe.

> Happens if you commit exactly the same data 2 times, or if you try to
> autopack a single file. We've fixed a few of them, but having
> reproducible data here would help.

I'll run a local test of this packages to see if it is associated with
the data.

Thanks,

James