ProcMaps.txt may contain private information such as username

Emmet Hikory persia at ubuntu.com
Fri Jul 27 23:18:16 UTC 2012


Fred . wrote:
> Yes, things like hardware information is useful for a complete bug report.
> But I doubt username and hostname would be necessary for a complete report.

    With this I must disagree.  I have triaged at least two bugs where the
bug was specifically related to specific characters or sequences of characters
present in either the user's username or associated GECOS information.
Admittedly, these strings were visible in the stacktrace or traceback (one of
the specific bugs I remember was a C program and the other python) as well as
from ancillary sources (such as ProcMaps.txt), but the fact that they were
sourced from the /etc/passwd file on the user machine was plainly obvious.

    While I can't disagree that the information disclosed, may, for some
user configurations, disclose more information than the user may have
desired, or even meet the definitions of some requires-consent-for-disclosure
requirements in some jurisdictions, this information may be essential to
an understanding and repair of the bug, depending on the bug.

    As long as we make best efforts to prevent this information being made
completely public, and restrict it to those the project trusts to care for
the software in question, I think we've done the right thing.  As soon as
we start auto-munging data, we have a real possibility of generating a
false bug report: one where the bug is real and exists and is trivial to
replicate, yet cannot be replicated from the data provided, and so likely
cannot be understood by a triager or developer.

    Note that this does require some discretion on the part of bug reporters:
we may be able to do better in terms of informing bug reporters precisely what
information is being disclosed, and helping them to understand the submitted
report: while there isn't that much space in the Apport dialogues, adding a
link to some wiki page with more detailed user-oriented documentation on how
to determine what information is being disclosed may help to reduce the chance
that reporters are unhappy with the submitted information, or may increase the
chance that when a bug does depend on private information, the user will
replicate it in a safe environment, so that the submitted information does not
provide a compromise.

    As an example of the last one, there is a "bug" related to the interaction
of gnome-screensaver and IMEs, such that if a user has a username or password
that contains characters requiring an IME to enter, the user cannot unlock their
screen.  I haven't retested this against lightdm yet, but gdm allowed users
with this information to log in, causing some confusion for folk whose keyboard
definitions didn't have native representations for the characters concerned
(for example, assume that one includes Cyrillic in one's password: users who
have multilayer keyboard maps can unlock the screensaver, but users who use
an IME to generate Cyrillic cannot).  For reports of this bug we needed to
know an affected username and password pair in order to identify the issue,
and it was submitted from an otherwise fresh install with intentionally
useless information, as obfuscation removes the ability to replicate.
(triage note: because screen-savers may hide secure information, it is
considered unsafe to allow arbitrary hooks to run arbitrary installed
software, so that this example "bug" cannot be fixed: all users are advised
to limit their usernames and passwords to strings that do not require an IME).

    As a hypothetical example, imagine an application that needs to process
credit cards (perhaps a library used as part of a framework for a web shop)
which crashes on a server.  Imagine that the crash occurred because of the
introduction of a new allowable string length for credit card numbers (the
current rules allow 10, 11, or 12 digits, with a few different grouping
models).  In such a case, we may need to have a credit card number that
causes the crash in order to replicate the bug.  Obviously, it would be
irresponsible of the server administrators to submit the credit card of one
of their customers to our bug tracker.  Similarly, it would be irresponsible
of us to make such a number public did they do so.  I would hope that such
a hypothetical situation would be resolved by the administrators replicating
the bug on a test or development server using a checksum-consistent credit
card number not matching any known real number, and submitting the bug with
the false information: with the current system, they could so note their
procedure in a comment, so we would know it was safe to disclose the invalid
number, so that the developers could extend the software to support the new
format.  With a system that automatically obfuscates data, we would have no
means to determine how to replicate the bug, thereby allowing it to spread
unfixed to other services as they adopt Ubuntu and as their customers receive
cards with the new format, perhaps causing Ubuntu to receive a poor reputation
for online commerce.

    As with this extreme example, there are potential cases where nearly any
datapoint is essential to the solution of the problem.  And in all cases, only
the reporter can know whether information that appears personally identifying
or personally sensitive is accurate (rather than deliberately constructed on
a test system to replicate a bug).  As such, we must rely on the reporter for
appropriate discretion on what is submitted, and maintain a reputation that
allows reporters to rely on us for appropriate discretion on what is made
public.

-- 
Emmet HIKORY



More information about the Ubuntu-bugsquad mailing list