[Merge] lp:~vorlon/ubuntu/utopic/apport/lp.1354318 into lp:~ubuntu-core-dev/ubuntu/utopic/apport/ubuntu

Fri Aug 8 11:47:32 UTC 2014

Review: Needs Fixing

Thanks Steve for working on this! I have some inline comments.

whoopsie-upload-all is being maintained in trunk, but please don't worry about that too much. While I'm on holiday next week it's fine to get that fixed in the utopic branch, and I'll sync trunk back up when I'm back.

Diff comments:

> === modified file 'data/whoopsie-upload-all'
> --- data/whoopsie-upload-all	2014-07-29 10:24:37 +0000
> +++ data/whoopsie-upload-all	2014-08-08 10:43:52 +0000
> @@ -1,4 +1,4 @@
> -#!/usr/bin/python
> +#!/usr/bin/python3

Please revert. Apport works with both Pythons, and setup.py is designed to update the hashbangs when called with Python 3.

>  
>  # Process all pending crashes and mark them for whoopsie upload, but do not
>  # upload them to any other crash database. Wait until whoopsie is done
> @@ -18,6 +18,7 @@
>  import time
>  import subprocess
>  import argparse
> +import fcntl
>  
>  import apport.fileutils
>  import apport
> @@ -37,50 +38,69 @@
>          print('%s already marked for upload, skipping' % report)
>          return None
>  
> -    r = apport.Report()
> -    try:
> -        with open(report, 'rb') as f:
> -            r.load(f, binary='compressed')
> -    except Exception as e:
> -        sys.stderr.write('ERROR: cannot load %s: %s\n' % (report, str(e)))
> -        return None
> -    if r.get('ProblemType', '') != 'Crash' and 'ExecutablePath' not in r:
> -        print('  skipping, not a crash')
> -        return None
> -    if 'Dependencies' in r:
> -        print('%s already has info collected' % report)
> -    else:
> -        print('Collecting info for %s...' % report)
> -        r.add_os_info()
> -        # add minimal hook information here; whoopsie only considers
> -        # ApportVersion, NonfreeKernelModules, SystemImageInfo
> -        try:
> -            r.add_hooks_info(None)
> +    user_details = None
> +
> +    # make sure we're actually on the hook to write this updated report
> +    # before we start doing expensive collection operations
> +    new_report = report + '.new'
> +    with open(new_report, 'wb') as f:
> +        try:

I don't like writing an entirely new report file here. This will have to compress/write the whole initial report again (mostly the core dump). With the current code the existing file is opened in append mode, and only new keys are being written (r.write(f, only_new=True)).

If there is a reason why this doesn't work somehow: Then this needs to be fixed to set the new file to permissions 000 so that nothing else which inotifies the directory is trying to read the .new file while it's still being written (like the old code).

flocking could just as easily use the existing file, which is a lot easier, too?

> +            fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
> +        except IOError as e:

unused variable "e", I suppose the pyflakes test (test/run) should catch this?

> +            print('%s already being processed, skipping' % report)
> +            return None
> +
> +        r = apport.Report()
> +
> +        try:
> +            with open(report, 'rb') as f2:
> +                r.load(f2, binary='compressed')
> +                user_details = os.fstat(f2)

Nitpick: No need to use "f2" here; in these constructions "f" always has a tiny scope (only within the f).

Nitpick 2: could you please s/user_details/report_stat/, so that it's clearer what it is? (It has quite a wide scope)

>          except Exception as e:
> -            sys.stderr.write('WARNING: hook failed for processing %s: %s\n' % (report, str(e)))
> -
> -        try:
> -            r.add_gdb_info()
> -        except EOFError as e:
> -            sys.stderr.write('ERROR: processing %s: %s\n' % (report, str(e)))
> -            return None
> -        try:
> -            r.add_package_info()
> -        except (SystemError, ValueError) as e:
> -            sys.stderr.write('ERROR: cannot add package info on %s: %s\n' %
> -                             (report, str(e)))
> -            return None
> -
> -        # write updated report
> -        with open(report, 'ab') as f:
> -            os.chmod(report, 0)
> -            r.write(f, only_new=True)
> -            os.chmod(report, 0o640)
> +            sys.stderr.write('ERROR: cannot load %s: %s\n' % (report, str(e)))
> +            return None
> +
> +        if r.get('ProblemType', '') != 'Crash' and 'ExecutablePath' not in r:
> +            print('  skipping, not a crash')
> +            os.unlink(new_report)

There's a ton of these unlink calls in every exit path. It's more robust to do that at the very end in finally: (and ignore errors in case it's already gone, because it succeeded and was renamed). But this is probably moot anyway, I'd really like to avoid having to rewrite the entire file (see above).

> +            return None
> +        if 'Dependencies' in r:
> +            print('%s already has info collected' % report)
> +            os.unlink(new_report)
> +        else:
> +            print('Collecting info for %s...' % report)
> +            r.add_os_info()
> +            # add minimal hook information here; whoopsie only considers
> +            # ApportVersion, NonfreeKernelModules, SystemImageInfo
> +            try:
> +                r.add_hooks_info(None)
> +            except Exception as e:
> +                sys.stderr.write('WARNING: hook failed for processing %s: %s\n' % (report, str(e)))
> +
> +            try:
> +                r.add_gdb_info()
> +            except EOFError as e:
> +                sys.stderr.write('ERROR: processing %s: %s\n' % (report, str(e)))
> +                os.unlink(new_report)
> +                return None
> +            try:
> +                r.add_package_info()
> +            except (SystemError, ValueError) as e:
> +                sys.stderr.write('ERROR: cannot add package info on %s: %s\n' %
> +                                 (report, str(e)))
> +                os.unlink(new_report)
> +                return None
> +
> +            r.write(f)
> +            os.fchmod(f, 0o640)
> +            os.fchown(f, user_details.st_uid, user_details.st_gid)
> +            os.rename(new_report, report)
>  
>      # now tell whoopsie to upload the report
>      print('Marking %s for whoopsie upload' % report)
>      apport.fileutils.mark_report_upload(report)
>      assert os.path.exists(upload_stamp)
> +    os.chown(upload_stamp, user_details.st_uid, user_details.st_gid)
>      return upload_stamp
>  
>  
> @@ -117,7 +137,7 @@
>          missing = ''
>          for stamp in stamps:
>              uploaded = stamp + 'ed'
> -            if not os.path.exists(uploaded):
> +            if os.path.exists(stamp) and not os.path.exists(uploaded):
>                  missing += uploaded + ' '
>          if not missing:
>              return True
> @@ -133,8 +153,8 @@
>  #
>  parser = argparse.ArgumentParser(description='Noninteractively upload all '
>                                   'Apport crash reports to errors.ubuntu.com')
> -parser.add_argument('-t', '--timeout', default=1800, type=int,
> -                    help='seconds to wait for whoopsie to upload the reports (default: 1800s)')
> +parser.add_argument('-t', '--timeout', default=0, type=int,
> +                    help='seconds to wait for whoopsie to upload the reports (default: do not wait)')
>  opts = parser.parse_args()

Has this been cleared with CI (Paul Larson/Francis Ginther) that this default behaviour change doesn't break CI? I wrote this script on their request to run it at the end of MPs and image testing, where it's crucial that it waits until everything has been uploaded. So this might need a change in CI first to set an explicit timeout there?

If this turns out to be too unwieldy, it's equivalently possible to modify the upstart job to use --timeout=0.

>  
>  # parse args
> @@ -148,6 +168,9 @@
>  stamps = collect_info()
>  # print('stamps:', stamps)
>  if stamps:
> -    if not wait_uploaded(stamps, opts.timeout):
> -        sys.exit(2)
> -    print('All reports uploaded successfully')
> +    if opts.timeout > 0:
> +        if not wait_uploaded(stamps, opts.timeout):
> +            sys.exit(2)
> +        print('All reports uploaded successfully')
> +    else:
> +        print('All reports processed')
> 
> === modified file 'debian/apport-noui.upstart'
> --- debian/apport-noui.upstart	2014-05-14 19:03:57 +0000
> +++ debian/apport-noui.upstart	2014-08-08 10:43:52 +0000
> @@ -5,15 +5,9 @@
>      file FILE=/var/crash/*.crash EVENT=create
>  )
>  
> -env MATCH=NULL
> -
> +instance $MATCH
>  script
>      [ -e /var/lib/apport/autoreport ] || exit 0
>  
> -    UID="$(stat -c '%u' "$MATCH")"
> -    if [ -n "$UID" ] && [ $UID -ge 500 ]; then
> -        exec sudo -u \#$UID /usr/share/apport/whoopsie-upload-all
> -        exit 0
> -    fi

I see the point of doing this; I think we can get away with this as the subset of keys which we send to errors isn't specific to the user (right now). In general, a lot of the report information depends on the user (changed configuration, locale, $PATH etc.).

>      exec /usr/share/apport/whoopsie-upload-all
>  end script
> 
> === modified file 'debian/changelog'
> --- debian/changelog	2014-08-01 22:06:52 +0000
> +++ debian/changelog	2014-08-08 10:43:52 +0000
> @@ -1,3 +1,27 @@
> +apport (2.14.5-0ubuntu4) UNRELEASED; urgency=medium
> +
> +  * Set the correct python3 shebang for data/whoopsie-upload-all in the
> +    source.  This is already correct in the package, but we should ensure
> +    it's correct for any local runs.
> +  * Refactor apport-noui/whoopsie-upload-all to behave more reliably in the
> +    case of overlapping crash processing (LP: #1354318):
> +    - debian/apport-noui.upstart: refactor to make this an 'instance' job for
> +      each incoming .crash file, and drop the racy handling of non-root .crash
> +      files (as well as the unnecessary 'env MATCH' line).
> +    - data/whoopsie-upload-all: refactor report processing to ensure that
> +      whoopsie-upload-all can be called multiple times in parallel without
> +      causing any .crash file to be processed more than once.
> +    - data/whoopsie-upload-all: handle setting ownership of files in
> +      process_report() instead of relying on this script being called by a
> +      particular user.
> +    - data/whoopsie-upload-all: don't spin in wait_uploaded() watching for
> +      .uploaded files if the corresponding .upload file has been removed out
> +      from under us.
> +    - data/whoopsie-upload-all: by default, return immediately instead of
> +      waiting to see if whoopsie processes all of the crashes.
> +
> + -- Steve Langasek <steve.langasek at ubuntu.com>  Fri, 08 Aug 2014 10:20:52 +0200
> +
>  apport (2.14.5-0ubuntu3) utopic; urgency=medium
>  
>    * apport-noui: make the package installation automatically enable
> 

-- 
https://code.launchpad.net/~vorlon/ubuntu/utopic/apport/lp.1354318/+merge/230083
Your team Ubuntu Core Development Team is subscribed to branch lp:~ubuntu-core-dev/ubuntu/utopic/apport/ubuntu.