Packaging large Java software stacks ?
Emmet Hikory
persia at ubuntu.com
Tue Jan 27 03:36:07 GMT 2009
Thierry Carrez wrote:
> It is difficult to integrate the large recent Java software stacks
> (Glassfish, Geronimo...) in a Linux distribution in general. The key
> reason is that most of those stacks require very precise versions of
> libraries (JARs) to run and to build. They won't work with the latest
> version of libraries as those might change APIs and/or key behavior.
> Java developers are used to pick specific JAR versions and assemble the
> exact needed stack, they don't want to care about sharing their
> dependencies with other packages, or about dependencies being upgraded.
> Tools like Maven help them in this endeavor, and they rely very heavily
> on this external code : dozens of those JARs are usually needed at
> runtime, hundreds of those at build-time.
>
> That makes most attempts to properly package large Java software in
> Ubuntu to fail for two reasons.
>
> (1) Need precisely-versioned artifacts
> Those stacks need, as build dependencies and as runtime dependencies,
> very precise versions of JARs. They won't build or run with a different
> one. Using a more recent one might break functionality in a creative
> way. A maven-based build will sometimes require 6 different versions of
> the same JAR. In our packages, we usually offer only one version,
> in corner cases one minor version for each major version. In some cases
> the Java software will run/build with ours, in most cases it won't.
In the case of libraries in other languages (most commonly C), we
regularly port applications to work with the preferred version of the
libraries we ship. When preparing new versions of C libraries, the API
and ABI are checked, with the binary package name changed where they
differ, to better ensure compatibility. Perhaps we could use the Java
Introspection methods to generate some API report, and version Java
libraries based on changing APIs? In the case where an incompatibility
is not an API change, what sort of differences are encountered? Might
these be considered bugs? Is there a case where two applications depend
on the same API, but would break if used with different versions
providing that API?
> (2) Building entirely from source
> Java "compilation" (in fact bytecoding) requires lots of JARs to be
> present, because they are used to check all external method signatures.
> Geronimo and Glassfish builds will require hundreds of them. Packaging
> all those build dependencies from source is a huge work, and each of
> those build dependencies, in turn, will require more. Combined with the
> first problem, this makes packaging those stacks too much work for so
> little to gain (they are usually easily installable by
> downloading/unpacking the upstream tarball).
Well, the same could be said for any interpreted language (python,
ruby, perl, etc.), and to a large degree for compiled languages (which
are often sufficiently portable that `./configure && make && make
install` works). One of the points of building from source is to be
able to verify that the provided source matches the provided binary, and
allow our users to make further modifications to the software in
accordance with the license. In a purely self-serving manner, shipping
the source from which the package was built significantly reduces the
effort required for distribution-local patches, most notably for
security patches.
> Potential solutions to solve (1) include using a parallel delivery
> mechanism for JARs, separated from the usual Ubuntu packages, something
> that could hook into Maven repositories... or get ready to create a
> separate package for each version of each JAR.
The last option is what is chosen for libraries in other languages,
where we ship two or three different upstream versions while waiting for
ports to occur. Where possible, we try to restrict this to less than
three, and wherever possible to only one.
> Potential solutions to solve (2) include evolving our build-from-source
> policy to accept that JARs that would only be used as method-checking
> media during Java compilation to be considered part of the source...
>
> A solution to workaround both problems would be to avoid targeting our
> built-from-source repositories for such Java software and pack them with
> their binary dependencies (or as binary directly)...
This has historically been acceptable for some packages in
multiverse, but without using the built-from-source method, it is hard
for us to comply with licenses that insist that source code be provided
on request as we cannot know that the source provided generates the
binary provided. This is further complicated in terms of defect
management: without being able to ourselves collect a source tree we
know to be capable of generating a given binary object, it becomes very
difficult for us to address any user-reported problems. This is
compounded if there are embedded libraries, as a patch to fix a bug in a
library may need to be repeated for several packages embedding that
library. Even if such patches are generated, but the use of our build
tools to generate the package has not been tested, we cannot be
confident of our ability to regenerate a suitable package to distribute
to users.
Note also that including binary objects in a source limits our
ability to patch them using our existing toolset. Variances from
upstream distributed software are included in the diff,gz, which is
restricted to unicode artifacts. The typical model to handle binary
patches is to ship the replacement blob uuencoded, and uudecode at
build-time for installation. When changing objects where the preferred
form of modification is indeed binary (e.g. PNG files), this is
acceptable. Where changing objects where the preferred form of
modification are separate source, we would need to find a way to
indicate from which source the replacement blob was generated (which is
more than just patching the provided source, especially in the case of
binary-only embedded libraries).
In summary, I'd recommend shipping such stacks that do not require
redistribution of source as binary objects in multiverse until such time
as they can be built reliably from source within the distribution, and
discussing the difficulties of extending the offered license to others
with upstream for those stacks licensed with source-distribution
licenses (e.g. GPL) if the entire application cannot be built without
binary objects within the distribution (as I'm not convinced these can
be shipped, although I'm not an archive-admin, so it's not my decision,
and I'm certainly not providing any legal advice).
My personal experience in working with upstream to get Java software
packaged is that upstream is quite willing to help identify what pieces
are missing in Ubuntu, help identify which patches may have been applied
to any embedded libraries, and test their software against provided
system libraries once available (even where versions differ). While
this still leaves significant work in getting the necessary components
packaged, I believe that these discussions lead to better sharing of
patches between upstreams, and overall a better quality of the Java
stack within Ubuntu.
--
Emmet HIKORY
More information about the ubuntu-devel
mailing list