[Blueprint servercloud-p-hadoop] Ubuntu Server + Hadoop and Bigdata

Mon Nov 7 10:06:24 UTC 2011

Blueprint changed by James Page:

Whiteboard changed:
- Validation of Assumptions from spec:
+ Summary of objectives for Precise:

- Ubuntu will package Apache Hadoop (rather than one of the various variants).
-     Cloudera - CDH
-     Hortonworks - Apache Hadoop - Employ 80% of upstream committers
- OpenJDK Support
-     Ubuntu should be help drivng support for OpenJDK from upstreams
- Packaging will align to Apache Bigtop (based on the most Ipopular upstream packaging) - YES
- Packaging will focus on the most recent stable release of Hadoop - 0.203.0 - YES
- Configuration methods should take into account integration with configuration management tools such as Puppet and Chef - YES
- The majority of Java dependencies can be fulfilled through what is already in the archive (see hadoop-dependency-report.tar.gz)
-     kfs - this can be excluded to disable this feature but does not look like that much work to package.
-     Apache ftpserver would be required to enable smoke testing - again looks OK to package.
- Focus will be on a solid Hadoop core with contrib packages if time permits.
- Most dependencies are already in the archive apart from thrift (probably not an issue).
- Native integrations must be part of the packaging.
- Packages will target universe for this release.
-  
- We need to ensure upstream co-operation from Hortonworks/Cloudera to ensure ongoing collaboration going forward.
-  
- Good support for Hadoop on ARM should be an objective of this work.
-  
- Comments from blueprint:
-  
- I have to wonder if the demand for Hadoop really is large enough to justify the effort we'd be putting into providing it? Are we really at a point already where having terabytes of data you need to analyse is a common use case? - Soren
-     Sounds like there is demand in the distribution.
-     - important for Ubuntu Server, to maintain its position as 'best OS for the Cloud'
-     - the number of users needing to process TBs of data is just increasing; over the life of 12.04 LTS, more and more users will have a need for a map-reduce cluster application; having in the distro will ensure they pick Ubuntu for that application
-  
- Work Items:
- [m_3] hadoop community input (what about no thrift?): TODO
- [jamespage] Active backport packaging process post 12.04 release: TODO
- [m_3] Attend HadoopWorld. :): TODO
+ 1) Ubuntu will target packaging Apache Hadoop
+ - Help drive support for running under OpenJDK
+ - Packaging will retain flavour of most popular upstream packaging
+ - Focus will be on most recent stable (0.20.203.0)- validate with upstream release schedule
+ - Thrift support will not be included
+ - LZO and snappy compression options will be investigated
+ - universe target this release.
+ 
+ 2) Juju Charms will by default align to distro packaging for precise
+ 
+ Full sessions notes from UDS-P:
+ http://pad.ubuntu.com/uds-p-servercloud-p-hadoop
+ 
+ Work items precise-alpha-1:
+ [m_3] Hadoop community input (what about no thrift? etc): TODO
+ [m_3] Attend HadoopWorld: TODO
  [james-page] Check on release schedule for Apache Hadoop between now and Feature Freeze: TODO
  [kirkland] Investigate upstream co-operation from Hortonworks/Cloudera to ensure ongoing collaboration going forward: TODO
+ 
+ Work items precise-beta-1:
  [negronjl] adjust hadoop charms to have a configurable backend hadoop, get one into the charm repository: TODO
  [james-page] Package KFS for Ubuntu: TODO
  [james-page] Package Apache ftp-server for Ubuntu: TODO
  [james-page] Package Hadoop for Ubuntu: TODO
+ 
+ Work Items:
+ [james-page] Active backport packaging post 12.04 release: TODO
+ [james-page] Feed back all work to Debian: TODO

-- 
Ubuntu Server + Hadoop and Bigdata
https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hadoop