[Blueprint servercloud-p-hadoop] Ubuntu Server + Hadoop and Bigdata

Thu Nov 3 15:23:13 UTC 2011

Blueprint changed by James Page:

Whiteboard changed:
- I have to wonder if the demand for Hadoop really is large enough to
- justify the effort we'd be putting into providing it? Are we really at a
- point already where having terabytes of data you need to analyse is a
- common use case? - Soren
+ Validation of Assumptions from spec:
+ 
+ Ubuntu will package Apache Hadoop (rather than one of the various variants).
+     Cloudera - CDH
+     Hortonworks - Apache Hadoop - Employ 80% of upstream committers
+ OpenJDK Support
+     Ubuntu should be help drivng support for OpenJDK from upstreams
+ Packaging will align to Apache Bigtop (based on the most Ipopular upstream packaging) - YES
+ Packaging will focus on the most recent stable release of Hadoop - 0.203.0 - YES
+ Configuration methods should take into account integration with configuration management tools such as Puppet and Chef - YES
+ The majority of Java dependencies can be fulfilled through what is already in the archive (see hadoop-dependency-report.tar.gz)
+     kfs - this can be excluded to disable this feature but does not look like that much work to package.
+     Apache ftpserver would be required to enable smoke testing - again looks OK to package.
+ Focus will be on a solid Hadoop core with contrib packages if time permits.
+ Most dependencies are already in the archive apart from thrift (probably not an issue).
+ Native integrations must be part of the packaging.
+ Packages will target universe for this release.
+  
+ We need to ensure upstream co-operation from Hortonworks/Cloudera to ensure ongoing collaboration going forward.
+  
+ Good support for Hadoop on ARM should be an objective of this work.
+  
+ Comments from blueprint:
+  
+ I have to wonder if the demand for Hadoop really is large enough to justify the effort we'd be putting into providing it? Are we really at a point already where having terabytes of data you need to analyse is a common use case? - Soren
+     Sounds like there is demand in the distribution.
+     - important for Ubuntu Server, to maintain its position as 'best OS for the Cloud'
+     - the number of users needing to process TBs of data is just increasing; over the life of 12.04 LTS, more and more users will have a need for a map-reduce cluster application; having in the distro will ensure they pick Ubuntu for that application
+  
+ Work Items:
+ [m_3] hadoop community input (what about no thrift?): TODO
+ [jamespage] Active backport packaging process post 12.04 release: TODO
+ [m_3] Attend HadoopWorld. :): TODO
+ [james-page] Check on release schedule for Apache Hadoop between now and Feature Freeze: TODO
+ [kirkland] Investigate upstream co-operation from Hortonworks/Cloudera to ensure ongoing collaboration going forward: TODO
+ [negronjl] adjust hadoop charms to have a configurable backend hadoop, get one into the charm repository: TODO
+ [james-page] Package KFS for Ubuntu: TODO
+ [james-page] Package Apache ftp-server for Ubuntu: TODO
+ [james-page] Package Hadoop for Ubuntu: TODO

-- 
Ubuntu Server + Hadoop and Bigdata
https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hadoop