[Flight Delay Bundle] Fixing the namenode/compute-nodes relation error.

Samuel Cozannet samuel.cozannet at canonical.com
Wed Feb 25 09:31:00 UTC 2015


Actually the bug was already filled:
https://bugs.launchpad.net/charms/+source/hdp-hadoop/+bug/1414080

but now this email is the solution, so it should get a resolution quickly.

Thanks,
Sam

Best,
Samuel

--
Samuel Cozannet
Cloud, Big Data and IoT Strategy Team
Business Development - Cloud and ISV Ecosystem
Changing the Future of Cloud
Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> / Juju
<https://jujucharms.com>
samuel.cozannet at canonical.com
mob: +33 616 702 389
skype: samnco
Twitter: @SaMnCo_23

On Wed, Feb 25, 2015 at 10:24 AM, Samuel Cozannet <
samuel.cozannet at canonical.com> wrote:

> Hi,
>
> @juju mailing lists members: sorry for adding you to the thread only now.
> This is a discussion about making sure the Flight Delay demo gets to work,
> which is a bundle comprising a few HDP nodes (compute, YARN) and ipython
> notebook customized to run Scala code.
>
> @Andrew: I figured out what is going wrong. See below
>
> @All:
> So this is the story of a YARN node (based on charm hdp-hadoop-7 and 4
> compute nodes (same charm).
> If you deploy it with multiple compute nodes at once, you get a failed
> relation namenode on the yarn-master side:
>
> *unit-yarn-master-0[28041]: 2015-02-25 09:09:40 INFO
> unit.yarn-master/0.namenode-relation-joined logger.go:40
> subprocess.CalledProcessError: Command '['su', 'hdfs', '-c',
> '/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start
> namenode']' returned non-zero exit status 1*
> *unit-yarn-master-0[28041]: 2015-02-25 09:09:40 ERROR juju.worker.uniter
> uniter.go:608 hook "namenode-relation-joined" failed: exit status 1*
>
> So I connected on yarn-master/0 and tried:
>
> *ubuntu at ip-172-31-42-86:~$ sudo su hdfs*
> *hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
> --config /etc/hadoop/conf start namenode*
> *namenode running as process 9270. Stop it first.*
>
> So I did it:
> *hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
> --config /etc/hadoop/conf stop namenode*
> *stopping namenode*
>
> But then when running:
>
> *juju resolved -r yarn-master/0 *
>
> I would still run into the same issue. The trick is to remove *-r*. What
> happens is that
> * the hook is run as many times as there are compute-nodes.
> * The error comes from the hook not testing if the namenode service is
> already started or not, and trying to start it anyway instead of restarting
> it.
>
> So the fix comes with alternatively stopping namenode service, and
> resolving the issue on juju client side:
>
> On YARN side:
> *hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
> --config /etc/hadoop/conf stop namenode*
> *stopping namenode*
>
> Then  (on client side)
> *juju resolved yarn-master/0 *
>
> Then on YARN side:
> *hdfs at ip-172-31-42-86:/home/ubuntu$ /usr/lib/hadoop/sbin/hadoop-daemon.sh
> --config /etc/hadoop/conf stop namenode*
> *stopping namenode*
>
> Then  (on client side) (!!! there is no retry !!!)
> *juju resolved yarn-master/0 *
>
> do that as many times as you have compute nodes (minus one for the last
> time the namenode will actually start) and you'll be OK.
>
> @Andrew: I'll fill a bug for that issue on Launchpad. There is no
> "restart" command for namenode, so that needs to be a stop then start.
> Thanks for finding it out.
>
> Best,
> Samuel
>
> --
> Samuel Cozannet
> Cloud, Big Data and IoT Strategy Team
> Business Development - Cloud and ISV Ecosystem
> Changing the Future of Cloud
> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
> Juju <https://jujucharms.com>
> samuel.cozannet at canonical.com
> mob: +33 616 702 389
> skype: samnco
> Twitter: @SaMnCo_23
>
> On Mon, Feb 23, 2015 at 5:43 PM, Samuel Cozannet <
> samuel.cozannet at canonical.com> wrote:
>
>> I know we had an issue lately with some kind of upstream change with
>> Java, so that would be one that we can't fix until the charmers team fixes
>> it.
>>
>> I actually need to deploy it for demos @MWC so I'll have a look, but
>> later this week only. I'll keep you posted when I do it. Before that,
>> eventually use the juju mailing list or IRC channel on freenode as others
>> can also answer. Some of our solution architects have played with it, they
>> may be able to help.
>>
>> Best,
>> Sam
>>
>> Best,
>> Samuel
>>
>> --
>> Samuel Cozannet
>> Cloud, Big Data and IoT Strategy Team
>> Business Development - Cloud and ISV Ecosystem
>> Changing the Future of Cloud
>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>> Juju <https://jujucharms.com>
>> samuel.cozannet at canonical.com
>> mob: +33 616 702 389
>> skype: samnco
>> Twitter: @SaMnCo_23
>>
>> On Mon, Feb 23, 2015 at 5:35 PM, Andrew Brookes <andrew at theasi.co> wrote:
>>
>>> Hi Sam,
>>>
>>> No luck I'm afraid. I wasn't able to get past the race condition. I
>>> waited about an hour before adding the relations.
>>>
>>> Any ideas?
>>>
>>> Thanks,
>>>
>>> Andy.
>>>
>>>
>>> On 20 February 2015 at 14:36, Andrew Brookes <andrew at theasi.co> wrote:
>>>
>>>> Thanks. I'll give it a go. Have a good flight.
>>>>
>>>> On 20 February 2015 at 14:35, Samuel Cozannet <
>>>> samuel.cozannet at canonical.com> wrote:
>>>>
>>>>> You should see them in the notebook GUI if everything went well.
>>>>>
>>>>> However, with the relations red, this is not good. That's the error of
>>>>> the race condition, and I never found how to resolve it.
>>>>> The consequence is that YARN doesn't talk to the nodes and pig, which
>>>>> means the cluster is useless.
>>>>> The only way to recover that I found so far is to kill all services
>>>>> (pig, harn, compute) and restart from scratch. First connect yarn &
>>>>> compute. Tail the logs until it doesn't move (green relation doesn't mean
>>>>> that it has finished everything). Then connect pig.
>>>>>
>>>>> I have to run and take a plane. If you look at the github for that
>>>>> bundle, the deploy script I made is mostly manual and prevents errors such
>>>>> as this one, but you need to follow its questions one after the other. (see
>>>>> https://github.com/SaMnCo/bundle-flight-delay-demo/blob/master/00-deploy
>>>>> )
>>>>>
>>>>> Best,
>>>>> Sam
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>> Samuel
>>>>>
>>>>> --
>>>>> Samuel Cozannet
>>>>> Cloud, Big Data and IoT Strategy Team
>>>>> Business Development - Cloud and ISV Ecosystem
>>>>> Changing the Future of Cloud
>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD <http://canonical.com> /
>>>>> Juju <https://jujucharms.com>
>>>>> samuel.cozannet at canonical.com
>>>>> mob: +33 616 702 389
>>>>> skype: samnco
>>>>> Twitter: @SaMnCo_23
>>>>>
>>>>> On Fri, Feb 20, 2015 at 3:19 PM, Andrew Brookes <andrew at theasi.co>
>>>>> wrote:
>>>>>
>>>>>> It's deployed. I can connect to juju and the ipython notebook.
>>>>>>
>>>>>> I see this error though:
>>>>>> [image: Inline images 1]
>>>>>>
>>>>>> Also, I'm not sure where the notebooks are stored.
>>>>>>
>>>>>> On 20 February 2015 at 13:18, Andrew Brookes <andrew at theasi.co>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks. I'll try it now.
>>>>>>>
>>>>>>> On 20 February 2015 at 12:15, Samuel Cozannet <
>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>
>>>>>>>> Hey!
>>>>>>>>
>>>>>>>> The airline is a nasty beast :/ because of a race condition between
>>>>>>>> the datanodes and the notebook that also runs some Hadoop components.
>>>>>>>>
>>>>>>>> Can you try with this deployment script:
>>>>>>>> http://bazaar.launchpad.net/~mmenkhof/orange-box-examples/orange-box-examples-new-demo-flight-delay/view/head:/hadoop/flight-delay-demo/01-deploy.sh
>>>>>>>>
>>>>>>>> and let me know if it works?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Samuel
>>>>>>>>
>>>>>>>> --
>>>>>>>> Samuel Cozannet
>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>> Changing the Future of Cloud
>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>> mob: +33 616 702 389
>>>>>>>> skype: samnco
>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>
>>>>>>>> On Fri, Feb 20, 2015 at 12:41 PM, Andrew Brookes <andrew at theasi.co>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Sam,
>>>>>>>>>
>>>>>>>>> We're having some real troubles configuring a hadoop instance for
>>>>>>>>> the airline delay prediction. We've tried the Juju charm and also deploying
>>>>>>>>> manually.
>>>>>>>>>
>>>>>>>>> Apparently when deploying the juju charm the data nodes did not
>>>>>>>>> seem to be communicating with the NameNode.
>>>>>>>>>
>>>>>>>>> Any help on this would be appreciated.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Andy.
>>>>>>>>>
>>>>>>>>> On 26 January 2015 at 13:11, Samuel Cozannet <
>>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hey
>>>>>>>>>>
>>>>>>>>>> As I am doing all the testing, I actually have one up & running:
>>>>>>>>>> https://ec2-54-149-158-178.us-west-2.compute.amazonaws.com
>>>>>>>>>>
>>>>>>>>>> password: secret
>>>>>>>>>>
>>>>>>>>>> Open airline/demo/notebook python
>>>>>>>>>>
>>>>>>>>>> I can also activate the spark one if you need...
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Samuel
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Samuel Cozannet
>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>> skype: samnco
>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>
>>>>>>>>>> On Sun, Jan 25, 2015 at 8:39 PM, Angie Ma <angie at theasi.co>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for this! That's awesome and very interesting! Will have
>>>>>>>>>>> a look through the datasets. I think we'll design a mini project for the
>>>>>>>>>>> fellowship and may be combine with some flight crashes data we've got.
>>>>>>>>>>> Extend it as a hackathon as well. Will keep you posted.
>>>>>>>>>>>
>>>>>>>>>>> On 23 January 2015 at 09:38, Samuel Cozannet <
>>>>>>>>>>> samuel.cozannet at canonical.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey!!
>>>>>>>>>>>>
>>>>>>>>>>>> Just to let you know I had been working on Airline Delay
>>>>>>>>>>>> Prediction
>>>>>>>>>>>> <http://hortonworks.com/blog/data-science-apacheh-hadoop-predicting-airline-delays/> in
>>>>>>>>>>>> python and also the version in Scala
>>>>>>>>>>>> <http://hortonworks.com/blog/data-science-hadoop-spark-scala-part-2/>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> It's now possible to deploy the architecture and notebooks
>>>>>>>>>>>> involved directly as a bundle in Juju:
>>>>>>>>>>>> https://demo.jujucharms.com/~samuel-cozannet/trusty/flight-delay-demo-2/?text=flight#readme
>>>>>>>>>>>>
>>>>>>>>>>>> You can browse the code on:
>>>>>>>>>>>> * https://github.com/SaMnCo/bundle-flight-delay-demo
>>>>>>>>>>>> * https://github.com/SaMnCo/charm-flight-delay-demo
>>>>>>>>>>>>
>>>>>>>>>>>> Let me know if that is a usecase you'd like to use for
>>>>>>>>>>>> hackathons, I can see if it's possible to build a smaller / less expensive
>>>>>>>>>>>> version (that one has 5 quad core/16GB RAM units)...
>>>>>>>>>>>>
>>>>>>>>>>>> Enjoy :)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Samuel
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Samuel Cozannet
>>>>>>>>>>>> Cloud, Big Data and IoT Strategy Team
>>>>>>>>>>>> Business Development - Cloud and ISV Ecosystem
>>>>>>>>>>>> Changing the Future of Cloud
>>>>>>>>>>>> Ubuntu <http://ubuntu.com>  / Canonical UK LTD
>>>>>>>>>>>> <http://canonical.com> / Juju <https://jujucharms.com>
>>>>>>>>>>>> samuel.cozannet at canonical.com
>>>>>>>>>>>> mob: +33 616 702 389
>>>>>>>>>>>> skype: samnco
>>>>>>>>>>>> Twitter: @SaMnCo_23
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> * <http://www.theasi.co>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>>>>>>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>>>>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>>>>>> <http://www.pinterest.com/advskills/>
>>>>>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> * <http://www.theasi.co>*
>>>>>>>
>>>>>>>
>>>>>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>>>>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>>>> <http://www.pinterest.com/advskills/>
>>>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> * <http://www.theasi.co>*
>>>>>>
>>>>>>
>>>>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>>>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>>>> <http://www.pinterest.com/advskills/>
>>>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> * <http://www.theasi.co>*
>>>>
>>>>
>>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>>> <http://www.pinterest.com/advskills/>
>>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>>
>>>
>>>
>>>
>>> --
>>> * <http://www.theasi.co>*
>>>
>>>
>>> *  Andrew Brookes   |   CTO, ASI   e: andrew at theasi.co ・ *
>>> * m: +44 (0) 7888 675 230 ・  skype: brookesey
>>> <https://twitter.com/advskills> <https://www.facebook.com/theasi.co>
>>> <http://www.pinterest.com/advskills/>
>>> <https://www.linkedin.com/company/advanced-skills-initiative>*
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150225/5a0e3240/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 177871 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/juju/attachments/20150225/5a0e3240/attachment-0001.png>


More information about the Juju mailing list