%pyspark in Zeppelin: No module named pyspark error

Sat Jul 16 15:50:19 UTC 2016

The issue with the Bigtop Charms is tracked here:
https://github.com/juju-solutions/layer-apache-bigtop-base/issues/14

@Cory was looking into a possible fix. Any update on this?

2016-07-14 17:35 GMT+02:00 Konstantinos Tsakalozos <
kos.tsakalozos at canonical.com>:

> Hi Gregory,
>
> Thank you for your feedback. We are looking forward on any more
> information you can give us.
>
> In the mean time, you can use revision 80 of apache-spark charm on
> bigdata-dev to verify that the patch for pyspark does indeed address
> the issue you are seeing. You can deploy the revision 80 with:
> "juju deploy cs:~bigdata-dev/apache-spark-80"
>
> Thanks,
> Konstantinos
>
>
> On Thu, Jul 14, 2016 at 5:33 PM, Gregory Van Seghbroeck
> <gregory.vanseghbroeck at intec.ugent.be> wrote:
> > Hi Konstantinos,
> >
> > Thanks a lot!! I'll give it a try after my holidays.
> >
> > I still have to answer your question about the bigtop charms. Here it
> goes ... my apologies for being vague with versions and stuff, it`s from a
> while back.
> > What I did, was deploying a small HDFS setup using the big top charms.
> We always set things up in LXC containers on bare metal servers. Management
> of these bare metal servers is out of our hands, it is provided by our
> Emulab system.
> > Everything seemed to go fine, except the relations part. The resource
> manager needs FQDN to set things up properly. Unfortunately, resolving the
> FQDNs is something that fails. It has to do with how the physical system is
> set up and how the networking is handled between the LXC containers. This
> is something one of my colleagues (Merlijn Sebrechts, probably not a
> stranger to you or at least not to the community) has created for us. My
> workaround at that moment was to manually add all the FQDNs in the
> /etc/hosts file. Sufficient at that time, but not workable in the long run.
> So I asked my colleague if he could simply add this in the charms that
> provide the networking, but he responded to me that something like this
> actually should be handled in the big top charms, since the failing
> relation is part of that charm.
> > I probably should have gone directly to the authors of the big top
> charms, but you were so helpful to also respond to this issue.
> >
> > If things are not clear, I'll try to reproduce this issue on our system
> and will come back to you in a week or so.
> >
> > Kind regards and thanks again for your help.
> > Gregory
> >
> > -----Original Message-----
> > From: Konstantinos Tsakalozos [mailto:kos.tsakalozos at canonical.com]
> > Sent: Thursday, July 14, 2016 3:40 PM
> > To: Gregory Van Seghbroeck <gregory.vanseghbroeck at intec.ugent.be>
> > Cc: bigdata at lists.ubuntu.com; Kevin Monroe <kevin.monroe at canonical.com>
> > Subject: Re: %pyspark in Zeppelin: No module named pyspark error
> >
> > Hi Gregory,
> >
> > Done some more testing today and submitted a patch for review.
> >
> > The line:
> > "spark.driver.extraClassPath
> > /usr/lib/hadoop/share/hadoop/common/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar"
> > will fix only spark-shell
> > For pyspark the line to be added to spark-defaults.conf is slightly
> different:
> > "spark.jars
> /usr/lib/hadoop/share/hadoop/common/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar"
> >
> > We have a patch under review
> > https://github.com/juju-solutions/layer-apache-spark/pull/25 so that
> you will not have to do any editing.
> >
> > Thanks,
> > Konstantinos
> >
> >
> >
> > On Wed, Jul 13, 2016 at 8:47 PM, Konstantinos Tsakalozos <
> kos.tsakalozos at canonical.com> wrote:
> >> Hi Gregory,
> >>
> >> Here is what I have so far.
> >>
> >> When in yarn-client mode pyspark jobs fail with "pyspark module not
> >> present": http://pastebin.ubuntu.com/19266710/
> >> Most probably this is because the execution end-nodes are not spark
> >> nodes, they are just hadoop nodes without pyspark installed.
> >> You will need to  run the job you have in a spark cluster setup in
> >> standalone execution mode scaled to match your needs.
> >> Relating spark to the hadoop-plugin will give you access to HDFS.
> >>
> >> In this setup you will need to manually go and add the following line:
> >> "spark.driver.extraClassPath
> >> /usr/lib/hadoop/share/hadoop/common/lib/hadoop-lzo-0.4.20-SNAPSHOT.jar"
> >> inside /etc/spark/conf/spark-defaults.conf
> >> We are working on a patch to remove this extra manual step.
> >>
> >> A couple of asks from our side:
> >> - Would it be possible to share with us the job you are running so
> >> that we verify we have addressed your use-case?
> >> - You mentioned problems with using the spark charm that is based on
> >> Apache Bigtop. Would it be possible to provide us with more info on
> >> what is not working there?
> >>
> >> We would like to thank you for your feedback as it allows us to
> >> improve our work.
> >>
> >> Thanks,
> >> Konstantinos
> >>
> >>
> >> On Tue, Jul 12, 2016 at 9:55 PM, Kevin Monroe
> >> <kevin.monroe at canonical.com>
> >> wrote:
> >>>
> >>> I think i accidentally discarded kostas' message.  Sorry about that!
> >>>
> >>> Gregory, Kostas is working on reproducing your env.. We should know
> >>> more in the next day or so.
> >>>
> >>> ---------- Forwarded message ----------
> >>> From: Konstantinos Tsakalozos <kos.tsakalozos at canonical.com>
> >>> Date: Tue, Jul 12, 2016 at 10:39 AM
> >>> Subject: Re: %pyspark in Zeppelin: No module named pyspark error
> >>> To: Gregory Van Seghbroeck <gregory.vanseghbroeck at intec.ugent.be>
> >>> Cc: Kevin Monroe <kevin.monroe at canonical.com>,
> >>> bigdata at lists.ubuntu.com
> >>>
> >>>
> >>> Hi Gregory,
> >>>
> >>> Thank you for the info you provided. I will need some time to setup
> >>> the deployment you just described and try to reproduce the error. I
> >>> guess any pyspark job should have the same effect.
> >>>
> >>> Thanks,
> >>> Konstantinos
> >>>
> >>> On Tue, Jul 12, 2016 at 11:31 AM, Gregory Van Seghbroeck
> >>> <gregory.vanseghbroeck at intec.ugent.be> wrote:
> >>>>
> >>>> Hi Kevin,
> >>>>
> >>>>
> >>>>
> >>>> Thanks for the response! Really like the juju and canonical community.
> >>>>
> >>>>
> >>>>
> >>>> I can tell you the juju version. This is 1.25.3.
> >>>>
> >>>> The status will be a problem, since I removed most of the services.
> >>>> This being said, I don’t think we are already using the bigtop spark
> >>>> charms, so this might be the problem. Here a list of the services I
> deployed before:
> >>>>
> >>>> -          cs:trusty/apache-hadoop-namenode-2
> >>>>
> >>>> -          cs:trusty/apache-hadoop-resourcemanager-3
> >>>>
> >>>> -          cs:trusty/apache-hadoop-slave-2
> >>>>
> >>>> -          cs:trusty/apache-hadoop-plugin-14
> >>>>
> >>>> -          cs:trusty/apache-spark-9
> >>>>
> >>>> -          cs:trusty/apache-zeppelin-7
> >>>>
> >>>>
> >>>>
> >>>> The reason why we don’t use the bigtop charms yet, is that we see
> >>>> problems with the hostnames on the containers. Some of the relations
> >>>> use hostnames, but these cannot be resolved. So I have to add the
> >>>> mapping between IPs and hostnames manually to the /etc/hosts file.
> >>>>
> >>>>
> >>>>
> >>>> The image I pasted in, showing our environment, was a screenshot of
> >>>> the Zeppelin environment. These parameters looked oké from what I
> >>>> could find online.
> >>>>
> >>>>
> >>>>
> >>>> Kind Regards,
> >>>>
> >>>> Gregory
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> From: Kevin Monroe [mailto:kevin.monroe at canonical.com]
> >>>> Sent: Monday, July 11, 2016 7:20 PM
> >>>> To: Gregory Van Seghbroeck <gregory.vanseghbroeck at intec.ugent.be>
> >>>> Cc: bigdata at lists.ubuntu.com
> >>>> Subject: Re: %pyspark in Zeppelin: No module named pyspark error
> >>>>
> >>>>
> >>>>
> >>>> Hi Gregory,
> >>>>
> >>>>
> >>>>
> >>>> I wasn't able to see your data after "Our environment is set up as
> >>>> follows:"
> >>>>
> >>>>
> >>>>
> >>>> <big black box for me>
> >>>>
> >>>>
> >>>>
> >>>> Will you reply with the output (or a pastebin link) with the
> following:
> >>>>
> >>>>
> >>>>
> >>>> juju version
> >>>>
> >>>> juju status --format=tabular
> >>>>
> >>>>
> >>>>
> >>>> Kostas has found a potential zeppelin issue in the bigtop charms
> >>>> where the bigtop spark offering may be too old.  Knowing your juju
> >>>> and charm versions will help me know if your issue is related.
> >>>>
> >>>>
> >>>>
> >>>> Thanks!
> >>>>
> >>>> -Kevin
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Jul 11, 2016 at 7:36 AM, Gregory Van Seghbroeck
> >>>> <gregory.vanseghbroeck at intec.ugent.be> wrote:
> >>>>
> >>>> Dear,
> >>>>
> >>>>
> >>>>
> >>>> We have deployed Zeppelin with juju and connected it to Spark.
> >>>> According to juju everything went well. We can see this is indeed
> >>>> the case; when we try to execute one of the Zeppelin tutorials we see
> some nice graphs.
> >>>> However, if we try to use the python interpreter (%pyspark) we
> >>>> always get an error.
> >>>>
> >>>>
> >>>> Kind Regards,
> >>>>
> >>>> Gregory
> >>>>
> >>>>
> >>>> --
> >>>> Bigdata mailing list
> >>>> Bigdata at lists.ubuntu.com
> >>>> Modify settings or unsubscribe at:
> >>>> https://lists.ubuntu.com/mailman/listinfo/bigdata
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Bigdata mailing list
> >>>> Bigdata at lists.ubuntu.com
> >>>> Modify settings or unsubscribe at:
> >>>> https://lists.ubuntu.com/mailman/listinfo/bigdata
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bigdata mailing list
> >>> Bigdata at lists.ubuntu.com
> >>> Modify settings or unsubscribe at:
> >>> https://lists.ubuntu.com/mailman/listinfo/bigdata
> >>>
> >
>
> --
> Bigdata mailing list
> Bigdata at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/bigdata
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/bigdata/attachments/20160716/860503f9/attachment.html>