[ZEPPELIN-2160] PySpark: Matplotlib Integration extremely slow - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.7.0
Fix Version/s: None
Component/s: front-end, GUI
Labels:
None

Description

Issue:
I tested matplotlib integration in Pyspark. As a baseline, the following 3 examples took at 1 - 2 seconds in Jupyter on the same machine.

%pyspark

import matplotlib.pyplot as plt
plt.plot([1,2,3,4])
plt.ylabel('some numbers')
z.show(plt)

==> 1 sec

%pyspark

import numpy as np
import matplotlib.pyplot as plt

# Fixing random state for reproducibility
np.random.seed(19680801)

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)

plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)
plt.show()

==> 11 sec

%pyspark
from ggplot import *

ggplot(diamonds, aes(x='price', fill='cut')) +\
    geom_density(alpha=0.25) +\
    facet_wrap("clarity")

==> 138 sec

Environment:
Downloaded http://apache.mirror.digionline.de/zeppelin/zeppelin-0.7.0/zeppelin-0.7.0-bin-netinst.tgz and installed spark, python, sh, md and angular interpreter
Started via bin/zeppelin.sh

Attachments

Issue Links

duplicates

ZEPPELIN-1894 Matplotlib is very slow in python interpreter

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Bernhard Walter

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Feb/17 16:11

Updated:: 23/Feb/17 21:44