[ZEPPELIN-4358] Seaborn renders plots slowly in apache zeppelin notebooks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.8.1
Fix Version/s: None
Component/s: pySpark
Labels:
None

Description

I am currently trying to generate visualizations in zeppelin (0.8.1) notebooks using the pyspark interpreter with python 3.7.3.

Generating the following simple plot with seaborn (0.9.0) takes around 5 minutes (with very high CPU usage throughout the duration):

%pyspark
import seaborn as sns
import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.rand(100,3))

sns.pairplot(data)

This behavior is rather inconsistent as the following (much more data intensive) plot is rendered instantly

%pyspark
import seaborn as sns
import numpy as np
import pandas as pd

df = pd.DataFrame(data = np.random.rand(10000,2))

sns.lineplot(x = 0, y = 1, data = df)

I noticed that using matplotlib (3.1.0) is generally much faster for and almost as snappy as I am used to from jupyter notebook environments.

I have already read about issue [~~ZEPPELIN-1894~~|https://jira.apache.org/jira/browse/ZEPPELIN-1894] but I can render the mentioned scatterplot instantly as well.

I already stated my question on StackOverflow but I think here is a better place:

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nawid Sayed

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Sep/19 21:06

Updated:: 29/Sep/19 19:07