- Currently, Zeppelin visualization ability is focus mainly on developers who know code in Scala Spark and SQL. I want to make Zeppelin easier for normal users who want to use Zeppelin visualization to analysis their data but have little coding knowledge.
- I will focus on analysing CSV files on Spark (or another structured file such as logs) which normal users want most to analysis and view visualization.
- I will create a function on a Zeppelin notebook to let the users choose CSV file to upload then I will read all fields in file to show for users to select: which fields they want to analysis, what is the name of fields they want to display, which data type of the fields (in some basic and simple types such as: String, Integer, Double, DateTime, Boolean).
- Then this function will upload this file to server and generates code to process file in Scala Spark. And next I will use Zeppelin REST API to create new paragraph (available from v0.6.0) and automatically run paragraph. After that I will have table object in Spark environment and I will create the basic SQL command: select * from <table_name> then using Zeppelin pivot table function to analysis data.
- As a result, the users will not have to code any more to have data to be analysed and visualized. If the users have code experience, they will use the generated code to make more complex SQL commands.
- By the way, I want to add more options to Zeppelin REST API such as:
+ When run a paragraph, you can choose which chart will be displayed first rather than table as default.