Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7.1
-
None
-
None
Description
Users often get confused about the right way to use the Kudu-Spark integration. The most common dangerous result is that they create multiple Kudu clients, sometimes even one per task. It's pretty easy to overwhelm the master in this way, e.g., with a 2 second batch window and a client per task in a Spark streaming job. We should take our current minimal Spark docs and provide better examples and bigger, louder, redder warnings about making extra Kudu clients. Users should be directed to use the KuduContext exclusively. When a client is needed, the client instance inside the KuduContext should be used.