Java version of the SchemaRDD API causes high maintenance burden for Spark SQL itself and downstream libraries (e.g. MLlib pipeline API needs to support both JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it usable for Java, and then we can remove the Java specific version.
Things to remove include (Java version of):
- data type
Things to consider:
- Scala and Java have a different collection library.
- Scala and Java (8) have different closure interface.
- Scala and Java can have duplicate definitions of common classes, such as BigDecimal.