Details
Description
Check Java compatibility for this release:
- APIs in spark.ml
- New APIs in spark.mllib (There should be few, if any.)
Checking compatibility means:
- Checking for differences in how Scala and Java handle types. Some items to look out for are:
- Check for generic "Object" types where Java cannot understand complex Scala types.
- Note: The Java docs do not always match the bytecode. If you find a problem, please verify it using javap.
- Check Scala objects (especially with nesting!) carefully. These may not be understood in Java, or they may be accessible only via the weirdly named Java types (with "$" or "#") which are generated by the Scala compiler.
- Check for uses of Scala and Java enumerations, which can show up oddly in the other language's doc. (In spark.ml, we have largely tried to avoid using enumerations, and have instead favored plain strings.)
- Check for generic "Object" types where Java cannot understand complex Scala types.
- Check for differences in generated Scala vs Java docs. E.g., one past issue was that Javadocs did not respect Scala's package private modifier.
If you find issues, please comment here, or for larger items, create separate JIRAs and link here as "requires".
- Remember that we should not break APIs from previous releases. If you find a problem, check if it was introduced in this Spark release (in which case we can fix it) or in a previous one (in which case we can create a java-friendly version of the API).
- If needed for complex issues, create small Java unit tests which execute each method. (Algorithmic correctness can be checked in Scala.)
Recommendations for how to complete this task:
- There are not great tools. In the past, this task has been done by:
- Generating API docs
- Building JAR and outputting the Java class signatures for MLlib
- Manually inspecting and searching the docs and class signatures for issues
- If you do have ideas for better tooling, please say so we can make this task easier in the future!