Details
-
Documentation
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.3.0
-
None
Description
We have (as far as I know) maintained backwards compatibility for ML persistence, but this is not documented anywhere. I'd like us to document it (for spark.ml, not for spark.mllib).
I'd recommend something like:
In general, MLlib maintains backwards compatibility for ML persistence. I.e., if you save an ML model or Pipeline in one version of Spark, then you should be able to load it back and use it in a future version of Spark. However, there are rare exceptions, described below.
Model persistence: Is a model or Pipeline saved using Apache Spark ML persistence in Spark version X loadable by Spark version Y?
- Major versions: No guarantees, but best-effort.
- Minor and patch versions: Yes; these are backwards compatible.
- Note about the format: There are no guarantees for a stable persistence format, but model loading itself is designed to be backwards compatible.
Model behavior: Does a model or Pipeline in Spark version X behave identically in Spark version Y?
- Major versions: No guarantees, but best-effort.
- Minor and patch versions: Identical behavior, except for bug fixes.
For both model persistence and model behavior, any breaking changes across a minor version or patch version are reported in the Spark version release notes. If a breakage is not reported in release notes, then it should be treated as a bug to be fixed.
How does this sound?
Note: We unfortunately don't have tests for backwards compatibility (which has technical hurdles and can be discussed in SPARK-15573). However, we have made efforts to maintain it during PR review and Spark release QA, and most users expect it.
Attachments
Issue Links
- is related to
-
SPARK-15573 Backwards-compatible persistence for spark.ml
- Resolved
- links to