Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.0
-
None
Description
The SHS in 2.3.0 has the ability to serialize history data to disk (see SPARK-18085 and its sub-tasks). This means that if either the serialized data or the disk format changes, the code needs to be modified to either support the old formats, or discard the old data (and re-create it from logs).
We should add integration tests that help us detect whether one of these changes has occurred. The should check data generated by old versions of Spark and fail if that data cannot be read back.
The Hive suites recently added the ability to download old Spark versions and generate data from those old versions to test that new code can read it, we could use something similar to test this (starting with when 2.3.0 is released).