Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
Description
when finish the sixth operation where two records inserted and `compaction` in `TestMORDataSource.testCount`, `hudiIncDF6.count()` returns 152. Because there are 150 records which just have finished the `compaction` and consist of 100 records updated in the second and third times and 50 records updated in the fifth updated, and 2 records inserted in the six time.
The right answer should be 2, and 150 records should not be counted in.
The reason is that `compaction` has changed the commit time of some records which are updated later and stored in log file.
val hudiIncDF6 = spark.read.format("org.apache.hudi") .option(DataSourceReadOptions.QUERY_TYPE.key, DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL) .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time) .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time) .load(basePath) // compaction updated 150 rows + inserted 2 new row assertEquals(152, hudiIncDF6.count())
Attachments
Issue Links
- relates to
-
HUDI-44 Compaction must preserve commit timestamps of merged records #376
- Closed
- links to