Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3213

compaction should not change the commit time

    XMLWordPrintableJSON

Details

    Description

      when finish the sixth operation where two records inserted and `compaction` in `TestMORDataSource.testCount`,  `hudiIncDF6.count()` returns 152. Because there are 150 records which just have finished the `compaction` and consist of 100 records updated in the second and  third times and 50 records updated in the fifth updated, and 2 records inserted in the six time.

      The right answer should be 2, and 150 records should not be counted in.

      The reason is that `compaction` has changed the commit time of some records which are updated later and stored in log file. 

      val hudiIncDF6 = spark.read.format("org.apache.hudi")
        .option(DataSourceReadOptions.QUERY_TYPE.key, DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
        .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time)
        .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time)
        .load(basePath)
      // compaction updated 150 rows + inserted 2 new row
      assertEquals(152, hudiIncDF6.count()) 

       

       

      Attachments

        Issue Links

          Activity

            People

              biyan900116@gmail.com Yann Byron
              biyan900116@gmail.com Yann Byron
              sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 2h
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified