Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-19452

[Analytics] Use constant reference time during bulk read process

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • NA
    • Analytics Library
    • None

    Description

      Bulk reader leverages a time provider that returns the current time during read to guide compaction and validation.

      As the current time value varies in spark executors, there is a chance that rows/cells get expired inconsistently. Another issue is the validation on no-expired rows/cells after compaction might fail, since they could expire during read. The read can take minutes or even hours.
      It could lead to false data omission and job failure.

      The fix is to use constant reference time that is decided by Spark driver and distribute to all executors. The reference time is used for compaction and validation later.

      Attachments

        Issue Links

          Activity

            People

              yifanc Yifan Cai
              yifanc Yifan Cai
              Yifan Cai
              Francisco Guerrero, James Berragan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h