Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2772

Deltastreamer fails to read checkpoint from previous commit metadata by spark writer on continuous mode where there is no data in source

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • multi-writer
    • None

    Description

      Even after setting the right config to copy over deltastreamer checkpoint, deltastreamer fails to read the checkpoint from previous commit metadata. This is not something that happens in general. In this case, in continuous mode, there is no data in source (parquet dfs) folder and so deltatastreamer continuously checks source folder and also loads last checkpoint from timeline metadata. So, with this set up, when a write from spark-datasource is triggered, deltastreamer immediately fails to read the checkpoint from the completed spark-writer commit.  But if deltastreamer is restarted, the exception is not seen and picks up the checkpoint. 

      I induced a 1 sec delay in continuous mode and things were fine too. 

       

      Setup:

      Deltastreamer in continuous mode. source folder did not have any data, and so deltastreamer was checking source folder and fetching latest checkpoint from commit metadata in quick succession. 

      And triggered a concurrent write from spark-datasource. 

       

      I inspected the last commit.completed instant(that was reported by deltastreamer) made by spark writer and it looks ok to me. 

      grep "checkpoint" /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074129737.deltacommit
          "deltastreamer.checkpoint.key" : "1637066483000" 

      But after the below exception, if I restart deltastreamer, it just runs fine. Very strange? I was able to reprod this 2 times out of 5.  

      here is the checkpoint from last delta commit by deltastreamer (which matches the entry found by delta commit by spark writer above)

      grep "checkpoint" /tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074123384.deltacommit
          "deltastreamer.checkpoint.key" : "1637066483000" 

       

      I also check detlastreamer code and we do look at only completed instants and the completed commit metadata. So, not sure why is this happening. 

      stacktrace: 

      21/11/16 10:51:15 WARN HoodieDeltaStreamer: Next round 
      21/11/16 10:51:15 WARN DeltaSync: Extra metadata :: 20211116105105578, 20211116105105578.deltacommit, = [schema, deltastreamer.checkpoint.key]
      21/11/16 10:51:15 WARN HoodieDeltaStreamer: Next round 
      21/11/16 10:51:15 WARN DeltaSync: Extra metadata :: 20211116105105578, 20211116105105578.deltacommit, = [schema, deltastreamer.checkpoint.key]
      21/11/16 10:51:15 WARN HoodieDeltaStreamer: Next round 
      21/11/16 10:51:15 WARN DeltaSync: Extra metadata :: 20211116105112814, 20211116105112814.deltacommit, = []
      21/11/16 10:51:15 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception
      org.apache.hudi.utilities.exception.HoodieDeltaStreamerException: Unable to find previous checkpoint. Please double check if this table was indeed built via delta streamer. Last Commit :Option{val=[20211116105112814__deltacommit__COMPLETED]}, Instants :[[20211116104228269__deltacommit__COMPLETED], [20211116104553080__deltacommit__COMPLETED], [20211116104759622__deltacommit__COMPLETED], [20211116105105578__deltacommit__COMPLETED], [20211116105112814__deltacommit__COMPLETED]], CommitMetadata={
        "partitionToWriteStats" : { },
        "compacted" : false,
        "extraMetadata" : { },
        "operationType" : "UNKNOWN",
        "fileIdAndRelativePaths" : { },
        "totalRecordsDeleted" : 0,
        "totalLogRecordsCompacted" : 0,
        "totalLogFilesCompacted" : 0,
        "totalCompactedRecordsUpdated" : 0,
      
      
      
      

      Attachments

        Activity

          People

            harsh1231 Harshal Patil
            shivnarayan sivabalan narayanan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: