Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-528

Incremental Pull fails when latest commit is empty

    XMLWordPrintableJSON

Details

    Description

      When trying to create an incremental view of a dataset, an exception is thrown when the latest commit in the time range is empty. In order to determine the schema of the dataset, Hudi will grab the latest commit file, parse it, and grab the first metadata file path. If the latest commit was empty though, the field which is used to determine file paths (partitionToWriteStats) will be empty causing the following exception:

       

       

      java.util.NoSuchElementException
        at java.util.HashMap$HashIterator.nextNode(HashMap.java:1447)
        at java.util.HashMap$ValueIterator.next(HashMap.java:1474)
        at org.apache.hudi.IncrementalRelation.<init>(IncrementalRelation.scala:80)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:65)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:46)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
      
      

      Attachments

        Issue Links

          Activity

            People

              garyli1019 Yanjia Gary Li
              vega Luis Vega
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: