Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7480

initializeFunctionalIndexPartition is called multiple times

    XMLWordPrintableJSON

Details

    Description

      This is due to a issue in 
      initializeFromFilesystem(), which tries to check if MDT partition needs to be initialized based on the absence of partition-type. But for functional index, partition-type actually store the prefix (func_index_)- hence the check always fails and we try to reinit the same functional index partition again.
       
      Simple test:

      spark.sql(
      s"""

      create table $tableName (
      id int,
      name string,
      price double,
      ts long
      ) using hudi
      options (
      primaryKey ='id',
      type = '$tableType',
      preCombineField = 'ts',
      hoodie.metadata.record.index.enable = 'true',
      hoodie.datasource.write.recordkey.field = 'id'
      )
      partitioned by(ts)
      location '$basePath'
      """.stripMargin)
      spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
      spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)")
      spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)")
       
      var createIndexSql = s"create index idx_datestr on $tableName using column_stats(ts) options(func='from_unixtime', format='yyyy-MM-dd')"
      spark.sql(createIndexSql)
       
      – This insert throws null-pointer exception
      spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)")

      Attachments

        Issue Links

          Activity

            People

              codope Sagar Sumit
              vinay.bhat Vinaykumar Bhat
              Danny Chen, Ethan Guo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: