Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7579

Functional index (on col stats) creation fails to process all files/partitions

    XMLWordPrintableJSON

Details

    Description

      Creating a functional index on an existing table fails to process all files and partitions of the table. The col-stats MDT partition ends up having an entry only for subset of files that belong to the table. An example follows.

       

      The following create-table and inserts should create a table with 3 partitions (with each partition having one slice){}

          spark.sql(
           s"""
              |create table test_table(
              |  id int,
              |  name string,
              |  ts long,
              |  price int
              |) using hudi
              | options (
              |  primaryKey ='id',
              |  type = 'cow',
              |  preCombineField = 'ts',
              |  hoodie.metadata.record.index.enable = 'true',
              |  hoodie.datasource.write.recordkey.field = 'id'
              | )
              | partitioned by(price)
              | location '$basePath'
      """.stripMargin)
         spark.sql(s"insert into test_table (id, name, ts, price) values(1, 'a1', 1000, 10)")
         spark.sql(s"insert into test_table (id, name, ts, price) values(2, 'a2', 200000, 100)")
         spark.sql(s"insert into test_table (id, name, ts, price) values(3, 'a3', 2000000000, 1000)")

      Now create a functional index (using col stats) on this table. The col-stat in the MDT should have three entries (representing column level stats for 3 files). However, col stats only has one single entry (for one of the file).
       

      var createIndexSql = s"create index idx_datestr on test_table using column_stats(ts) options(func='from_unixtime', format='yyyy-MM-dd')"
      
      spark.sql(createIndexSql)
      spark.sql(s"select key, type, ColumnStatsMetadata from hudi_metadata('test_table') where type = 3").show(false) 

      As seen below, col-stats has only one entry for one of the file (and is missing statistics for two other files): *{32490467-702f-4bb4-81e8-91082da9baf0-0_0-28-66_20240409095623406.parquet, ts, {null, null, null, null, null, null,

      {1970-01-01}, null, null, null, null}, {null, null, null, null, null, null, {1970-01-01}

      , null, null, null, null}, 1, 0, 434874, 869748, false}*

       

       +------------------------------------------------+----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      |key                                             |type|ColumnStatsMetadata                                                                                                                                                                                                                                                |
      +------------------------------------------------+----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      |oyTjviKHuhI=/vI1OU7mFjI=Ev9dj4Bf3S0TEjEiWebRSQ==|3   |{32490467-702f-4bb4-81e8-91082da9baf0-0_0-28-66_20240409095623406.parquet, ts, {null, null, null, null, null, null, {1970-01-01}, null, null, null, null}, {null, null, null, null, null, null, {1970-01-01}, null, null, null, null}, 1, 0, 434874, 869748, false}|
      +------------------------------------------------+----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

       
       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vinay.bhat Vinaykumar Bhat
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: