Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-8724 Bug fixes - Phase 1
  3. HUDI-8312

Support YYYY-MM-DD partition format with hive

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • 1.0.1
    • None
    • None

    Description

      Currently a format like YYYY-MM-DD fails when syncing with hive. The Jira aims to add a fix so that such a format is supported.
      Steps to reproduce: The table created below uses a custom keygen with combination of simple and timestamp keygen. Timestamp keygen produces an output of format - YYYY-MM-DD

      import org.apache.hudi.HoodieSparkUtils
      import org.apache.hudi.common.config.TypedProperties
      import org.apache.hudi.common.util.StringUtils
      import org.apache.hudi.exception.HoodieException
      import org.apache.hudi.functional.TestSparkSqlWithCustomKeyGenerator._
      import org.apache.hudi.testutils.HoodieClientTestUtils.createMetaClient
      import org.apache.hudi.util.SparkKeyGenUtilsimport org.apache.spark.sql.SaveMode
      import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
      import org.joda.time.DateTime
      import org.joda.time.format.DateTimeFormat
      import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, assertTrue}
      import org.slf4j.LoggerFactory
          val df = spark.sql(
            s"""SELECT 1 as id, 'a1' as name, 1.6 as price, 1704121827 as ts, 'cat1' as segment
               | UNION
               | SELECT 2 as id, 'a2' as name, 10.8 as price, 1704121827 as ts, 'cat1' as segment
               | UNION
               | SELECT 3 as id, 'a3' as name, 30.0 as price, 1706800227 as ts, 'cat1' as segment
               | UNION
               | SELECT 4 as id, 'a4' as name, 103.4 as price, 1701443427 as ts, 'cat2' as segment
               | UNION
               | SELECT 5 as id, 'a5' as name, 1999.0 as price, 1704121827 as ts, 'cat2' as segment
               | UNION
               | SELECT 6 as id, 'a6' as name, 80.0 as price, 1704121827 as ts, 'cat3' as segment
               |""".stripMargin)    df.write.format("hudi").option("hoodie.datasource.write.table.type", "MERGE_ON_READ").option("hoodie.datasource.write.keygenerator.class<span class="code-quote">", "org.apache.hudi.keygen.CustomAvroKeyGenerator").option("hoodie.datasource.write.partitionpath.field", "segment:simple,ts:timestamp").option("hoodie.datasource.write.recordkey.field", "id").option("hoodie.datasource.write.precombine.field", "name").option("hoodie.table.name", "hudi_table_2").option("hoodie.insert.shuffle.parallelism", "1").option("hoodie.upsert.shuffle.parallelism", "1").option("hoodie.bulkinsert.shuffle.parallelism", "1").option("hoodie.keygen.timebased.timestamp.type", "SCALAR").option("hoodie.keygen.timebased.output.dateformat", "yyyy-MM-DD").option("hoodie.keygen.timebased.timestamp.scalar.time.unit", "seconds").mode(SaveMode.Overwrite).save("/user/hive/warehouse/hudi_table_2") 
      
      // Sync with hive
      /var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
        --jdbc-url jdbc:hive2://hiveserver:10000 \
        --user hive \
        --pass hive \
        --partitioned-by segment,ts \
        --base-path /user/hive/warehouse/hudi_table_2 \
        --database default \
        --table hudi_table_2 \
        --partition-value-extractor org.apache.hudi.hive.MultiPartKeysValueExtractor     

      Error

      2024-10-06 15:18:22,220 INFO  [main] ddl.JDBCExecutor (JDBCExecutor.java:runSQL(67)) - Executing SQL ALTER TABLE `default`.`hudi_table_2_ro` ADD IF NOT EXISTS   PARTITION (`segment`='cat1',`ts`='2024-10-01') LOCATION '/user/hive/warehouse/hudi_table_2/cat1/2024-10-01'   PARTITION (`segment`='cat2',`ts`='2023-10-01') LOCATION '/user/hive/warehouse/hudi_table_2/cat2/2023-10-01'   PARTITION (`segment`='cat2',`ts`='2024-10-01') LOCATION '/user/hive/warehouse/hudi_table_2/cat2/2024-10-01'   PARTITION (`segment`='cat3',`ts`='2024-10-01') LOCATION '/user/hive/warehouse/hudi_table_2/cat3/2024-10-01'
      2024-10-06 15:18:22,299 INFO  [main] hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, current connections: 0
      Exception in thread "main" org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing hudi_table_2
          at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:180)
          at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:547)
      Caused by: org.apache.hudi.hive.HoodieHiveSyncException: failed to sync the table hudi_table_2_ro
          at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:272)
          at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:203)
          at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:177)
          ... 1 more
      Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table hudi_table_2_ro
          at org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:474)
          at org.apache.hudi.hive.HiveSyncTool.validateAndSyncPartitions(HiveSyncTool.java:321)
          at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:261)
          ... 3 more
      Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL ALTER TABLE `default`.`hudi_table_2_ro` ADD IF NOT EXISTS   PARTITION (`segment`='cat1',`ts`='2024-10-01') LOCATION '/user/hive/warehouse/hudi_table_2/cat1/2024-10-01'   PARTITION (`segment`='cat2',`ts`='2023-10-01') LOCATION '/user/hive/warehouse/hudi_table_2/cat2/2023-10-01'   PARTITION (`segment`='cat2',`ts`='2024-10-01') LOCATION '/user/hive/warehouse/hudi_table_2/cat2/2024-10-01'   PARTITION (`segment`='cat3',`ts`='2024-10-01') LOCATION '/user/hive/warehouse/hudi_table_2/cat3/2024-10-01'
          at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:70)
          at org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.lambda$addPartitionsToTable$0(QueryBasedDDLExecutor.java:125)
          at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
          at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
          at org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.addPartitionsToTable(QueryBasedDDLExecutor.java:125)
          at org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:118)
          at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:516)
          at org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:470)
          ... 5 more
      Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10248]: Cannot add partition column ts of type string as it cannot be converted to type int
          at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267)
      -Dspark3.5 -Dscala-2.12
          at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:253)
          at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:313)
          at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:253)
          at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:68)
          ... 12 more
      Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10248]: Cannot add partition column ts of type string as it cannot be converted to type int
          at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
          at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
          at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
          at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
          at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
          at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
          at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
          at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
          at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
          at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
          at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
          at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
          at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
          at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Cannot add partition column ts of type string as it cannot be converted to type int
          at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.validatePartColumnType(BaseSemanticAnalyzer.java:1582)
          at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.validatePartSpec(BaseSemanticAnalyzer.java:1536)
          at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getValidatedPartSpec(DDLSemanticAnalyzer.java:2096)
          at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:2866)
          at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:285)
          at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
          at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
          at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
          at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
          at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
          ... 15 more 

      Attachments

        Activity

          People

            ljain Lokesh Jain
            ljain Lokesh Jain
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: