Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-607

Hive sync fails to register tables partitioned by Date Type column

    XMLWordPrintableJSON

Details

    Description

      Issue Description

      As part of spark to avro conversion, Spark's Date type is represented as corresponding Date Logical Type in Avro, which is underneath represented in Avro by physical Integer type. For this reason when forming the Avro records from Spark rows, it is converted to corresponding Epoch day to be stored as corresponding Integer value in the parquet files.

      However, this manifests into a problem that when a Date Type column is chosen as partition column. In this case, Hudi's partition column _hoodie_partition_path also gets the corresponding epoch day integer value when reading the partition field from the avro record, and as a result syncing partitions in hudi table issues a command like the following, where the date is an integer:

      ALTER TABLE uditme_hudi.uditme_hudi_events_cow_feb05_00 ADD IF NOT EXISTS   PARTITION (event_date='17897') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17897'   PARTITION (event_date='17898') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17898'   PARTITION (event_date='17899') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17899'   PARTITION (event_date='17900') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17900'

      Hive is not able to make sense of the partition field values like 17897 as it is not able to convert it to corresponding date from this string. It actually expects the actual date to be represented in string form.

      So, we need to make sure that Hudi's partition field gets the actual date value in string form, instead of the integer. This change makes sure that when a fields value is retrieved from the Avro record, we check that if its Date Logical Type we return the actual date value, instead of the epoch. After this change the command for sync partitions issues is like:

      ALTER TABLE `uditme_hudi`.`uditme_hudi_events_cow_feb05_01` ADD IF NOT EXISTS   PARTITION (`event_date`='2019-01-01') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-01'   PARTITION (`event_date`='2019-01-02') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-02'   PARTITION (`event_date`='2019-01-03') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-03'   PARTITION (`event_date`='2019-01-04') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-04'

      Stack Trace

      20/01/13 23:28:04 INFO HoodieHiveClient: Last commit time synced is not known, listing all partitions in s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar,FS :com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@1f0c8e1f
      20/01/13 23:28:08 INFO HiveSyncTool: Storage partitions scan complete. Found 31
      20/01/13 23:28:08 INFO HiveSyncTool: New Partitions [18206, 18207, 18208, 18209, 18210, 18211, 18212, 18213, 18214, 18215, 18216, 18217, 18218, 18219, 18220, 18221, 18222, 18223, 18224, 18225, 18226, 18227, 18228, 18229, 18230, 18231, 18232, 18233, 18234, 18235, 18236]
      20/01/13 23:28:08 INFO HoodieHiveClient: Adding partitions 31 to table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
      20/01/13 23:28:08 INFO HoodieHiveClient: Executing SQL ALTER TABLE default.fact_hourly_search_term_conversions_hudi_mor_hudi_jar ADD IF NOT EXISTS   PARTITION (dim_date='18206') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18206'   PARTITION (dim_date='18207') LOCATION $
      s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18207'   PARTITION (dim_date='18208') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18208'   PARTITION (dim_date='18209') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_$
      n_read_aws_hudi_jar/18209'   PARTITION (dim_date='18210') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18210'   PARTITION (dim_date='18211') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18211'   PARTITION (dim_date='18212') L$
      CATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18212'   PARTITION (dim_date='18213') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18213'   PARTITION (dim_date='18214') LOCATION 's3://feichi-test/fact_hourly_search_term_conversion$
      /merge_on_read_aws_hudi_jar/18214'   PARTITION (dim_date='18215') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18215'   PARTITION (dim_date='18216') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18216'   PARTITION (dim_date='1$
      217') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18217'   PARTITION (dim_date='18218') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18218'   PARTITION (dim_date='18219') LOCATION 's3://feichi-test/fact_hourly_search_term_co$
      versions/merge_on_read_aws_hudi_jar/18219'   PARTITION (dim_date='18220') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18220'   PARTITION (dim_date='18221') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18221'   PARTITION (dim$
      date='18222') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18222'   PARTITION (dim_date='18223') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18223'   PARTITION (dim_date='18224') LOCATION 's3://feichi-test/fact_hourly_search$
      term_conversions/merge_on_read_aws_hudi_jar/18224'   PARTITION (dim_date='18225') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18225'   PARTITION (dim_date='18226') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18226'   PARTIT$
      ON (dim_date='18227') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18227'   PARTITION (dim_date='18228') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18228'   PARTITION (dim_date='18229') LOCATION 's3://feichi-test/fact_hourl$
      _search_term_conversions/merge_on_read_aws_hudi_jar/18229'   PARTITION (dim_date='18230') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18230'   PARTITION (dim_date='18231') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18231'
       PARTITION (dim_date='18232') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18232'   PARTITION (dim_date='18233') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18233'   PARTITION (dim_date='18234') LOCATION 's3://feichi-test/fa$
      t_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18234'   PARTITION (dim_date='18235') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18235'   PARTITION (dim_date='18236') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/
      18236'
      org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
        at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:177)
        at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:107)
        at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:71)
        at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
        at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)

       

      Attachments

        Issue Links

          Activity

            People

              uditme Udit Mehrotra
              uditme Udit Mehrotra
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m