Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Issue Description
As part of spark to avro conversion, Spark's Date type is represented as corresponding Date Logical Type in Avro, which is underneath represented in Avro by physical Integer type. For this reason when forming the Avro records from Spark rows, it is converted to corresponding Epoch day to be stored as corresponding Integer value in the parquet files.
However, this manifests into a problem that when a Date Type column is chosen as partition column. In this case, Hudi's partition column _hoodie_partition_path also gets the corresponding epoch day integer value when reading the partition field from the avro record, and as a result syncing partitions in hudi table issues a command like the following, where the date is an integer:
ALTER TABLE uditme_hudi.uditme_hudi_events_cow_feb05_00 ADD IF NOT EXISTS PARTITION (event_date='17897') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17897' PARTITION (event_date='17898') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17898' PARTITION (event_date='17899') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17899' PARTITION (event_date='17900') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17900'
Hive is not able to make sense of the partition field values like 17897 as it is not able to convert it to corresponding date from this string. It actually expects the actual date to be represented in string form.
So, we need to make sure that Hudi's partition field gets the actual date value in string form, instead of the integer. This change makes sure that when a fields value is retrieved from the Avro record, we check that if its Date Logical Type we return the actual date value, instead of the epoch. After this change the command for sync partitions issues is like:
ALTER TABLE `uditme_hudi`.`uditme_hudi_events_cow_feb05_01` ADD IF NOT EXISTS PARTITION (`event_date`='2019-01-01') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-01' PARTITION (`event_date`='2019-01-02') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-02' PARTITION (`event_date`='2019-01-03') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-03' PARTITION (`event_date`='2019-01-04') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-04'
Stack Trace
20/01/13 23:28:04 INFO HoodieHiveClient: Last commit time synced is not known, listing all partitions in s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar,FS :com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@1f0c8e1f 20/01/13 23:28:08 INFO HiveSyncTool: Storage partitions scan complete. Found 31 20/01/13 23:28:08 INFO HiveSyncTool: New Partitions [18206, 18207, 18208, 18209, 18210, 18211, 18212, 18213, 18214, 18215, 18216, 18217, 18218, 18219, 18220, 18221, 18222, 18223, 18224, 18225, 18226, 18227, 18228, 18229, 18230, 18231, 18232, 18233, 18234, 18235, 18236] 20/01/13 23:28:08 INFO HoodieHiveClient: Adding partitions 31 to table fact_hourly_search_term_conversions_hudi_mor_hudi_jar 20/01/13 23:28:08 INFO HoodieHiveClient: Executing SQL ALTER TABLE default.fact_hourly_search_term_conversions_hudi_mor_hudi_jar ADD IF NOT EXISTS PARTITION (dim_date='18206') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18206' PARTITION (dim_date='18207') LOCATION $ s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18207' PARTITION (dim_date='18208') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18208' PARTITION (dim_date='18209') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_$ n_read_aws_hudi_jar/18209' PARTITION (dim_date='18210') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18210' PARTITION (dim_date='18211') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18211' PARTITION (dim_date='18212') L$ CATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18212' PARTITION (dim_date='18213') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18213' PARTITION (dim_date='18214') LOCATION 's3://feichi-test/fact_hourly_search_term_conversion$ /merge_on_read_aws_hudi_jar/18214' PARTITION (dim_date='18215') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18215' PARTITION (dim_date='18216') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18216' PARTITION (dim_date='1$ 217') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18217' PARTITION (dim_date='18218') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18218' PARTITION (dim_date='18219') LOCATION 's3://feichi-test/fact_hourly_search_term_co$ versions/merge_on_read_aws_hudi_jar/18219' PARTITION (dim_date='18220') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18220' PARTITION (dim_date='18221') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18221' PARTITION (dim$ date='18222') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18222' PARTITION (dim_date='18223') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18223' PARTITION (dim_date='18224') LOCATION 's3://feichi-test/fact_hourly_search$ term_conversions/merge_on_read_aws_hudi_jar/18224' PARTITION (dim_date='18225') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18225' PARTITION (dim_date='18226') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18226' PARTIT$ ON (dim_date='18227') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18227' PARTITION (dim_date='18228') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18228' PARTITION (dim_date='18229') LOCATION 's3://feichi-test/fact_hourl$ _search_term_conversions/merge_on_read_aws_hudi_jar/18229' PARTITION (dim_date='18230') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18230' PARTITION (dim_date='18231') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18231' PARTITION (dim_date='18232') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18232' PARTITION (dim_date='18233') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18233' PARTITION (dim_date='18234') LOCATION 's3://feichi-test/fa$ t_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18234' PARTITION (dim_date='18235') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18235' PARTITION (dim_date='18236') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/ 18236' org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table fact_hourly_search_term_conversions_hudi_mor_hudi_jar at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:177) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:107) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:71) at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
Attachments
Issue Links
- links to