Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
The https://hudi.apache.org/docs/quick-start-guide.html example data has a column `partitionpath` which holds values like `americas/brazil/sao_paulo`. Using the docker environment's spark-shell, you can change the basePath from the quickstart to save to hdfs://user/hive/warehouse/hudi_trips_cow and write the table. Then you can see the folder in the HDFS browser, similar to the stock_ticks_cow folder created in the docker demo.
However, if you try to use run_sync_tool.sh to sync the table to Hive, you get the error: "java.lang.IllegalArgumentException: Partition key parts [partitionpath] does not match with partition values [americas, brazil, sao_paulo]. Check partition strategy. "
/var/hoodie/ws/hudi-hive/run_sync_tool.sh --jdbc-url jdbc:hive2://hiveserver:10000 --user hive --pass hive --partitioned-by partitionpath --partition-value-extractor org.apache.hudi.hive.MultiPartKeysValueExtractor -MultiPartKeysValueExtractor -base-path /user/hive/warehouse/hudi_trips_cow --database default --table hudi_trips_cow
This error is thrown in `HoodieHiveClient.getPartitionClause`, which uses `extractPartitionValuesInPath` to get a list of partitionValues. The problem is that it compares the length of the partitionValues to the length of the partitionField. In this example, there is only 1 partitionField, "partitionpath," which is split into 3 partitionValues. Thus the check fails and throws the exception.