Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.14.0
Description
Flink SQL Queries on Hudi Table fail when using the hudi-aws-bundle jar. hudi-aws-bundle jar is needed for metastore sync into AWS Glue. Below are the different issues seen:
Issue 1:
2023-10-07 14:47:03,463 ERROR org.apache.hudi.sink.StreamWriteOperatorCoordinator [] - Executor executes action [sync hive metadata for instant 20231007144701183] error
java.lang.NoClassDefFoundError: software/amazon/awssdk/services/glue/model/EntityNotFoundException
at org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool.initSyncClient(AwsGlueCatalogSyncTool.java:52) ~[hudi-flink-bundle.jar:0.13.1-amzn-1]
at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:114) ~[hudi-flink-bundle.jar:0.13.1-amzn-1]
This issue happens as AwsGlueCatalogSyncTool (hudi-aws module) part of hudi-flink-bundle has not relocated AWS SDK whereas the one from hudi-aws-bundle has relocated it. The fix is to remove including hudi-aws from hudi-flink-bundle. hudi-flink-bundle need not bring hudi-aws classes as hudi-aws-bundle jar can be used instead at runtime.
Issue 2:
Caused by: java.lang.NoSuchMethodError: org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(Z)Lorg/apache/hudi/org/apache/avro/Schema; at org.apache.hudi.util.StreamerUtil.getTableAvroSchema(StreamerUtil.java:431) at org.apache.hudi.util.StreamerUtil.getLatestTableSchema(StreamerUtil.java:441) at org.apache.hudi.table.catalog.HoodieHiveCatalog.getTable(HoodieHiveCatalog.java:420)
This issue happens as TableSchemaResolver (hudi-common module) part of hudi-aws-bundle has not relocated the avro classes whereas the one from hudi-flink-module has relocated it. The fix is to remove including hudi-common from hudi-aws-bundle. hudi-aws-bundle need not bring hudi-common classes as it is used in conjuction with the service bundle hudi-spark-bundle/hudi-flink-bundle which has hudi-common classes.
Issue 3:
Caused by: java.lang.NoSuchMethodError: org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/hudi/org/apache/avro/Schema;)Lorg/apache/parquet/schema/MessageType; at org.apache.hudi.common.table.TableSchemaResolver.convertAvroSchemaToParquet(TableSchemaResolver.java:288) at org.apache.hudi.table.catalog.TableOptionProperties.translateFlinkTableProperties2Spark(TableOptionProperties.java:181) at org.apache.hudi.table.catalog.HoodieHiveCatalog.instantiateHiveTable(HoodieHiveCatalog.java:603) at org.apache.hudi.table.catalog.HoodieHiveCatalog.createTable(HoodieHiveCatalog.java:468)
This issue happens as AvroSchemaConverter (parquet-avro) part of hudi-aws-bundle has not relocated the avro classes whereas the
one from hudi-flink-module has relocated it. The fix is to remove including parquet-avro from hudi-aws-bundle. hudi-aws-bundle need not bring parquet-avro classes as it is used in conjuction with the service bundle hudi-spark-bundle/hudi-flink-bundle which has parquet-avro classes.
Repro
cd /usr/lib/flink/lib wget https://repo1.maven.org/maven2/org/apache/hudi/hudi-flink1.17-bundle/0.14.0/hudi-flink1.17-bundle-0.14.0.jar wget https://repo1.maven.org/maven2/org/apache/hudi/hudi-aws-bundle/0.14.0/hudi-aws-bundle-0.14.0.jar flink-yarn-session -d /usr/lib/flink/bin/sql-client.sh embedded CREATE CATALOG glue_catalog_for_hudi WITH ( 'type' = 'hudi', 'mode' = 'hms', 'table.external' = 'true', 'default-database' = 'default', 'hive.conf.dir' = '/etc/hive/conf.dist', 'catalog.path' = 's3://prabhuflinks3/HUDICDC/warehouse/' ); USE CATALOG glue_catalog_for_hudi; CREATE DATABASE IF NOT EXISTS flink_glue_hudi_db; use flink_glue_hudi_db; CREATE TABLE `glue_catalog_for_hudi`.`flink_glue_hudi_db`.`Persons_src` ( ID INT NOT NULL, FirstName STRING, Age STRING, PRIMARY KEY (`ID`) NOT ENFORCED ) WITH ( 'connector' = 'hudi', 'write.tasks' = '2', 'path' = 's3://prabhuflinks3/HUDICDC/warehouse/Persons_src', 'table.type' = 'COPY_ON_WRITE', 'read.streaming.enabled' = 'true', 'read.streaming.check-interval' = '1', 'hoodie.embed.timeline.server' = 'false', 'hive_sync.mode' = 'glue' );
cc uditme
Attachments
Issue Links
- links to