Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7005

Flink SQL Queries on Hudi Table fail when using the hudi-aws-bundle jar

    XMLWordPrintableJSON

Details

    Description

      Flink SQL Queries on Hudi Table fail when using the hudi-aws-bundle jar. hudi-aws-bundle jar is needed for metastore sync into AWS Glue. Below are the different issues seen:

      Issue 1:

      2023-10-07 14:47:03,463 ERROR org.apache.hudi.sink.StreamWriteOperatorCoordinator          [] - Executor executes action [sync hive metadata for instant 20231007144701183] error
      java.lang.NoClassDefFoundError: software/amazon/awssdk/services/glue/model/EntityNotFoundException
          at org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool.initSyncClient(AwsGlueCatalogSyncTool.java:52) ~[hudi-flink-bundle.jar:0.13.1-amzn-1]
          at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:114) ~[hudi-flink-bundle.jar:0.13.1-amzn-1]
      

      This issue happens as AwsGlueCatalogSyncTool (hudi-aws module) part of hudi-flink-bundle has not relocated AWS SDK whereas the one from hudi-aws-bundle has relocated it. The fix is to remove including hudi-aws from hudi-flink-bundle. hudi-flink-bundle need not bring hudi-aws classes as hudi-aws-bundle jar can be used instead at runtime.

      Issue 2:

      Caused by: java.lang.NoSuchMethodError: org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(Z)Lorg/apache/hudi/org/apache/avro/Schema;
              at org.apache.hudi.util.StreamerUtil.getTableAvroSchema(StreamerUtil.java:431)
              at org.apache.hudi.util.StreamerUtil.getLatestTableSchema(StreamerUtil.java:441)
              at org.apache.hudi.table.catalog.HoodieHiveCatalog.getTable(HoodieHiveCatalog.java:420)
      

      This issue happens as TableSchemaResolver (hudi-common module) part of hudi-aws-bundle has not relocated the avro classes whereas the one from hudi-flink-module has relocated it. The fix is to remove including hudi-common from hudi-aws-bundle. hudi-aws-bundle need not bring hudi-common classes as it is used in conjuction with the service bundle hudi-spark-bundle/hudi-flink-bundle which has hudi-common classes.

      Issue 3:

      Caused by: java.lang.NoSuchMethodError: org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/hudi/org/apache/avro/Schema;)Lorg/apache/parquet/schema/MessageType;
              at org.apache.hudi.common.table.TableSchemaResolver.convertAvroSchemaToParquet(TableSchemaResolver.java:288)
              at org.apache.hudi.table.catalog.TableOptionProperties.translateFlinkTableProperties2Spark(TableOptionProperties.java:181)
              at org.apache.hudi.table.catalog.HoodieHiveCatalog.instantiateHiveTable(HoodieHiveCatalog.java:603)
              at org.apache.hudi.table.catalog.HoodieHiveCatalog.createTable(HoodieHiveCatalog.java:468)
      

      This issue happens as AvroSchemaConverter (parquet-avro) part of hudi-aws-bundle has not relocated the avro classes whereas the
      one from hudi-flink-module has relocated it. The fix is to remove including parquet-avro from hudi-aws-bundle. hudi-aws-bundle need not bring parquet-avro classes as it is used in conjuction with the service bundle hudi-spark-bundle/hudi-flink-bundle which has parquet-avro classes.

       

       
      Repro

      cd /usr/lib/flink/lib
      
      wget https://repo1.maven.org/maven2/org/apache/hudi/hudi-flink1.17-bundle/0.14.0/hudi-flink1.17-bundle-0.14.0.jar
      
      wget https://repo1.maven.org/maven2/org/apache/hudi/hudi-aws-bundle/0.14.0/hudi-aws-bundle-0.14.0.jar
      
      flink-yarn-session -d
      
      /usr/lib/flink/bin/sql-client.sh embedded
      
      
      CREATE CATALOG glue_catalog_for_hudi WITH (
      'type' = 'hudi',
      'mode' = 'hms',
      'table.external' = 'true',
      'default-database' = 'default',
      'hive.conf.dir' = '/etc/hive/conf.dist',
      'catalog.path' = 's3://prabhuflinks3/HUDICDC/warehouse/'
      );
      
      
      USE CATALOG glue_catalog_for_hudi;
      
      CREATE DATABASE IF NOT EXISTS flink_glue_hudi_db;
      
      use flink_glue_hudi_db;
      
      CREATE TABLE `glue_catalog_for_hudi`.`flink_glue_hudi_db`.`Persons_src` (
      ID INT NOT NULL,
      FirstName STRING,
      Age STRING,
      PRIMARY KEY (`ID`) NOT ENFORCED
      )
      WITH (
      'connector' = 'hudi',
      'write.tasks' = '2', 
      'path' = 's3://prabhuflinks3/HUDICDC/warehouse/Persons_src',
      'table.type' = 'COPY_ON_WRITE',
      'read.streaming.enabled' = 'true',
      'read.streaming.check-interval' = '1',
      'hoodie.embed.timeline.server' = 'false',
      'hive_sync.mode' = 'glue'
      );
      
      

       
      cc uditme

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              prabhujoseph Prabhu Joseph
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: