Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32590

Fail to read flink parquet filesystem table stored in hive metastore service.

    XMLWordPrintableJSON

Details

    Description

      Summary:

      Fail to read flink parquet filesystem table stored in hive metastore service.

      The problem:

      When I try to read a flink parquet filesystem table stored in hive metastore service, I got the following exception.

      java.lang.RuntimeException: One or more fetchers have encountered exception
      	at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:261) ~[flink-connector-files-1.17.1.jar:1.17.1]
      	at org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:169) ~[flink-connector-files-1.17.1.jar:1.17.1]
      	at org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:131) ~[flink-connector-files-1.17.1.jar:1.17.1]
      	at org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:417) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:550) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) ~[flink-dist-1.17.1.jar:1.17.1]
      	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) ~[flink-dist-1.17.1.jar:1.17.1]
      	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_345]
      Caused by: java.lang.NoSuchMethodError: shaded.parquet.org.apache.thrift.TBaseHelper.hashCode(J)I
      	at org.apache.parquet.format.ColumnChunk.hashCode(ColumnChunk.java:812) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at java.util.AbstractList.hashCode(AbstractList.java:541) ~[?:1.8.0_345]
      	at org.apache.parquet.format.RowGroup.hashCode(RowGroup.java:704) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at java.util.HashMap.hash(HashMap.java:340) ~[?:1.8.0_345]
      	at java.util.HashMap.put(HashMap.java:613) ~[?:1.8.0_345]
      	at org.apache.parquet.format.converter.ParquetMetadataConverter.generateRowGroupOffsets(ParquetMetadataConverter.java:1411) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.parquet.format.converter.ParquetMetadataConverter.access$600(ParquetMetadataConverter.java:144) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1461) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1437) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.parquet.format.converter.ParquetMetadataConverter$RangeMetadataFilter.accept(ParquetMetadataConverter.java:1207) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1437) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:583) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:777) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:658) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReader(ParquetVectorizedInputFormat.java:127) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReader(ParquetVectorizedInputFormat.java:75) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
      	at org.apache.flink.connector.file.table.FileInfoExtractorBulkFormat.createReader(FileInfoExtractorBulkFormat.java:109) ~[flink-connector-files-1.17.1.jar:1.17.1]
      	at org.apache.flink.connector.file.src.impl.FileSourceSplitReader.checkSplitOrStartNext(FileSourceSplitReader.java:112) ~[flink-connector-files-1.17.1.jar:1.17.1]
      	at org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:65) ~[flink-connector-files-1.17.1.jar:1.17.1]
      	at org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58) ~[flink-connector-files-1.17.1.jar:1.17.1]
      	at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:162) ~[flink-connector-files-1.17.1.jar:1.17.1]
      	at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:114) ~[flink-connector-files-1.17.1.jar:1.17.1]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_345]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_345]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_345]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_345]
      	... 1 more
      

      Possible reason:

      When I start the cluster with the "-verbse:class" opt, I got classloading message shown below.

      # how I start the cluster
      FLINK_ENV_JAVA_OPTS='-verbose:class' bin/start-cluster.sh
      
      [Loaded shaded.parquet.org.apache.thrift.TBaseHelper from file:/Users/guozhenyang/Tools/flink-1.17.1/lib/flink-sql-connector-hive-3.1.3_2.12-1.17.1.jar]
      [Loaded org.apache.parquet.format.ColumnChunk from file:/Users/guozhenyang/Tools/flink-1.17.1/lib/flink-sql-parquet-1.17.1.jar]
      

      I assume there maybe conflict between the libthrift libs contained in flink-sql-connector-hive-3.1.3_2.12-1.17.1.jar and flink-sql-parquet-1.17.1.jar.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yangguozhen Guozhen Yang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: