Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3817

Need to specify parquet version for hudi-hadoop-mr-bundle when compile hudi using -Dspark3

    XMLWordPrintableJSON

Details

    Description

      if use -Dspark3 to compile hudi, module hudi-hadoop-mr will use 1.12.2 of parquet which has conflict with hive. 

      hive> select * from h_321_0401_mor_rt;
      OK
      Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/schema/LogicalTypeAnnotation
          at org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:177)
          at org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:242)
          at org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:199)
          at org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:152)
          at org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:260)
          at org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:146)
          at org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:137)
          at org.apache.hudi.common.table.TableSchemaResolver.readSchemaFromLogFile(TableSchemaResolver.java:520)
          at org.apache.hudi.common.table.TableSchemaResolver.readSchemaFromLogFile(TableSchemaResolver.java:503)
          at org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:105)
          at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:138)
          at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:530)
          at org.apache.hudi.common.table.TableSchemaResolver.<init>(TableSchemaResolver.java:72)
          at org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:90)
          at org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:72)
          at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:62)
          at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
          at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47)
          at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:74)
          at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:776)
          at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:344)
          at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:540)
          at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509)
          at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
          at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2777)
          at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
          at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
          at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
          at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
          at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) 

      Attachments

        Activity

          People

            rex_xiong rex xiong
            rex_xiong rex xiong
            Sagar Sumit, Shiyan Xu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: