Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1718

when query incr view of mor table which has Multi level partitions, the query failed

    XMLWordPrintableJSON

Details

    Description

      HoodieCombineHiveInputFormat use "," to join mutil partitions, however hive use "/" to join muit1 partitions. there exists some gap, so modify HoodieCombineHiveInputFormat's logical
      test env

      spark2.4.5, hadoop 3.1.1, hive 3.1.1

       

      step1:

      val df = spark.range(0, 10000).toDF("keyid")
      .withColumn("col3", expr("keyid + 10000000"))
      .withColumn("p", lit(0))
      .withColumn("p1", lit(0))
      .withColumn("p2", lit(6))
      .withColumn("a1", lit(Array[String]("sb1", "rz")))
      .withColumn("a2", lit(Array[String]("sb1", "rz")))

      // bulk_insert df,   partition by p,p1,p2

        merge(df, 4, "default", "hive_8b", DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")

      step2:

      val df = spark.range(0, 10000).toDF("keyid")
      .withColumn("col3", expr("keyid + 10000000"))
      .withColumn("p", lit(0))
      .withColumn("p1", lit(0))
      .withColumn("p2", lit(7))
      .withColumn("a1", lit(Array[String]("sb1", "rz")))
      .withColumn("a2", lit(Array[String]("sb1", "rz")))

      // upsert table hive8b

      merge(df, 4, "default", "hive_8b", DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")

      step3:

      start hive beeline:

      set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;

      set hoodie.hive_8b.consume.mode=INCREMENTAL;

      set hoodie.hive_8b.consume.max.commits=3;

      set hoodie.hive_8b.consume.start.timestamp=20210325141300;  // this timestamp is smaller the earlist commit, so  we can query whole commits

      select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where `_hoodie_commit_time`>'20210325141300'

       

      2021-03-25 14:14:36,036 | INFO  | AsyncDispatcher event handler | Diagnostics report from attempt_1615883368881_0028_m_000000_3: Error: org.apache.hudi.org.apache.avro.SchemaParseException: Illegal character in: p,p1,p2 2021-03-25 14:14:36,036 | INFO  | AsyncDispatcher event handler | Diagnostics report from attempt_1615883368881_0028_m_000000_3: Error: org.apache.hudi.org.apache.avro.SchemaParseException: Illegal character in: p,p1,p2 at org.apache.hudi.org.apache.avro.Schema.validateName(Schema.java:1151) at org.apache.hudi.org.apache.avro.Schema.access$200(Schema.java:81) at org.apache.hudi.org.apache.avro.Schema$Field.<init>(Schema.java:403) at org.apache.hudi.org.apache.avro.Schema$Field.<init>(Schema.java:396) at org.apache.hudi.avro.HoodieAvroUtils.appendNullSchemaFields(HoodieAvroUtils.java:268) at org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.addPartitionFields(HoodieRealtimeRecordReaderUtils.java:286) at org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:98) at org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:67) at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:53) at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70) at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47) at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:123) at org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getRecordReader(HoodieCombineHiveInputFormat.java:975) at org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getRecordReader(HoodieCombineHiveInputFormat.java:556) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:183) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:177)

       

      Attachments

        Issue Links

          Activity

            People

              xiaotaotao tao meng
              xiaotaotao tao meng
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: