Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4238

Revisit TestCOWDataSourceStorage#testCopyOnWriteStorage

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tests-ci
    • None

    Description

      Within the pr (Support Hadoop 3.x Hive 3.x and Spark 3.x) https://github.com/apache/hudi/pull/5786, 

       

      The testCopyOnWriteStorage has an issue with the test case where `nation` is added to the recordKeys. When debugging further it seems that this is due to an issue with avro 1.10.2 being used since it adds the following to the schema

      ```
      "nation":"Canada"
      ```

      instead of adding

      ```
      "nation":

      { "bytes":"Canada" }

      ```

      This leads to the exception later for this test case since when nation is being retrieved from the record, since `getNestedFieldVal` expects the value to be nested as opposed to a String.  

      ```
       
      at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:514)
      at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldValAsString(HoodieAvroUtils.java:487)
      at org.apache.hudi.keygen.KeyGenUtils.getRecordKey(KeyGenUtils.java:96)
      at org.apache.hudi.keygen.ComplexAvroKeyGenerator.getRecordKey(ComplexAvroKeyGenerator.java:47)
      at org.apache.hudi.keygen.ComplexKeyGenerator.getRecordKey(ComplexKeyGenerator.java:53)
      at org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:65)
      at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$10(HoodieSparkSqlWriter.scala:279)
      at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
      at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:224)
      at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
      at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
      at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
      at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
      at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
      at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
      at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
      at org.apache.spark.scheduler.Task.run(Task.scala:131)
      at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
      at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:750)
      ```

      Attachments

        Activity

          People

            Unassigned Unassigned
            rchertara Rahil Chertara
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: