Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.2.1
-
None
-
None
Description
Describe the bug
We are trying to save a table containing a `DecimalType` column constructed through a Spark DataFrame with the `Avro` data format. We also want to be able to query this table both from this Spark instance as well as from the Hive instance that Spark is using directly. Say that `DecimalType(6, 3)` is part of the schema.
When we `INSERT` some valid value (e.g. BigDecimal("333.222")) in DataFrame, and `SELECT` from the table in HiveQL, we expect it to give back the inserted value. However, we instead get an `AvroTypeException`.
To Reproduce
On Spark 3.2.1 (commit `4f25b3f712`), using `spark-shell` with the Avro package:
./bin/spark-shell --packages org.apache.spark:spark-avro_2.12:3.2.1
Execute the following:
import org.apache.spark.sql.Row import org.apache.spark.sql.types._ val rdd = sc.parallelize(Seq(Row(BigDecimal("333.222")))) val schema = new StructType().add(StructField("c1", DecimalType(6,3), true)) val df = spark.createDataFrame(rdd, schema) df.show(false) // result in error despite correctly showing output in the end df.write.mode("overwrite").format("avro").saveAsTable("ws")
`df.show(false)` will result in the following error before printing out the expected output `333.222`:
java.lang.AssertionError: assertion failed: Decimal$DecimalIsFractional while compiling: <console> during phase: globalPhase=terminal, enteringPhase=jvm library version: version 2.12.15 compiler version: version 2.12.15 reconstructed args: -classpath /Users/xsystem/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar:/Users/xsystem/.ivy2/jars/org.tukaani_xz-1.8.jar:/Users/xsystem/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar -Yrepl-class-based -Yrepl-outdir /private/var/folders/01/bm1ky3qj3sq7gb5f345nxlcm0000gn/T/spark-ed7aba34-997a-4950-9ea4-52c61c222660/repl-bd6bbf2b-5647-4306-a5d3-50cdc30fcbc0 last tree to typer: TypeTree(class Byte) tree position: line 6 of <console> tree tpe: Byte symbol: (final abstract) class Byte in package scala symbol definition: final abstract class Byte extends (a ClassSymbol) symbol package: scala symbol owners: class Byte call site: constructor $eval in object $eval in package $line19 == Source file context for tree position == 3 4 object $eval { 5 lazy val $result = $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 6 lazy val $print: _root_.java.lang.String = { 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw 8 9 "" at scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) at scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) at scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) at scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) at scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188) at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:357) at scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$run$1(UnPickler.scala:96) at scala.reflect.internal.pickling.UnPickler$Scan.run(UnPickler.scala:88) at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:47) at scala.tools.nsc.symtab.classfile.ClassfileParser.unpickleOrParseInnerClasses(ClassfileParser.scala:1186) at scala.tools.nsc.symtab.classfile.ClassfileParser.parseClass(ClassfileParser.scala:468) at scala.tools.nsc.symtab.classfile.ClassfileParser.$anonfun$parse$2(ClassfileParser.scala:161) at scala.tools.nsc.symtab.classfile.ClassfileParser.$anonfun$parse$1(ClassfileParser.scala:147) at scala.tools.nsc.symtab.classfile.ClassfileParser.parse(ClassfileParser.scala:130) at scala.tools.nsc.symtab.SymbolLoaders$ClassfileLoader.doComplete(SymbolLoaders.scala:343) at scala.tools.nsc.symtab.SymbolLoaders$SymbolLoader.complete(SymbolLoaders.scala:250) at scala.tools.nsc.symtab.SymbolLoaders$SymbolLoader.load(SymbolLoaders.scala:269) at scala.reflect.internal.Symbols$Symbol.exists(Symbols.scala:1104) at scala.reflect.internal.Symbols$Symbol.toOption(Symbols.scala:2609) at scala.tools.nsc.interpreter.IMain.translateSimpleResource(IMain.scala:340) at scala.tools.nsc.interpreter.IMain$TranslatingClassLoader.findAbstractFile(IMain.scala:354) at scala.reflect.internal.util.AbstractFileClassLoader.findResource(AbstractFileClassLoader.scala:76) at java.lang.ClassLoader.getResource(ClassLoader.java:1096) at java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1307) at scala.reflect.internal.util.RichClassLoader$.classAsStream$extension(ScalaClassLoader.scala:89) at scala.reflect.internal.util.RichClassLoader$.classBytes$extension(ScalaClassLoader.scala:81) at scala.reflect.internal.util.ScalaClassLoader.classBytes(ScalaClassLoader.scala:131) at scala.reflect.internal.util.ScalaClassLoader.classBytes$(ScalaClassLoader.scala:131) at scala.reflect.internal.util.AbstractFileClassLoader.classBytes(AbstractFileClassLoader.scala:41) at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:70) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.codehaus.janino.ClassLoaderIClassLoader.findIClass(ClassLoaderIClassLoader.java:89) at org.codehaus.janino.IClassLoader.loadIClass(IClassLoader.java:317) at org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:8618) at org.codehaus.janino.UnitCompiler.reclassifyName(UnitCompiler.java:8838) at org.codehaus.janino.UnitCompiler.reclassifyName(UnitCompiler.java:8529) at org.codehaus.janino.UnitCompiler.reclassify(UnitCompiler.java:8388) at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6900) at org.codehaus.janino.UnitCompiler.access$14600(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$22$2$1.visitAmbiguousName(UnitCompiler.java:6518) at org.codehaus.janino.UnitCompiler$22$2$1.visitAmbiguousName(UnitCompiler.java:6515) at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4429) at org.codehaus.janino.UnitCompiler$22$2.visitLvalue(UnitCompiler.java:6515) at org.codehaus.janino.UnitCompiler$22$2.visitLvalue(UnitCompiler.java:6511) at org.codehaus.janino.Java$Lvalue.accept(Java.java:4353) at org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6511) at org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6490) at org.codehaus.janino.Java$Rvalue.accept(Java.java:4321) at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:6490) at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9110) at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:7164) at org.codehaus.janino.UnitCompiler.access$16200(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$22$2.visitMethodInvocation(UnitCompiler.java:6538) at org.codehaus.janino.UnitCompiler$22$2.visitMethodInvocation(UnitCompiler.java:6511) at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:5286) at org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6511) at org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6490) at org.codehaus.janino.Java$Rvalue.accept(Java.java:4321) at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:6490) at org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9306) at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9192) at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9110) at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:5055) at org.codehaus.janino.UnitCompiler.access$9100(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$16.visitMethodInvocation(UnitCompiler.java:4482) at org.codehaus.janino.UnitCompiler$16.visitMethodInvocation(UnitCompiler.java:4455) at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:5286) at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4455) at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:5683) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3839) at org.codehaus.janino.UnitCompiler.access$6100(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3799) at org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3779) at org.codehaus.janino.Java$Assignment.accept(Java.java:4690) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3779) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2366) at org.codehaus.janino.UnitCompiler.access$1800(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1497) at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1490) at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:3064) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490) at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1573) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1559) at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1496) at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1490) at org.codehaus.janino.Java$Block.accept(Java.java:2969) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2478) at org.codehaus.janino.UnitCompiler.access$1900(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1498) at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1490) at org.codehaus.janino.Java$IfStatement.accept(Java.java:3140) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490) at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1573) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3420) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1362) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1335) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:807) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:975) at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:392) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:384) at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1445) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:384) at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1312) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:833) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:410) at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:389) at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:384) at org.codehaus.janino.Java$PackageMemberClassDeclaration.accept(Java.java:1594) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:384) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:362) at org.codehaus.janino.UnitCompiler.access$000(UnitCompiler.java:226) at org.codehaus.janino.UnitCompiler$1.visitCompilationUnit(UnitCompiler.java:336) at org.codehaus.janino.UnitCompiler$1.visitCompilationUnit(UnitCompiler.java:333) at org.codehaus.janino.Java$CompilationUnit.accept(Java.java:363) at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:333) at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:235) at org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:464) at org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:314) at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:237) at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205) at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1489) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1586) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1583) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1436) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:378) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:331) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:34) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1362) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:204) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:193) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) error: error while loading Decimal, class file '/Users/xsystem/spark1/assembly/target/scala-2.12/jars/spark-catalyst_2.12-3.2.1.jar(org/apache/spark/sql/types/Decimal.class)' is broken (class java.lang.RuntimeException/error reading Scala signature of Decimal.class: assertion failed: Decimal$DecimalIsFractional while compiling: <console> during phase: globalPhase=terminal, enteringPhase=jvm library version: version 2.12.15 compiler version: version 2.12.15 reconstructed args: -classpath /Users/xsystem/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar:/Users/xsystem/.ivy2/jars/org.tukaani_xz-1.8.jar:/Users/xsystem/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar -Yrepl-class-based -Yrepl-outdir /private/var/folders/01/bm1ky3qj3sq7gb5f345nxlcm0000gn/T/spark-ed7aba34-997a-4950-9ea4-52c61c222660/repl-bd6bbf2b-5647-4306-a5d3-50cdc30fcbc0 last tree to typer: TypeTree(class Byte) tree position: line 6 of <console> tree tpe: Byte symbol: (final abstract) class Byte in package scala symbol definition: final abstract class Byte extends (a ClassSymbol) symbol package: scala symbol owners: class Byte call site: constructor $eval in object $eval in package $line19 == Source file context for tree position == 3 4 object $eval { 5 lazy val $result = res0 6 lazy val $print: _root_.java.lang.String = { 7 $iw 8 9 "" ) +-------+ |c1 | +-------+ |333.222| +-------+
Running hive 3.1.2 with debug logger
./hive/bin/hive --hiveconf hive.root.logger=DEBUG,console
with the following input:
select * from ws;
Will result in AvroTypeException (the following error message):
java.io.IOException: org.apache.avro.AvroTypeException: Found topLevelRecord.c1.fixed, expecting union at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:602) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2691) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) Caused by: org.apache.avro.AvroTypeException: Found topLevelRecord.c1.fixed, expecting union at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:193) at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:58) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:569) ... 16 more
Expected behavior
We expect the output to successfully select `333.222`. We tried other formats like Parquet and the outcome is consistent with this expectation. We didn't encounter this issue when writing in Spark SQL and then reading it in HiveQL, or only querying within Spark.
Additional context
We are still unaware of the root cause but the following is what we have tried:
We think this is not a problem of Hive because in `AvroSerDe` in Hive it is both serializing and deserializing the value to the same thing.
It seems like the bug is either somewhere in Avro, or with Spark incorrectly constructing the object somehow, which we think is here:
https://github.com/apache/spark/blob/4f25b3f71238a00508a356591553f2dfa89f8290/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala#L133
https://github.com/apache/spark/blob/4f25b3f71238a00508a356591553f2dfa89f8290/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala#L204