Hive
  1. Hive
  2. HIVE-7787

Reading Parquet file with enum in Thrift Encoding throws NoSuchFieldError

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0, 0.13.0, 0.12.1, 0.14.0, 0.13.1
    • Fix Version/s: None
    • Component/s: Database/Schema, Thrift API
    • Labels:
      None
    • Environment:

      Hive 0.12 CDH 5.1.0, Hadoop 2.3.0 CDH 5.1.0

    • Tags:
      Parquet

      Description

      When reading Parquet file, where the original Thrift schema contains a struct with an enum, this causes the following error (full stack trace blow):

       java.lang.NoSuchFieldError: DECIMAL.
      

      Example Thrift Schema:

      enum MyEnumType {
          EnumOne,
          EnumTwo,
          EnumThree
      }
      
      struct MyStruct {
          1: optional MyEnumType myEnumType;
          2: optional string field2;
          3: optional string field3;
      }
      
      struct outerStruct {
          1: optional list<MyStruct> myStructs
      }
      

      Hive Table:

      CREATE EXTERNAL TABLE mytable (
        mystructs array<struct<myenumtype: string, field2: string, field3: string>>
      )
      ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
      STORED AS
      INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
      OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
      ; 
      

      Error Stack trace:

      Java stack trace for Hive 0.12:
      Caused by: java.lang.NoSuchFieldError: DECIMAL
      	at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter.getNewConverter(ETypeConverter.java:146)
      	at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:31)
      	at org.apache.hadoop.hive.ql.io.parquet.convert.ArrayWritableGroupConverter.<init>(ArrayWritableGroupConverter.java:45)
      	at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:34)
      	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64)
      	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:47)
      	at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:36)
      	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64)
      	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:40)
      	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.<init>(DataWritableRecordConverter.java:32)
      	at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:128)
      	at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142)
      	at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118)
      	at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107)
      	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:92)
      	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
      	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
      	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
      	... 16 more
      

        Issue Links

          Activity

          Hide
          Arup Malakar added a comment -

          I tried release 1.0 and still have the same problem, I am going to reopen the JIRA. I will resubmit the patch when I get time.

          Show
          Arup Malakar added a comment - I tried release 1.0 and still have the same problem, I am going to reopen the JIRA. I will resubmit the patch when I get time.
          Hide
          Ryan Blue added a comment -

          I'm closing this per discussion with Arup Malakar.

          Show
          Ryan Blue added a comment - I'm closing this per discussion with Arup Malakar .
          Hide
          Arup Malakar added a comment -

          Ryan Blue I haven't tried trunk yet, I am using the patch I submitted here. But lets close this issue. I would reopen if I happen to see the issue after trying hive trunk/0.15.

          Show
          Arup Malakar added a comment - Ryan Blue I haven't tried trunk yet, I am using the patch I submitted here. But lets close this issue. I would reopen if I happen to see the issue after trying hive trunk/0.15.
          Hide
          Ryan Blue added a comment -

          Arup Malakar, HIVE-8909 recently fixed the ArrayWritableGroupConverter problem you ran into on this and I see that you found the error in your classpath that was causing the original issue. Is it okay to close this issue now?

          Show
          Ryan Blue added a comment - Arup Malakar , HIVE-8909 recently fixed the ArrayWritableGroupConverter problem you ran into on this and I see that you found the error in your classpath that was causing the original issue. Is it okay to close this issue now?
          Hide
          Arup Malakar added a comment -

          Looks like ArrayWritableGroupConverter enforces that the struct should have either 1 or 2 elements. I am not sure the rational behind this, since a struct may have more than two elements. I did a quick patch to omit the check and handle any number of fields. I have tested it and it seems to be working for me for the schema in the description. Given there were explicit checks for the filed count to be either 1 or 2, I am not sure if it is the right approach. Please take a look.

          Show
          Arup Malakar added a comment - Looks like ArrayWritableGroupConverter enforces that the struct should have either 1 or 2 elements. I am not sure the rational behind this, since a struct may have more than two elements. I did a quick patch to omit the check and handle any number of fields. I have tested it and it seems to be working for me for the schema in the description. Given there were explicit checks for the filed count to be either 1 or 2, I am not sure if it is the right approach. Please take a look.
          Hide
          Arup Malakar added a comment -

          The exception in the above comment was due to the fact that the hadoop cluster I had run had an older version of parquet in.
          I did the following and got rid of the error: SET mapreduce.job.user.classpath.first=true

          But I hit another issue:

          Diagnostic Messages for this Task:
          Error: java.io.IOException: java.lang.reflect.InvocationTargetException
          	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
          	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
          	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:300)
          	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:247)
          	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:371)
          	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:652)
          	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
          	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
          	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
          	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:415)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
          	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
          Caused by: java.lang.reflect.InvocationTargetException
          	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
          	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
          	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
          	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
          	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:286)
          	... 11 more
          Caused by: java.lang.IllegalStateException: Field count must be either 1 or 2: 3
          	at org.apache.hadoop.hive.ql.io.parquet.convert.ArrayWritableGroupConverter.<init>(ArrayWritableGroupConverter.java:38)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:34)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:47)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:36)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:40)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.<init>(DataWritableRecordConverter.java:35)
          	at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:152)
          	at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142)
          	at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118)
          	at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107)
          	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:92)
          	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
          	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71)
          	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
          

          It appears that ArrayWritableGroupConverter allows either 1 or 2 elements in the structure for an array. Is there a reason for that?

          Is the following schema not supported?

          enum MyEnumType {
              EnumOne,
              EnumTwo,
              EnumThree
          }
          struct MyStruct {
              1: optional MyEnumType myEnumType;
              2: optional string field2;
              3: optional string field3;
          }
          
          struct outerStruct {
              1: optional list<MyStruct> myStructs
          }
          

          I can file another JIRA for this issue.

          Show
          Arup Malakar added a comment - The exception in the above comment was due to the fact that the hadoop cluster I had run had an older version of parquet in. I did the following and got rid of the error: SET mapreduce.job.user.classpath.first=true But I hit another issue: Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:300) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:247) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:371) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:652) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:286) ... 11 more Caused by: java.lang.IllegalStateException: Field count must be either 1 or 2: 3 at org.apache.hadoop.hive.ql.io.parquet.convert.ArrayWritableGroupConverter.<init>(ArrayWritableGroupConverter.java:38) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:34) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:47) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:36) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:40) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.<init>(DataWritableRecordConverter.java:35) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:152) at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:92) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65) It appears that ArrayWritableGroupConverter allows either 1 or 2 elements in the structure for an array. Is there a reason for that? Is the following schema not supported? enum MyEnumType { EnumOne, EnumTwo, EnumThree } struct MyStruct { 1: optional MyEnumType myEnumType; 2: optional string field2; 3: optional string field3; } struct outerStruct { 1: optional list<MyStruct> myStructs } I can file another JIRA for this issue.
          Hide
          Arup Malakar added a comment -

          I tried building hive from trunk, and running it. But I am seeing the same error:

          Caused by: java.lang.NoSuchFieldError: DECIMAL
          	at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter.getNewConverter(ETypeConverter.java:168)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:31)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:40)
          	at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.<init>(DataWritableRecordConverter.java:35)
          	at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:152)
          	at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142)
          	at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118)
          	at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107)
          	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:92)
          	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
          	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71)
          	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
          	... 16 more
          
          Show
          Arup Malakar added a comment - I tried building hive from trunk, and running it. But I am seeing the same error: Caused by: java.lang.NoSuchFieldError: DECIMAL at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter.getNewConverter(ETypeConverter.java:168) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:31) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:40) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.<init>(DataWritableRecordConverter.java:35) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:152) at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:92) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65) ... 16 more
          Hide
          Vikram Dixit K added a comment -

          Raymond Lau Can you verify if this is fixed by HIVE-6367?

          Thanks
          Vikram.

          Show
          Vikram Dixit K added a comment - Raymond Lau Can you verify if this is fixed by HIVE-6367 ? Thanks Vikram.
          Hide
          Svend Vanderveken added a comment -

          This might be due to HIVE-6367

          Show
          Svend Vanderveken added a comment - This might be due to HIVE-6367
          Hide
          Svend Vanderveken added a comment -

          I encounter a very similar issue with importing data from a hive external table in raw CSV format into a parquet table with CDH 5.1

          create external table if not exists testsv.objects_raw (
            objectid string,
            model string,
            owner string,
            attributes map<string,string>)
           ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
           STORED AS TEXTFILE
           location '/test/somefolder;
          

          (load some data in csv format in /test/somefolder)

          create table if not exists testsv.objects (
            objectid string,
            model string,
            owner string,
            attributes map<string,string>)
           ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
           STORED AS
           INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
           OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
          
           insert overwrite table testsv.objects select source.* from testsv.objects_raw source;
          
          2014-09-17 10:58:39,436 Stage-3 map = 100%,  reduce = 0%
          Ended Job = job_1410534905977_0011 with errors
          Error during job, obtaining debugging information...
          Examining task ID: task_1410534905977_0011_m_000000 (and more) from job job_1410534905977_0011
          
          Task with the most failures(4):
          -----
          Task ID:
            task_1410534905977_0011_m_000000
          
          URL:
            http://vm28-hulk-priv:8088/taskdetails.jsp?jobid=job_1410534905977_0011&tipid=task_1410534905977_0011_m_000000
          -----
          Diagnostic Messages for this Task:
          Error: java.io.IOException: java.lang.reflect.InvocationTargetException
                  at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
                  at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
                  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:346)
                  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:293)
                  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:407)
                  at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:560)
                  at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
                  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
                  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
                  at java.security.AccessController.doPrivileged(Native Method)
                  at javax.security.auth.Subject.doAs(Subject.java:415)
                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
                  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
          Caused by: java.lang.reflect.InvocationTargetException
                  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
                  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
                  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:332)
                  ... 11 more
          Caused by: java.lang.NoSuchFieldError: DECIMAL
                  at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter.getNewConverter(ETypeConverter.java:146)
                  at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:31)
                  at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64)
                  at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:40)
                  at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.<init>(DataWritableRecordConverter.java:32)
                  at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:128)
                  at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142)
                  at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118)
                  at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107)
                  at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:92)
                  at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
                  at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
                  at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
                  ... 16 more
          
          Show
          Svend Vanderveken added a comment - I encounter a very similar issue with importing data from a hive external table in raw CSV format into a parquet table with CDH 5.1 create external table if not exists testsv.objects_raw ( objectid string, model string, owner string, attributes map<string,string>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/test/somefolder; (load some data in csv format in /test/somefolder) create table if not exists testsv.objects ( objectid string, model string, owner string, attributes map<string,string>) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'; insert overwrite table testsv.objects select source.* from testsv.objects_raw source; 2014-09-17 10:58:39,436 Stage-3 map = 100%, reduce = 0% Ended Job = job_1410534905977_0011 with errors Error during job, obtaining debugging information... Examining task ID: task_1410534905977_0011_m_000000 (and more) from job job_1410534905977_0011 Task with the most failures(4): ----- Task ID: task_1410534905977_0011_m_000000 URL: http: //vm28-hulk-priv:8088/taskdetails.jsp?jobid=job_1410534905977_0011&tipid=task_1410534905977_0011_m_000000 ----- Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:346) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:293) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:407) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:560) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:332) ... 11 more Caused by: java.lang.NoSuchFieldError: DECIMAL at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter.getNewConverter(ETypeConverter.java:146) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:31) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:64) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.<init>(DataWritableGroupConverter.java:40) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.<init>(DataWritableRecordConverter.java:32) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:128) at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:92) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65) ... 16 more
          Hide
          Raymond Lau added a comment -

          This issue does not occur in Hive 0.12 CDH 5.0.0. ETypeConverter.getNewConverter in that version does not have checks involving DECIMAL type.
          CDH 5.0.0

          public static Converter getNewConverter(final Class<?> type, final int index, final HiveGroupConverter parent) {
              for (final ETypeConverter eConverter : values()) {
                if (eConverter.getType() == type) {
                  return eConverter.getConverter(type, index, parent);
                }
              }
              throw new IllegalArgumentException("Converter not found ... for type : " + type);
            }
          

          CDH 5.1.0

          public static Converter getNewConverter(final PrimitiveType type, final int index, final HiveGroupConverter parent) {
              if (type.isPrimitive() && (type.asPrimitiveType().getPrimitiveTypeName().equals(PrimitiveType.PrimitiveTypeName.INT96))) {
                //TODO- cleanup once parquet support Timestamp type annotation.
                return ETypeConverter.ETIMESTAMP_CONVERTER.getConverter(type, index, parent);
              }
              if (OriginalType.DECIMAL == type.getOriginalType()) {
                return EDECIMAL_CONVERTER.getConverter(type, index, parent);
              } else if (OriginalType.UTF8 == type.getOriginalType()) {
                return ESTRING_CONVERTER.getConverter(type, index, parent);
              }
          
              Class<?> javaType = type.getPrimitiveTypeName().javaType;
              for (final ETypeConverter eConverter : values()) {
                if (eConverter.getType() == javaType) {
                  return eConverter.getConverter(type, index, parent);
                }
              }
          
              throw new IllegalArgumentException("Converter not found ... for type : " + type);
            }
          
          Show
          Raymond Lau added a comment - This issue does not occur in Hive 0.12 CDH 5.0.0. ETypeConverter.getNewConverter in that version does not have checks involving DECIMAL type. CDH 5.0.0 public static Converter getNewConverter( final Class <?> type, final int index, final HiveGroupConverter parent) { for ( final ETypeConverter eConverter : values()) { if (eConverter.getType() == type) { return eConverter.getConverter(type, index, parent); } } throw new IllegalArgumentException( "Converter not found ... for type : " + type); } CDH 5.1.0 public static Converter getNewConverter( final PrimitiveType type, final int index, final HiveGroupConverter parent) { if (type.isPrimitive() && (type.asPrimitiveType().getPrimitiveTypeName().equals(PrimitiveType.PrimitiveTypeName.INT96))) { //TODO- cleanup once parquet support Timestamp type annotation. return ETypeConverter.ETIMESTAMP_CONVERTER.getConverter(type, index, parent); } if (OriginalType.DECIMAL == type.getOriginalType()) { return EDECIMAL_CONVERTER.getConverter(type, index, parent); } else if (OriginalType.UTF8 == type.getOriginalType()) { return ESTRING_CONVERTER.getConverter(type, index, parent); } Class <?> javaType = type.getPrimitiveTypeName().javaType; for ( final ETypeConverter eConverter : values()) { if (eConverter.getType() == javaType) { return eConverter.getConverter(type, index, parent); } } throw new IllegalArgumentException( "Converter not found ... for type : " + type); }

            People

            • Assignee:
              Arup Malakar
              Reporter:
              Raymond Lau
            • Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development