Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22833

NPE when using concatenate with AWS Glue

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.6
    • Fix Version/s: None
    • Component/s: Hive, ORC
    • Labels:
      None
    • Environment:

      The hive version isĀ Hive 2.3.6-amzn-0, running on an EMR label emr-5.28.0, with Glue as a data catalog.

      Description

      When runningĀ 

      alter table query.payment_transaction partition (insert_date='2017-10-27') concatenate;

      I get a NPE, with the following stack trace on the logs:

      8-4207-837d-4fd6ea87b397): alter table query.payment_transaction partition (insert_date='2017-10-27') concatenate8-4207-837d-4fd6ea87b397): alter table query.payment_transaction partition (insert_date='2017-10-27') concatenate2020-02-05T13:55:50,840 ERROR [c8ef752c-2f03-460a-b131-ef3f6be4ddf8 main([])]: ql.Driver (SessionState.java:printError(1130)) - FAILED: SemanticException java.lang.NullPointerExceptionorg.apache.hadoop.hive.ql.parse.SemanticException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTablePartMergeFiles(DDLSemanticAnalyzer.java:1699) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:239) at org.apache.hadoop.util.RunJar.main(RunJar.java:153)Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Partition.getSkewedColNames(Partition.java:557) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTablePartMergeFiles(DDLSemanticAnalyzer.java:1626) ... 19 more
      

      The table definition looks like this:

      CREATE TABLE `query.payment_transaction`(
        `player_id` string,
        `seamless_wallet_id` string,
        `transaction_timestamp` timestamp,
        `creation_time` bigint,
        `seamless_wallet_transaction_id` string,
        `currency` string,
        `real_delta` decimal(38,18),
        `real_delta_eur` decimal(38,18),
        `transaction_type` string,
        `payment_method_id` string,
        `execution_provider` string,
        `channel` string,
        `channel_subtype` string,
        `session_id` string)
      PARTITIONED BY (
        `insert_date` string)
      ROW FORMAT SERDE
        'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
      STORED AS INPUTFORMAT
        'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
      OUTPUTFORMAT
        'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
      LOCATION
        's3://prod.casumo.bigdata.hive/casumo_query/payment_transaction'
      TBLPROPERTIES (
        'spark.sql.create.version'='2.2 or prior',
        'spark.sql.sources.schema.numPartCols'='1',
        'spark.sql.sources.schema.numParts'='1',
        'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"player_id","type":"string","nullable":true,"metadata":{}},{"name":"seamless_wallet_id","type":"string","nullable":true,"metadata":{}},{"name":"transaction_timestamp","type":"timestamp","nullable":true,"metadata":{}},{"name":"creation_time","type":"long","nullable":true,"metadata":{}},{"name":"seamless_wallet_transaction_id","type":"string","nullable":true,"metadata":{}},{"name":"currency","type":"string","nullable":true,"metadata":{}},{"name":"real_delta","type":"decimal(38,18)","nullable":true,"metadata":{}},{"name":"real_delta_eur","type":"decimal(38,18)","nullable":true,"metadata":{}},{"name":"transaction_type","type":"string","nullable":true,"metadata":{}},{"name":"payment_method_id","type":"string","nullable":true,"metadata":{}},{"name":"execution_provider","type":"string","nullable":true,"metadata":{}},{"name":"channel","type":"string","nullable":true,"metadata":{}},{"name":"channel_subtype","type":"string","nullable":true,"metadata":{}},{"name":"session_id","type":"string","nullable":true,"metadata":{}},{"name":"insert_date","type":"string","nullable":true,"metadata":{}}]}',
        'spark.sql.sources.schema.partCol.0'='insert_date')
      

      And is worth mentioning that we are using AWS Glue as a metastore, and this table was imported from a normal mysql metastore.

      Can someone guide me on the solution for this?

      Thank you all!

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              maduxi Madhava Carrillo
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: