Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5267

test_seq_writer_hive_compatibility hits error running statement on Hive

    Details

    • Epic Color:
      ghx-label-5

      Description

      The ASF master core build sees an error from Hive during the map reduce when performing the "select count from table" portion of the test_seq_writer_hive_compatibility test. This may be a Hive bug, but we should track down whether there is anything about this test that is triggering it.

      F query_test/test_compressed_formats.py::TestTableWriters:)::test_seq_writer_hive_compatibility[exec_option:

      {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0}

      | table_format: text/none]
      query_test/test_compressed_formats.py:177: in test_seq_writer_hive_compatibility
      output = self.run_stmt_in_hive('select count from %s' % table_name)
      common/impala_test_suite.py:609: in run_stmt_in_hive
      raise RuntimeError(stderr)
      E RuntimeError: SLF4J: Class path contains multiple SLF4J bindings.
      E SLF4J: Found binding in [jar:file:/data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.12.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      E SLF4J: Found binding in [jar:file:/data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.12.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      E SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      E SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      E scan complete in 5ms
      E Connecting to jdbc:hive2://localhost:11050
      E Connected to: Apache Hive (version 1.1.0-cdh5.12.0-SNAPSHOT)
      E Driver: Hive JDBC (version 1.1.0-cdh5.12.0-SNAPSHOT)
      E Transaction isolation: TRANSACTION_REPEATABLE_READ
      E INFO : Compiling command(queryId=jenkins_20170501011717_5640b961-12ca-4ac9-a823-31d19af5b369): select count from test_seq_writer_hive_compatibility_e3728f35.seq_tbl_GZIP_RECORD
      E INFO : Semantic Analysis Completed
      E INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
      E INFO : Completed compiling command(queryId=jenkins_20170501011717_5640b961-12ca-4ac9-a823-31d19af5b369); Time taken: 0.173 seconds
      E INFO : Executing command(queryId=jenkins_20170501011717_5640b961-12ca-4ac9-a823-31d19af5b369): select count from test_seq_writer_hive_compatibility_e3728f35.seq_tbl_GZIP_RECORD
      E INFO : Query ID = jenkins_20170501011717_5640b961-12ca-4ac9-a823-31d19af5b369
      E INFO : Total jobs = 1
      E INFO : Launching Job 1 out of 1
      E INFO : Starting task [Stage-1:MAPRED] in serial mode
      E INFO : Number of reduce tasks determined at compile time: 1
      E INFO : In order to change the average load for a reducer (in bytes):
      E INFO : set hive.exec.reducers.bytes.per.reducer=<number>
      E INFO : In order to limit the maximum number of reducers:
      E INFO : set hive.exec.reducers.max=<number>
      E INFO : In order to set a constant number of reducers:
      E INFO : set mapreduce.job.reduces=<number>
      E INFO : number of splits:1
      E INFO : Submitting tokens for job: job_local220383829_0007
      E INFO : The url to track the job: http://localhost:8080/
      E INFO : Job running in-process (local Hadoop)
      E INFO : 2017-05-01 01:17:03,363 Stage-1 map = 0%, reduce = 0%
      E ERROR : Ended Job = job_local220383829_0007 with errors
      E ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
      E INFO : MapReduce Jobs Launched:
      E INFO : Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
      E INFO : Total MapReduce CPU Time Spent: 0 msec
      E INFO : Completed executing command(queryId=jenkins_20170501011717_5640b961-12ca-4ac9-a823-31d19af5b369); Time taken: 2.767 seconds
      E Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
      E Closing: 0: jdbc:hive2://localhost:11050

      In the hive logs, there is this error:
      2017-05-01 01:17:02,873 FATAL mr.ExecMapper (ExecMapper.java:map(178)) - java.lang.IllegalStateException: Invalid input path hdfs://localhost:20500/test-warehouse/decimal_tbl/d6=1/decimal_tbl.txt
      at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:410)
      at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:446)
      at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
      at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
      at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
      at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

      2017-05-01 01:17:02,873 INFO exec.MapOperator (Operator.java:close(595)) - 137 finished. closing...

        Issue Links

          Activity

          Hide
          attilaj Attila Jeges added a comment -

          I believe that the Hive error above was caused by another test 'TestHdfsParquetTableStatsWriter::test_write_statistics_decimal', that ran before 'test_seq_writer_hive_compatibility'. Despite the error, 'test_write_statistics_decimal' eventually succeeded.

          'test_seq_writer_hive_compatibility' failed with the following Hive error:
          2017-05-01 01:17:03,718 ERROR operation.Operation (SQLOperation.java:run(296)) - Error running hive query:
          org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
          at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
          at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:239)
          at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
          at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:415)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
          at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:745)

          I'm not sure what "return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask" means. We need to investigate further.

          Show
          attilaj Attila Jeges added a comment - I believe that the Hive error above was caused by another test 'TestHdfsParquetTableStatsWriter::test_write_statistics_decimal', that ran before 'test_seq_writer_hive_compatibility'. Despite the error, 'test_write_statistics_decimal' eventually succeeded. 'test_seq_writer_hive_compatibility' failed with the following Hive error: 2017-05-01 01:17:03,718 ERROR operation.Operation (SQLOperation.java:run(296)) - Error running hive query: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:239) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88) at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I'm not sure what "return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask" means. We need to investigate further.
          Hide
          dhecht Dan Hecht added a comment -

          Attila Jeges, any update on this?

          Show
          dhecht Dan Hecht added a comment - Attila Jeges , any update on this?
          Hide
          attilaj Attila Jeges added a comment -

          Dan Hecht The test failed because of a Hive issue that may cause failures when running queries in parallel against a minicluster DB. The test failure is not reproducible consistently. The Hive issue is tracked by HIVE-16345. It was resolved on April 10. Peter Vary Thanks for tracking it down!

          Show
          attilaj Attila Jeges added a comment - Dan Hecht The test failed because of a Hive issue that may cause failures when running queries in parallel against a minicluster DB. The test failure is not reproducible consistently. The Hive issue is tracked by HIVE-16345 . It was resolved on April 10. Peter Vary Thanks for tracking it down!
          Hide
          dhecht Dan Hecht added a comment -

          Thanks for the update Attila Jeges. So this doesn't sound like an Impala Blocker to me - could you please adjust the priority to something more appropriate. Also, if there's nothing more for impala to do (the only thing would be to pull in a new hive sooner?), then we might as well resolve as a dup of the Hive JIRA.

          Show
          dhecht Dan Hecht added a comment - Thanks for the update Attila Jeges . So this doesn't sound like an Impala Blocker to me - could you please adjust the priority to something more appropriate. Also, if there's nothing more for impala to do (the only thing would be to pull in a new hive sooner?), then we might as well resolve as a dup of the Hive JIRA.
          Hide
          attilaj Attila Jeges added a comment -

          Dan Hecht Done.

          Show
          attilaj Attila Jeges added a comment - Dan Hecht Done.

            People

            • Assignee:
              attilaj Attila Jeges
              Reporter:
              joemcdonnell Joe McDonnell
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development