Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4009

CLI Tests fail randomly due to MapReduce LocalJobRunner race condition

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail randomly when using LocalJobRunner.

      1. HIVE-4009-0.patch
        5 kB
        Brock Noland
      2. HIVE-4009.patch
        5 kB
        Brock Noland

        Issue Links

          Activity

          Hide
          brocknoland Brock Noland added a comment -

          Case 1 in JobClient.getJob():

          2013-02-08 08:26:16,132 FATAL conf.Configuration (Configuration.java:loadResource(2011)) - error parsing conf file:/home/hiveptest/hive/build/test/hadoop-hiveptest/mapred/staging/hiveptest1573470257/.staging/job_local1573470257_0001/job.xml
          java.io.FileNotFoundException: /home/hiveptest/build/test/hadoop-hiveptest/mapred/staging/hiveptest1573470257/.staging/job_local1573470257_0001/job.xml (No such file or directory)
                  at java.io.FileInputStream.open(Native Method)
                  at java.io.FileInputStream.<init>(FileInputStream.java:120)
                  at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1924)
                  at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1877)
                  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1785)
                  at org.apache.hadoop.conf.Configuration.get(Configuration.java:712)
                  at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1951)
                  at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:398)
                  at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:388)
                  at org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:174)
                  at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:655)
                  at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:668)
                  at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:282)
                  at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:532)
                  at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:453)
                  at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:689)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
          

          Case 1 in JobClient.getMapTaskReports():

              [junit] java.lang.RuntimeException: java.io.FileNotFoundException: /home/hiveptest/hive/build/test/hadoop-hiveptest/mapred/staging/hiveptest1743741198/.staging/job_local1743741198_0001/job.xml (No such file or directory)
              [junit] 	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2012)
              [junit] 	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1877)
              [junit] 	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1785)
              [junit] 	at org.apache.hadoop.conf.Configuration.get(Configuration.java:712)
              [junit] 	at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1951)
              [junit] 	at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:398)
              [junit] 	at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:388)
              [junit] 	at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186)
              [junit] 	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:635)
              [junit] 	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:633)
              [junit] 	at java.security.AccessController.doPrivileged(Native Method)
              [junit] 	at javax.security.auth.Subject.doAs(Subject.java:396)
              [junit] 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1372)
              [junit] 	at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:633)
              [junit] 	at org.apache.hadoop.mapred.JobClient.getTaskReports(JobClient.java:687)
              [junit] 	at org.apache.hadoop.mapred.JobClient.getMapTaskReports(JobClient.java:681)
              [junit] 	at org.apache.hadoop.mapred.JobClient.getMapTaskReports(JobClient.java:700)
              [junit] 	at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:253)
              [junit] 	at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:532)
              [junit] 	at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:453)
              [junit] 	at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:689)
              [junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              [junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              [junit] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              [junit] 	at java.lang.reflect.Method.invoke(Method.java:597)
              [junit] 	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
              [junit] Caused by: java.io.FileNotFoundException: /home/hiveptest/hive/build/test/hadoop-hiveptest/mapred/staging/hiveptest1743741198/.staging/job_local1743741198_0001/job.xml (No such file or directory)
              [junit] 	at java.io.FileInputStream.open(Native Method)
              [junit] 	at java.io.FileInputStream.<init>(FileInputStream.java:120)
              [junit] 	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1924)
              [junit] 	... 25 more
          
          
          Show
          brocknoland Brock Noland added a comment - Case 1 in JobClient.getJob(): 2013-02-08 08:26:16,132 FATAL conf.Configuration (Configuration.java:loadResource(2011)) - error parsing conf file:/home/hiveptest/hive/build/test/hadoop-hiveptest/mapred/staging/hiveptest1573470257/.staging/job_local1573470257_0001/job.xml java.io.FileNotFoundException: /home/hiveptest/build/test/hadoop-hiveptest/mapred/staging/hiveptest1573470257/.staging/job_local1573470257_0001/job.xml (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:120) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1924) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1877) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1785) at org.apache.hadoop.conf.Configuration.get(Configuration.java:712) at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1951) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:398) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:388) at org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:174) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:655) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:668) at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:282) at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:532) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:453) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:689) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Case 1 in JobClient.getMapTaskReports(): [junit] java.lang.RuntimeException: java.io.FileNotFoundException: /home/hiveptest/hive/build/test/hadoop-hiveptest/mapred/staging/hiveptest1743741198/.staging/job_local1743741198_0001/job.xml (No such file or directory) [junit] at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2012) [junit] at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1877) [junit] at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1785) [junit] at org.apache.hadoop.conf.Configuration.get(Configuration.java:712) [junit] at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1951) [junit] at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:398) [junit] at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:388) [junit] at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:186) [junit] at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:635) [junit] at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:633) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1372) [junit] at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:633) [junit] at org.apache.hadoop.mapred.JobClient.getTaskReports(JobClient.java:687) [junit] at org.apache.hadoop.mapred.JobClient.getMapTaskReports(JobClient.java:681) [junit] at org.apache.hadoop.mapred.JobClient.getMapTaskReports(JobClient.java:700) [junit] at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:253) [junit] at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:532) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:453) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:689) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:208) [junit] Caused by: java.io.FileNotFoundException: /home/hiveptest/hive/build/test/hadoop-hiveptest/mapred/staging/hiveptest1743741198/.staging/job_local1743741198_0001/job.xml (No such file or directory) [junit] at java.io.FileInputStream.open(Native Method) [junit] at java.io.FileInputStream.<init>(FileInputStream.java:120) [junit] at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1924) [junit] ... 25 more
          Hide
          brocknoland Brock Noland added a comment -

          Adding link to Phabricator: https://reviews.facebook.net/D8523

          Show
          brocknoland Brock Noland added a comment - Adding link to Phabricator: https://reviews.facebook.net/D8523
          Hide
          ashutoshc Ashutosh Chauhan added a comment -

          I never hit this in my CLI tests. Brock, can you describe the situation when you ran into it. Are these HiveServer2 tests?

          Show
          ashutoshc Ashutosh Chauhan added a comment - I never hit this in my CLI tests. Brock, can you describe the situation when you ran into it. Are these HiveServer2 tests?
          Hide
          brocknoland Brock Noland added a comment -

          Hey,

          I hit this with the parallel test tool. However, I think I was oversubscribing CPU quite a bit which caused this race more likely to be hit. After scaling back the number of threads of unit tests per host I have not hit it. However, since more parallel testing seems to be coming let's keep this open for the time being.

          Show
          brocknoland Brock Noland added a comment - Hey, I hit this with the parallel test tool. However, I think I was oversubscribing CPU quite a bit which caused this race more likely to be hit. After scaling back the number of threads of unit tests per host I have not hit it. However, since more parallel testing seems to be coming let's keep this open for the time being.
          Hide
          brocknoland Brock Noland added a comment -

          I haven't seen this reproduce in some time. Closing for now.

          Show
          brocknoland Brock Noland added a comment - I haven't seen this reproduce in some time. Closing for now.
          Hide
          brocknoland Brock Noland added a comment -

          I've seen this again. Time to fix it.

          Show
          brocknoland Brock Noland added a comment - I've seen this again. Time to fix it.
          Hide
          brocknoland Brock Noland added a comment -

          Too be clear, although MAPREDUCE-5001 improves the situation in that an exception is not throw, it's still possible for LJR to return null an fail. This happens on hosts which are very busy. Let's just not the racy status section of code when in local mode.

          Show
          brocknoland Brock Noland added a comment - Too be clear, although MAPREDUCE-5001 improves the situation in that an exception is not throw, it's still possible for LJR to return null an fail. This happens on hosts which are very busy. Let's just not the racy status section of code when in local mode.
          Hide
          hiveqa Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12682533/HIVE-4009.patch

          ERROR: -1 due to 1 failed/errored test(s), 6651 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1874/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1874/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1874/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12682533 - PreCommit-HIVE-TRUNK-Build

          Show
          hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12682533/HIVE-4009.patch ERROR: -1 due to 1 failed/errored test(s), 6651 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1874/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1874/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1874/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12682533 - PreCommit-HIVE-TRUNK-Build
          Hide
          szehon Szehon Ho added a comment -

          Makes sense to me not to run the racy section for local-mode. +1

          Show
          szehon Szehon Ho added a comment - Makes sense to me not to run the racy section for local-mode. +1
          Hide
          brocknoland Brock Noland added a comment -

          Thank you Szehon! I hope this will improve test flakiness. I've committed this to trunk.

          Show
          brocknoland Brock Noland added a comment - Thank you Szehon! I hope this will improve test flakiness. I've committed this to trunk.

            People

            • Assignee:
              brocknoland Brock Noland
              Reporter:
              brocknoland Brock Noland
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development