Hive
  1. Hive
  2. HIVE-1019

java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.6.0
    • Fix Version/s: None
    • Component/s: Server Infrastructure
    • Labels:
      None
    • Release Note:
      Hide
      Fix for java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
      and java.io.IOException: cannot find dir = .... in partToPartitionInfo!
      when running multiple threads with roughly similar queries.
      Show
      Fix for java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) and java.io.IOException: cannot find dir = .... in partToPartitionInfo! when running multiple threads with roughly similar queries.

      Description

      I keep getting errors like this:

      java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

      and :
      java.io.IOException: cannot find dir = hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in partToPartitionInfo!

      when running multiple threads with roughly similar queries.
      I have a patch for this which works for me.

      1. stacktrace2.txt
        4 kB
        Bennie Schut
      2. HIVE-1019-8.patch
        67 kB
        Ning Zhang
      3. HIVE-1019-7.patch
        28 kB
        Bennie Schut
      4. HIVE-1019-6.patch
        29 kB
        Bennie Schut
      5. HIVE-1019-5.patch
        36 kB
        Bennie Schut
      6. HIVE-1019-4.patch
        32 kB
        Bennie Schut
      7. HIVE-1019-3.patch
        17 kB
        Bennie Schut
      8. HIVE-1019-2.patch
        28 kB
        Bennie Schut
      9. HIVE-1019-1.patch
        26 kB
        Bennie Schut
      10. HIVE-1019.patch
        28 kB
        Bennie Schut

        Activity

        Hide
        Bennie Schut added a comment -

        Just found at least one way to cause/reproduce this:

        CREATE TABLE test_1 (
        test_field STRING
        , network STRING
        , version_type STRING
        , country STRING
        , test_ID BIGINT
        ) PARTITIONED BY (appendkey_id INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE;

        CREATE TABLE test_2 (
        test_field STRING
        , test_ID BIGINT
        ) PARTITIONED BY (appendkey_id INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE;

        CREATE TABLE test_month (
        network STRING
        , version_type STRING
        , country STRING
        , total INT
        ) PARTITIONED BY (month STRING, appendkey_id INT)
        ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE;

        from ( select network, version_type, country
        from test_1 t1 join test_2 t2 on (t1.test_ID=t2.test_ID)) subq1

        insert overwrite table test_month partition(month='2004-07', appendkey_id=0) select network, version_type, country , count total
        group by network, version_type, country

        insert overwrite table test_month partition(month='2004-07', appendkey_id=1) select network, version_type, country , count total
        group by network, version_type, country

        On our server it completely kills the hive service.

        Show
        Bennie Schut added a comment - Just found at least one way to cause/reproduce this: CREATE TABLE test_1 ( test_field STRING , network STRING , version_type STRING , country STRING , test_ID BIGINT ) PARTITIONED BY (appendkey_id INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE; CREATE TABLE test_2 ( test_field STRING , test_ID BIGINT ) PARTITIONED BY (appendkey_id INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE; CREATE TABLE test_month ( network STRING , version_type STRING , country STRING , total INT ) PARTITIONED BY (month STRING, appendkey_id INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE; from ( select network, version_type, country from test_1 t1 join test_2 t2 on (t1.test_ID=t2.test_ID)) subq1 insert overwrite table test_month partition(month='2004-07', appendkey_id=0) select network, version_type, country , count total group by network, version_type, country insert overwrite table test_month partition(month='2004-07', appendkey_id=1) select network, version_type, country , count total group by network, version_type, country On our server it completely kills the hive service.
        Hide
        Bennie Schut added a comment -

        on HIVE-1524 they added UUID's in a few places but the problem seems a bit more complicated. I used this patch for months without seeing this issue. I just updated to trunk without this patch and I have exactly the same error you have:

        java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLANroot_2011-03-29_12-21-37_996_31570073-87f1-4230-88e6-886bed8b6698 (No such file or directory)

        Show
        Bennie Schut added a comment - on HIVE-1524 they added UUID's in a few places but the problem seems a bit more complicated. I used this patch for months without seeing this issue. I just updated to trunk without this patch and I have exactly the same error you have: java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLANroot_2011-03-29_12-21-37_996_31570073-87f1-4230-88e6-886bed8b6698 (No such file or directory)
        Hide
        Leo Alekseyev added a comment -

        I get very similar errors when running multiple jobs through JDBC with lots of mappers. After a while, the jobs start crapping out with e.g.
        java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLAN67464971-325d-43bd-8822-c1a5873eb6ef (No such file or directory)

        There is a UUID after HIVE_PLAN, but I see that this issue is still open – so is the fix not complete?.. Is it a separate issue? Can someone explain what's going on here so that I can try to set up some reproducible tests?..
        I am running a fairly fresh (a few months old) trunk build of Hive.

        Show
        Leo Alekseyev added a comment - I get very similar errors when running multiple jobs through JDBC with lots of mappers. After a while, the jobs start crapping out with e.g. java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLAN67464971-325d-43bd-8822-c1a5873eb6ef (No such file or directory) There is a UUID after HIVE_PLAN, but I see that this issue is still open – so is the fix not complete?.. Is it a separate issue? Can someone explain what's going on here so that I can try to set up some reproducible tests?.. I am running a fairly fresh (a few months old) trunk build of Hive.
        Hide
        Dave Brondsema added a comment -

        Nevermind, I got that error because we had an old version (0.4.0) on the hadoop cluster. Upgrading hive everywhere fixed it.

        Show
        Dave Brondsema added a comment - Nevermind, I got that error because we had an old version (0.4.0) on the hadoop cluster. Upgrading hive everywhere fixed it.
        Hide
        Dave Brondsema added a comment -

        I'm getting this error with the simplest of queries, on 0.5.0 release and a recent build of the 0.6.0 SVN branch.

        I have an empty table 'test'.

        hive> select * from test;
        OK
        Time taken: 2.916 seconds
        hive> select count(1) from test;
        Total MapReduce jobs = 1
        Launching Job 1 out of 1
        Number of reduce tasks determined at compile time: 1
        In order to change the average load for a reducer (in bytes):
          set hive.exec.reducers.bytes.per.reducer=<number>
        In order to limit the maximum number of reducers:
          set hive.exec.reducers.max=<number>
        In order to set a constant number of reducers:
          set mapred.reduce.tasks=<number>
        Starting Job = job_201002192051_14077, Tracking URL = http://hadoop-namenode-1.v39.ch3.sourceforge.com:50030/jobdetails.jsp?jobid=job_201002192051_14077
        Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=hadoop-namenode-1.v39.ch3.sourceforge.com:8021 -kill job_201002192051_14077
        2010-08-26 14:53:11,348 Stage-1 map = 0%,  reduce = 0%
        2010-08-26 14:53:21,413 Stage-1 map = 100%,  reduce = 100%
        Ended Job = job_201002192051_14077 with errors
        
        Failed tasks with most(2) failures : 
        Task URL: http://hadoop-namenode-1.v39.ch3.sourceforge.com:50030/taskdetails.jsp?jobid=job_201002192051_14077&tipid=task_201002192051_14077_m_000000
        
        FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver
        

        If I check the job tracker URL it shows:

        2010-08-26 14:41:05,789 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=                                                                                                                                        
        2010-08-26 14:41:05,865 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1                                                                                                                                                                                        
        2010-08-26 14:41:05,881 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100                                                                                                                                                                                         
        2010-08-26 14:41:05,929 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720                                                                                                                                                                          
        2010-08-26 14:41:05,930 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680                                                                                                                                                                            
        2010-08-26 14:41:05,988 WARN org.apache.hadoop.mapred.TaskTracker: Error running child                                                                                                                                                                                  
        java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)                                                                                                                                                                        
                at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:110)                                                                                                                                                                                   
                at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:244)                                                                                                                                                                                  
                at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:208)                                                                                                                                                                       
                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)                                                                                                                                                                                                       
                at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)                                                                                                                                                                                       
        Caused by: java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)                                                                                                                                                                                         
                at java.io.FileInputStream.open(Native Method)                                                                                                                                                                                                                  
                at java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:106)                                                                                                                                                                                               
                at java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:66)                                                                                                                                                                                                
                at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:101)                                                                                                                                                                                   
                ... 4 more
        

        hive.exec.parallel is false, and there are no other jobs running

        Show
        Dave Brondsema added a comment - I'm getting this error with the simplest of queries, on 0.5.0 release and a recent build of the 0.6.0 SVN branch. I have an empty table 'test'. hive> select * from test; OK Time taken: 2.916 seconds hive> select count(1) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201002192051_14077, Tracking URL = http://hadoop-namenode-1.v39.ch3.sourceforge.com:50030/jobdetails.jsp?jobid=job_201002192051_14077 Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=hadoop-namenode-1.v39.ch3.sourceforge.com:8021 -kill job_201002192051_14077 2010-08-26 14:53:11,348 Stage-1 map = 0%, reduce = 0% 2010-08-26 14:53:21,413 Stage-1 map = 100%, reduce = 100% Ended Job = job_201002192051_14077 with errors Failed tasks with most(2) failures : Task URL: http://hadoop-namenode-1.v39.ch3.sourceforge.com:50030/taskdetails.jsp?jobid=job_201002192051_14077&tipid=task_201002192051_14077_m_000000 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver If I check the job tracker URL it shows: 2010-08-26 14:41:05,789 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2010-08-26 14:41:05,865 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 2010-08-26 14:41:05,881 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 2010-08-26 14:41:05,929 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720 2010-08-26 14:41:05,930 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 2010-08-26 14:41:05,988 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:110) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:244) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:208) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210) Caused by: java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:106) at java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:66) at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:101) ... 4 more hive.exec.parallel is false, and there are no other jobs running
        Hide
        Zheng Shao added a comment -

        The concept of session is longer than a query. See HIVE-584.

        We should not start a new session inside a query. Instead we should introduce a separate concept (maybe a combination of session id and task id) for that and use that for the PLAN.

        Show
        Zheng Shao added a comment - The concept of session is longer than a query. See HIVE-584 . We should not start a new session inside a query. Instead we should introduce a separate concept (maybe a combination of session id and task id) for that and use that for the PLAN.
        Hide
        Ning Zhang added a comment -

        The changes look fine in general. However, I found a new issue related to this JIRA. I'm uploading a new patch HIVE-1019-8.patch to include the fix as well.

        Basically the issue is that when we set hive.exec.parallel=true and two parallel jobs are reading the PLAN for the same file. This is an existing issue and it exists when applying to your patch. The reason is that the session ID (which now used as the Plan file suffix) is not changed for different session. We need to generate a new session ID for each started session in the parallel execution case. A unit test is also added to reproduce the issue in the current trunk.

        Bennie, is it possible to add a new unit test for the case you are solving?

        Show
        Ning Zhang added a comment - The changes look fine in general. However, I found a new issue related to this JIRA. I'm uploading a new patch HIVE-1019 -8.patch to include the fix as well. Basically the issue is that when we set hive.exec.parallel=true and two parallel jobs are reading the PLAN for the same file. This is an existing issue and it exists when applying to your patch. The reason is that the session ID (which now used as the Plan file suffix) is not changed for different session. We need to generate a new session ID for each started session in the parallel execution case. A unit test is also added to reproduce the issue in the current trunk. Bennie, is it possible to add a new unit test for the case you are solving?
        Hide
        Ning Zhang added a comment -

        Thanks for the quick reply Bennie. I will review it and get back to you soon.

        Show
        Ning Zhang added a comment - Thanks for the quick reply Bennie. I will review it and get back to you soon.
        Hide
        Bennie Schut added a comment -

        new patch merged with trunk

        Show
        Bennie Schut added a comment - new patch merged with trunk
        Hide
        Bennie Schut added a comment -

        Ok will merge with trunk.

        Show
        Bennie Schut added a comment - Ok will merge with trunk.
        Hide
        Ning Zhang added a comment -

        Hi Bennie,

        Thanks for the hard work! Can you rebase your patch to the latest trunk? There are some conflicts when I try to apply the patch. I'll review and commit as soon as possible.

        Thanks,

        Show
        Ning Zhang added a comment - Hi Bennie, Thanks for the hard work! Can you rebase your patch to the latest trunk? There are some conflicts when I try to apply the patch. I'll review and commit as soon as possible. Thanks,
        Hide
        Bennie Schut added a comment -

        Ok removed change to InputSplit. made a few more checkstyle fixes on SessionState.

        SessionState = 14
        Context = 1
        CombineHiveInputFormat = 2
        Utilities = 22
        HiveInputFormat = 5

        Show
        Bennie Schut added a comment - Ok removed change to InputSplit. made a few more checkstyle fixes on SessionState. SessionState = 14 Context = 1 CombineHiveInputFormat = 2 Utilities = 22 HiveInputFormat = 5
        Hide
        Bennie Schut added a comment -

        > Static inner classes in Java behave the same as stand-alone classes, or as c++ inner classes.
        > Non-static inner classes in Java has an implicit pointer to the parent class.

        Ah I incorrectly thought a static inner class would be like a normal static class (1 per jvm). I'll undo that change. I learned something new today.

        Show
        Bennie Schut added a comment - > Static inner classes in Java behave the same as stand-alone classes, or as c++ inner classes. > Non-static inner classes in Java has an implicit pointer to the parent class. Ah I incorrectly thought a static inner class would be like a normal static class (1 per jvm). I'll undo that change. I learned something new today.
        Hide
        Zheng Shao added a comment -

        Thanks for the update Bennie.

        Static inner classes in Java behave the same as stand-alone classes, or as c++ inner classes.
        Non-static inner classes in Java has an implicit pointer to the parent class.

        I don't think these InputSplit classes should be static. What is the concern having them static?

        Show
        Zheng Shao added a comment - Thanks for the update Bennie. Static inner classes in Java behave the same as stand-alone classes, or as c++ inner classes. Non-static inner classes in Java has an implicit pointer to the parent class. I don't think these InputSplit classes should be static. What is the concern having them static?
        Hide
        Bennie Schut added a comment -

        1. fixed the imports.
        2. I significantly reduced the number of checkstyles warnings what's left:
        CombineHiveInputFormat = 2
        Context = 1
        HiveInputFormat = 5
        HiveInputSplit = 1
        SessionState = 21
        Utilities = 22

        I noticed new warnings like 'HashMap' and 'ArrayList' is not allowed which where recently added.

        Show
        Bennie Schut added a comment - 1. fixed the imports. 2. I significantly reduced the number of checkstyles warnings what's left: CombineHiveInputFormat = 2 Context = 1 HiveInputFormat = 5 HiveInputSplit = 1 SessionState = 21 Utilities = 22 I noticed new warnings like 'HashMap' and 'ArrayList' is not allowed which where recently added.
        Hide
        Bennie Schut added a comment -

        1. My editor's bad will fix this.
        2. Ok will fix.
        3. I made both CombineHiveInputSplit and HiveInputSplit non-static if that's what you mean?
        I think both can be used concurrently so it's risky to keep them static.
        I also don't think CombineFilter should be static. HiveInputSplit was refactored into it's own file and perhaps we can do the same for CombineHiveInputSplit but I think both should atleast be non-static.

        Let me know what you think so I can make the changes you prefer.

        Show
        Bennie Schut added a comment - 1. My editor's bad will fix this. 2. Ok will fix. 3. I made both CombineHiveInputSplit and HiveInputSplit non-static if that's what you mean? I think both can be used concurrently so it's risky to keep them static. I also don't think CombineFilter should be static. HiveInputSplit was refactored into it's own file and perhaps we can do the same for CombineHiveInputSplit but I think both should atleast be non-static. Let me know what you think so I can make the changes you prefer.
        Hide
        Zheng Shao added a comment -

        There are some syntactic fixes needed:
        1. Please avoid using "import xxx.xxx.*"
        2. The code-style guideline from HIVE-1148 says we will have 100 chars per line, so we don't need to wrap the line if it's less than 100.
        3. There are some unrelated changed about the CombineFileInputFormat

        Show
        Zheng Shao added a comment - There are some syntactic fixes needed: 1. Please avoid using "import xxx.xxx.*" 2. The code-style guideline from HIVE-1148 says we will have 100 chars per line, so we don't need to wrap the line if it's less than 100. 3. There are some unrelated changed about the CombineFileInputFormat
        Hide
        Bennie Schut added a comment -

        Let me know if this is good enough or some more refactoring is needed on this.

        Show
        Bennie Schut added a comment - Let me know if this is good enough or some more refactoring is needed on this.
        Hide
        Bennie Schut added a comment -

        I've also made some minor changes to get the check styles count a bit lower.

        Show
        Bennie Schut added a comment - I've also made some minor changes to get the check styles count a bit lower.
        Hide
        Bennie Schut added a comment -

        On Context I've looked at making the id generation more generic so like this:

         
          public static String generateExecutionId() {
              return generateId("hive");
          }
        
          /**
           * Generate a unique planId.
           */
          public static String generatePlanId() {
            return generateId("plan");
          }
        
          /**
           * Generate a unique hive.session.id
           * @return unique session id string.
           */
          public static String generateSessionId() {
            return generateId(System.getProperty("user.name"));
          }
        
          private static String generateId(String type) {
              SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd_HH-mm-ss_SSS");
              return type
                      + "_" + format.format(new Date())
                      + "_" + UUID.randomUUID();
          }
        
        

        Ok perhaps you can take a look at what I have now and see if it's ok?

        Show
        Bennie Schut added a comment - On Context I've looked at making the id generation more generic so like this: public static String generateExecutionId() { return generateId("hive"); } /** * Generate a unique planId. */ public static String generatePlanId() { return generateId("plan"); } /** * Generate a unique hive.session.id * @return unique session id string. */ public static String generateSessionId() { return generateId(System.getProperty("user.name")); } private static String generateId(String type) { SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd_HH-mm-ss_SSS"); return type + "_" + format.format(new Date()) + "_" + UUID.randomUUID(); } Ok perhaps you can take a look at what I have now and see if it's ok?
        Hide
        Zheng Shao added a comment -

        Thanks for the quick response Bennie.

        2.2 Can we replace the random number in Context.generateExecutionId() with UUID? That will be the best. We still want to see the date and time easily from the ExecutionId.

        It will be great if you can do these changes together here (I don't think it will make the patch too big).

        Show
        Zheng Shao added a comment - Thanks for the quick response Bennie. 2.2 Can we replace the random number in Context.generateExecutionId() with UUID? That will be the best. We still want to see the date and time easily from the ExecutionId. It will be great if you can do these changes together here (I don't think it will make the patch too big).
        Hide
        Bennie Schut added a comment -

        Hi Zheng,

        1. I never ran it before. That's a lot of warnings. I'll use it from now on in my editor.

        2. Ah yes it needs a bit of refactoring. I was trying to do a quick and dirty fix to get the job done.
        To give you some background, on my project we are using JdbcTemplates with dbcp and running multiple queries concurrently. We for instance import data from 1 load table into multiple partitions at the same time. So those threads start roughly at the same time and look roughly the same making this an issue.

        2.1 Sounds like a good idea.

        2.2 Would you prefer Context.generateExecutionId() way of doing it? it theoretically can still collide although extremely unlikely.
        Using a UUID prevents this completely (although at a price). Either way is fine by me.

        Would you like me to make these changes or would this be part of the larger issue?

        Thanks.

        Show
        Bennie Schut added a comment - Hi Zheng, 1. I never ran it before. That's a lot of warnings. I'll use it from now on in my editor. 2. Ah yes it needs a bit of refactoring. I was trying to do a quick and dirty fix to get the job done. To give you some background, on my project we are using JdbcTemplates with dbcp and running multiple queries concurrently. We for instance import data from 1 load table into multiple partitions at the same time. So those threads start roughly at the same time and look roughly the same making this an issue. 2.1 Sounds like a good idea. 2.2 Would you prefer Context.generateExecutionId() way of doing it? it theoretically can still collide although extremely unlikely. Using a UUID prevents this completely (although at a price). Either way is fine by me. Would you like me to make these changes or would this be part of the larger issue? Thanks.
        Hide
        Zheng Shao added a comment -

        Hi Bennie,

        1. Can you run "ant checkstyle"?
        2. Frankly speaking I think the current code in trunk needs bigger refactoring:
        2.1 Utility.setMapRedWork(): We should use the directories in the "Context" class to store the plan at the JobClient (we can pass the full path name of the plan in Configuration), instead of generating a random plan file name.
        2.2 We should fix "hive.session.id" to be unique per session (Let SessionState.makeSessionId() look similar to Context.generateExecutionId()). Then we can use "hive.session.id" in the gWorkMap.

          public static void setMapRedWork(Configuration job, MapredWork w) {
            try {
              // use the default file system of the job
              FileSystem fs = FileSystem.get(job);
        *      Path planPath = new Path(HiveConf.getVar(job,
        *          HiveConf.ConfVars.SCRATCHDIR), "plan." + randGen.nextInt());
        

        What do you think?

        Show
        Zheng Shao added a comment - Hi Bennie, 1. Can you run "ant checkstyle"? 2. Frankly speaking I think the current code in trunk needs bigger refactoring: 2.1 Utility.setMapRedWork(): We should use the directories in the "Context" class to store the plan at the JobClient (we can pass the full path name of the plan in Configuration), instead of generating a random plan file name. 2.2 We should fix "hive.session.id" to be unique per session (Let SessionState.makeSessionId() look similar to Context.generateExecutionId()). Then we can use "hive.session.id" in the gWorkMap. public static void setMapRedWork(Configuration job, MapredWork w) { try { // use the default file system of the job FileSystem fs = FileSystem.get(job); * Path planPath = new Path(HiveConf.getVar(job, * HiveConf.ConfVars.SCRATCHDIR), "plan." + randGen.nextInt()); What do you think?
        Hide
        Bennie Schut added a comment -

        Is this acceptable to be committed like this?

        Show
        Bennie Schut added a comment - Is this acceptable to be committed like this?
        Hide
        Bennie Schut added a comment -

        Ok new patch for trunk.

        Show
        Bennie Schut added a comment - Ok new patch for trunk.
        Hide
        Bennie Schut added a comment -

        On errors like this:
        Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://audi.ebuddy.net:9000/user/hive/warehouse/client in partToPartitionInfo!)'
        10/01/28 16:48:15 ERROR exec.ExecDriver: Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://audi.ebuddy.net:9000/user/hive/warehouse/client in partToPartitionInfo!)'
        java.io.IOException: cannot find dir = hdfs://audi.ebuddy.net:9000/user/hive/warehouse/client in partToPartitionInfo!

        The original code uses the "mapred.job.name" as a unique id to store a map of plan's in the Utilities class. This would however not work
        since a name like this: "insert overwrite table use...iptolong(cs.ip)(Stage-1)" you can have a 2nd query with multiple joins and a where clause which make the query really different but the name the same.

        I originally solved this by using the hive.sesion.id but I just found out simply using the hive.session.id isn't going to work if you rapidly start sessions since it generates uniqueid's only if you start in different minutes.

        For now I append a UUID to the hive.session.id to make it really unique and this works perfectly.

        for example:
        "hive.session.id" "hadoop_201001291121_6ac26279-ce6e-4445-9d14-a656c2854f8d"

        I think anyone using the jdbc driver multithreaded will at some point run into this problem.

        I'll make a new patch.

        Show
        Bennie Schut added a comment - On errors like this: Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://audi.ebuddy.net:9000/user/hive/warehouse/client in partToPartitionInfo!)' 10/01/28 16:48:15 ERROR exec.ExecDriver: Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://audi.ebuddy.net:9000/user/hive/warehouse/client in partToPartitionInfo!)' java.io.IOException: cannot find dir = hdfs://audi.ebuddy.net:9000/user/hive/warehouse/client in partToPartitionInfo! The original code uses the "mapred.job.name" as a unique id to store a map of plan's in the Utilities class. This would however not work since a name like this: "insert overwrite table use...iptolong(cs.ip)(Stage-1)" you can have a 2nd query with multiple joins and a where clause which make the query really different but the name the same. I originally solved this by using the hive.sesion.id but I just found out simply using the hive.session.id isn't going to work if you rapidly start sessions since it generates uniqueid's only if you start in different minutes. For now I append a UUID to the hive.session.id to make it really unique and this works perfectly. for example: "hive.session.id" "hadoop_201001291121_6ac26279-ce6e-4445-9d14-a656c2854f8d" I think anyone using the jdbc driver multithreaded will at some point run into this problem. I'll make a new patch.
        Hide
        Bennie Schut added a comment -

        just noticed I missed something when merging from the previouse verion. In some cases you still get this error.

        Show
        Bennie Schut added a comment - just noticed I missed something when merging from the previouse verion. In some cases you still get this error.
        Hide
        Bennie Schut added a comment -

        Ok new patch including the "Licensed" headers in the new files. (HIVE-1019-2.patch)

        Show
        Bennie Schut added a comment - Ok new patch including the "Licensed" headers in the new files. ( HIVE-1019 -2.patch)
        Hide
        Namit Jain added a comment -

        yes, i meant that

        Show
        Namit Jain added a comment - yes, i meant that
        Hide
        Bennie Schut added a comment -

        ah i just realized you probably mean the files inside the patch and not the patch. Ill fix that tomorrow.

        Show
        Bennie Schut added a comment - ah i just realized you probably mean the files inside the patch and not the patch. Ill fix that tomorrow.
        Hide
        Bennie Schut added a comment -

        First few lines of the patch when I download it look like this:

        Index: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
        ===================================================================
        — ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java (revision 903125)
        +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java (working copy)

        Plus I tried the patch doing this:
        patch -p0 < HIVE-1019-1.patch
        So it should work right?

        it should only be the "HIVE-1019-1.patch" the other one is the old file and a stacktrace showing the problem.

        Show
        Bennie Schut added a comment - First few lines of the patch when I download it look like this: Index: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java =================================================================== — ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java (revision 903125) +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java (working copy) Plus I tried the patch doing this: patch -p0 < HIVE-1019 -1.patch So it should work right? it should only be the " HIVE-1019 -1.patch" the other one is the old file and a stacktrace showing the problem.
        Hide
        Namit Jain added a comment -

        The header is missing in the newly created files

        Show
        Namit Jain added a comment - The header is missing in the newly created files
        Hide
        Bennie Schut added a comment -

        Old patch wasn't working on trunk anymore. "HIVE-1019-1.patch" works on trunk again.

        Show
        Bennie Schut added a comment - Old patch wasn't working on trunk anymore. " HIVE-1019 -1.patch" works on trunk again.
        Hide
        Bennie Schut added a comment -

        I'm using jdbc. I'm not sure you would get the same problem when using the cli but I wouldn't be surprised.

        Show
        Bennie Schut added a comment - I'm using jdbc. I'm not sure you would get the same problem when using the cli but I wouldn't be surprised.
        Hide
        Zheng Shao added a comment -

        Hi Bennie, how do you run multiple threads with roughly similar queries? Are you using HiveServer or jdbc or odbc?

        Show
        Zheng Shao added a comment - Hi Bennie, how do you run multiple threads with roughly similar queries? Are you using HiveServer or jdbc or odbc?
        Hide
        Bennie Schut added a comment -

        fyi, on the Utilites class I used the HIVESESSIONID as the unique id since getting the unique hadoop jobid would mean large changes to some files, since we only have jobconfig info at that point.

        Show
        Bennie Schut added a comment - fyi, on the Utilites class I used the HIVESESSIONID as the unique id since getting the unique hadoop jobid would mean large changes to some files, since we only have jobconfig info at that point.
        Hide
        Bennie Schut added a comment -

        This was a two part problem. On Utilities we sometimes synchronize on "Utilities" and sometimes on "gWorkMap" where in these cases it should always be "gWorkMap". The seconds part is we use the jobname as unique id to get jobs from this map. But the name is actually not unique. I have jobnames like this: "select distinct login_cl...chatsessions_load(Stage-1)" and they are not all the same job.

        the second part is some static inner classes in for instance HiveInputFormat and CombineHiveInputFormat.

        This seems to fix the problem for me.

        Show
        Bennie Schut added a comment - This was a two part problem. On Utilities we sometimes synchronize on "Utilities" and sometimes on "gWorkMap" where in these cases it should always be "gWorkMap". The seconds part is we use the jobname as unique id to get jobs from this map. But the name is actually not unique. I have jobnames like this: "select distinct login_cl...chatsessions_load(Stage-1)" and they are not all the same job. the second part is some static inner classes in for instance HiveInputFormat and CombineHiveInputFormat. This seems to fix the problem for me.
        Hide
        Bennie Schut added a comment -

        stacktrace of the problem.

        Show
        Bennie Schut added a comment - stacktrace of the problem.

          People

          • Assignee:
            Bennie Schut
            Reporter:
            Bennie Schut
          • Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development