Hive
  1. Hive
  2. HIVE-2725

Fix flaky testing infrastructure

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.0
    • Component/s: Testing Infrastructure
    • Labels:
      None

      Description

      To begin with org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert2_overwrite_partitions and org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_inputddl5 are failing on trunk for a while now.

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        They are failing on apache jenkins build server.
        When I try to debug them can't reproduce them locally. Both of them passed when ran individually.
        My local environment information:

        $ svn up
        At revision 1233014.
        $ svn info
        Path: .
        URL: https://svn.apache.org/repos/asf/hive/trunk
        Repository Root: https://svn.apache.org/repos/asf
        Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
        Revision: 1233014
        Node Kind: directory
        Schedule: normal
        Last Changed Author: cws
        Last Changed Rev: 1232766
        Last Changed Date: 2012-01-18 07:10:02 +0000 (Wed, 18 Jan 2012)
        
        $ uname -a
        Linux 2.6.18-238.1 #1 SMP Mon Feb 21 19:27:52 PST 2011 x86_64 x86_64 x86_64 GNU/Linux
        
        $ cat /etc/redhat-release 
        Red Hat Enterprise Linux Server release 5.6 (Tikanga)
        [hortonas@hrt8n30 hive-trunk]$ ant -version
        Apache Ant(TM) version 1.8.2 compiled on December 20 2010
        
        $ ant test -Dtestcase=TestCliDriver -Dqfile=inputddl5.q
        junit] Running org.apache.hadoop.hive.cli.TestCliDriver
            [junit] Begin query: inputddl5.q
            [junit] Copying file: file:/home/ashutosh/hive-trunk/data/files/kv4.txt
            [junit] diff -a /home/ashutosh/hive-trunk/build/ql/test/logs/clientpositive/inputddl5.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/inputddl5.q.out
            [junit] Done query: inputddl5.q
            [junit] Cleaning up TestCliDriver
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 10.351 sec
        
        $ ant test -Dtestcase=TestCliDriver -Dqfile=insert2_overwrite_partitions.q
          [junit] Running org.apache.hadoop.hive.cli.TestCliDriver
            [junit] Begin query: insert2_overwrite_partitions.q
            [junit] Copying file: file:/home/ashutosh/hive-trunk/data/files/kv1.txt
            [junit] Copying file: file:/home/ashutosh/hive-trunk/data/files/kv3.txt
            [junit] Deleted pfile:/home/ashutosh/hive-trunk/build/ql/test/data/warehouse/db2.db/destintable/ds=2011-11-11
            [junit] diff -a /home/ashutosh/hive-trunk/build/ql/test/logs/clientpositive/insert2_overwrite_partitions.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/insert2_overwrite_partitions.q.out
            [junit] Done query: insert2_overwrite_partitions.q
            [junit] Cleaning up TestCliDriver
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 27.886 sec
        

        Does anyone have any idea why do these tests fail on apache jenkins server ?

        Show
        Ashutosh Chauhan added a comment - They are failing on apache jenkins build server. When I try to debug them can't reproduce them locally. Both of them passed when ran individually. My local environment information: $ svn up At revision 1233014. $ svn info Path: . URL: https: //svn.apache.org/repos/asf/hive/trunk Repository Root: https: //svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 1233014 Node Kind: directory Schedule: normal Last Changed Author: cws Last Changed Rev: 1232766 Last Changed Date: 2012-01-18 07:10:02 +0000 (Wed, 18 Jan 2012) $ uname -a Linux 2.6.18-238.1 #1 SMP Mon Feb 21 19:27:52 PST 2011 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.6 (Tikanga) [hortonas@hrt8n30 hive-trunk]$ ant -version Apache Ant(TM) version 1.8.2 compiled on December 20 2010 $ ant test -Dtestcase=TestCliDriver -Dqfile=inputddl5.q junit] Running org.apache.hadoop.hive.cli.TestCliDriver [junit] Begin query: inputddl5.q [junit] Copying file: file:/home/ashutosh/hive-trunk/data/files/kv4.txt [junit] diff -a /home/ashutosh/hive-trunk/build/ql/test/logs/clientpositive/inputddl5.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/inputddl5.q.out [junit] Done query: inputddl5.q [junit] Cleaning up TestCliDriver [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 10.351 sec $ ant test -Dtestcase=TestCliDriver -Dqfile=insert2_overwrite_partitions.q [junit] Running org.apache.hadoop.hive.cli.TestCliDriver [junit] Begin query: insert2_overwrite_partitions.q [junit] Copying file: file:/home/ashutosh/hive-trunk/data/files/kv1.txt [junit] Copying file: file:/home/ashutosh/hive-trunk/data/files/kv3.txt [junit] Deleted pfile:/home/ashutosh/hive-trunk/build/ql/test/data/warehouse/db2.db/destintable/ds=2011-11-11 [junit] diff -a /home/ashutosh/hive-trunk/build/ql/test/logs/clientpositive/insert2_overwrite_partitions.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/insert2_overwrite_partitions.q.out [junit] Done query: insert2_overwrite_partitions.q [junit] Cleaning up TestCliDriver [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 27.886 sec Does anyone have any idea why do these tests fail on apache jenkins server ?
        Hide
        Ashutosh Chauhan added a comment -

        On the same build machine after applying hive-2589_2.patch following tests pass which apache jenkins reports as failures:

        $ svn info
        Path: .
        URL: https://svn.apache.org/repos/asf/hive/trunk
        Repository Root: https://svn.apache.org/repos/asf
        Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
        Revision: 1233014
        Node Kind: directory
        Schedule: normal
        Last Changed Author: cws
        Last Changed Rev: 1232766
        Last Changed Date: 2012-01-18 07:10:02 +0000 (Wed, 18 Jan 2012)
        
        $ svn diff
        Index: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
        ===================================================================
        --- metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java	(revision 1233014)
        +++ metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java	(working copy)
        @@ -26,14 +26,17 @@
         import java.io.IOException;
         import java.util.AbstractMap;
         import java.util.ArrayList;
        +import java.util.Arrays;
         import java.util.Collections;
         import java.util.Formatter;
         import java.util.HashMap;
        +import java.util.HashSet;
         import java.util.LinkedHashMap;
         import java.util.List;
         import java.util.Map;
         import java.util.Map.Entry;
         import java.util.Properties;
        +import java.util.Set;
         import java.util.Timer;
         import java.util.regex.Pattern;
         
        @@ -1562,7 +1565,23 @@
                     part.getParameters().get(Constants.DDL_TIME) == null) {
                   part.putToParameters(Constants.DDL_TIME, Long.toString(time));
                 }
        + 
        +        Map<String,String> tblParams = tbl.getParameters();
        +        String inheritProps =  hiveConf.getVar(ConfVars.METASTORE_PART_INHERIT_TBL_PROPS).trim();
        +        // Default value is empty string in which case no properties will be inherited.
        +        // * implies all properties needs to be inherited
        +        Set<String> inheritKeys = new HashSet<String>(Arrays.asList(inheritProps.split(",")));
        +        if(inheritKeys.contains("*")){
        +          inheritKeys =  tblParams.keySet();
        +        }
         
        +        for (String key : inheritKeys) {
        +          String paramVal = tblParams.get(key);
        +          if(null != paramVal){ // add the property only if it exists in table properties
        +            part.putToParameters(key, paramVal);
        +          }
        +        }
        +
                 success = ms.addPartition(part);
         
               } finally {
        Index: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        ===================================================================
        --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java	(revision 1233014)
        +++ common/src/java/org/apache/hadoop/hive/conf/HiveConf.java	(working copy)
        @@ -119,6 +119,7 @@
               HiveConf.ConfVars.METASTORE_EVENT_EXPIRY_DURATION,
               HiveConf.ConfVars.METASTORE_RAW_STORE_IMPL,
               HiveConf.ConfVars.METASTORE_END_FUNCTION_LISTENERS,
        +      HiveConf.ConfVars.METASTORE_PART_INHERIT_TBL_PROPS,
               };
         
           /**
        @@ -296,6 +297,7 @@
             METASTORE_NON_TRANSACTIONAL_READ("javax.jdo.option.NonTransactionalRead", true),
             METASTORE_CONNECTION_USER_NAME("javax.jdo.option.ConnectionUserName", "APP"),
             METASTORE_END_FUNCTION_LISTENERS("hive.metastore.end.function.listeners", ""),
        +    METASTORE_PART_INHERIT_TBL_PROPS("hive.metastore.partition.inherit.table.properties",""),
         
             // CLI
             CLIIGNOREERRORS("hive.cli.errors.ignore", false),
        
        $ ant test -Dtestcase=TestCliDriver -Dqfile=repair.q
            [junit] Running org.apache.hadoop.hive.cli.TestCliDriver
            [junit] Begin query: repair.q
            [junit] diff -a /home/hortonas/hive-trunk/build/ql/test/logs/clientpositive/repair.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/repair.q.out
            [junit] Done query: repair.q
            [junit] Cleaning up TestCliDriver
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.733 sec
        
        $ ant test -Dtestcase=TestCliDriver -Dqfile=router_join_ppr.q
            [junit] Running org.apache.hadoop.hive.cli.TestCliDriver
            [junit] Begin query: router_join_ppr.q
            [junit] diff -a /home/hortonas/hive-trunk/build/ql/test/logs/clientpositive/router_join_ppr.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/router_join_ppr.q.out
            [junit] Done query: router_join_ppr.q
            [junit] Cleaning up TestCliDriver
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 23.946 sec
        
        
         [junit] Begin query: sample1.q
            [junit] Deleted file:/home/hortonas/hive-trunk/build/ql/test/data/warehouse/dest1
            [junit] diff -a /home/hortonas/hive-trunk/build/ql/test/logs/clientpositive/sample1.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/sample1.q.out
            [junit] Done query: sample1.q
            [junit] Cleaning up TestCliDriver
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 13.215 sec
        
        $ ant test -Dtestcase=TestCliDriver -Dqfile=source.q
         [junit] Running org.apache.hadoop.hive.cli.TestCliDriver
            [junit] Begin query: source.q
            [junit] diff -a /home/hortonas/hive-trunk/build/ql/test/logs/clientpositive/source.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/source.q.out
            [junit] Done query: source.q
            [junit] Cleaning up TestCliDriver
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.46 sec
        
        
        

        Cant figure out whats wrong with apache jenkins server or in hive test infra.

        Show
        Ashutosh Chauhan added a comment - On the same build machine after applying hive-2589_2.patch following tests pass which apache jenkins reports as failures: $ svn info Path: . URL: https: //svn.apache.org/repos/asf/hive/trunk Repository Root: https: //svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 1233014 Node Kind: directory Schedule: normal Last Changed Author: cws Last Changed Rev: 1232766 Last Changed Date: 2012-01-18 07:10:02 +0000 (Wed, 18 Jan 2012) $ svn diff Index: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java =================================================================== --- metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java (revision 1233014) +++ metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java (working copy) @@ -26,14 +26,17 @@ import java.io.IOException; import java.util.AbstractMap; import java.util.ArrayList; + import java.util.Arrays; import java.util.Collections; import java.util.Formatter; import java.util.HashMap; + import java.util.HashSet; import java.util.LinkedHashMap; import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.Properties; + import java.util.Set; import java.util.Timer; import java.util.regex.Pattern; @@ -1562,7 +1565,23 @@ part.getParameters().get(Constants.DDL_TIME) == null ) { part.putToParameters(Constants.DDL_TIME, Long .toString(time)); } + + Map< String , String > tblParams = tbl.getParameters(); + String inheritProps = hiveConf.getVar(ConfVars.METASTORE_PART_INHERIT_TBL_PROPS).trim(); + // Default value is empty string in which case no properties will be inherited. + // * implies all properties needs to be inherited + Set< String > inheritKeys = new HashSet< String >(Arrays.asList(inheritProps.split( "," ))); + if (inheritKeys.contains( "*" )){ + inheritKeys = tblParams.keySet(); + } + for ( String key : inheritKeys) { + String paramVal = tblParams.get(key); + if ( null != paramVal){ // add the property only if it exists in table properties + part.putToParameters(key, paramVal); + } + } + success = ms.addPartition(part); } finally { Index: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java =================================================================== --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (revision 1233014) +++ common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (working copy) @@ -119,6 +119,7 @@ HiveConf.ConfVars.METASTORE_EVENT_EXPIRY_DURATION, HiveConf.ConfVars.METASTORE_RAW_STORE_IMPL, HiveConf.ConfVars.METASTORE_END_FUNCTION_LISTENERS, + HiveConf.ConfVars.METASTORE_PART_INHERIT_TBL_PROPS, }; /** @@ -296,6 +297,7 @@ METASTORE_NON_TRANSACTIONAL_READ( "javax.jdo.option.NonTransactionalRead" , true ), METASTORE_CONNECTION_USER_NAME( "javax.jdo.option.ConnectionUserName" , "APP" ), METASTORE_END_FUNCTION_LISTENERS( "hive.metastore.end.function.listeners" , ""), + METASTORE_PART_INHERIT_TBL_PROPS( "hive.metastore.partition.inherit.table.properties" ,""), // CLI CLIIGNOREERRORS( "hive.cli.errors.ignore" , false ), $ ant test -Dtestcase=TestCliDriver -Dqfile=repair.q [junit] Running org.apache.hadoop.hive.cli.TestCliDriver [junit] Begin query: repair.q [junit] diff -a /home/hortonas/hive-trunk/build/ql/test/logs/clientpositive/repair.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/repair.q.out [junit] Done query: repair.q [junit] Cleaning up TestCliDriver [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.733 sec $ ant test -Dtestcase=TestCliDriver -Dqfile=router_join_ppr.q [junit] Running org.apache.hadoop.hive.cli.TestCliDriver [junit] Begin query: router_join_ppr.q [junit] diff -a /home/hortonas/hive-trunk/build/ql/test/logs/clientpositive/router_join_ppr.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/router_join_ppr.q.out [junit] Done query: router_join_ppr.q [junit] Cleaning up TestCliDriver [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 23.946 sec [junit] Begin query: sample1.q [junit] Deleted file:/home/hortonas/hive-trunk/build/ql/test/data/warehouse/dest1 [junit] diff -a /home/hortonas/hive-trunk/build/ql/test/logs/clientpositive/sample1.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/sample1.q.out [junit] Done query: sample1.q [junit] Cleaning up TestCliDriver [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 13.215 sec $ ant test -Dtestcase=TestCliDriver -Dqfile=source.q [junit] Running org.apache.hadoop.hive.cli.TestCliDriver [junit] Begin query: source.q [junit] diff -a /home/hortonas/hive-trunk/build/ql/test/logs/clientpositive/source.q.out /home/hortonas/hive-trunk/ql/src/test/results/clientpositive/source.q.out [junit] Done query: source.q [junit] Cleaning up TestCliDriver [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.46 sec Cant figure out whats wrong with apache jenkins server or in hive test infra.
        Hide
        Carl Steinbach added a comment -

        I also verified that with HIVE-2589 in place you can run tests individually and get them to pass. However, if you run all of TestCliDriver then 149 tests reliably fail. Someone needs to start with the entire testsuite and gradually cut away tests until they get to smallest set possible that reliably triggers the failure in repair.q, and then drill down from there.

        Show
        Carl Steinbach added a comment - I also verified that with HIVE-2589 in place you can run tests individually and get them to pass. However, if you run all of TestCliDriver then 149 tests reliably fail. Someone needs to start with the entire testsuite and gradually cut away tests until they get to smallest set possible that reliably triggers the failure in repair.q, and then drill down from there.
        Hide
        Ashutosh Chauhan added a comment -

        Thanks, Carl for looking into it. You are spot on. Tests don't fail when run individually but if all the tests are run then 149 of those fail. I digged into it and figured out that there is an bug in existing test code QTestUtil.java . I will upload the patch soon on HIVE-2589 which fixes this bug. All the tests pass with that patch.

        Show
        Ashutosh Chauhan added a comment - Thanks, Carl for looking into it. You are spot on. Tests don't fail when run individually but if all the tests are run then 149 of those fail. I digged into it and figured out that there is an bug in existing test code QTestUtil.java . I will upload the patch soon on HIVE-2589 which fixes this bug. All the tests pass with that patch.
        Hide
        Ashutosh Chauhan added a comment -

        Tests insert2_overwrite_partitions and inputddl5 are failing consistently on apache jenkins server for trunk but passes for 0.8 branch. In my environment both tests pass both for 0.8 and trunk, so I don't have any way to reproduce those failures.

        Show
        Ashutosh Chauhan added a comment - Tests insert2_overwrite_partitions and inputddl5 are failing consistently on apache jenkins server for trunk but passes for 0.8 branch. In my environment both tests pass both for 0.8 and trunk, so I don't have any way to reproduce those failures.
        Hide
        Ashutosh Chauhan added a comment -

        Test inputddl5 is failing consistently on trunk for more than a month now. In my environment it passes, so I am not able to reproduce it. Does anyone else see this as a failure in their environment or have some info about this failing test.

        Show
        Ashutosh Chauhan added a comment - Test inputddl5 is failing consistently on trunk for more than a month now. In my environment it passes, so I am not able to reproduce it. Does anyone else see this as a failure in their environment or have some info about this failing test.
        Hide
        Ashutosh Chauhan added a comment -

        Trunk is green now. Resolving this as fixed. Only remaining issue is HIVE-2811

        Show
        Ashutosh Chauhan added a comment - Trunk is green now. Resolving this as fixed. Only remaining issue is HIVE-2811
        Hide
        Ashutosh Chauhan added a comment -

        This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.

        Show
        Ashutosh Chauhan added a comment - This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.

          People

          • Assignee:
            Ashutosh Chauhan
            Reporter:
            Ashutosh Chauhan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development