Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6430

MapJoin hash table has large memory overhead

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None
    • None

    Description

      Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization).

      Attachments

        1. HIVE-6430.patch
          134 kB
          Sergey Shelukhin
        2. HIVE-6430.01.patch
          149 kB
          Sergey Shelukhin
        3. HIVE-6430.02.patch
          137 kB
          Sergey Shelukhin
        4. HIVE-6430.03.patch
          149 kB
          Sergey Shelukhin
        5. HIVE-6430.04.patch
          158 kB
          Sergey Shelukhin
        6. HIVE-6430.05.patch
          162 kB
          Sergey Shelukhin
        7. HIVE-6430.06.patch
          161 kB
          Sergey Shelukhin
        8. HIVE-6430.07.patch
          169 kB
          Sergey Shelukhin
        9. HIVE-6430.08.patch
          170 kB
          Sergey Shelukhin
        10. HIVE-6430.09.patch
          179 kB
          Sergey Shelukhin
        11. HIVE-6430.10.patch
          195 kB
          Sergey Shelukhin
        12. HIVE-6430.11.patch
          202 kB
          Sergey Shelukhin
        13. HIVE-6430.12.patch
          204 kB
          Sergey Shelukhin
        14. HIVE-6430.12.patch
          204 kB
          Sergey Shelukhin
        15. HIVE-6430.13.patch
          205 kB
          Sergey Shelukhin
        16. HIVE-6430.14.patch
          207 kB
          Sergey Shelukhin

        Issue Links

          Activity

            Here's the summary of the overhead per entry /after both of the above patches go in/ (before, the overhead in key and value is significantly bigger).

            HashTable
            Entry array: 8+ bytes
            Entry: 32 bytes
            Key and value objects: 32 bytes

            Key
            Byte array object + length: 20 bytes.
            Field count and null mask: 1 byte.
            Rounding to 8 bytes: 0-7 bytes.

            Row
            Fields: 8 bytes.
            Object array object + length: 24 bytes.
            Per-column, writable object: 16 bytes (assuming all the other fields in writables are useful data).

            "Guaranteed" overhead per entry: 125 bytes, plus writables for row values and padding on key.
            Example double key, row with one field: additional 21 bytes per entry, ~146 total
            Example int key, row with 5 fields: additional 87 bytes per entry, ~212 total
            + some overhead depending on HashMap fullness.

            So that's a lot of overhead (depends on the data of course, if row contains cat photos in binary then 150-200 bytes is not much).

            The approach to get rid of per-entry overhead in general involves a hashtable implemented on top of array, with open addressing, and storing the actual variable-length keys and rows in big flat array(s) of byte[]-s or objects. That would get rid of key and rowe object overhead, most of hashmap overhead, most of key overhead, and most/some (see below) of row overhead.

            The good thing about the table is that it's R/O after initial creation and we never delete, so we don't have to worry about many scenarios.

            Details (scroll down for estimates)
            Simple case, assuming we can convert both key and row into bytes:
            Allocate largish fixed size byte arrays to have an infinite write buffer (or array can be reallocated if needed, or combination). Have a flat, custom-made hash table similar to HPPC one, that would store offsets into that array in the key array (of longs), and would have no value or state arrays. Some additional stuff, for example lengths or null bitmasks can be fit into key array values also.
            When loading, incoming writables would write the keys and values into the write buffer. We know the schema so we don't have to worry about storing types, field offsets etc. Then write a fixed-size tail with e.g. length of key and value, to know what to compare and where value starts, etc. Because there's no requirement to allocate some number of bytes like there is now, v-length format can be used if needed to save space... but it shouldn't be too complicated. Probably it shouldn't use ORC there Then, key array uses standard hashtable put to store the offset to the postfix.
            When getting, the key can still be compared same as now, as a byte array. One extra "dereference" from key array to get to the actual key by index.
            For values, writables will have to be re-created when the row is requested because everything depends on writables now. Writables will trivially read from byte array at offset. Obviously this has performance cost.
            Note that this is not like current lazy deserialization:
            1) We do not deserialize on demand - final writables are just written to/read from byte array, so creating them should be cheaper than deserializing.
            2) Writables are not preserved for future use and are created every time row is accessed, which has perf cost but saves memory.
            Total overhead per entry would be around 14-16 bytes, plus some fixed or semi-fixed overhead depending on the write buffer allocation scheme.
            In the above examples overhead will go from 146 and 212 bytes to 16 and 16.

            Another alternative is similar, but with only keys in byte array, and values in a separate large Object array operating on the same principles, in writables with all their glory.
            Key array can store indices and length to both, probably 2-3 longs per entry depending on what limitations we can accept.
            So the total overhead will be around 16-24 bytes + 16 per field in the row, but writables wouldn't need to be re-created.
            In the above examples overhead will go from 146 and 212 bytes to 32 and 96.

            Tl;dr and estimates
            The bad thing obviously is that w/o key and row objects all the interfaces around them would cease to exist. This is esp. bad for MR due to convoluted HashTable path with write and read, so in the first cut I think we should go Tez-only and preserve legacy path with objects for MR.

            There are several good things...

            • We can essentially copy-paste HPPC long-long hashmap. It probably doesn't fit by itself and we don't need all the features, but it must be simple to convert to above. So we don't need to code up the open-addressing hashmap.
            • W.r.t. interface difference, I looked at the divergent paths; Tez HT loader obviously would be able to do whatever. MapJoinOperator is the only place where there will be problems - it currently creates the key and then calls get(key). Get can be changed to take the row, so that it would create the key for get as necessary.
            • Code for byte key creation, compare, validation etc.; and some other code from the above two patches can be reused; plus I know all I need to know and what needs to be done about writables and bytes from them.
            sershe Sergey Shelukhin added a comment - Here's the summary of the overhead per entry /after both of the above patches go in/ (before, the overhead in key and value is significantly bigger). HashTable Entry array: 8+ bytes Entry: 32 bytes Key and value objects: 32 bytes Key Byte array object + length: 20 bytes. Field count and null mask: 1 byte. Rounding to 8 bytes: 0-7 bytes. Row Fields: 8 bytes. Object array object + length: 24 bytes. Per-column, writable object: 16 bytes (assuming all the other fields in writables are useful data). "Guaranteed" overhead per entry: 125 bytes, plus writables for row values and padding on key. Example double key, row with one field: additional 21 bytes per entry, ~146 total Example int key, row with 5 fields: additional 87 bytes per entry, ~212 total + some overhead depending on HashMap fullness. So that's a lot of overhead (depends on the data of course, if row contains cat photos in binary then 150-200 bytes is not much). The approach to get rid of per-entry overhead in general involves a hashtable implemented on top of array, with open addressing, and storing the actual variable-length keys and rows in big flat array(s) of byte[]-s or objects. That would get rid of key and rowe object overhead, most of hashmap overhead, most of key overhead, and most/some (see below) of row overhead. The good thing about the table is that it's R/O after initial creation and we never delete, so we don't have to worry about many scenarios. Details (scroll down for estimates) Simple case, assuming we can convert both key and row into bytes: Allocate largish fixed size byte arrays to have an infinite write buffer (or array can be reallocated if needed, or combination). Have a flat, custom-made hash table similar to HPPC one, that would store offsets into that array in the key array (of longs), and would have no value or state arrays. Some additional stuff, for example lengths or null bitmasks can be fit into key array values also. When loading, incoming writables would write the keys and values into the write buffer. We know the schema so we don't have to worry about storing types, field offsets etc. Then write a fixed-size tail with e.g. length of key and value, to know what to compare and where value starts, etc. Because there's no requirement to allocate some number of bytes like there is now, v-length format can be used if needed to save space... but it shouldn't be too complicated. Probably it shouldn't use ORC there Then, key array uses standard hashtable put to store the offset to the postfix. When getting, the key can still be compared same as now, as a byte array. One extra "dereference" from key array to get to the actual key by index. For values, writables will have to be re-created when the row is requested because everything depends on writables now. Writables will trivially read from byte array at offset. Obviously this has performance cost. Note that this is not like current lazy deserialization: 1) We do not deserialize on demand - final writables are just written to/read from byte array, so creating them should be cheaper than deserializing. 2) Writables are not preserved for future use and are created every time row is accessed, which has perf cost but saves memory. Total overhead per entry would be around 14-16 bytes, plus some fixed or semi-fixed overhead depending on the write buffer allocation scheme. In the above examples overhead will go from 146 and 212 bytes to 16 and 16. Another alternative is similar, but with only keys in byte array, and values in a separate large Object array operating on the same principles, in writables with all their glory. Key array can store indices and length to both, probably 2-3 longs per entry depending on what limitations we can accept. So the total overhead will be around 16-24 bytes + 16 per field in the row, but writables wouldn't need to be re-created. In the above examples overhead will go from 146 and 212 bytes to 32 and 96. Tl;dr and estimates The bad thing obviously is that w/o key and row objects all the interfaces around them would cease to exist. This is esp. bad for MR due to convoluted HashTable path with write and read, so in the first cut I think we should go Tez-only and preserve legacy path with objects for MR. There are several good things... We can essentially copy-paste HPPC long-long hashmap. It probably doesn't fit by itself and we don't need all the features, but it must be simple to convert to above. So we don't need to code up the open-addressing hashmap. W.r.t. interface difference, I looked at the divergent paths; Tez HT loader obviously would be able to do whatever. MapJoinOperator is the only place where there will be problems - it currently creates the key and then calls get(key). Get can be changed to take the row, so that it would create the key for get as necessary. Code for byte key creation, compare, validation etc.; and some other code from the above two patches can be reused; plus I know all I need to know and what needs to be done about writables and bytes from them.

            "all the other fields in writables" should be "all the fields in writables", cannot edit

            sershe Sergey Shelukhin added a comment - "all the other fields in writables" should be "all the fields in writables", cannot edit

            attempt #2... presumably not only TableScans can be valid parents, because if I remove all other operators (as in the initial version) the tests fail. The input from someone with better knowledge of the original path would be helpful

            sershe Sergey Shelukhin added a comment - attempt #2... presumably not only TableScans can be valid parents, because if I remove all other operators (as in the initial version) the tests fail. The input from someone with better knowledge of the original path would be helpful

            wrong jira

            sershe Sergey Shelukhin added a comment - wrong jira

            New code probably has tons of bugs, but some old tests I ran have passed, let's try HiveQA. I will run tez tests

            sershe Sergey Shelukhin added a comment - New code probably has tons of bugs, but some old tests I ran have passed, let's try HiveQA. I will run tez tests

            Reattaching the patch, with some fixes in new code (not working yet). Looks like QA didn't pick it up

            sershe Sergey Shelukhin added a comment - Reattaching the patch, with some fixes in new code (not working yet). Looks like QA didn't pick it up
            hiveqa Hive QA added a comment -

            Overall: -1 no tests executed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12633240/HIVE-6430.patch

            Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1649/testReport
            Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1649/console

            Messages:

            Executing org.apache.hive.ptest.execution.PrepPhase
            Tests exited with: NonZeroExitCodeException
            Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]]
            + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
            + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
            + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
            + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
            + cd /data/hive-ptest/working/
            + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1649/source-prep.txt
            + [[ false == \t\r\u\e ]]
            + mkdir -p maven ivy
            + [[ svn = \s\v\n ]]
            + [[ -n '' ]]
            + [[ -d apache-svn-trunk-source ]]
            + [[ ! -d apache-svn-trunk-source/.svn ]]
            + [[ ! -d apache-svn-trunk-source ]]
            + cd apache-svn-trunk-source
            + svn revert -R .
            Reverted 'metastore/scripts/upgrade/derby/upgrade.order.derby'
            Reverted 'metastore/scripts/upgrade/mysql/upgrade.order.mysql'
            Reverted 'metastore/scripts/upgrade/mysql/hive-schema-0.13.0.mysql.sql'
            Reverted 'metastore/scripts/upgrade/oracle/upgrade.order.oracle'
            Reverted 'metastore/scripts/upgrade/postgres/upgrade.order.postgres'
            ++ awk '{print $2}'
            ++ egrep -v '^X|^Performing status on external'
            ++ svn status --no-ignore
            + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target metastore/scripts/upgrade/derby/upgrade-0.13.0-to-0.14.0.derby.sql metastore/scripts/upgrade/derby/hive-schema-0.14.0.derby.sql metastore/scripts/upgrade/mysql/upgrade-0.13.0-to-0.14.0.mysql.sql metastore/scripts/upgrade/mysql/hive-schema-0.14.0.mysql.sql metastore/scripts/upgrade/oracle/upgrade-0.13.0-to-0.14.0.oracle.sql metastore/scripts/upgrade/oracle/hive-schema-0.14.0.oracle.sql metastore/scripts/upgrade/postgres/upgrade-0.13.0-to-0.14.0.postgres.sql metastore/scripts/upgrade/postgres/hive-schema-0.14.0.postgres.sql itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
            + svn update
            U    ql/src/test/queries/clientpositive/mapjoin_mapjoin.q
            U    ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out
            U    ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
            U    ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
            U    ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java
            U    ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
            
            Fetching external item into 'hcatalog/src/test/e2e/harness'
            Updated external to revision 1575376.
            
            Updated to revision 1575376.
            + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
            + patchFilePath=/data/hive-ptest/working/scratch/build.patch
            + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
            + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
            + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
            The patch does not appear to apply with p0, p1, or p2
            + exit 1
            '
            

            This message is automatically generated.

            ATTACHMENT ID: 12633240

            hiveqa Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12633240/HIVE-6430.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1649/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1649/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1649/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'metastore/scripts/upgrade/derby/upgrade.order.derby' Reverted 'metastore/scripts/upgrade/mysql/upgrade.order.mysql' Reverted 'metastore/scripts/upgrade/mysql/hive-schema-0.13.0.mysql.sql' Reverted 'metastore/scripts/upgrade/oracle/upgrade.order.oracle' Reverted 'metastore/scripts/upgrade/postgres/upgrade.order.postgres' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target metastore/scripts/upgrade/derby/upgrade-0.13.0-to-0.14.0.derby.sql metastore/scripts/upgrade/derby/hive-schema-0.14.0.derby.sql metastore/scripts/upgrade/mysql/upgrade-0.13.0-to-0.14.0.mysql.sql metastore/scripts/upgrade/mysql/hive-schema-0.14.0.mysql.sql metastore/scripts/upgrade/oracle/upgrade-0.13.0-to-0.14.0.oracle.sql metastore/scripts/upgrade/oracle/hive-schema-0.14.0.oracle.sql metastore/scripts/upgrade/postgres/upgrade-0.13.0-to-0.14.0.postgres.sql metastore/scripts/upgrade/postgres/hive-schema-0.14.0.postgres.sql itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update U ql/src/test/queries/clientpositive/mapjoin_mapjoin.q U ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out U ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java U ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java U ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java U ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java Fetching external item into 'hcatalog/src/test/e2e/harness' Updated external to revision 1575376. Updated to revision 1575376. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12633240

            Ran some regular and some tez tests, they passed. Will wait for QA and run more tez tests

            sershe Sergey Shelukhin added a comment - Ran some regular and some tez tests, they passed. Will wait for QA and run more tez tests

            all tez tests passed, some explain plans changed in details that should be unrelated (like column names), and ordering changed in one file.
            I will see if trunk files need to be updated again, and/or if ordering needs to be enforced

            sershe Sergey Shelukhin added a comment - all tez tests passed, some explain plans changed in details that should be unrelated (like column names), and ordering changed in one file. I will see if trunk files need to be updated again, and/or if ordering needs to be enforced

            gopalv hagleitn this patch is ready for review... if you were looking for good weekend reading

            sershe Sergey Shelukhin added a comment - gopalv hagleitn this patch is ready for review... if you were looking for good weekend reading
            hiveqa Hive QA added a comment -

            Overall: -1 at least one tests failed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12633496/HIVE-6430.patch

            ERROR: -1 due to 2 failed/errored test(s), 5373 tests executed
            Failed tests:

            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
            

            Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1682/testReport
            Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1682/console

            Messages:

            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 2 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12633496

            hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12633496/HIVE-6430.patch ERROR: -1 due to 2 failed/errored test(s), 5373 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1682/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1682/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12633496

            Both of these tests pass for me... looks unrelated. After review feedback update they can rerun

            sershe Sergey Shelukhin added a comment - Both of these tests pass for me... looks unrelated. After review feedback update they can rerun

            most of RB feedback except for refactor, need to discuss... also added ascii art to comments and one more memory optimization to truncate the array, after initial tests.

            sershe Sergey Shelukhin added a comment - most of RB feedback except for refactor, need to discuss... also added ascii art to comments and one more memory optimization to truncate the array, after initial tests.

            the test changes were not intentional... merged wrong branch

            sershe Sergey Shelukhin added a comment - the test changes were not intentional... merged wrong branch

            Addressed all CR feedback, but patch still fails some Tez tests. Will address tomorrow.

            Meanwhile, can you review common code (I may separate it into different patch), so that we could perhaps put this into Hive 13 in disabled form?

            sershe Sergey Shelukhin added a comment - Addressed all CR feedback, but patch still fails some Tez tests. Will address tomorrow. Meanwhile, can you review common code (I may separate it into different patch), so that we could perhaps put this into Hive 13 in disabled form?

            Addressed major review and discussion feedback. I kept the list bit in the ref though, because putting it in the array results in huge pita w/retrieval of the union. Removed the "split" long, now everything is in one place.
            Probably need to write some unit tests, q files do not cover all cases. Will do so later today or maybe sunday

            sershe Sergey Shelukhin added a comment - Addressed major review and discussion feedback. I kept the list bit in the ref though, because putting it in the array results in huge pita w/retrieval of the union. Removed the "split" long, now everything is in one place. Probably need to write some unit tests, q files do not cover all cases. Will do so later today or maybe sunday

            Some Tez tests passed, running others

            sershe Sergey Shelukhin added a comment - Some Tez tests passed, running others

            forgot one todo, remove unnecessary method

            sershe Sergey Shelukhin added a comment - forgot one todo, remove unnecessary method
            leftyl Lefty Leverenz added a comment -

            This adds config parameter hive.mapjoin.optimized.hashtable to HiveConf.java but doesn't give a description in hive-default.xml.template or a HiveConf.java comment.

            HIVE-6037 is going to change HiveConf.java and start generating hive-default.xml.template from HiveConf.java, so I suggest putting the parameter description in a jira release note. Then it can be added to the new version of HiveConf.java after HIVE-6037 gets committed.

            leftyl Lefty Leverenz added a comment - This adds config parameter hive.mapjoin.optimized.hashtable to HiveConf.java but doesn't give a description in hive-default.xml.template or a HiveConf.java comment. HIVE-6037 is going to change HiveConf.java and start generating hive-default.xml.template from HiveConf.java, so I suggest putting the parameter description in a jira release note. Then it can be added to the new version of HiveConf.java after HIVE-6037 gets committed.
            hiveqa Hive QA added a comment -

            Overall: -1 no tests executed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12634889/HIVE-6430.03.patch

            Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1825/testReport
            Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1825/console

            Messages:

            **** This message was trimmed, see log for full details ****
            [INFO] Using 'UTF-8' encoding to copy filtered resources.
            [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hwi/src/test/resources
            [INFO] Copying 3 resources
            [INFO] 
            [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hwi ---
            [INFO] Executing tasks
            
            main:
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/warehouse
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf
                 [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf
            [INFO] Executed tasks
            [INFO] 
            [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-hwi ---
            [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/test-classes
            [INFO] 
            [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hwi ---
            [INFO] Tests are skipped.
            [INFO] 
            [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hwi ---
            [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar
            [INFO] 
            [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-hwi ---
            [INFO] 
            [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hwi ---
            [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.jar
            [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.pom
            [INFO]                                                                         
            [INFO] ------------------------------------------------------------------------
            [INFO] Building Hive ODBC 0.14.0-SNAPSHOT
            [INFO] ------------------------------------------------------------------------
            [INFO] 
            [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-odbc ---
            [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/odbc (includes = [datanucleus.log, derby.log], excludes = [])
            [INFO] 
            [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-odbc ---
            [INFO] 
            [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-odbc ---
            [INFO] Executing tasks
            
            main:
            [INFO] Executed tasks
            [INFO] 
            [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-odbc ---
            [INFO] Executing tasks
            
            main:
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/warehouse
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf
                 [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf
            [INFO] Executed tasks
            [INFO] 
            [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-odbc ---
            [INFO] 
            [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-odbc ---
            [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/odbc/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-odbc/0.14.0-SNAPSHOT/hive-odbc-0.14.0-SNAPSHOT.pom
            [INFO]                                                                         
            [INFO] ------------------------------------------------------------------------
            [INFO] Building Hive Shims Aggregator 0.14.0-SNAPSHOT
            [INFO] ------------------------------------------------------------------------
            [INFO] 
            [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims-aggregator ---
            [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims (includes = [datanucleus.log, derby.log], excludes = [])
            [INFO] 
            [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims-aggregator ---
            [INFO] 
            [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims-aggregator ---
            [INFO] Executing tasks
            
            main:
            [INFO] Executed tasks
            [INFO] 
            [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-aggregator ---
            [INFO] Executing tasks
            
            main:
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/warehouse
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp/conf
                 [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp/conf
            [INFO] Executed tasks
            [INFO] 
            [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-shims-aggregator ---
            [INFO] 
            [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims-aggregator ---
            [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/shims/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-shims-aggregator/0.14.0-SNAPSHOT/hive-shims-aggregator-0.14.0-SNAPSHOT.pom
            [INFO]                                                                         
            [INFO] ------------------------------------------------------------------------
            [INFO] Building Hive TestUtils 0.14.0-SNAPSHOT
            [INFO] ------------------------------------------------------------------------
            [INFO] 
            [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-testutils ---
            [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/testutils (includes = [datanucleus.log, derby.log], excludes = [])
            [INFO] 
            [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-testutils ---
            [INFO] 
            [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-testutils ---
            [INFO] Using 'UTF-8' encoding to copy filtered resources.
            [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/testutils/src/main/resources
            [INFO] Copying 3 resources
            [INFO] 
            [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-testutils ---
            [INFO] Executing tasks
            
            main:
            [INFO] Executed tasks
            [INFO] 
            [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-testutils ---
            [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/classes
            [INFO] 
            [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-testutils ---
            [INFO] Using 'UTF-8' encoding to copy filtered resources.
            [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/testutils/src/test/resources
            [INFO] Copying 3 resources
            [INFO] 
            [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-testutils ---
            [INFO] Executing tasks
            
            main:
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/warehouse
                [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp/conf
                 [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp/conf
            [INFO] Executed tasks
            [INFO] 
            [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-testutils ---
            [INFO] No sources to compile
            [INFO] 
            [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-testutils ---
            [INFO] Tests are skipped.
            [INFO] 
            [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-testutils ---
            [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/hive-testutils-0.14.0-SNAPSHOT.jar
            [INFO] 
            [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-testutils ---
            [INFO] 
            [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-testutils ---
            [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/hive-testutils-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-testutils/0.14.0-SNAPSHOT/hive-testutils-0.14.0-SNAPSHOT.jar
            [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/testutils/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-testutils/0.14.0-SNAPSHOT/hive-testutils-0.14.0-SNAPSHOT.pom
            [INFO]                                                                         
            [INFO] ------------------------------------------------------------------------
            [INFO] Building Hive Packaging 0.14.0-SNAPSHOT
            [INFO] ------------------------------------------------------------------------
            Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/maven-metadata.xml
            Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/hive-hcatalog-hbase-storage-handler-0.14.0-SNAPSHOT.pom
            [WARNING] The POM for org.apache.hive.hcatalog:hive-hcatalog-hbase-storage-handler:jar:0.14.0-SNAPSHOT is missing, no dependency information available
            Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/hive-hcatalog-hbase-storage-handler-0.14.0-SNAPSHOT.jar
            [INFO] ------------------------------------------------------------------------
            [INFO] Reactor Summary:
            [INFO] 
            [INFO] Hive .............................................. SUCCESS [8.715s]
            [INFO] Hive Ant Utilities ................................ SUCCESS [5.435s]
            [INFO] Hive Shims Common ................................. SUCCESS [3.745s]
            [INFO] Hive Shims 0.20 ................................... SUCCESS [2.563s]
            [INFO] Hive Shims Secure Common .......................... SUCCESS [4.273s]
            [INFO] Hive Shims 0.20S .................................. SUCCESS [2.567s]
            [INFO] Hive Shims 0.23 ................................... SUCCESS [7.933s]
            [INFO] Hive Shims ........................................ SUCCESS [1.228s]
            [INFO] Hive Common ....................................... SUCCESS [6.888s]
            [INFO] Hive Serde ........................................ SUCCESS [10.418s]
            [INFO] Hive Metastore .................................... SUCCESS [35.599s]
            [INFO] Hive Query Language ............................... SUCCESS [1:10.690s]
            [INFO] Hive Service ...................................... SUCCESS [7.930s]
            [INFO] Hive JDBC ......................................... SUCCESS [3.004s]
            [INFO] Hive Beeline ...................................... SUCCESS [2.789s]
            [INFO] Hive CLI .......................................... SUCCESS [1.823s]
            [INFO] Hive Contrib ...................................... SUCCESS [2.640s]
            [INFO] Hive HBase Handler ................................ SUCCESS [2.594s]
            [INFO] Hive HCatalog ..................................... SUCCESS [0.545s]
            [INFO] Hive HCatalog Core ................................ SUCCESS [2.355s]
            [INFO] Hive HCatalog Pig Adapter ......................... SUCCESS [2.462s]
            [INFO] Hive HCatalog Server Extensions ................... SUCCESS [1.779s]
            [INFO] Hive HCatalog Webhcat Java Client ................. SUCCESS [1.624s]
            [INFO] Hive HCatalog Webhcat ............................. SUCCESS [9.865s]
            [INFO] Hive HWI .......................................... SUCCESS [1.245s]
            [INFO] Hive ODBC ......................................... SUCCESS [0.829s]
            [INFO] Hive Shims Aggregator ............................. SUCCESS [0.209s]
            [INFO] Hive TestUtils .................................... SUCCESS [0.640s]
            [INFO] Hive Packaging .................................... FAILURE [1.763s]
            [INFO] ------------------------------------------------------------------------
            [INFO] BUILD FAILURE
            [INFO] ------------------------------------------------------------------------
            [INFO] Total time: 3:28.664s
            [INFO] Finished at: Sat Mar 15 06:39:11 EDT 2014
            [INFO] Final Memory: 74M/461M
            [INFO] ------------------------------------------------------------------------
            [ERROR] Failed to execute goal on project hive-packaging: Could not resolve dependencies for project org.apache.hive:hive-packaging:pom:0.14.0-SNAPSHOT: Could not find artifact org.apache.hive.hcatalog:hive-hcatalog-hbase-storage-handler:jar:0.14.0-SNAPSHOT in apache.snapshots (http://repository.apache.org/snapshots) -> [Help 1]
            [ERROR] 
            [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
            [ERROR] Re-run Maven using the -X switch to enable full debug logging.
            [ERROR] 
            [ERROR] For more information about the errors and possible solutions, please read the following articles:
            [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
            [ERROR] 
            [ERROR] After correcting the problems, you can resume the build with the command
            [ERROR]   mvn <goals> -rf :hive-packaging
            + exit 1
            '
            

            This message is automatically generated.

            ATTACHMENT ID: 12634889

            hiveqa Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12634889/HIVE-6430.03.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1825/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1825/console Messages: **** This message was trimmed, see log for full details **** [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hwi/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hwi --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-hwi --- [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hwi --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hwi --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-hwi --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hwi --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.pom [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Hive ODBC 0.14.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-odbc --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/odbc (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-odbc --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-odbc --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-odbc --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-odbc --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-odbc --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/odbc/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-odbc/0.14.0-SNAPSHOT/hive-odbc-0.14.0-SNAPSHOT.pom [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Hive Shims Aggregator 0.14.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims-aggregator --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims-aggregator --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims-aggregator --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-aggregator --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-shims-aggregator --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims-aggregator --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/shims/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-shims-aggregator/0.14.0-SNAPSHOT/hive-shims-aggregator-0.14.0-SNAPSHOT.pom [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Hive TestUtils 0.14.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-testutils --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/testutils (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-testutils --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-testutils --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/testutils/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-testutils --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-testutils --- [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/classes [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-testutils --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/testutils/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-testutils --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-testutils --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-testutils --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-testutils --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/hive-testutils-0.14.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-testutils --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-testutils --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/hive-testutils-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-testutils/0.14.0-SNAPSHOT/hive-testutils-0.14.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/testutils/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-testutils/0.14.0-SNAPSHOT/hive-testutils-0.14.0-SNAPSHOT.pom [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Hive Packaging 0.14.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/maven-metadata.xml Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/hive-hcatalog-hbase-storage-handler-0.14.0-SNAPSHOT.pom [WARNING] The POM for org.apache.hive.hcatalog:hive-hcatalog-hbase-storage-handler:jar:0.14.0-SNAPSHOT is missing, no dependency information available Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/hive-hcatalog-hbase-storage-handler-0.14.0-SNAPSHOT.jar [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Hive .............................................. SUCCESS [8.715s] [INFO] Hive Ant Utilities ................................ SUCCESS [5.435s] [INFO] Hive Shims Common ................................. SUCCESS [3.745s] [INFO] Hive Shims 0.20 ................................... SUCCESS [2.563s] [INFO] Hive Shims Secure Common .......................... SUCCESS [4.273s] [INFO] Hive Shims 0.20S .................................. SUCCESS [2.567s] [INFO] Hive Shims 0.23 ................................... SUCCESS [7.933s] [INFO] Hive Shims ........................................ SUCCESS [1.228s] [INFO] Hive Common ....................................... SUCCESS [6.888s] [INFO] Hive Serde ........................................ SUCCESS [10.418s] [INFO] Hive Metastore .................................... SUCCESS [35.599s] [INFO] Hive Query Language ............................... SUCCESS [1:10.690s] [INFO] Hive Service ...................................... SUCCESS [7.930s] [INFO] Hive JDBC ......................................... SUCCESS [3.004s] [INFO] Hive Beeline ...................................... SUCCESS [2.789s] [INFO] Hive CLI .......................................... SUCCESS [1.823s] [INFO] Hive Contrib ...................................... SUCCESS [2.640s] [INFO] Hive HBase Handler ................................ SUCCESS [2.594s] [INFO] Hive HCatalog ..................................... SUCCESS [0.545s] [INFO] Hive HCatalog Core ................................ SUCCESS [2.355s] [INFO] Hive HCatalog Pig Adapter ......................... SUCCESS [2.462s] [INFO] Hive HCatalog Server Extensions ................... SUCCESS [1.779s] [INFO] Hive HCatalog Webhcat Java Client ................. SUCCESS [1.624s] [INFO] Hive HCatalog Webhcat ............................. SUCCESS [9.865s] [INFO] Hive HWI .......................................... SUCCESS [1.245s] [INFO] Hive ODBC ......................................... SUCCESS [0.829s] [INFO] Hive Shims Aggregator ............................. SUCCESS [0.209s] [INFO] Hive TestUtils .................................... SUCCESS [0.640s] [INFO] Hive Packaging .................................... FAILURE [1.763s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:28.664s [INFO] Finished at: Sat Mar 15 06:39:11 EDT 2014 [INFO] Final Memory: 74M/461M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project hive-packaging: Could not resolve dependencies for project org.apache.hive:hive-packaging:pom:0.14.0-SNAPSHOT: Could not find artifact org.apache.hive.hcatalog:hive-hcatalog-hbase-storage-handler:jar:0.14.0-SNAPSHOT in apache.snapshots (http://repository.apache.org/snapshots) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :hive-packaging + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12634889

            Add unit test, fixes

            sershe Sergey Shelukhin added a comment - Add unit test, fixes
            hiveqa Hive QA added a comment -

            Overall: -1 at least one tests failed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12635047/HIVE-6430.04.patch

            ERROR: -1 due to 2 failed/errored test(s), 5417 tests executed
            Failed tests:

            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
            

            Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1867/testReport
            Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1867/console

            Messages:

            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 2 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12635047

            hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12635047/HIVE-6430.04.patch ERROR: -1 due to 2 failed/errored test(s), 5417 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1867/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1867/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12635047

            rebase, incorporate not enabling for decimal

            sershe Sergey Shelukhin added a comment - rebase, incorporate not enabling for decimal

            Finally fixed last glitches and got some memory numbers. Next, I will try on some queries on a real cluster...

            On standard tables (over10k data file), we join the entire table with 7k rows of the same, on one column, resulting in only 407 unique keys. Each row contains 3 columns from the joined table.
            Note that the "from" case uses LazyFlatRowContainer, so this is on top of gain from HIVE-6418.

            The usage goes from:

            Class Objects Shallow Size Retained Size
            org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper 1 32 880632
            java.util.HashMap 2 96 880560
            java.util.HashMap$Entry[] 2 65632 880464
            java.util.HashMap$Entry 407 13024 814832
            java.lang.Object[] 810 101008 785488
            org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer 405 9720 775768
            org.apache.hadoop.io.Text 7000 168000 394760
            byte[] 7001 226776 226776
            org.apache.hadoop.hive.serde2.io.DoubleWritable 7000 168000 168000
            org.apache.hadoop.io.IntWritable 7000 112000 112000
            org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject 405 6480 25920
            org.apache.hadoop.io.LongWritable 405 9720 9720
            java.lang.String 2 64 120
            char[] 2 56 56
            org.apache.hadoop.hive.serde2.ByteStream$Output 1 24 40

            To:

            Class Objects Shallow Size Retained Size
            org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer 1 32 340664
            org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap 1 48 340392
            java.util.ArrayList 4 96 209344
            java.lang.Object[] 6 152 209304
            org.apache.hadoop.hive.serde2.WriteBuffers 1 56 209256
            byte[] 1 209152 209152
            long[] 1 131088 131088
            org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$KeyValueWriter 1 40 200

            That is 61% reduction on top of HIVE-6418.

            If the join is on 4 columns (to increase number of unique keys to 7000, one row per key), it goes from:

            Class Objects Shallow Size Retained Size
            org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper 1 32 2196624
            java.util.HashMap 2 96 2196552
            java.util.HashMap$Entry[] 2 65632 2196456
            java.util.HashMap$Entry 7002 224064 2130824
            java.lang.Object[] 13999 447968 1626656
            org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer 7000 168000 1066760
            org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject 6999 111984 839880
            org.apache.hadoop.io.Text 7000 168000 394760
            byte[] 7001 226776 226776
            org.apache.hadoop.io.IntWritable 13999 223984 223984
            org.apache.hadoop.hive.serde2.io.DoubleWritable 7000 168000 168000
            org.apache.hadoop.io.LongWritable 6999 167976 167976
            org.apache.hadoop.hive.serde2.io.ByteWritable 6999 111984 111984
            org.apache.hadoop.hive.serde2.io.ShortWritable 6999 111984 111984
            java.lang.String 2 64 120
            char[] 2 56 56
            org.apache.hadoop.hive.serde2.ByteStream$Output 1 24 40

            To:

            Class Objects Shallow Size Retained Size
            org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer 1 32 452976
            org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap 1 48 452688
            java.util.ArrayList 4 96 321648
            java.lang.Object[] 6 168 321616
            org.apache.hadoop.hive.serde2.WriteBuffers 1 56 321552
            byte[] 1 321448 321448
            long[] 1 131088 131088
            org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$KeyValueWriter 1 40 216

            That is 79% reduction on top of HIVE-6418, or roughly 5 times smaller (this is a rather favorable case though).

            sershe Sergey Shelukhin added a comment - Finally fixed last glitches and got some memory numbers. Next, I will try on some queries on a real cluster... On standard tables (over10k data file), we join the entire table with 7k rows of the same, on one column, resulting in only 407 unique keys. Each row contains 3 columns from the joined table. Note that the "from" case uses LazyFlatRowContainer, so this is on top of gain from HIVE-6418 . The usage goes from: Class Objects Shallow Size Retained Size org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper 1 32 880632 java.util.HashMap 2 96 880560 java.util.HashMap$Entry[] 2 65632 880464 java.util.HashMap$Entry 407 13024 814832 java.lang.Object[] 810 101008 785488 org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer 405 9720 775768 org.apache.hadoop.io.Text 7000 168000 394760 byte[] 7001 226776 226776 org.apache.hadoop.hive.serde2.io.DoubleWritable 7000 168000 168000 org.apache.hadoop.io.IntWritable 7000 112000 112000 org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject 405 6480 25920 org.apache.hadoop.io.LongWritable 405 9720 9720 java.lang.String 2 64 120 char[] 2 56 56 org.apache.hadoop.hive.serde2.ByteStream$Output 1 24 40 To: Class Objects Shallow Size Retained Size org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer 1 32 340664 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap 1 48 340392 java.util.ArrayList 4 96 209344 java.lang.Object[] 6 152 209304 org.apache.hadoop.hive.serde2.WriteBuffers 1 56 209256 byte[] 1 209152 209152 long[] 1 131088 131088 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$KeyValueWriter 1 40 200 That is 61% reduction on top of HIVE-6418 . If the join is on 4 columns (to increase number of unique keys to 7000, one row per key), it goes from: Class Objects Shallow Size Retained Size org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper 1 32 2196624 java.util.HashMap 2 96 2196552 java.util.HashMap$Entry[] 2 65632 2196456 java.util.HashMap$Entry 7002 224064 2130824 java.lang.Object[] 13999 447968 1626656 org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer 7000 168000 1066760 org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject 6999 111984 839880 org.apache.hadoop.io.Text 7000 168000 394760 byte[] 7001 226776 226776 org.apache.hadoop.io.IntWritable 13999 223984 223984 org.apache.hadoop.hive.serde2.io.DoubleWritable 7000 168000 168000 org.apache.hadoop.io.LongWritable 6999 167976 167976 org.apache.hadoop.hive.serde2.io.ByteWritable 6999 111984 111984 org.apache.hadoop.hive.serde2.io.ShortWritable 6999 111984 111984 java.lang.String 2 64 120 char[] 2 56 56 org.apache.hadoop.hive.serde2.ByteStream$Output 1 24 40 To: Class Objects Shallow Size Retained Size org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer 1 32 452976 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap 1 48 452688 java.util.ArrayList 4 96 321648 java.lang.Object[] 6 168 321616 org.apache.hadoop.hive.serde2.WriteBuffers 1 56 321552 byte[] 1 321448 321448 long[] 1 131088 131088 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$KeyValueWriter 1 40 216 That is 79% reduction on top of HIVE-6418 , or roughly 5 times smaller (this is a rather favorable case though).

            Fix missing call to seal, some minor stuff

            sershe Sergey Shelukhin added a comment - Fix missing call to seal, some minor stuff
            hiveqa Hive QA added a comment -

            Overall: -1 at least one tests failed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12635910/HIVE-6430.06.patch

            ERROR: -1 due to 1 failed/errored test(s), 5445 tests executed
            Failed tests:

            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
            

            Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1900/testReport
            Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1900/console

            Messages:

            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 1 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12635910

            hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12635910/HIVE-6430.06.patch ERROR: -1 due to 1 failed/errored test(s), 5445 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1900/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1900/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12635910

            gopalv do you want to finish review when you have time?

            sershe Sergey Shelukhin added a comment - gopalv do you want to finish review when you have time?

            Tested the patch on real queries. I do see huge memory reduction (modified TPCDS query 72, worst map task goes from 7Gb to ~1.2Gb dump after populating hash tables, I'll need to download the dumps to analyze but it's pretty clear cut); and GC time counter goes down from ~1min total to few seconds, as expected, but I also see huge wall clock time increase (without corresponding CPU time increase it looks like) during processing. I would expect some tradeoff but not as much as I'm seeing... will profile more.

            sershe Sergey Shelukhin added a comment - Tested the patch on real queries. I do see huge memory reduction (modified TPCDS query 72, worst map task goes from 7Gb to ~1.2Gb dump after populating hash tables, I'll need to download the dumps to analyze but it's pretty clear cut); and GC time counter goes down from ~1min total to few seconds, as expected, but I also see huge wall clock time increase (without corresponding CPU time increase it looks like) during processing. I would expect some tradeoff but not as much as I'm seeing... will profile more.

            Resize has an epic bug, cannot rely on slot being part of the hash because of probing... that was pretty silly.
            I think this also causes some of perf degradation because table does get rehashed and it may screw it up completely (I ran the query that returns no results so it wouldn't clutter my shell, good thinking there).

            sershe Sergey Shelukhin added a comment - Resize has an epic bug, cannot rely on slot being part of the hash because of probing... that was pretty silly. I think this also causes some of perf degradation because table does get rehashed and it may screw it up completely (I ran the query that returns no results so it wouldn't clutter my shell, good thinking there).

            Patch that fixes some issues, main thing is that Murmur hash from guava is used; hashing behavior is very bad with previous hash code method and perf suffers a lot.
            There's also an issue with previously used expand method. To make expand fast, hash is now stored fully. This is not necessary for anything else so it's a tradeoff - more memory (+4 bytes per key) or expensive rehash. We may do it later.
            Fast paths were added to WriteBuffers for the majority of cases where whatever we are doing is all in one buffer. There's some bug in there that causes some queries to fail, I'll investigate... want to UL patch with what is done, the queries with large map joins that do work now run approximately as fast as before (will later measure more precisely) in a fraction of memory.

            sershe Sergey Shelukhin added a comment - Patch that fixes some issues, main thing is that Murmur hash from guava is used; hashing behavior is very bad with previous hash code method and perf suffers a lot. There's also an issue with previously used expand method. To make expand fast, hash is now stored fully. This is not necessary for anything else so it's a tradeoff - more memory (+4 bytes per key) or expensive rehash. We may do it later. Fast paths were added to WriteBuffers for the majority of cases where whatever we are doing is all in one buffer. There's some bug in there that causes some queries to fail, I'll investigate... want to UL patch with what is done, the queries with large map joins that do work now run approximately as fast as before (will later measure more precisely) in a fraction of memory.

            This is an excellent find!

            The hash collision scenario seems to be affecting the regular hashmap cases as well.

            I flipped over the MapJoinKeyBytes::hashCode() to an inlined murmur, which resulted in a ~2 seconds savings to my map tasks.

            gopalv Gopal Vijayaraghavan added a comment - This is an excellent find! The hash collision scenario seems to be affecting the regular hashmap cases as well. I flipped over the MapJoinKeyBytes::hashCode() to an inlined murmur, which resulted in a ~2 seconds savings to my map tasks.

            We should probably do the same in actual codebase... I'll file a JIRA

            sershe Sergey Shelukhin added a comment - We should probably do the same in actual codebase... I'll file a JIRA

            Fixed bugs, improved tests; TPCDS q27 now can run on the cluster I have access to (fails with OOM even with 8Gb containers). Profiling the results are actually much better now, little own time for the hashmap.

            sershe Sergey Shelukhin added a comment - Fixed bugs, improved tests; TPCDS q27 now can run on the cluster I have access to (fails with OOM even with 8Gb containers). Profiling the results are actually much better now, little own time for the hashmap.

            er, 72

            sershe Sergey Shelukhin added a comment - er, 72

            This replaces guava murmurhash with inline one, and adds (untested) serialization bypass for serdes (testing fast query, hash and byte copies in serdes are the most prominent differences in my profiled runs). Unfortunately, for the latter I've discovered that keys given to us are serialized using BinarySortableSerDe because they come from ReduceSinkOperator. Will need to sync w/Gunther tomorrow on this. Most likely outcome is that we'll change the tez hashtable output to lazy serde, so we could just copy bytes. Alternative would be to change key serialization to binarysortable, but that's ugly because values would stay on lazybinary so we will have two paths. Plus bunch of changes will be required to binarysortable to not have byte copies again, and use RandomAccessOutput instead of its OutputBuffer thing. Yet another alternative is to do bypass only for values, not keys.

            Regardless, I think we should be committing this patch soon (even if off by default), and doing additional improvements in separate jiras.
            It's growing too big.

            sershe Sergey Shelukhin added a comment - This replaces guava murmurhash with inline one, and adds (untested) serialization bypass for serdes (testing fast query, hash and byte copies in serdes are the most prominent differences in my profiled runs). Unfortunately, for the latter I've discovered that keys given to us are serialized using BinarySortableSerDe because they come from ReduceSinkOperator. Will need to sync w/Gunther tomorrow on this. Most likely outcome is that we'll change the tez hashtable output to lazy serde, so we could just copy bytes. Alternative would be to change key serialization to binarysortable, but that's ugly because values would stay on lazybinary so we will have two paths. Plus bunch of changes will be required to binarysortable to not have byte copies again, and use RandomAccessOutput instead of its OutputBuffer thing. Yet another alternative is to do bypass only for values, not keys. Regardless, I think we should be committing this patch soon (even if off by default), and doing additional improvements in separate jiras. It's growing too big.

            LazySerde is not sortable, at least as far as I know - this is why the Reduce Sink produces binary sortables.

            gopalv Gopal Vijayaraghavan added a comment - LazySerde is not sortable, at least as far as I know - this is why the Reduce Sink produces binary sortables.

            That comment above probably didn't parse - but the usage of lazy keys make it impossible to generate a min-max range (or >1 ranges) from the hashtable.

            gopalv Gopal Vijayaraghavan added a comment - That comment above probably didn't parse - but the usage of lazy keys make it impossible to generate a min-max range (or >1 ranges) from the hashtable.
            hiveqa Hive QA added a comment -

            Overall: -1 no tests executed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12641640/HIVE-6430.09.patch

            Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/31/testReport
            Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/31/console

            Messages:

            **** This message was trimmed, see log for full details ****
            As a result, alternative(s) 2 were disabled for that input
            warning(200): IdentifiersParser.g:68:4: 
            Decision can match input such as "LPAREN KW_NULL BITWISEOR" using multiple alternatives: 1, 2
            
            As a result, alternative(s) 2 were disabled for that input
            warning(200): IdentifiersParser.g:68:4: 
            Decision can match input such as "LPAREN CharSetName CharSetLiteral" using multiple alternatives: 1, 2
            
            As a result, alternative(s) 2 were disabled for that input
            warning(200): IdentifiersParser.g:68:4: 
            Decision can match input such as "LPAREN KW_NULL NOTEQUAL" using multiple alternatives: 1, 2
            
            As a result, alternative(s) 2 were disabled for that input
            warning(200): IdentifiersParser.g:115:5: 
            Decision can match input such as "KW_CLUSTER KW_BY LPAREN" using multiple alternatives: 1, 2
            
            As a result, alternative(s) 2 were disabled for that input
            warning(200): IdentifiersParser.g:127:5: 
            Decision can match input such as "KW_PARTITION KW_BY LPAREN" using multiple alternatives: 1, 2
            
            As a result, alternative(s) 2 were disabled for that input
            warning(200): IdentifiersParser.g:138:5: 
            Decision can match input such as "KW_DISTRIBUTE KW_BY LPAREN" using multiple alternatives: 1, 2
            
            As a result, alternative(s) 2 were disabled for that input
            warning(200): IdentifiersParser.g:149:5: 
            Decision can match input such as "KW_SORT KW_BY LPAREN" using multiple alternatives: 1, 2
            
            As a result, alternative(s) 2 were disabled for that input
            warning(200): IdentifiersParser.g:166:7: 
            Decision can match input such as "STAR" using multiple alternatives: 1, 2
            
            As a result, alternative(s) 2 were disabled for that input
            warning(200): IdentifiersParser.g:179:5: 
            Decision can match input such as "KW_STRUCT" using multiple alternatives: 4, 6
            
            As a result, alternative(s) 6 were disabled for that input
            warning(200): IdentifiersParser.g:179:5: 
            Decision can match input such as "KW_ARRAY" using multiple alternatives: 2, 6
            
            As a result, alternative(s) 6 were disabled for that input
            warning(200): IdentifiersParser.g:179:5: 
            Decision can match input such as "KW_UNIONTYPE" using multiple alternatives: 5, 6
            
            As a result, alternative(s) 6 were disabled for that input
            warning(200): IdentifiersParser.g:261:5: 
            Decision can match input such as "KW_TRUE" using multiple alternatives: 3, 8
            
            As a result, alternative(s) 8 were disabled for that input
            warning(200): IdentifiersParser.g:261:5: 
            Decision can match input such as "KW_DATE StringLiteral" using multiple alternatives: 2, 3
            
            As a result, alternative(s) 3 were disabled for that input
            warning(200): IdentifiersParser.g:261:5: 
            Decision can match input such as "KW_NULL" using multiple alternatives: 1, 8
            
            As a result, alternative(s) 8 were disabled for that input
            warning(200): IdentifiersParser.g:261:5: 
            Decision can match input such as "KW_FALSE" using multiple alternatives: 3, 8
            
            As a result, alternative(s) 8 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "KW_BETWEEN KW_MAP LPAREN" using multiple alternatives: 8, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_ORDER KW_BY" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER KW_BY" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_UNION KW_ALL" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:393:5: 
            Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO" using multiple alternatives: 2, 9
            
            As a result, alternative(s) 9 were disabled for that input
            warning(200): IdentifiersParser.g:518:5: 
            Decision can match input such as "{AMPERSAND..BITWISEXOR, DIV..DIVIDE, EQUAL..EQUAL_NS, GREATERTHAN..GREATERTHANOREQUALTO, KW_AND, KW_ARRAY, KW_BETWEEN..KW_BOOLEAN, KW_CASE, KW_DOUBLE, KW_FLOAT, KW_IF, KW_IN, KW_INT, KW_LIKE, KW_MAP, KW_NOT, KW_OR, KW_REGEXP, KW_RLIKE, KW_SMALLINT, KW_STRING..KW_STRUCT, KW_TINYINT, KW_UNIONTYPE, KW_WHEN, LESSTHAN..LESSTHANOREQUALTO, MINUS..NOTEQUAL, PLUS, STAR, TILDE}" using multiple alternatives: 1, 3
            
            As a result, alternative(s) 3 were disabled for that input
            [INFO] 
            [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-exec ---
            [INFO] 
            [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-exec ---
            [INFO] Using 'UTF-8' encoding to copy filtered resources.
            [INFO] Copying 1 resource
            [INFO] Copying 3 resources
            [INFO] 
            [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-exec ---
            [INFO] Executing tasks
            
            main:
            [INFO] Executed tasks
            [INFO] 
            [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-exec ---
            [INFO] Compiling 1687 source files to /data/hive-ptest/working/apache-svn-trunk-source/ql/target/classes
            [INFO] -------------------------------------------------------------
            [WARNING] COMPILATION WARNING : 
            [INFO] -------------------------------------------------------------
            [WARNING] Note: Some input files use or override a deprecated API.
            [WARNING] Note: Recompile with -Xlint:deprecation for details.
            [WARNING] Note: Some input files use unchecked or unsafe operations.
            [WARNING] Note: Recompile with -Xlint:unchecked for details.
            [INFO] 4 warnings 
            [INFO] -------------------------------------------------------------
            [INFO] -------------------------------------------------------------
            [ERROR] COMPILATION ERROR : 
            [INFO] -------------------------------------------------------------
            [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,27] cannot find symbol
            symbol  : variable tmpSerDe
            location: class org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer
            [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,12] internal error; cannot instantiate org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor.<init> at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor to ()
            [INFO] 2 errors 
            [INFO] -------------------------------------------------------------
            [INFO] ------------------------------------------------------------------------
            [INFO] Reactor Summary:
            [INFO] 
            [INFO] Hive .............................................. SUCCESS [9.353s]
            [INFO] Hive Ant Utilities ................................ SUCCESS [5.818s]
            [INFO] Hive Shims Common ................................. SUCCESS [3.953s]
            [INFO] Hive Shims 0.20 ................................... SUCCESS [2.640s]
            [INFO] Hive Shims Secure Common .......................... SUCCESS [4.766s]
            [INFO] Hive Shims 0.20S .................................. SUCCESS [2.439s]
            [INFO] Hive Shims 0.23 ................................... SUCCESS [8.866s]
            [INFO] Hive Shims ........................................ SUCCESS [1.196s]
            [INFO] Hive Common ....................................... SUCCESS [13.038s]
            [INFO] Hive Serde ........................................ SUCCESS [10.412s]
            [INFO] Hive Metastore .................................... SUCCESS [34.091s]
            [INFO] Hive Query Language ............................... FAILURE [53.832s]
            [INFO] Hive Service ...................................... SKIPPED
            [INFO] Hive JDBC ......................................... SKIPPED
            [INFO] Hive Beeline ...................................... SKIPPED
            [INFO] Hive CLI .......................................... SKIPPED
            [INFO] Hive Contrib ...................................... SKIPPED
            [INFO] Hive HBase Handler ................................ SKIPPED
            [INFO] Hive HCatalog ..................................... SKIPPED
            [INFO] Hive HCatalog Core ................................ SKIPPED
            [INFO] Hive HCatalog Pig Adapter ......................... SKIPPED
            [INFO] Hive HCatalog Server Extensions ................... SKIPPED
            [INFO] Hive HCatalog Webhcat Java Client ................. SKIPPED
            [INFO] Hive HCatalog Webhcat ............................. SKIPPED
            [INFO] Hive HCatalog Streaming ........................... SKIPPED
            [INFO] Hive HWI .......................................... SKIPPED
            [INFO] Hive ODBC ......................................... SKIPPED
            [INFO] Hive Shims Aggregator ............................. SKIPPED
            [INFO] Hive TestUtils .................................... SKIPPED
            [INFO] Hive Packaging .................................... SKIPPED
            [INFO] ------------------------------------------------------------------------
            [INFO] BUILD FAILURE
            [INFO] ------------------------------------------------------------------------
            [INFO] Total time: 2:35.321s
            [INFO] Finished at: Thu Apr 24 17:10:24 EDT 2014
            [INFO] Final Memory: 56M/629M
            [INFO] ------------------------------------------------------------------------
            [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-exec: Compilation failure: Compilation failure:
            [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,27] cannot find symbol
            [ERROR] symbol  : variable tmpSerDe
            [ERROR] location: class org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer
            [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,12] internal error; cannot instantiate org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor.<init> at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor to ()
            [ERROR] -> [Help 1]
            [ERROR] 
            [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
            [ERROR] Re-run Maven using the -X switch to enable full debug logging.
            [ERROR] 
            [ERROR] For more information about the errors and possible solutions, please read the following articles:
            [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
            [ERROR] 
            [ERROR] After correcting the problems, you can resume the build with the command
            [ERROR]   mvn <goals> -rf :hive-exec
            + exit 1
            '
            

            This message is automatically generated.

            ATTACHMENT ID: 12641640

            hiveqa Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12641640/HIVE-6430.09.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/31/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/31/console Messages: **** This message was trimmed, see log for full details **** As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN KW_NULL BITWISEOR" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN CharSetName CharSetLiteral" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN KW_NULL NOTEQUAL" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:115:5: Decision can match input such as "KW_CLUSTER KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:127:5: Decision can match input such as "KW_PARTITION KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:138:5: Decision can match input such as "KW_DISTRIBUTE KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:149:5: Decision can match input such as "KW_SORT KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:166:7: Decision can match input such as "STAR" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as "KW_STRUCT" using multiple alternatives: 4, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as "KW_ARRAY" using multiple alternatives: 2, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as "KW_UNIONTYPE" using multiple alternatives: 5, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as "KW_TRUE" using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as "KW_DATE StringLiteral" using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as "KW_NULL" using multiple alternatives: 1, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as "KW_FALSE" using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "KW_BETWEEN KW_MAP LPAREN" using multiple alternatives: 8, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_ORDER KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_UNION KW_ALL" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:518:5: Decision can match input such as "{AMPERSAND..BITWISEXOR, DIV..DIVIDE, EQUAL..EQUAL_NS, GREATERTHAN..GREATERTHANOREQUALTO, KW_AND, KW_ARRAY, KW_BETWEEN..KW_BOOLEAN, KW_CASE, KW_DOUBLE, KW_FLOAT, KW_IF, KW_IN, KW_INT, KW_LIKE, KW_MAP, KW_NOT, KW_OR, KW_REGEXP, KW_RLIKE, KW_SMALLINT, KW_STRING..KW_STRUCT, KW_TINYINT, KW_UNIONTYPE, KW_WHEN, LESSTHAN..LESSTHANOREQUALTO, MINUS..NOTEQUAL, PLUS, STAR, TILDE}" using multiple alternatives: 1, 3 As a result, alternative(s) 3 were disabled for that input [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-exec --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-exec --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-exec --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-exec --- [INFO] Compiling 1687 source files to /data/hive-ptest/working/apache-svn-trunk-source/ql/target/classes [INFO] ------------------------------------------------------------- [WARNING] COMPILATION WARNING : [INFO] ------------------------------------------------------------- [WARNING] Note: Some input files use or override a deprecated API. [WARNING] Note: Recompile with -Xlint:deprecation for details. [WARNING] Note: Some input files use unchecked or unsafe operations. [WARNING] Note: Recompile with -Xlint:unchecked for details. [INFO] 4 warnings [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------- [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,27] cannot find symbol symbol : variable tmpSerDe location: class org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,12] internal error; cannot instantiate org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor.<init> at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor to () [INFO] 2 errors [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Hive .............................................. SUCCESS [9.353s] [INFO] Hive Ant Utilities ................................ SUCCESS [5.818s] [INFO] Hive Shims Common ................................. SUCCESS [3.953s] [INFO] Hive Shims 0.20 ................................... SUCCESS [2.640s] [INFO] Hive Shims Secure Common .......................... SUCCESS [4.766s] [INFO] Hive Shims 0.20S .................................. SUCCESS [2.439s] [INFO] Hive Shims 0.23 ................................... SUCCESS [8.866s] [INFO] Hive Shims ........................................ SUCCESS [1.196s] [INFO] Hive Common ....................................... SUCCESS [13.038s] [INFO] Hive Serde ........................................ SUCCESS [10.412s] [INFO] Hive Metastore .................................... SUCCESS [34.091s] [INFO] Hive Query Language ............................... FAILURE [53.832s] [INFO] Hive Service ...................................... SKIPPED [INFO] Hive JDBC ......................................... SKIPPED [INFO] Hive Beeline ...................................... SKIPPED [INFO] Hive CLI .......................................... SKIPPED [INFO] Hive Contrib ...................................... SKIPPED [INFO] Hive HBase Handler ................................ SKIPPED [INFO] Hive HCatalog ..................................... SKIPPED [INFO] Hive HCatalog Core ................................ SKIPPED [INFO] Hive HCatalog Pig Adapter ......................... SKIPPED [INFO] Hive HCatalog Server Extensions ................... SKIPPED [INFO] Hive HCatalog Webhcat Java Client ................. SKIPPED [INFO] Hive HCatalog Webhcat ............................. SKIPPED [INFO] Hive HCatalog Streaming ........................... SKIPPED [INFO] Hive HWI .......................................... SKIPPED [INFO] Hive ODBC ......................................... SKIPPED [INFO] Hive Shims Aggregator ............................. SKIPPED [INFO] Hive TestUtils .................................... SKIPPED [INFO] Hive Packaging .................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2:35.321s [INFO] Finished at: Thu Apr 24 17:10:24 EDT 2014 [INFO] Final Memory: 56M/629M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-exec: Compilation failure: Compilation failure: [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,27] cannot find symbol [ERROR] symbol : variable tmpSerDe [ERROR] location: class org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,12] internal error; cannot instantiate org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor.<init> at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor to () [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :hive-exec + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12641640

            Make bypass work... still has a hack to remove ReduceSinkOp tag on hashtable side. Join-to-mapjoin conversion code is very convoluted, need to get hold of ReduceSink that feeds hashtable values and remove tag output from there reliably. Will read code later. And perf test with this

            sershe Sergey Shelukhin added a comment - Make bypass work... still has a hack to remove ReduceSinkOp tag on hashtable side. Join-to-mapjoin conversion code is very convoluted, need to get hold of ReduceSink that feeds hashtable values and remove tag output from there reliably. Will read code later. And perf test with this
            hiveqa Hive QA added a comment -

            Overall: -1 at least one tests failed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12642056/HIVE-6430.10.patch

            ERROR: -1 due to 46 failed/errored test(s), 5424 tests executed
            Failed tests:

            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tblproperties
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_1
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property
            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat
            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_unset_table_property
            org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testPutGetMultiple
            

            Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/55/testReport
            Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/55/console

            Messages:

            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 46 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12642056

            hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12642056/HIVE-6430.10.patch ERROR: -1 due to 46 failed/errored test(s), 5424 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tblproperties org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_unset_table_property org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testPutGetMultiple Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/55/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/55/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 46 tests failed This message is automatically generated. ATTACHMENT ID: 12642056
            leftyl Lefty Leverenz added a comment -

            This adds hive.mapjoin.optimized.hashtable and hive.mapjoin.optimized.hashtable.wbsize to HiveConf.java. They both need descriptions – I assume "wb" means write buffer.

            The descriptions can go in HiveConf comments or a release note for now, or you can patch hive-default.xml.template and I'll add a comment on HIVE-6586 (for HIVE-6037, Synchronize HiveConf with hive-default.xml.template and support show conf).

            leftyl Lefty Leverenz added a comment - This adds hive.mapjoin.optimized.hashtable and hive.mapjoin.optimized.hashtable.wbsize to HiveConf.java. They both need descriptions – I assume "wb" means write buffer. The descriptions can go in HiveConf comments or a release note for now, or you can patch hive-default.xml.template and I'll add a comment on HIVE-6586 (for HIVE-6037 , Synchronize HiveConf with hive-default.xml.template and support show conf).

            ok, I found another dumb bug in this patch (this time in MJO wiring). It doesn't actually alter the results but causes lots of useless work it seems. I will fix it tomorrow probably.

            sershe Sergey Shelukhin added a comment - ok, I found another dumb bug in this patch (this time in MJO wiring). It doesn't actually alter the results but causes lots of useless work it seems. I will fix it tomorrow probably.

            Meanwhile the serialization bypass appears to work, no more arraycopy. Need to replace byte-removal hack with not tagging in ReduceSink, but after reading code creating reducesinks for this case I think I might have approached the limits of sanity... will also look tomorrow.

            sershe Sergey Shelukhin added a comment - Meanwhile the serialization bypass appears to work, no more arraycopy. Need to replace byte-removal hack with not tagging in ReduceSink, but after reading code creating reducesinks for this case I think I might have approached the limits of sanity... will also look tomorrow.

            Fix all things. The skipTag path actually doesn't work all the time and warning is output in several tez tests. Debugging that code is very difficult, will continue tomorrow. Probably ready to checkin though, we can just remove the tag

            sershe Sergey Shelukhin added a comment - Fix all things. The skipTag path actually doesn't work all the time and warning is output in several tez tests. Debugging that code is very difficult, will continue tomorrow. Probably ready to checkin though, we can just remove the tag

            I will build for my nightly runs with this patch turned on.

            gopalv Gopal Vijayaraghavan added a comment - I will build for my nightly runs with this patch turned on.
            hiveqa Hive QA added a comment -

            Overall: -1 at least one tests failed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12642582/HIVE-6430.11.patch

            ERROR: -1 due to 7 failed/errored test(s), 5430 tests executed
            Failed tests:

            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
            org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testPutGetMultiple
            

            Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/86/testReport
            Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/86/console

            Messages:

            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 7 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12642582

            hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12642582/HIVE-6430.11.patch ERROR: -1 due to 7 failed/errored test(s), 5430 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testPutGetMultiple Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/86/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/86/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed This message is automatically generated. ATTACHMENT ID: 12642582

            Fix the tag issue, CR feedback

            sershe Sergey Shelukhin added a comment - Fix the tag issue, CR feedback

            fix small bug

            sershe Sergey Shelukhin added a comment - fix small bug
            hiveqa Hive QA added a comment -

            Overall: -1 at least one tests failed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12642792/HIVE-6430.12.patch

            ERROR: -1 due to 6 failed/errored test(s), 5433 tests executed
            Failed tests:

            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
            

            Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/98/testReport
            Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/98/console

            Messages:

            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 6 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12642792

            hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12642792/HIVE-6430.12.patch ERROR: -1 due to 6 failed/errored test(s), 5433 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/98/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/98/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed This message is automatically generated. ATTACHMENT ID: 12642792
            leftyl Lefty Leverenz added a comment -

            Thanks for the parameter descriptions in hive-default.xml.template. But patch 12 has a duplicate description for hive.mapjoin.optimized.hashtable.

            leftyl Lefty Leverenz added a comment - Thanks for the parameter descriptions in hive-default.xml.template. But patch 12 has a duplicate description for hive.mapjoin.optimized.hashtable.

            Will remove on commit. hagleitn can you take a look? t3rmin4t0r signed off on RB but he's not formally a committer

            sershe Sergey Shelukhin added a comment - Will remove on commit. hagleitn can you take a look? t3rmin4t0r signed off on RB but he's not formally a committer

            This is neat. Some comments on rb.

            hagleitn Gunther Hagleitner added a comment - This is neat. Some comments on rb.

            CR feedback. RB was never posted in the JIRA, apparently... it's at https://reviews.apache.org/r/18936/

            sershe Sergey Shelukhin added a comment - CR feedback. RB was never posted in the JIRA, apparently... it's at https://reviews.apache.org/r/18936/
            hiveqa Hive QA added a comment -

            Overall: -1 at least one tests failed

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12644187/HIVE-6430.13.patch

            ERROR: -1 due to 3 failed/errored test(s), 5439 tests executed
            Failed tests:

            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
            org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
            org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService
            

            Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/175/testReport
            Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/175/console

            Messages:

            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 3 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12644187

            hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12644187/HIVE-6430.13.patch ERROR: -1 due to 3 failed/errored test(s), 5439 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/175/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/175/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12644187

            ping?

            sershe Sergey Shelukhin added a comment - ping?

            +1 looks good!

            hagleitn Gunther Hagleitner added a comment - +1 looks good!
            leftyl Lefty Leverenz added a comment -

            +1 for parameter documentation.

            leftyl Lefty Leverenz added a comment - +1 for parameter documentation.

            will commit today evening

            sershe Sergey Shelukhin added a comment - will commit today evening

            Is there any solution for the partial build problem? I have to "mvn clean" for every build after this patch.

            [ERROR] /grid/5/dev/gopalv/tez-autobuild/hive/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java:[224,35] method put in interface java.util.Map<K,V> cannot be applied to given typ;
            [ERROR] required: org.apache.hadoop.hive.ql.exec.Operator<?>,java.util.List<org.apache.hadoop.hive.ql.exec.Operator<?>>
            [ERROR] found: org.apache.hadoop.hive.ql.exec.MapJoinOperator,java.util.List<org.apache.hadoop.hive.ql.exec.Operator<? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>>
            [ERROR] reason: actual argument java.util.List<org.apache.hadoop.hive.ql.exec.Operator<? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>> cannot be converted to java.util.List<org.apache.hadoop.hiven
            [ERROR] -> [Help 1]
            
            gopalv Gopal Vijayaraghavan added a comment - Is there any solution for the partial build problem? I have to "mvn clean" for every build after this patch. [ERROR] /grid/5/dev/gopalv/tez-autobuild/hive/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java:[224,35] method put in interface java.util.Map<K,V> cannot be applied to given typ; [ERROR] required: org.apache.hadoop.hive.ql.exec.Operator<?>,java.util.List<org.apache.hadoop.hive.ql.exec.Operator<?>> [ERROR] found: org.apache.hadoop.hive.ql.exec.MapJoinOperator,java.util.List<org.apache.hadoop.hive.ql.exec.Operator<? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>> [ERROR] reason: actual argument java.util.List<org.apache.hadoop.hive.ql.exec.Operator<? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>> cannot be converted to java.util.List<org.apache.hadoop.hiven [ERROR] -> [Help 1]

            Seems to be only breaking on JDK7 javac.

            And only on rebuilds with modifications - never on "mvn clean package" builds.

            gopalv Gopal Vijayaraghavan added a comment - Seems to be only breaking on JDK7 javac. And only on rebuilds with modifications - never on "mvn clean package" builds.

            Hmm... I cannot repro this... tried JDK 6 or 7, clean build or not, and with modifications. Can you make an addendum patch that fixes it? So I could apply on top

            sershe Sergey Shelukhin added a comment - Hmm... I cannot repro this... tried JDK 6 or 7, clean build or not, and with modifications. Can you make an addendum patch that fixes it? So I could apply on top

            I can confirm that if I do an "mvn install" once, this problem goes away for a day (always fails exactly only on the first build of the day with the patch).

            If I had to guess, that's because my maven update interval is once-a-day for snapshots. Once you commit this, the .m2/ version from apache-snapshots will match up and my builds won't break anymore (hopefully).

            Commit this and if it breaks again for me, I'll post an addendum as a new patch.

            gopalv Gopal Vijayaraghavan added a comment - I can confirm that if I do an "mvn install" once, this problem goes away for a day (always fails exactly only on the first build of the day with the patch). If I had to guess, that's because my maven update interval is once-a-day for snapshots. Once you commit this, the .m2/ version from apache-snapshots will match up and my builds won't break anymore (hopefully). Commit this and if it breaks again for me, I'll post an addendum as a new patch.

            Reproed it on SVN. It is not related to this patch, fixing anyway. I'm assuming +1 stands...

            sershe Sergey Shelukhin added a comment - Reproed it on SVN. It is not related to this patch, fixing anyway. I'm assuming +1 stands...

            committed to trunk

            sershe Sergey Shelukhin added a comment - committed to trunk
            leftyl Lefty Leverenz added a comment -

            The configuration parameters hive.mapjoin.optimized.hashtable and hive.mapjoin.optimized.hashtable.wbsize need to be documented in the wiki for release 0.14.0.

            leftyl Lefty Leverenz added a comment - The configuration parameters hive.mapjoin.optimized.hashtable and hive.mapjoin.optimized.hashtable.wbsize need to be documented in the wiki for release 0.14.0. Hive Configuration Properties

            They are already documented in config template as far as I recall. Should we have that copied to wiki automatically somehow?

            sershe Sergey Shelukhin added a comment - They are already documented in config template as far as I recall. Should we have that copied to wiki automatically somehow?
            leftyl Lefty Leverenz added a comment -

            We don't have a way to add parameters to the wiki automatically. Yes, they're in the template file and I've got them on my wiki to-do list, but feel free to take care of them yourself if you have time.

            Mapjoin parameters don't have a section of their own, but they're listed together in order of Hive release (except for a couple of hive.skewjoin.mapjoin parameters) so these belong after hive.mapjoin.lazy.hashtable:

            leftyl Lefty Leverenz added a comment - We don't have a way to add parameters to the wiki automatically. Yes, they're in the template file and I've got them on my wiki to-do list, but feel free to take care of them yourself if you have time. Mapjoin parameters don't have a section of their own, but they're listed together in order of Hive release (except for a couple of hive.skewjoin.mapjoin parameters) so these belong after hive.mapjoin.lazy.hashtable: hive.mapjoin.lazy.hashtable
            thejas Thejas Nair added a comment -

            This has been fixed in 0.14 release. Please open new jira if you see any issues.

            thejas Thejas Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.
            leftyl Lefty Leverenz added a comment - Doc done, with links from the Tez parameter section: Configuration Properties – hive.mapjoin.optimized.hashtable Configuration Properties – hive.mapjoin.optimized.hashtable.wbsize Configuration Properties – Tez
            akolb Alex Kolbasov added a comment - misha@cloudera.com FYI.

            This has since been superseded by vectorized mapjoin that improves the hashtable further and specializes it for java types and special cases

            sershe Sergey Shelukhin added a comment - This has since been superseded by vectorized mapjoin that improves the hashtable further and specializes it for java types and special cases
            misha@cloudera.com Misha Dmitriev added a comment -

            Thank you akolb! This is nice work of the kind I wish I can do more

            misha@cloudera.com Misha Dmitriev added a comment - Thank you akolb ! This is nice work of the kind I wish I can do more

            People

              sershe Sergey Shelukhin
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: