[HIVE-6430] MapJoin hash table has large memory overhead - ASF JIRA

Sergey Shelukhin added a comment - 21/Feb/14 00:11

Here's the summary of the overhead per entry /after both of the above patches go in/ (before, the overhead in key and value is significantly bigger).

HashTable
Entry array: 8+ bytes
Entry: 32 bytes
Key and value objects: 32 bytes

Key
Byte array object + length: 20 bytes.
Field count and null mask: 1 byte.
Rounding to 8 bytes: 0-7 bytes.

Row
Fields: 8 bytes.
Object array object + length: 24 bytes.
Per-column, writable object: 16 bytes (assuming all the other fields in writables are useful data).

"Guaranteed" overhead per entry: 125 bytes, plus writables for row values and padding on key.
Example double key, row with one field: additional 21 bytes per entry, ~146 total
Example int key, row with 5 fields: additional 87 bytes per entry, ~212 total
+ some overhead depending on HashMap fullness.

So that's a lot of overhead (depends on the data of course, if row contains cat photos in binary then 150-200 bytes is not much).

The approach to get rid of per-entry overhead in general involves a hashtable implemented on top of array, with open addressing, and storing the actual variable-length keys and rows in big flat array(s) of byte[]-s or objects. That would get rid of key and rowe object overhead, most of hashmap overhead, most of key overhead, and most/some (see below) of row overhead.

The good thing about the table is that it's R/O after initial creation and we never delete, so we don't have to worry about many scenarios.

Details (scroll down for estimates)
Simple case, assuming we can convert both key and row into bytes:
Allocate largish fixed size byte arrays to have an infinite write buffer (or array can be reallocated if needed, or combination). Have a flat, custom-made hash table similar to HPPC one, that would store offsets into that array in the key array (of longs), and would have no value or state arrays. Some additional stuff, for example lengths or null bitmasks can be fit into key array values also.
When loading, incoming writables would write the keys and values into the write buffer. We know the schema so we don't have to worry about storing types, field offsets etc. Then write a fixed-size tail with e.g. length of key and value, to know what to compare and where value starts, etc. Because there's no requirement to allocate some number of bytes like there is now, v-length format can be used if needed to save space... but it shouldn't be too complicated. Probably it shouldn't use ORC there Then, key array uses standard hashtable put to store the offset to the postfix.
When getting, the key can still be compared same as now, as a byte array. One extra "dereference" from key array to get to the actual key by index.
For values, writables will have to be re-created when the row is requested because everything depends on writables now. Writables will trivially read from byte array at offset. Obviously this has performance cost.
Note that this is not like current lazy deserialization:
1) We do not deserialize on demand - final writables are just written to/read from byte array, so creating them should be cheaper than deserializing.
2) Writables are not preserved for future use and are created every time row is accessed, which has perf cost but saves memory.
Total overhead per entry would be around 14-16 bytes, plus some fixed or semi-fixed overhead depending on the write buffer allocation scheme.
In the above examples overhead will go from 146 and 212 bytes to 16 and 16.

Another alternative is similar, but with only keys in byte array, and values in a separate large Object array operating on the same principles, in writables with all their glory.
Key array can store indices and length to both, probably 2-3 longs per entry depending on what limitations we can accept.
So the total overhead will be around 16-24 bytes + 16 per field in the row, but writables wouldn't need to be re-created.
In the above examples overhead will go from 146 and 212 bytes to 32 and 96.

Tl;dr and estimates
The bad thing obviously is that w/o key and row objects all the interfaces around them would cease to exist. This is esp. bad for MR due to convoluted HashTable path with write and read, so in the first cut I think we should go Tez-only and preserve legacy path with objects for MR.

There are several good things...

We can essentially copy-paste HPPC long-long hashmap. It probably doesn't fit by itself and we don't need all the features, but it must be simple to convert to above. So we don't need to code up the open-addressing hashmap.
W.r.t. interface difference, I looked at the divergent paths; Tez HT loader obviously would be able to do whatever. MapJoinOperator is the only place where there will be problems - it currently creates the key and then calls get(key). Get can be changed to take the row, so that it would create the key for get as necessary.
Code for byte key creation, compare, validation etc.; and some other code from the above two patches can be reused; plus I know all I need to know and what needs to be done about writables and bytes from them.

Sergey Shelukhin added a comment - 21/Feb/14 00:11 Here's the summary of the overhead per entry /after both of the above patches go in/ (before, the overhead in key and value is significantly bigger). HashTable Entry array: 8+ bytes Entry: 32 bytes Key and value objects: 32 bytes Key Byte array object + length: 20 bytes. Field count and null mask: 1 byte. Rounding to 8 bytes: 0-7 bytes. Row Fields: 8 bytes. Object array object + length: 24 bytes. Per-column, writable object: 16 bytes (assuming all the other fields in writables are useful data). "Guaranteed" overhead per entry: 125 bytes, plus writables for row values and padding on key. Example double key, row with one field: additional 21 bytes per entry, ~146 total Example int key, row with 5 fields: additional 87 bytes per entry, ~212 total + some overhead depending on HashMap fullness. So that's a lot of overhead (depends on the data of course, if row contains cat photos in binary then 150-200 bytes is not much). The approach to get rid of per-entry overhead in general involves a hashtable implemented on top of array, with open addressing, and storing the actual variable-length keys and rows in big flat array(s) of byte[]-s or objects. That would get rid of key and rowe object overhead, most of hashmap overhead, most of key overhead, and most/some (see below) of row overhead. The good thing about the table is that it's R/O after initial creation and we never delete, so we don't have to worry about many scenarios. Details (scroll down for estimates) Simple case, assuming we can convert both key and row into bytes: Allocate largish fixed size byte arrays to have an infinite write buffer (or array can be reallocated if needed, or combination). Have a flat, custom-made hash table similar to HPPC one, that would store offsets into that array in the key array (of longs), and would have no value or state arrays. Some additional stuff, for example lengths or null bitmasks can be fit into key array values also. When loading, incoming writables would write the keys and values into the write buffer. We know the schema so we don't have to worry about storing types, field offsets etc. Then write a fixed-size tail with e.g. length of key and value, to know what to compare and where value starts, etc. Because there's no requirement to allocate some number of bytes like there is now, v-length format can be used if needed to save space... but it shouldn't be too complicated. Probably it shouldn't use ORC there Then, key array uses standard hashtable put to store the offset to the postfix. When getting, the key can still be compared same as now, as a byte array. One extra "dereference" from key array to get to the actual key by index. For values, writables will have to be re-created when the row is requested because everything depends on writables now. Writables will trivially read from byte array at offset. Obviously this has performance cost. Note that this is not like current lazy deserialization: 1) We do not deserialize on demand - final writables are just written to/read from byte array, so creating them should be cheaper than deserializing. 2) Writables are not preserved for future use and are created every time row is accessed, which has perf cost but saves memory. Total overhead per entry would be around 14-16 bytes, plus some fixed or semi-fixed overhead depending on the write buffer allocation scheme. In the above examples overhead will go from 146 and 212 bytes to 16 and 16. Another alternative is similar, but with only keys in byte array, and values in a separate large Object array operating on the same principles, in writables with all their glory. Key array can store indices and length to both, probably 2-3 longs per entry depending on what limitations we can accept. So the total overhead will be around 16-24 bytes + 16 per field in the row, but writables wouldn't need to be re-created. In the above examples overhead will go from 146 and 212 bytes to 32 and 96. Tl;dr and estimates The bad thing obviously is that w/o key and row objects all the interfaces around them would cease to exist. This is esp. bad for MR due to convoluted HashTable path with write and read, so in the first cut I think we should go Tez-only and preserve legacy path with objects for MR. There are several good things... We can essentially copy-paste HPPC long-long hashmap. It probably doesn't fit by itself and we don't need all the features, but it must be simple to convert to above. So we don't need to code up the open-addressing hashmap. W.r.t. interface difference, I looked at the divergent paths; Tez HT loader obviously would be able to do whatever. MapJoinOperator is the only place where there will be problems - it currently creates the key and then calls get(key). Get can be changed to take the row, so that it would create the key for get as necessary. Code for byte key creation, compare, validation etc.; and some other code from the above two patches can be reused; plus I know all I need to know and what needs to be done about writables and bytes from them.

Sergey Shelukhin added a comment - 21/Feb/14 00:12

"all the other fields in writables" should be "all the fields in writables", cannot edit

Sergey Shelukhin added a comment - 21/Feb/14 00:12 "all the other fields in writables" should be "all the fields in writables", cannot edit

Sergey Shelukhin added a comment - 04/Mar/14 01:46

attempt #2... presumably not only TableScans can be valid parents, because if I remove all other operators (as in the initial version) the tests fail. The input from someone with better knowledge of the original path would be helpful

Sergey Shelukhin added a comment - 04/Mar/14 01:46 attempt #2... presumably not only TableScans can be valid parents, because if I remove all other operators (as in the initial version) the tests fail. The input from someone with better knowledge of the original path would be helpful

Sergey Shelukhin added a comment - 04/Mar/14 01:47

wrong jira

Sergey Shelukhin added a comment - 04/Mar/14 01:47 wrong jira

Sergey Shelukhin added a comment - 06/Mar/14 08:28

New code probably has tons of bugs, but some old tests I ran have passed, let's try HiveQA. I will run tez tests

Sergey Shelukhin added a comment - 06/Mar/14 08:28 New code probably has tons of bugs, but some old tests I ran have passed, let's try HiveQA. I will run tez tests

Sergey Shelukhin added a comment - 06/Mar/14 22:21

Reattaching the patch, with some fixes in new code (not working yet). Looks like QA didn't pick it up

Sergey Shelukhin added a comment - 06/Mar/14 22:21 Reattaching the patch, with some fixes in new code (not working yet). Looks like QA didn't pick it up

Hive QA added a comment - 07/Mar/14 19:16

Overall: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12633240/HIVE-6430.patch

Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1649/testReport
Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1649/console

Messages:

Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1649/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'metastore/scripts/upgrade/derby/upgrade.order.derby'
Reverted 'metastore/scripts/upgrade/mysql/upgrade.order.mysql'
Reverted 'metastore/scripts/upgrade/mysql/hive-schema-0.13.0.mysql.sql'
Reverted 'metastore/scripts/upgrade/oracle/upgrade.order.oracle'
Reverted 'metastore/scripts/upgrade/postgres/upgrade.order.postgres'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target metastore/scripts/upgrade/derby/upgrade-0.13.0-to-0.14.0.derby.sql metastore/scripts/upgrade/derby/hive-schema-0.14.0.derby.sql metastore/scripts/upgrade/mysql/upgrade-0.13.0-to-0.14.0.mysql.sql metastore/scripts/upgrade/mysql/hive-schema-0.14.0.mysql.sql metastore/scripts/upgrade/oracle/upgrade-0.13.0-to-0.14.0.oracle.sql metastore/scripts/upgrade/oracle/hive-schema-0.14.0.oracle.sql metastore/scripts/upgrade/postgres/upgrade-0.13.0-to-0.14.0.postgres.sql metastore/scripts/upgrade/postgres/hive-schema-0.14.0.postgres.sql itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update
U    ql/src/test/queries/clientpositive/mapjoin_mapjoin.q
U    ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out
U    ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
U    ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
U    ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java
U    ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java

Fetching external item into 'hcatalog/src/test/e2e/harness'
Updated external to revision 1575376.

Updated to revision 1575376.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'

This message is automatically generated.

ATTACHMENT ID: 12633240

Hive QA added a comment - 07/Mar/14 19:16 Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12633240/HIVE-6430.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1649/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1649/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1649/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'metastore/scripts/upgrade/derby/upgrade.order.derby' Reverted 'metastore/scripts/upgrade/mysql/upgrade.order.mysql' Reverted 'metastore/scripts/upgrade/mysql/hive-schema-0.13.0.mysql.sql' Reverted 'metastore/scripts/upgrade/oracle/upgrade.order.oracle' Reverted 'metastore/scripts/upgrade/postgres/upgrade.order.postgres' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target metastore/scripts/upgrade/derby/upgrade-0.13.0-to-0.14.0.derby.sql metastore/scripts/upgrade/derby/hive-schema-0.14.0.derby.sql metastore/scripts/upgrade/mysql/upgrade-0.13.0-to-0.14.0.mysql.sql metastore/scripts/upgrade/mysql/hive-schema-0.14.0.mysql.sql metastore/scripts/upgrade/oracle/upgrade-0.13.0-to-0.14.0.oracle.sql metastore/scripts/upgrade/oracle/hive-schema-0.14.0.oracle.sql metastore/scripts/upgrade/postgres/upgrade-0.13.0-to-0.14.0.postgres.sql metastore/scripts/upgrade/postgres/hive-schema-0.14.0.postgres.sql itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update U ql/src/test/queries/clientpositive/mapjoin_mapjoin.q U ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out U ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java U ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java U ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java U ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java Fetching external item into 'hcatalog/src/test/e2e/harness' Updated external to revision 1575376. Updated to revision 1575376. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12633240

Sergey Shelukhin added a comment - 08/Mar/14 00:22

Ran some regular and some tez tests, they passed. Will wait for QA and run more tez tests

Sergey Shelukhin added a comment - 08/Mar/14 00:22 Ran some regular and some tez tests, they passed. Will wait for QA and run more tez tests

Sergey Shelukhin added a comment - 08/Mar/14 02:26

all tez tests passed, some explain plans changed in details that should be unrelated (like column names), and ordering changed in one file.
I will see if trunk files need to be updated again, and/or if ordering needs to be enforced

Sergey Shelukhin added a comment - 08/Mar/14 02:26 all tez tests passed, some explain plans changed in details that should be unrelated (like column names), and ordering changed in one file. I will see if trunk files need to be updated again, and/or if ordering needs to be enforced

Sergey Shelukhin added a comment - 08/Mar/14 02:27

gopalv hagleitn this patch is ready for review... if you were looking for good weekend reading

Sergey Shelukhin added a comment - 08/Mar/14 02:27 gopalv hagleitn this patch is ready for review... if you were looking for good weekend reading

Hive QA added a comment - 09/Mar/14 23:19

Overall: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12633496/HIVE-6430.patch

ERROR: -1 due to 2 failed/errored test(s), 5373 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table

Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1682/testReport
Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1682/console

Messages:

Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed

This message is automatically generated.

ATTACHMENT ID: 12633496

Hive QA added a comment - 09/Mar/14 23:19 Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12633496/HIVE-6430.patch ERROR: -1 due to 2 failed/errored test(s), 5373 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1682/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1682/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12633496

Sergey Shelukhin added a comment - 10/Mar/14 23:31

Both of these tests pass for me... looks unrelated. After review feedback update they can rerun

Sergey Shelukhin added a comment - 10/Mar/14 23:31 Both of these tests pass for me... looks unrelated. After review feedback update they can rerun

Sergey Shelukhin added a comment - 12/Mar/14 02:30

most of RB feedback except for refactor, need to discuss... also added ascii art to comments and one more memory optimization to truncate the array, after initial tests.

Sergey Shelukhin added a comment - 12/Mar/14 02:30 most of RB feedback except for refactor, need to discuss... also added ascii art to comments and one more memory optimization to truncate the array, after initial tests.

Sergey Shelukhin added a comment - 12/Mar/14 18:04

the test changes were not intentional... merged wrong branch

Sergey Shelukhin added a comment - 12/Mar/14 18:04 the test changes were not intentional... merged wrong branch

Sergey Shelukhin added a comment - 14/Mar/14 06:56

Addressed all CR feedback, but patch still fails some Tez tests. Will address tomorrow.

Meanwhile, can you review common code (I may separate it into different patch), so that we could perhaps put this into Hive 13 in disabled form?

Sergey Shelukhin added a comment - 14/Mar/14 06:56 Addressed all CR feedback, but patch still fails some Tez tests. Will address tomorrow. Meanwhile, can you review common code (I may separate it into different patch), so that we could perhaps put this into Hive 13 in disabled form?

Sergey Shelukhin added a comment - 15/Mar/14 01:16

Addressed major review and discussion feedback. I kept the list bit in the ref though, because putting it in the array results in huge pita w/retrieval of the union. Removed the "split" long, now everything is in one place.
Probably need to write some unit tests, q files do not cover all cases. Will do so later today or maybe sunday

Sergey Shelukhin added a comment - 15/Mar/14 01:16 Addressed major review and discussion feedback. I kept the list bit in the ref though, because putting it in the array results in huge pita w/retrieval of the union. Removed the "split" long, now everything is in one place. Probably need to write some unit tests, q files do not cover all cases. Will do so later today or maybe sunday

Sergey Shelukhin added a comment - 15/Mar/14 01:17

Some Tez tests passed, running others

Sergey Shelukhin added a comment - 15/Mar/14 01:17 Some Tez tests passed, running others

Sergey Shelukhin added a comment - 15/Mar/14 03:34

forgot one todo, remove unnecessary method

Sergey Shelukhin added a comment - 15/Mar/14 03:34 forgot one todo, remove unnecessary method

Lefty Leverenz added a comment - 15/Mar/14 06:15

This adds config parameter hive.mapjoin.optimized.hashtable to HiveConf.java but doesn't give a description in hive-default.xml.template or a HiveConf.java comment.

~~HIVE-6037~~ is going to change HiveConf.java and start generating hive-default.xml.template from HiveConf.java, so I suggest putting the parameter description in a jira release note. Then it can be added to the new version of HiveConf.java after ~~HIVE-6037~~ gets committed.

Lefty Leverenz added a comment - 15/Mar/14 06:15 This adds config parameter hive.mapjoin.optimized.hashtable to HiveConf.java but doesn't give a description in hive-default.xml.template or a HiveConf.java comment. HIVE-6037 is going to change HiveConf.java and start generating hive-default.xml.template from HiveConf.java, so I suggest putting the parameter description in a jira release note. Then it can be added to the new version of HiveConf.java after HIVE-6037 gets committed.

Hive QA added a comment - 15/Mar/14 10:39

Overall: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12634889/HIVE-6430.03.patch

Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1825/testReport
Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1825/console

Messages:

**** This message was trimmed, see log for full details ****
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hwi/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hwi ---
[INFO] Executing tasks

main:
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/warehouse
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf
     [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-hwi ---
[INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hwi ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hwi ---
[INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-hwi ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hwi ---
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.jar
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.pom
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Hive ODBC 0.14.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-odbc ---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/odbc (includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-odbc ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-odbc ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-odbc ---
[INFO] Executing tasks

main:
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/warehouse
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf
     [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-odbc ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-odbc ---
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/odbc/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-odbc/0.14.0-SNAPSHOT/hive-odbc-0.14.0-SNAPSHOT.pom
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Hive Shims Aggregator 0.14.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims-aggregator ---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims (includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims-aggregator ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims-aggregator ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-aggregator ---
[INFO] Executing tasks

main:
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/warehouse
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp/conf
     [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-shims-aggregator ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims-aggregator ---
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/shims/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-shims-aggregator/0.14.0-SNAPSHOT/hive-shims-aggregator-0.14.0-SNAPSHOT.pom
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Hive TestUtils 0.14.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-testutils ---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/testutils (includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-testutils ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-testutils ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/testutils/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-testutils ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-testutils ---
[INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-testutils ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/testutils/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-testutils ---
[INFO] Executing tasks

main:
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/warehouse
    [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp/conf
     [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-testutils ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-testutils ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-testutils ---
[INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/hive-testutils-0.14.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-testutils ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-testutils ---
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/hive-testutils-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-testutils/0.14.0-SNAPSHOT/hive-testutils-0.14.0-SNAPSHOT.jar
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/testutils/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-testutils/0.14.0-SNAPSHOT/hive-testutils-0.14.0-SNAPSHOT.pom
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Hive Packaging 0.14.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/maven-metadata.xml
Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/hive-hcatalog-hbase-storage-handler-0.14.0-SNAPSHOT.pom
[WARNING] The POM for org.apache.hive.hcatalog:hive-hcatalog-hbase-storage-handler:jar:0.14.0-SNAPSHOT is missing, no dependency information available
Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/hive-hcatalog-hbase-storage-handler-0.14.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Hive .............................................. SUCCESS [8.715s]
[INFO] Hive Ant Utilities ................................ SUCCESS [5.435s]
[INFO] Hive Shims Common ................................. SUCCESS [3.745s]
[INFO] Hive Shims 0.20 ................................... SUCCESS [2.563s]
[INFO] Hive Shims Secure Common .......................... SUCCESS [4.273s]
[INFO] Hive Shims 0.20S .................................. SUCCESS [2.567s]
[INFO] Hive Shims 0.23 ................................... SUCCESS [7.933s]
[INFO] Hive Shims ........................................ SUCCESS [1.228s]
[INFO] Hive Common ....................................... SUCCESS [6.888s]
[INFO] Hive Serde ........................................ SUCCESS [10.418s]
[INFO] Hive Metastore .................................... SUCCESS [35.599s]
[INFO] Hive Query Language ............................... SUCCESS [1:10.690s]
[INFO] Hive Service ...................................... SUCCESS [7.930s]
[INFO] Hive JDBC ......................................... SUCCESS [3.004s]
[INFO] Hive Beeline ...................................... SUCCESS [2.789s]
[INFO] Hive CLI .......................................... SUCCESS [1.823s]
[INFO] Hive Contrib ...................................... SUCCESS [2.640s]
[INFO] Hive HBase Handler ................................ SUCCESS [2.594s]
[INFO] Hive HCatalog ..................................... SUCCESS [0.545s]
[INFO] Hive HCatalog Core ................................ SUCCESS [2.355s]
[INFO] Hive HCatalog Pig Adapter ......................... SUCCESS [2.462s]
[INFO] Hive HCatalog Server Extensions ................... SUCCESS [1.779s]
[INFO] Hive HCatalog Webhcat Java Client ................. SUCCESS [1.624s]
[INFO] Hive HCatalog Webhcat ............................. SUCCESS [9.865s]
[INFO] Hive HWI .......................................... SUCCESS [1.245s]
[INFO] Hive ODBC ......................................... SUCCESS [0.829s]
[INFO] Hive Shims Aggregator ............................. SUCCESS [0.209s]
[INFO] Hive TestUtils .................................... SUCCESS [0.640s]
[INFO] Hive Packaging .................................... FAILURE [1.763s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:28.664s
[INFO] Finished at: Sat Mar 15 06:39:11 EDT 2014
[INFO] Final Memory: 74M/461M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hive-packaging: Could not resolve dependencies for project org.apache.hive:hive-packaging:pom:0.14.0-SNAPSHOT: Could not find artifact org.apache.hive.hcatalog:hive-hcatalog-hbase-storage-handler:jar:0.14.0-SNAPSHOT in apache.snapshots (http://repository.apache.org/snapshots) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hive-packaging
+ exit 1
'

This message is automatically generated.

ATTACHMENT ID: 12634889

Hive QA added a comment - 15/Mar/14 10:39 Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12634889/HIVE-6430.03.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1825/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1825/console Messages: **** This message was trimmed, see log for full details **** [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hwi/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hwi --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-hwi --- [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hwi --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hwi --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-hwi --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hwi --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.pom [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Hive ODBC 0.14.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-odbc --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/odbc (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-odbc --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-odbc --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-odbc --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-odbc --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-odbc --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/odbc/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-odbc/0.14.0-SNAPSHOT/hive-odbc-0.14.0-SNAPSHOT.pom [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Hive Shims Aggregator 0.14.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims-aggregator --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims-aggregator --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims-aggregator --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-aggregator --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/shims/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-shims-aggregator --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims-aggregator --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/shims/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-shims-aggregator/0.14.0-SNAPSHOT/hive-shims-aggregator-0.14.0-SNAPSHOT.pom [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Hive TestUtils 0.14.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-testutils --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/testutils (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-testutils --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-testutils --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/testutils/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-testutils --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-testutils --- [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/classes [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-testutils --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/testutils/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-testutils --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-testutils --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-testutils --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-testutils --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/hive-testutils-0.14.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-testutils --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-testutils --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/testutils/target/hive-testutils-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-testutils/0.14.0-SNAPSHOT/hive-testutils-0.14.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/testutils/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-testutils/0.14.0-SNAPSHOT/hive-testutils-0.14.0-SNAPSHOT.pom [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Hive Packaging 0.14.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/maven-metadata.xml Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/hive-hcatalog-hbase-storage-handler-0.14.0-SNAPSHOT.pom [WARNING] The POM for org.apache.hive.hcatalog:hive-hcatalog-hbase-storage-handler:jar:0.14.0-SNAPSHOT is missing, no dependency information available Downloading: http://repository.apache.org/snapshots/org/apache/hive/hcatalog/hive-hcatalog-hbase-storage-handler/0.14.0-SNAPSHOT/hive-hcatalog-hbase-storage-handler-0.14.0-SNAPSHOT.jar [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Hive .............................................. SUCCESS [8.715s] [INFO] Hive Ant Utilities ................................ SUCCESS [5.435s] [INFO] Hive Shims Common ................................. SUCCESS [3.745s] [INFO] Hive Shims 0.20 ................................... SUCCESS [2.563s] [INFO] Hive Shims Secure Common .......................... SUCCESS [4.273s] [INFO] Hive Shims 0.20S .................................. SUCCESS [2.567s] [INFO] Hive Shims 0.23 ................................... SUCCESS [7.933s] [INFO] Hive Shims ........................................ SUCCESS [1.228s] [INFO] Hive Common ....................................... SUCCESS [6.888s] [INFO] Hive Serde ........................................ SUCCESS [10.418s] [INFO] Hive Metastore .................................... SUCCESS [35.599s] [INFO] Hive Query Language ............................... SUCCESS [1:10.690s] [INFO] Hive Service ...................................... SUCCESS [7.930s] [INFO] Hive JDBC ......................................... SUCCESS [3.004s] [INFO] Hive Beeline ...................................... SUCCESS [2.789s] [INFO] Hive CLI .......................................... SUCCESS [1.823s] [INFO] Hive Contrib ...................................... SUCCESS [2.640s] [INFO] Hive HBase Handler ................................ SUCCESS [2.594s] [INFO] Hive HCatalog ..................................... SUCCESS [0.545s] [INFO] Hive HCatalog Core ................................ SUCCESS [2.355s] [INFO] Hive HCatalog Pig Adapter ......................... SUCCESS [2.462s] [INFO] Hive HCatalog Server Extensions ................... SUCCESS [1.779s] [INFO] Hive HCatalog Webhcat Java Client ................. SUCCESS [1.624s] [INFO] Hive HCatalog Webhcat ............................. SUCCESS [9.865s] [INFO] Hive HWI .......................................... SUCCESS [1.245s] [INFO] Hive ODBC ......................................... SUCCESS [0.829s] [INFO] Hive Shims Aggregator ............................. SUCCESS [0.209s] [INFO] Hive TestUtils .................................... SUCCESS [0.640s] [INFO] Hive Packaging .................................... FAILURE [1.763s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:28.664s [INFO] Finished at: Sat Mar 15 06:39:11 EDT 2014 [INFO] Final Memory: 74M/461M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project hive-packaging: Could not resolve dependencies for project org.apache.hive:hive-packaging:pom:0.14.0-SNAPSHOT: Could not find artifact org.apache.hive.hcatalog:hive-hcatalog-hbase-storage-handler:jar:0.14.0-SNAPSHOT in apache.snapshots (http://repository.apache.org/snapshots) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :hive-packaging + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12634889

Sergey Shelukhin added a comment - 17/Mar/14 08:22

Add unit test, fixes

Sergey Shelukhin added a comment - 17/Mar/14 08:22 Add unit test, fixes

Hive QA added a comment - 18/Mar/14 09:55

Overall: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12635047/HIVE-6430.04.patch

ERROR: -1 due to 2 failed/errored test(s), 5417 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part

Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1867/testReport
Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1867/console

Messages:

Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed

This message is automatically generated.

ATTACHMENT ID: 12635047

Hive QA added a comment - 18/Mar/14 09:55 Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12635047/HIVE-6430.04.patch ERROR: -1 due to 2 failed/errored test(s), 5417 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1867/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1867/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12635047

Sergey Shelukhin added a comment - 19/Mar/14 02:37

rebase, incorporate not enabling for decimal

Sergey Shelukhin added a comment - 19/Mar/14 02:37 rebase, incorporate not enabling for decimal

Sergey Shelukhin added a comment - 20/Mar/14 23:23

Finally fixed last glitches and got some memory numbers. Next, I will try on some queries on a real cluster...

On standard tables (over10k data file), we join the entire table with 7k rows of the same, on one column, resulting in only 407 unique keys. Each row contains 3 columns from the joined table.
Note that the "from" case uses LazyFlatRowContainer, so this is on top of gain from ~~HIVE-6418~~.

The usage goes from:

Class	Objects	Shallow Size	Retained Size
org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper	1	32	880632
java.util.HashMap	2	96	880560
java.util.HashMap$Entry[]	2	65632	880464
java.util.HashMap$Entry	407	13024	814832
java.lang.Object[]	810	101008	785488
org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer	405	9720	775768
org.apache.hadoop.io.Text	7000	168000	394760
byte[]	7001	226776	226776
org.apache.hadoop.hive.serde2.io.DoubleWritable	7000	168000	168000
org.apache.hadoop.io.IntWritable	7000	112000	112000
org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject	405	6480	25920
org.apache.hadoop.io.LongWritable	405	9720	9720
java.lang.String	2	64	120
char[]	2	56	56
org.apache.hadoop.hive.serde2.ByteStream$Output	1	24	40

To:

Class	Objects	Shallow Size	Retained Size
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer	1	32	340664
org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap	1	48	340392
java.util.ArrayList	4	96	209344
java.lang.Object[]	6	152	209304
org.apache.hadoop.hive.serde2.WriteBuffers	1	56	209256
byte[]	1	209152	209152
long[]	1	131088	131088
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$KeyValueWriter	1	40	200

That is 61% reduction on top of ~~HIVE-6418~~.

If the join is on 4 columns (to increase number of unique keys to 7000, one row per key), it goes from:

Class	Objects	Shallow Size	Retained Size
org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper	1	32	2196624
java.util.HashMap	2	96	2196552
java.util.HashMap$Entry[]	2	65632	2196456
java.util.HashMap$Entry	7002	224064	2130824
java.lang.Object[]	13999	447968	1626656
org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer	7000	168000	1066760
org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject	6999	111984	839880
org.apache.hadoop.io.Text	7000	168000	394760
byte[]	7001	226776	226776
org.apache.hadoop.io.IntWritable	13999	223984	223984
org.apache.hadoop.hive.serde2.io.DoubleWritable	7000	168000	168000
org.apache.hadoop.io.LongWritable	6999	167976	167976
org.apache.hadoop.hive.serde2.io.ByteWritable	6999	111984	111984
org.apache.hadoop.hive.serde2.io.ShortWritable	6999	111984	111984
java.lang.String	2	64	120
char[]	2	56	56
org.apache.hadoop.hive.serde2.ByteStream$Output	1	24	40

To:

Class	Objects	Shallow Size	Retained Size
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer	1	32	452976
org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap	1	48	452688
java.util.ArrayList	4	96	321648
java.lang.Object[]	6	168	321616
org.apache.hadoop.hive.serde2.WriteBuffers	1	56	321552
byte[]	1	321448	321448
long[]	1	131088	131088
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$KeyValueWriter	1	40	216

That is 79% reduction on top of ~~HIVE-6418~~, or roughly 5 times smaller (this is a rather favorable case though).

Sergey Shelukhin added a comment - 20/Mar/14 23:23 Finally fixed last glitches and got some memory numbers. Next, I will try on some queries on a real cluster... On standard tables (over10k data file), we join the entire table with 7k rows of the same, on one column, resulting in only 407 unique keys. Each row contains 3 columns from the joined table. Note that the "from" case uses LazyFlatRowContainer, so this is on top of gain from HIVE-6418 . The usage goes from: Class Objects Shallow Size Retained Size org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper 1 32 880632 java.util.HashMap 2 96 880560 java.util.HashMap$Entry[] 2 65632 880464 java.util.HashMap$Entry 407 13024 814832 java.lang.Object[] 810 101008 785488 org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer 405 9720 775768 org.apache.hadoop.io.Text 7000 168000 394760 byte[] 7001 226776 226776 org.apache.hadoop.hive.serde2.io.DoubleWritable 7000 168000 168000 org.apache.hadoop.io.IntWritable 7000 112000 112000 org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject 405 6480 25920 org.apache.hadoop.io.LongWritable 405 9720 9720 java.lang.String 2 64 120 char[] 2 56 56 org.apache.hadoop.hive.serde2.ByteStream$Output 1 24 40 To: Class Objects Shallow Size Retained Size org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer 1 32 340664 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap 1 48 340392 java.util.ArrayList 4 96 209344 java.lang.Object[] 6 152 209304 org.apache.hadoop.hive.serde2.WriteBuffers 1 56 209256 byte[] 1 209152 209152 long[] 1 131088 131088 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$KeyValueWriter 1 40 200 That is 61% reduction on top of HIVE-6418 . If the join is on 4 columns (to increase number of unique keys to 7000, one row per key), it goes from: Class Objects Shallow Size Retained Size org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper 1 32 2196624 java.util.HashMap 2 96 2196552 java.util.HashMap$Entry[] 2 65632 2196456 java.util.HashMap$Entry 7002 224064 2130824 java.lang.Object[] 13999 447968 1626656 org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer 7000 168000 1066760 org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject 6999 111984 839880 org.apache.hadoop.io.Text 7000 168000 394760 byte[] 7001 226776 226776 org.apache.hadoop.io.IntWritable 13999 223984 223984 org.apache.hadoop.hive.serde2.io.DoubleWritable 7000 168000 168000 org.apache.hadoop.io.LongWritable 6999 167976 167976 org.apache.hadoop.hive.serde2.io.ByteWritable 6999 111984 111984 org.apache.hadoop.hive.serde2.io.ShortWritable 6999 111984 111984 java.lang.String 2 64 120 char[] 2 56 56 org.apache.hadoop.hive.serde2.ByteStream$Output 1 24 40 To: Class Objects Shallow Size Retained Size org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer 1 32 452976 org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap 1 48 452688 java.util.ArrayList 4 96 321648 java.lang.Object[] 6 168 321616 org.apache.hadoop.hive.serde2.WriteBuffers 1 56 321552 byte[] 1 321448 321448 long[] 1 131088 131088 org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$KeyValueWriter 1 40 216 That is 79% reduction on top of HIVE-6418 , or roughly 5 times smaller (this is a rather favorable case though).

Sergey Shelukhin added a comment - 20/Mar/14 23:25

Fix missing call to seal, some minor stuff

Sergey Shelukhin added a comment - 20/Mar/14 23:25 Fix missing call to seal, some minor stuff

Hive QA added a comment - 22/Mar/14 10:37

Overall: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12635910/HIVE-6430.06.patch

ERROR: -1 due to 1 failed/errored test(s), 5445 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing

Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1900/testReport
Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1900/console

Messages:

Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed

This message is automatically generated.

ATTACHMENT ID: 12635910

Hive QA added a comment - 22/Mar/14 10:37 Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12635910/HIVE-6430.06.patch ERROR: -1 due to 1 failed/errored test(s), 5445 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1900/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1900/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12635910

Sergey Shelukhin added a comment - 01/Apr/14 01:06

gopalv do you want to finish review when you have time?

Sergey Shelukhin added a comment - 01/Apr/14 01:06 gopalv do you want to finish review when you have time?

Sergey Shelukhin added a comment - 08/Apr/14 17:24

Tested the patch on real queries. I do see huge memory reduction (modified TPCDS query 72, worst map task goes from 7Gb to ~1.2Gb dump after populating hash tables, I'll need to download the dumps to analyze but it's pretty clear cut); and GC time counter goes down from ~1min total to few seconds, as expected, but I also see huge wall clock time increase (without corresponding CPU time increase it looks like) during processing. I would expect some tradeoff but not as much as I'm seeing... will profile more.

Sergey Shelukhin added a comment - 08/Apr/14 17:24 Tested the patch on real queries. I do see huge memory reduction (modified TPCDS query 72, worst map task goes from 7Gb to ~1.2Gb dump after populating hash tables, I'll need to download the dumps to analyze but it's pretty clear cut); and GC time counter goes down from ~1min total to few seconds, as expected, but I also see huge wall clock time increase (without corresponding CPU time increase it looks like) during processing. I would expect some tradeoff but not as much as I'm seeing... will profile more.

Sergey Shelukhin added a comment - 16/Apr/14 00:16

Resize has an epic bug, cannot rely on slot being part of the hash because of probing... that was pretty silly.
I think this also causes some of perf degradation because table does get rehashed and it may screw it up completely (I ran the query that returns no results so it wouldn't clutter my shell, good thinking there).

Sergey Shelukhin added a comment - 16/Apr/14 00:16 Resize has an epic bug, cannot rely on slot being part of the hash because of probing... that was pretty silly. I think this also causes some of perf degradation because table does get rehashed and it may screw it up completely (I ran the query that returns no results so it wouldn't clutter my shell, good thinking there).

Sergey Shelukhin added a comment - 17/Apr/14 01:06

Patch that fixes some issues, main thing is that Murmur hash from guava is used; hashing behavior is very bad with previous hash code method and perf suffers a lot.
There's also an issue with previously used expand method. To make expand fast, hash is now stored fully. This is not necessary for anything else so it's a tradeoff - more memory (+4 bytes per key) or expensive rehash. We may do it later.
Fast paths were added to WriteBuffers for the majority of cases where whatever we are doing is all in one buffer. There's some bug in there that causes some queries to fail, I'll investigate... want to UL patch with what is done, the queries with large map joins that do work now run approximately as fast as before (will later measure more precisely) in a fraction of memory.

Sergey Shelukhin added a comment - 17/Apr/14 01:06 Patch that fixes some issues, main thing is that Murmur hash from guava is used; hashing behavior is very bad with previous hash code method and perf suffers a lot. There's also an issue with previously used expand method. To make expand fast, hash is now stored fully. This is not necessary for anything else so it's a tradeoff - more memory (+4 bytes per key) or expensive rehash. We may do it later. Fast paths were added to WriteBuffers for the majority of cases where whatever we are doing is all in one buffer. There's some bug in there that causes some queries to fail, I'll investigate... want to UL patch with what is done, the queries with large map joins that do work now run approximately as fast as before (will later measure more precisely) in a fraction of memory.

Gopal Vijayaraghavan added a comment - 17/Apr/14 04:06

This is an excellent find!

The hash collision scenario seems to be affecting the regular hashmap cases as well.

I flipped over the MapJoinKeyBytes::hashCode() to an inlined murmur, which resulted in a ~2 seconds savings to my map tasks.

Gopal Vijayaraghavan added a comment - 17/Apr/14 04:06 This is an excellent find! The hash collision scenario seems to be affecting the regular hashmap cases as well. I flipped over the MapJoinKeyBytes::hashCode() to an inlined murmur, which resulted in a ~2 seconds savings to my map tasks.

Sergey Shelukhin added a comment - 17/Apr/14 17:42

We should probably do the same in actual codebase... I'll file a JIRA

Sergey Shelukhin added a comment - 17/Apr/14 17:42 We should probably do the same in actual codebase... I'll file a JIRA

Sergey Shelukhin added a comment - 18/Apr/14 00:59

Fixed bugs, improved tests; TPCDS q27 now can run on the cluster I have access to (fails with OOM even with 8Gb containers). Profiling the results are actually much better now, little own time for the hashmap.

Sergey Shelukhin added a comment - 18/Apr/14 00:59 Fixed bugs, improved tests; TPCDS q27 now can run on the cluster I have access to (fails with OOM even with 8Gb containers). Profiling the results are actually much better now, little own time for the hashmap.

Sergey Shelukhin added a comment - 18/Apr/14 01:07

er, 72

Sergey Shelukhin added a comment - 18/Apr/14 01:07 er, 72

Sergey Shelukhin added a comment - 24/Apr/14 02:54

This replaces guava murmurhash with inline one, and adds (untested) serialization bypass for serdes (testing fast query, hash and byte copies in serdes are the most prominent differences in my profiled runs). Unfortunately, for the latter I've discovered that keys given to us are serialized using BinarySortableSerDe because they come from ReduceSinkOperator. Will need to sync w/Gunther tomorrow on this. Most likely outcome is that we'll change the tez hashtable output to lazy serde, so we could just copy bytes. Alternative would be to change key serialization to binarysortable, but that's ugly because values would stay on lazybinary so we will have two paths. Plus bunch of changes will be required to binarysortable to not have byte copies again, and use RandomAccessOutput instead of its OutputBuffer thing. Yet another alternative is to do bypass only for values, not keys.

Regardless, I think we should be committing this patch soon (even if off by default), and doing additional improvements in separate jiras.
It's growing too big.

Sergey Shelukhin added a comment - 24/Apr/14 02:54 This replaces guava murmurhash with inline one, and adds (untested) serialization bypass for serdes (testing fast query, hash and byte copies in serdes are the most prominent differences in my profiled runs). Unfortunately, for the latter I've discovered that keys given to us are serialized using BinarySortableSerDe because they come from ReduceSinkOperator. Will need to sync w/Gunther tomorrow on this. Most likely outcome is that we'll change the tez hashtable output to lazy serde, so we could just copy bytes. Alternative would be to change key serialization to binarysortable, but that's ugly because values would stay on lazybinary so we will have two paths. Plus bunch of changes will be required to binarysortable to not have byte copies again, and use RandomAccessOutput instead of its OutputBuffer thing. Yet another alternative is to do bypass only for values, not keys. Regardless, I think we should be committing this patch soon (even if off by default), and doing additional improvements in separate jiras. It's growing too big.

Gopal Vijayaraghavan added a comment - 24/Apr/14 03:17

LazySerde is not sortable, at least as far as I know - this is why the Reduce Sink produces binary sortables.

Gopal Vijayaraghavan added a comment - 24/Apr/14 03:17 LazySerde is not sortable, at least as far as I know - this is why the Reduce Sink produces binary sortables.

Gopal Vijayaraghavan added a comment - 24/Apr/14 03:33

That comment above probably didn't parse - but the usage of lazy keys make it impossible to generate a min-max range (or >1 ranges) from the hashtable.

Gopal Vijayaraghavan added a comment - 24/Apr/14 03:33 That comment above probably didn't parse - but the usage of lazy keys make it impossible to generate a min-max range (or >1 ranges) from the hashtable.

Hive QA added a comment - 24/Apr/14 21:11

Overall: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12641640/HIVE-6430.09.patch

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/31/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/31/console

Messages:

**** This message was trimmed, see log for full details ****
As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as "LPAREN KW_NULL BITWISEOR" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as "LPAREN CharSetName CharSetLiteral" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as "LPAREN KW_NULL NOTEQUAL" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:115:5: 
Decision can match input such as "KW_CLUSTER KW_BY LPAREN" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:127:5: 
Decision can match input such as "KW_PARTITION KW_BY LPAREN" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:138:5: 
Decision can match input such as "KW_DISTRIBUTE KW_BY LPAREN" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:149:5: 
Decision can match input such as "KW_SORT KW_BY LPAREN" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:166:7: 
Decision can match input such as "STAR" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as "KW_STRUCT" using multiple alternatives: 4, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as "KW_ARRAY" using multiple alternatives: 2, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as "KW_UNIONTYPE" using multiple alternatives: 5, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as "KW_TRUE" using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as "KW_DATE StringLiteral" using multiple alternatives: 2, 3

As a result, alternative(s) 3 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as "KW_NULL" using multiple alternatives: 1, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as "KW_FALSE" using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "KW_BETWEEN KW_MAP LPAREN" using multiple alternatives: 8, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_ORDER KW_BY" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER KW_BY" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_UNION KW_ALL" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:518:5: 
Decision can match input such as "{AMPERSAND..BITWISEXOR, DIV..DIVIDE, EQUAL..EQUAL_NS, GREATERTHAN..GREATERTHANOREQUALTO, KW_AND, KW_ARRAY, KW_BETWEEN..KW_BOOLEAN, KW_CASE, KW_DOUBLE, KW_FLOAT, KW_IF, KW_IN, KW_INT, KW_LIKE, KW_MAP, KW_NOT, KW_OR, KW_REGEXP, KW_RLIKE, KW_SMALLINT, KW_STRING..KW_STRUCT, KW_TINYINT, KW_UNIONTYPE, KW_WHEN, LESSTHAN..LESSTHANOREQUALTO, MINUS..NOTEQUAL, PLUS, STAR, TILDE}" using multiple alternatives: 1, 3

As a result, alternative(s) 3 were disabled for that input
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-exec ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-exec ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-exec ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-exec ---
[INFO] Compiling 1687 source files to /data/hive-ptest/working/apache-svn-trunk-source/ql/target/classes
[INFO] -------------------------------------------------------------
[WARNING] COMPILATION WARNING : 
[INFO] -------------------------------------------------------------
[WARNING] Note: Some input files use or override a deprecated API.
[WARNING] Note: Recompile with -Xlint:deprecation for details.
[WARNING] Note: Some input files use unchecked or unsafe operations.
[WARNING] Note: Recompile with -Xlint:unchecked for details.
[INFO] 4 warnings 
[INFO] -------------------------------------------------------------
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,27] cannot find symbol
symbol  : variable tmpSerDe
location: class org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer
[ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,12] internal error; cannot instantiate org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor.<init> at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor to ()
[INFO] 2 errors 
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Hive .............................................. SUCCESS [9.353s]
[INFO] Hive Ant Utilities ................................ SUCCESS [5.818s]
[INFO] Hive Shims Common ................................. SUCCESS [3.953s]
[INFO] Hive Shims 0.20 ................................... SUCCESS [2.640s]
[INFO] Hive Shims Secure Common .......................... SUCCESS [4.766s]
[INFO] Hive Shims 0.20S .................................. SUCCESS [2.439s]
[INFO] Hive Shims 0.23 ................................... SUCCESS [8.866s]
[INFO] Hive Shims ........................................ SUCCESS [1.196s]
[INFO] Hive Common ....................................... SUCCESS [13.038s]
[INFO] Hive Serde ........................................ SUCCESS [10.412s]
[INFO] Hive Metastore .................................... SUCCESS [34.091s]
[INFO] Hive Query Language ............................... FAILURE [53.832s]
[INFO] Hive Service ...................................... SKIPPED
[INFO] Hive JDBC ......................................... SKIPPED
[INFO] Hive Beeline ...................................... SKIPPED
[INFO] Hive CLI .......................................... SKIPPED
[INFO] Hive Contrib ...................................... SKIPPED
[INFO] Hive HBase Handler ................................ SKIPPED
[INFO] Hive HCatalog ..................................... SKIPPED
[INFO] Hive HCatalog Core ................................ SKIPPED
[INFO] Hive HCatalog Pig Adapter ......................... SKIPPED
[INFO] Hive HCatalog Server Extensions ................... SKIPPED
[INFO] Hive HCatalog Webhcat Java Client ................. SKIPPED
[INFO] Hive HCatalog Webhcat ............................. SKIPPED
[INFO] Hive HCatalog Streaming ........................... SKIPPED
[INFO] Hive HWI .......................................... SKIPPED
[INFO] Hive ODBC ......................................... SKIPPED
[INFO] Hive Shims Aggregator ............................. SKIPPED
[INFO] Hive TestUtils .................................... SKIPPED
[INFO] Hive Packaging .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:35.321s
[INFO] Finished at: Thu Apr 24 17:10:24 EDT 2014
[INFO] Final Memory: 56M/629M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-exec: Compilation failure: Compilation failure:
[ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,27] cannot find symbol
[ERROR] symbol  : variable tmpSerDe
[ERROR] location: class org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer
[ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,12] internal error; cannot instantiate org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor.<init> at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor to ()
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hive-exec
+ exit 1
'

This message is automatically generated.

ATTACHMENT ID: 12641640

Hive QA added a comment - 24/Apr/14 21:11 Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12641640/HIVE-6430.09.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/31/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/31/console Messages: **** This message was trimmed, see log for full details **** As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN KW_NULL BITWISEOR" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN CharSetName CharSetLiteral" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN KW_NULL NOTEQUAL" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:115:5: Decision can match input such as "KW_CLUSTER KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:127:5: Decision can match input such as "KW_PARTITION KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:138:5: Decision can match input such as "KW_DISTRIBUTE KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:149:5: Decision can match input such as "KW_SORT KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:166:7: Decision can match input such as "STAR" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as "KW_STRUCT" using multiple alternatives: 4, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as "KW_ARRAY" using multiple alternatives: 2, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as "KW_UNIONTYPE" using multiple alternatives: 5, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as "KW_TRUE" using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as "KW_DATE StringLiteral" using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as "KW_NULL" using multiple alternatives: 1, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as "KW_FALSE" using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "KW_BETWEEN KW_MAP LPAREN" using multiple alternatives: 8, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_ORDER KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_UNION KW_ALL" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:518:5: Decision can match input such as "{AMPERSAND..BITWISEXOR, DIV..DIVIDE, EQUAL..EQUAL_NS, GREATERTHAN..GREATERTHANOREQUALTO, KW_AND, KW_ARRAY, KW_BETWEEN..KW_BOOLEAN, KW_CASE, KW_DOUBLE, KW_FLOAT, KW_IF, KW_IN, KW_INT, KW_LIKE, KW_MAP, KW_NOT, KW_OR, KW_REGEXP, KW_RLIKE, KW_SMALLINT, KW_STRING..KW_STRUCT, KW_TINYINT, KW_UNIONTYPE, KW_WHEN, LESSTHAN..LESSTHANOREQUALTO, MINUS..NOTEQUAL, PLUS, STAR, TILDE}" using multiple alternatives: 1, 3 As a result, alternative(s) 3 were disabled for that input [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-exec --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-exec --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-exec --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-exec --- [INFO] Compiling 1687 source files to /data/hive-ptest/working/apache-svn-trunk-source/ql/target/classes [INFO] ------------------------------------------------------------- [WARNING] COMPILATION WARNING : [INFO] ------------------------------------------------------------- [WARNING] Note: Some input files use or override a deprecated API. [WARNING] Note: Recompile with -Xlint:deprecation for details. [WARNING] Note: Some input files use unchecked or unsafe operations. [WARNING] Note: Recompile with -Xlint:unchecked for details. [INFO] 4 warnings [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------- [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,27] cannot find symbol symbol : variable tmpSerDe location: class org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,12] internal error; cannot instantiate org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor.<init> at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor to () [INFO] 2 errors [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Hive .............................................. SUCCESS [9.353s] [INFO] Hive Ant Utilities ................................ SUCCESS [5.818s] [INFO] Hive Shims Common ................................. SUCCESS [3.953s] [INFO] Hive Shims 0.20 ................................... SUCCESS [2.640s] [INFO] Hive Shims Secure Common .......................... SUCCESS [4.766s] [INFO] Hive Shims 0.20S .................................. SUCCESS [2.439s] [INFO] Hive Shims 0.23 ................................... SUCCESS [8.866s] [INFO] Hive Shims ........................................ SUCCESS [1.196s] [INFO] Hive Common ....................................... SUCCESS [13.038s] [INFO] Hive Serde ........................................ SUCCESS [10.412s] [INFO] Hive Metastore .................................... SUCCESS [34.091s] [INFO] Hive Query Language ............................... FAILURE [53.832s] [INFO] Hive Service ...................................... SKIPPED [INFO] Hive JDBC ......................................... SKIPPED [INFO] Hive Beeline ...................................... SKIPPED [INFO] Hive CLI .......................................... SKIPPED [INFO] Hive Contrib ...................................... SKIPPED [INFO] Hive HBase Handler ................................ SKIPPED [INFO] Hive HCatalog ..................................... SKIPPED [INFO] Hive HCatalog Core ................................ SKIPPED [INFO] Hive HCatalog Pig Adapter ......................... SKIPPED [INFO] Hive HCatalog Server Extensions ................... SKIPPED [INFO] Hive HCatalog Webhcat Java Client ................. SKIPPED [INFO] Hive HCatalog Webhcat ............................. SKIPPED [INFO] Hive HCatalog Streaming ........................... SKIPPED [INFO] Hive HWI .......................................... SKIPPED [INFO] Hive ODBC ......................................... SKIPPED [INFO] Hive Shims Aggregator ............................. SKIPPED [INFO] Hive TestUtils .................................... SKIPPED [INFO] Hive Packaging .................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2:35.321s [INFO] Finished at: Thu Apr 24 17:10:24 EDT 2014 [INFO] Final Memory: 56M/629M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-exec: Compilation failure: Compilation failure: [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,27] cannot find symbol [ERROR] symbol : variable tmpSerDe [ERROR] location: class org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java:[242,12] internal error; cannot instantiate org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor.<init> at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.GetAdaptor to () [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :hive-exec + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12641640

Sergey Shelukhin added a comment - 26/Apr/14 01:52

Make bypass work... still has a hack to remove ReduceSinkOp tag on hashtable side. Join-to-mapjoin conversion code is very convoluted, need to get hold of ReduceSink that feeds hashtable values and remove tag output from there reliably. Will read code later. And perf test with this

Sergey Shelukhin added a comment - 26/Apr/14 01:52 Make bypass work... still has a hack to remove ReduceSinkOp tag on hashtable side. Join-to-mapjoin conversion code is very convoluted, need to get hold of ReduceSink that feeds hashtable values and remove tag output from there reliably. Will read code later. And perf test with this

Hive QA added a comment - 26/Apr/14 23:17

Overall: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12642056/HIVE-6430.10.patch

ERROR: -1 due to 46 failed/errored test(s), 5424 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tblproperties
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_unset_table_property
org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testPutGetMultiple

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/55/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/55/console

Messages:

Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 46 tests failed

This message is automatically generated.

ATTACHMENT ID: 12642056

Hive QA added a comment - 26/Apr/14 23:17 Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12642056/HIVE-6430.10.patch ERROR: -1 due to 46 failed/errored test(s), 5424 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tblproperties org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_unset_table_property org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testPutGetMultiple Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/55/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/55/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 46 tests failed This message is automatically generated. ATTACHMENT ID: 12642056

Lefty Leverenz added a comment - 27/Apr/14 06:50

This adds hive.mapjoin.optimized.hashtable and hive.mapjoin.optimized.hashtable.wbsize to HiveConf.java. They both need descriptions – I assume "wb" means write buffer.

The descriptions can go in HiveConf comments or a release note for now, or you can patch hive-default.xml.template and I'll add a comment on ~~HIVE-6586~~ (for ~~HIVE-6037~~, Synchronize HiveConf with hive-default.xml.template and support show conf).

Lefty Leverenz added a comment - 27/Apr/14 06:50 This adds hive.mapjoin.optimized.hashtable and hive.mapjoin.optimized.hashtable.wbsize to HiveConf.java. They both need descriptions – I assume "wb" means write buffer. The descriptions can go in HiveConf comments or a release note for now, or you can patch hive-default.xml.template and I'll add a comment on HIVE-6586 (for HIVE-6037 , Synchronize HiveConf with hive-default.xml.template and support show conf).

Sergey Shelukhin added a comment - 29/Apr/14 03:16

ok, I found another dumb bug in this patch (this time in MJO wiring). It doesn't actually alter the results but causes lots of useless work it seems. I will fix it tomorrow probably.

Sergey Shelukhin added a comment - 29/Apr/14 03:16 ok, I found another dumb bug in this patch (this time in MJO wiring). It doesn't actually alter the results but causes lots of useless work it seems. I will fix it tomorrow probably.

Sergey Shelukhin added a comment - 29/Apr/14 03:18

Meanwhile the serialization bypass appears to work, no more arraycopy. Need to replace byte-removal hack with not tagging in ReduceSink, but after reading code creating reducesinks for this case I think I might have approached the limits of sanity... will also look tomorrow.

Sergey Shelukhin added a comment - 29/Apr/14 03:18 Meanwhile the serialization bypass appears to work, no more arraycopy. Need to replace byte-removal hack with not tagging in ReduceSink, but after reading code creating reducesinks for this case I think I might have approached the limits of sanity... will also look tomorrow.

Sergey Shelukhin added a comment - 30/Apr/14 02:02

Fix all things. The skipTag path actually doesn't work all the time and warning is output in several tez tests. Debugging that code is very difficult, will continue tomorrow. Probably ready to checkin though, we can just remove the tag

Sergey Shelukhin added a comment - 30/Apr/14 02:02 Fix all things. The skipTag path actually doesn't work all the time and warning is output in several tez tests. Debugging that code is very difficult, will continue tomorrow. Probably ready to checkin though, we can just remove the tag

Gopal Vijayaraghavan added a comment - 30/Apr/14 02:47

I will build for my nightly runs with this patch turned on.

Gopal Vijayaraghavan added a comment - 30/Apr/14 02:47 I will build for my nightly runs with this patch turned on.

Hive QA added a comment - 30/Apr/14 17:30

Overall: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12642582/HIVE-6430.11.patch

ERROR: -1 due to 7 failed/errored test(s), 5430 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testPutGetMultiple

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/86/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/86/console

Messages:

Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed

This message is automatically generated.

ATTACHMENT ID: 12642582

Hive QA added a comment - 30/Apr/14 17:30 Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12642582/HIVE-6430.11.patch ERROR: -1 due to 7 failed/errored test(s), 5430 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testPutGetMultiple Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/86/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/86/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed This message is automatically generated. ATTACHMENT ID: 12642582

Sergey Shelukhin added a comment - 30/Apr/14 23:14

Fix the tag issue, CR feedback

Sergey Shelukhin added a comment - 30/Apr/14 23:14 Fix the tag issue, CR feedback

Sergey Shelukhin added a comment - 01/May/14 02:27

fix small bug

Sergey Shelukhin added a comment - 01/May/14 02:27 fix small bug

Hive QA added a comment - 02/May/14 01:40

Overall: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12642792/HIVE-6430.12.patch

ERROR: -1 due to 6 failed/errored test(s), 5433 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/98/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/98/console

Messages:

Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed

This message is automatically generated.

ATTACHMENT ID: 12642792

Hive QA added a comment - 02/May/14 01:40 Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12642792/HIVE-6430.12.patch ERROR: -1 due to 6 failed/errored test(s), 5433 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/98/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/98/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed This message is automatically generated. ATTACHMENT ID: 12642792

Lefty Leverenz added a comment - 02/May/14 07:15

Thanks for the parameter descriptions in hive-default.xml.template. But patch 12 has a duplicate description for hive.mapjoin.optimized.hashtable.

Lefty Leverenz added a comment - 02/May/14 07:15 Thanks for the parameter descriptions in hive-default.xml.template. But patch 12 has a duplicate description for hive.mapjoin.optimized.hashtable.

Sergey Shelukhin added a comment - 06/May/14 15:36

Will remove on commit. hagleitn can you take a look? t3rmin4t0r signed off on RB but he's not formally a committer

Sergey Shelukhin added a comment - 06/May/14 15:36 Will remove on commit. hagleitn can you take a look? t3rmin4t0r signed off on RB but he's not formally a committer

Gunther Hagleitner added a comment - 09/May/14 02:01

This is neat. Some comments on rb.

Gunther Hagleitner added a comment - 09/May/14 02:01 This is neat. Some comments on rb.

Sergey Shelukhin added a comment - 09/May/14 21:13

CR feedback. RB was never posted in the JIRA, apparently... it's at https://reviews.apache.org/r/18936/

Sergey Shelukhin added a comment - 09/May/14 21:13 CR feedback. RB was never posted in the JIRA, apparently... it's at https://reviews.apache.org/r/18936/

Hive QA added a comment - 11/May/14 19:15

Overall: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644187/HIVE-6430.13.patch

ERROR: -1 due to 3 failed/errored test(s), 5439 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/175/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/175/console

Messages:

Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed

This message is automatically generated.

ATTACHMENT ID: 12644187

Hive QA added a comment - 11/May/14 19:15 Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12644187/HIVE-6430.13.patch ERROR: -1 due to 3 failed/errored test(s), 5439 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/175/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/175/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12644187

Sergey Shelukhin added a comment - 12/May/14 16:32

ping?

Sergey Shelukhin added a comment - 12/May/14 16:32 ping?

Gunther Hagleitner added a comment - 13/May/14 03:53

+1 looks good!

Gunther Hagleitner added a comment - 13/May/14 03:53 +1 looks good!

Lefty Leverenz added a comment - 13/May/14 07:49

+1 for parameter documentation.

Lefty Leverenz added a comment - 13/May/14 07:49 +1 for parameter documentation.

Sergey Shelukhin added a comment - 13/May/14 16:59

will commit today evening

Sergey Shelukhin added a comment - 13/May/14 16:59 will commit today evening

Gopal Vijayaraghavan added a comment - 13/May/14 20:33

Is there any solution for the partial build problem? I have to "mvn clean" for every build after this patch.

[ERROR] /grid/5/dev/gopalv/tez-autobuild/hive/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java:[224,35] method put in interface java.util.Map<K,V> cannot be applied to given typ;
[ERROR] required: org.apache.hadoop.hive.ql.exec.Operator<?>,java.util.List<org.apache.hadoop.hive.ql.exec.Operator<?>>
[ERROR] found: org.apache.hadoop.hive.ql.exec.MapJoinOperator,java.util.List<org.apache.hadoop.hive.ql.exec.Operator<? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>>
[ERROR] reason: actual argument java.util.List<org.apache.hadoop.hive.ql.exec.Operator<? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>> cannot be converted to java.util.List<org.apache.hadoop.hiven
[ERROR] -> [Help 1]

Gopal Vijayaraghavan added a comment - 13/May/14 20:33 Is there any solution for the partial build problem? I have to "mvn clean" for every build after this patch. [ERROR] /grid/5/dev/gopalv/tez-autobuild/hive/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java:[224,35] method put in interface java.util.Map<K,V> cannot be applied to given typ; [ERROR] required: org.apache.hadoop.hive.ql.exec.Operator<?>,java.util.List<org.apache.hadoop.hive.ql.exec.Operator<?>> [ERROR] found: org.apache.hadoop.hive.ql.exec.MapJoinOperator,java.util.List<org.apache.hadoop.hive.ql.exec.Operator<? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>> [ERROR] reason: actual argument java.util.List<org.apache.hadoop.hive.ql.exec.Operator<? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>> cannot be converted to java.util.List<org.apache.hadoop.hiven [ERROR] -> [Help 1]

Gopal Vijayaraghavan added a comment - 13/May/14 22:02

Seems to be only breaking on JDK7 javac.

And only on rebuilds with modifications - never on "mvn clean package" builds.

Gopal Vijayaraghavan added a comment - 13/May/14 22:02 Seems to be only breaking on JDK7 javac. And only on rebuilds with modifications - never on "mvn clean package" builds.

Sergey Shelukhin added a comment - 14/May/14 01:40

Hmm... I cannot repro this... tried JDK 6 or 7, clean build or not, and with modifications. Can you make an addendum patch that fixes it? So I could apply on top

Sergey Shelukhin added a comment - 14/May/14 01:40 Hmm... I cannot repro this... tried JDK 6 or 7, clean build or not, and with modifications. Can you make an addendum patch that fixes it? So I could apply on top

Gopal Vijayaraghavan added a comment - 14/May/14 01:59

I can confirm that if I do an "mvn install" once, this problem goes away for a day (always fails exactly only on the first build of the day with the patch).

If I had to guess, that's because my maven update interval is once-a-day for snapshots. Once you commit this, the .m2/ version from apache-snapshots will match up and my builds won't break anymore (hopefully).

Commit this and if it breaks again for me, I'll post an addendum as a new patch.

Gopal Vijayaraghavan added a comment - 14/May/14 01:59 I can confirm that if I do an "mvn install" once, this problem goes away for a day (always fails exactly only on the first build of the day with the patch). If I had to guess, that's because my maven update interval is once-a-day for snapshots. Once you commit this, the .m2/ version from apache-snapshots will match up and my builds won't break anymore (hopefully). Commit this and if it breaks again for me, I'll post an addendum as a new patch.

Sergey Shelukhin added a comment - 14/May/14 20:21

Reproed it on SVN. It is not related to this patch, fixing anyway. I'm assuming +1 stands...

Sergey Shelukhin added a comment - 14/May/14 20:21 Reproed it on SVN. It is not related to this patch, fixing anyway. I'm assuming +1 stands...

Sergey Shelukhin added a comment - 14/May/14 20:55

committed to trunk

Sergey Shelukhin added a comment - 14/May/14 20:55 committed to trunk

Lefty Leverenz added a comment - 13/Jun/14 10:38

The configuration parameters hive.mapjoin.optimized.hashtable and hive.mapjoin.optimized.hashtable.wbsize need to be documented in the wiki for release 0.14.0.

Hive Configuration Properties

Lefty Leverenz added a comment - 13/Jun/14 10:38 The configuration parameters hive.mapjoin.optimized.hashtable and hive.mapjoin.optimized.hashtable.wbsize need to be documented in the wiki for release 0.14.0. Hive Configuration Properties

Sergey Shelukhin added a comment - 13/Jun/14 16:28

They are already documented in config template as far as I recall. Should we have that copied to wiki automatically somehow?

Sergey Shelukhin added a comment - 13/Jun/14 16:28 They are already documented in config template as far as I recall. Should we have that copied to wiki automatically somehow?

Lefty Leverenz added a comment - 13/Jun/14 23:32

We don't have a way to add parameters to the wiki automatically. Yes, they're in the template file and I've got them on my wiki to-do list, but feel free to take care of them yourself if you have time.

Mapjoin parameters don't have a section of their own, but they're listed together in order of Hive release (except for a couple of hive.skewjoin.mapjoin parameters) so these belong after hive.mapjoin.lazy.hashtable:

hive.mapjoin.lazy.hashtable

Lefty Leverenz added a comment - 13/Jun/14 23:32 We don't have a way to add parameters to the wiki automatically. Yes, they're in the template file and I've got them on my wiki to-do list, but feel free to take care of them yourself if you have time. Mapjoin parameters don't have a section of their own, but they're listed together in order of Hive release (except for a couple of hive.skewjoin.mapjoin parameters) so these belong after hive.mapjoin.lazy.hashtable: hive.mapjoin.lazy.hashtable

Thejas Nair added a comment - 13/Nov/14 19:43

This has been fixed in 0.14 release. Please open new jira if you see any issues.

Thejas Nair added a comment - 13/Nov/14 19:43 This has been fixed in 0.14 release. Please open new jira if you see any issues.

Lefty Leverenz added a comment - 04/Dec/14 01:27

Doc done, with links from the Tez parameter section:

Lefty Leverenz added a comment - 04/Dec/14 01:27 Doc done, with links from the Tez parameter section: Configuration Properties – hive.mapjoin.optimized.hashtable Configuration Properties – hive.mapjoin.optimized.hashtable.wbsize Configuration Properties – Tez

Alex Kolbasov added a comment - 13/Feb/18 20:18

misha@cloudera.com FYI.

Alex Kolbasov added a comment - 13/Feb/18 20:18 misha@cloudera.com FYI.

Sergey Shelukhin added a comment - 13/Feb/18 20:21

This has since been superseded by vectorized mapjoin that improves the hashtable further and specializes it for java types and special cases

Sergey Shelukhin added a comment - 13/Feb/18 20:21 This has since been superseded by vectorized mapjoin that improves the hashtable further and specializes it for java types and special cases

Misha Dmitriev added a comment - 13/Feb/18 22:17

Thank you akolb! This is nice work of the kind I wish I can do more

Misha Dmitriev added a comment - 13/Feb/18 22:17 Thank you akolb ! This is nice work of the kind I wish I can do more

Hive

MapJoin hash table has large memory overhead

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates