Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
Ubuntu LXC + -Xmx512m client opts
-
Save memory in HashTableSink for key-only map-joins, by reusing a single EMPTY_ROW_CONTAINER object
Description
The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code.
Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value);
But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container & a pre-allocated zero object array which is immutable (the only immutable array there is in java).
The query tested is roughly the following to scan all of customer_demographics in the hash-sink
select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ;
When running with current trunk, the code results in an OOM with 512Mb ram.
2013-08-23 05:11:26 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information