[DRILL-6028] Allow splitting generated code in ChainedHashTable into blocks to avoid "code too large" error - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.10.0
Fix Version/s: 1.13.0
Component/s: None
Labels:
- ready-to-commit

Description

Allow splitting generated code in ChainedHashTable into blocks to avoid "code too large" error.

REPRODUCE
File 1200_columns.csv

0,1,2,3...1200
0,1,2,3...1200

Query

select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`
union
select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv`

Error

Error: SYSTEM ERROR: CompileException: File 'org.apache.drill.exec.compile.DrillJavaFileObject[HashTableGen10.java]', Line -7886, Column 24: HashTableGen10.java:57650: error: code too large
        public boolean isKeyMatchInternalBuild(int incomingRowIdx, int htRowIdx)
                       ^ (compiler.err.limit.code)

ROOT CAUSE
~~DRILL-4715~~ added ability to ensure that methods size won't go beyond the 64k limit imposed by JVM. BlkCreateMode.TRUE_IF_BOUND was added to create new block only if # of expressions added hit upper-bound defined by exec.java.compiler.exp_in_method_size. Once number of expressions in methods hits upper bound we create from call inner method.
Example:

public void doSetup(RecordBatch incomingBuild, RecordBatch incomingProbe) throws SchemaChangeException {
// some logic

return doSetup0(incomingBuild, incomingProbe);
}

During code generation ChainedHashTable added all code in its methods in one block (using BlkCreateMode.FALSE) since getHashBuild and getHashProbe methods contained state and thus could not be split. In these methods hash was generated for each key expression. For the first key seed was 0, subsequent keys hash was generated based on seed from previous key.
To allow splitting for there methods the following was done:
1. Method signatures was changed: added new parameter seedValue. Initially starting seed value was hard-coded during code generation (set to 0), now it is passed as method parameter.
2. Initially hash function call for all keys was transformed into one logical expression which did not allow splitting. Now we create logical expression for each key and thus splitting is possible. New seedValue parameter is used as seed holder to pass seed value for the next key.
3. ParameterExpression was added to generate reference to method parameter during code generation.

Code example:

    public int getHashBuild(int incomingRowIdx, int seedValue)
        throws SchemaChangeException
    {
        {
            NullableVarCharHolder out3 = new NullableVarCharHolder();
            {
                out3 .isSet = vv0 .getAccessor().isSet((incomingRowIdx));
                if (out3 .isSet == 1) {
                    out3 .buffer = vv0 .getBuffer();
                    long startEnd = vv0 .getAccessor().getStartEnd((incomingRowIdx));
                    out3 .start = ((int) startEnd);
                    out3 .end = ((int)(startEnd >> 32));
                }
            }
            IntHolder seedValue4 = new IntHolder();
            seedValue4 .value = seedValue;
            //---- start of eval portion of hash32 function. ----//
            IntHolder out5 = new IntHolder();
            {
                final IntHolder out = new IntHolder();
                NullableVarCharHolder in = out3;
                IntHolder seed = seedValue4;
                 
Hash32FunctionsWithSeed$NullableVarCharHash_eval: {
    if (in.isSet == 0) {
        out.value = seed.value;
    } else
    {
        out.value = org.apache.drill.exec.expr.fn.impl.HashHelper.hash32(in.start, in.end, in.buffer, seed.value);
    }
}
 
                out5 = out;
            }
            //---- end of eval portion of hash32 function. ----//
            seedValue = out5 .value;

   return getHashBuild0((incomingRowIdx), (seedValue));
}

Examples of code generation:
HashTableGen5_for_40_columns_BEFORE.java - code compiles
HashTableGen5_for_40_columns_AFTER.java - code compiles

HashTableGen5_for_1200_columns_BEFORE.java - error during compilation, method too large
HashTableGen5_for_1200_columns_AFTER.java - code compiles since methods were split

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HashTableGen5_for_1200_columns_AFTER.java
21/Dec/17 11:30
10.86 MB
Arina Ielchiieva
HashTableGen5_for_1200_columns_BEFORE.java
21/Dec/17 11:30
10.61 MB
Arina Ielchiieva
HashTableGen5_for_40_columns_AFTER.java
21/Dec/17 11:30
366 kB
Arina Ielchiieva
HashTableGen5_for_40_columns_BEFORE.java
21/Dec/17 11:30
359 kB
Arina Ielchiieva

Issue Links

links to

GitHub Pull Request #1071

Activity

People

Assignee:: Arina Ielchiieva

Reporter:: Arina Ielchiieva

Reviewer:: Paul Rogers

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Dec/17 16:45

Updated:: 19/Jan/18 13:04

Resolved:: 22/Dec/17 12:49