[IMPALA-9747] More fine-grained codegen for text file scanners - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Implemented
Affects Version/s: None
Fix Version/s: None
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-4

Description

Currently if the materialization of any column cannot be codegend for some reason (e.g. it is CHAR(N)), then the whole codegen is cancelled for the text scanner, see:
https://github.com/apache/impala/blob/b5805de3e65fd1c7154e4169b323bb38ddc54f4f/be/src/exec/text-converter.cc#L112
https://github.com/apache/impala/blob/58273fff601dcc763ac43f7cc275a174a2e18b6b/be/src/exec/hdfs-scanner.cc#L342

It would be much better to use the non-codegend path only for the problematic columns and use the codegend materialization for the rest + always do conjunct evaluation with codegen.

The codegend path orders slots based on the conjuncts that use them and evaluates conjuncts when the slots it need becomes available, so if the row is dropped then the rest of the slots do not need to be materialized. A simple solution would be to always do non-codegend slot materialization first so that they are ready if a conjunct needs them. Moving the columns that are not used by conjuncts to the end could be a further optimization.

This came up during the materialization of BINARY columns, which needs base64 decoding during materialization.

Attachments

Issue Links

relates to

IMPALA-7332 Add Char codegen support to text scanner

Open

Activity

People

Assignee:: Daniel Becker

Reporter:: Csaba Ringhofer

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 14/May/20 13:56

Updated:: 29/Jun/20 10:29

Resolved:: 29/Jun/20 10:29