[HIVE-22891] Skip PartitionDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-1
Component/s: None
Labels:
None

Target Version/s:

4.0.0

Description

try {
      // TODO: refactor this out
      if (pathToPartInfo == null) {
        MapWork mrwork;
        if (HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) {
          mrwork = (MapWork) Utilities.getMergeWork(jobConf);
          if (mrwork == null) {
            mrwork = Utilities.getMapWork(jobConf);
          }
        } else {
          mrwork = Utilities.getMapWork(jobConf);
        }
        pathToPartInfo = mrwork.getPathToPartitionInfo();
      }      PartitionDesc part = extractSinglePartSpec(hsplit);
      inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part);
    } catch (HiveException e) {
      throw new IOException(e);
    }

The above piece of code in CombineHiveRecordReader.java was introduced in ~~HIVE-15147~~. This overwrites inputFormat based on the PartitionDesc which is not required in non-LLAP mode of execution as the method HiveInputFormat.wrapForLlap() simply returns the previously defined inputFormat in case of non-LLAP mode. The method call extractSinglePartSpec() has some serious performance implications. If there are large no. of small files, each call in the method extractSinglePartSpec() takes approx ~ (2 - 3) seconds. Hence the same query which runs in Hive 1.x / Hive 2 is way faster than the query run on latest hive.

2020-02-11 07:15:04,701 INFO [main] org.apache.hadoop.hive.ql.io.orc.ReaderImpl: Reading ORC rows from 

2020-02-11 07:15:06,468 WARN [main] org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: Multiple partitions found; not going to pass a part spec to LLAP IO: {{logdate=2020-02-03, hour=01, event=win}} and {{logdate=2020-02-03, hour=02, event=act}}

2020-02-11 07:15:06,468 INFO [main] org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: succeeded in getting org.apache.hadoop.mapred.FileSplit

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-22891.01.patch
14/Feb/20 14:33
4 kB
Syed Shameerur Rahman
HIVE-22891.02.patch
17/Feb/20 06:58
4 kB
Syed Shameerur Rahman
HIVE-22891.03.patch
22/Feb/20 04:13
3 kB
Syed Shameerur Rahman

Issue Links

links to

GitHub Pull Request #914

Activity

People

Assignee:: Syed Shameerur Rahman

Reporter:: Syed Shameerur Rahman

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Feb/20 14:28

Updated:: 27/Feb/24 22:23

Resolved:: 25/Feb/20 23:06