Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
hive + tez
-
Reviewed
Description
When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format).
Profiling revealed that,
1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method.
2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo().
I will attach the profiler snapshots soon.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-11035 PPD: Orc Split elimination fails because filterColumns=[-1]
- Resolved