Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
When executing the following query in LLAP (single instance) in a 5 node cluster, lots of GC pressure was observed.
select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon from (select 'depart' as type, origin as city, count(origin) as frequency from flights group by origin order by frequency desc, type) as a left join airports as b on a.city = b.iata order by frequency desc;
Flights table has got around 7000+ partitions in S3. Profiling revealed large amount of objects created just in path comparisons in HiveInputFormat. HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.