Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15422

HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.3.0
    • None
    • None
    • Reviewed

    Description

      When executing the following query in LLAP (single instance) in a 5 node cluster, lots of GC pressure was observed.

      select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
      from (select  'depart' as type, origin as city, count(origin) as frequency
      from flights
        group by origin
        order by frequency desc, type) as a 
      left join airports as b on a.city = b.iata
      order by frequency desc;
      

      Flights table has got around 7000+ partitions in S3. Profiling revealed large amount of objects created just in path comparisons in HiveInputFormat. HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.

      Attachments

        1. HIVE-15422.1.patch
          6 kB
          Rajesh Balamohan
        2. HIVE-15422.2.patch
          6 kB
          Rajesh Balamohan
        3. HIVE-15422.3.patch
          6 kB
          Rajesh Balamohan
        4. Profiler_Snapshot_HIVE-15422.png
          177 kB
          Rajesh Balamohan

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: