Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23218

LlapRecordReader queue limit computation is not optimal

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.0
    • Component/s: llap
    • Labels:
      None

      Description

      After decoding OrcEncodedDataConsumer::decodeBatch, data is enqueued into a queue in LlapRecordReader. Queue limit for this queue is determined in LlapRecordReader. If it is minimal, it ends up waiting for 100ms until it gets capacity.

      https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L168

      https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L590

      https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L260

      determineQueueLimit takes into consideration all columns though only few columns are needed for projection. Here is an example.

      create table test_acid(a1 string, a2 string, a3 string, a4 string, a5 string, a6 string, a7 string, a8 string, a9 string, a10 string,
      a11 string, a22 string, a33 string, a44 string, a55 string, a66 string, a77 string, a88 string, a99 string, a100 string,
      a111 decimal(25,2), a222 decimal(25,2), a333 decimal(25,2), a444 decimal(25,2), a555 decimal(25,2), a666 decimal(25,2), a777 decimal(25,2),
       a888 decimal(25,2), a999 decimal(25,2), a1000 decimal(25,2)) stored as orc;
      
      insert into table test_acid values ("a1","a2","a3","a4","a5","a6","a7","a8","a9","a10",
      "a11","a22","a33","a44","a55","a66","a77","a88","a99","a100",
      10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23
      );
      
      select a44, count(*) from test_acid where a44 like "a4%" group by a44 order by a44;
      
      

      For this query, queue size predicted would be "138" as it takes into account all fields instead of just 2. This would causes unwanted delays in adding data to the queue.

        Attachments

        1. HIVE-23218.3.patch
          9 kB
          Ramesh Kumar Thangarajan
        2. HIVE-23218.2.patch
          9 kB
          Ramesh Kumar Thangarajan

          Activity

            People

            • Assignee:
              rameshkumar Ramesh Kumar Thangarajan
              Reporter:
              rajesh.balamohan Rajesh Balamohan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: