Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16972

FetchOperator: filter out inputSplits which length is zero

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.1.0, 2.1.1
    • None
    • Physical Optimizer
    • None

    Description

      • Background
        We can describe the basic work flow of common HQL query as follows:
        1. compile and execute
        2. fetch results
        In many cases, we don't need to worry about the issues fetching results from HDFS(iff there are mapreduce jobs generated in planning step). However, the number of results files on HDFS and data distribution will affect the final status of HQL query, especially for HiveServer2. We have some map-only queries, e.g:
        select * from myTable where date > '20170101' and date <= '20170301' and id = 88;
        

        This query will generate more than 20,000 files(look at screenshot image uploaded) on HDFS and most of those files are empty. Of course, they are very sparse. If we send TFetchResultsRequest from HiveServer2 client with some parameters(timeout:90s, maxRows:1024) , FetchOperator can not fetch 1024 rows in 90 seconds and our HiveServer2 client will mark this TFetchResultsRequest as timed out failure. Why? In fact, It's expensive to fetch results from empty file. In our HDFS cluster( 5000+ DataNodes) , reading data from an empty file will cost almost 100 ms (100ms * 1000 ==> 100s > 90s timeout). Obviously, we can filter out those empty files or splits to speed up the process of FetchResults.

      Attachments

        1. HIVE-16972.6.patch
          3 kB
          Chaozhong Yang
        2. HIVE-16972.5.patch
          2 kB
          Chaozhong Yang
        3. HIVE-16972.4.patch
          2 kB
          Chaozhong Yang
        4. HIVE-16972.3.patch
          2 kB
          Chaozhong Yang
        5. HIVE-16972.2.patch
          1 kB
          Chaozhong Yang
        6. HIVE-16972.patch
          1 kB
          Chaozhong Yang

        Issue Links

          Activity

            People

              debugger87 Chaozhong Yang
              debugger87 Chaozhong Yang
              Votes:
              2 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: