Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19847

Create Separate getInputSummary Service

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.0.0, 4.0.0
    • None
    • HiveServer2
    • None

    Description

      The Hive org.apache.hadoop.hive.ql.exec.Utilities.java file has taken on a life of its own. We should consider separating out the various components into their own classes. For this ticket, I propose separating out the getInputSummary functionality into its own class.

      There are several issues with the current implementation:

      1. It is synchronized. Only one query can get file input summary at a time. For a query which deals with a large data set with a large number of files, this can block other queries for a long period of time. This is especially painful when most queries use a small data set, but a large data set is submitted on occasion.
      2. For each query, time is spend setting up and tearing down a ThreadPool
      3. It uses deprecated code

      I propose breaking it out into its own class and creating a single thread pool that all queries pull from. In this way, the bottle neck will be one the number of available threads, not on a single query and if a big query is running and a small query is also submitted, the smaller query will be able to proceed.

      In regards to setup/teardown... if a query uses 15 threads to perform this summary action, then finishes, it will tear down the threads, the next query may immediate create 15 new threads for processing. With a single pool, those threads are never performing tear down and setup.

      Attachments

        1. HIVE-19847.1.patch
          54 kB
          David Mollitor
        2. HIVE-19847.2.patch
          56 kB
          David Mollitor
        3. HIVE-19847.3.patch
          50 kB
          David Mollitor
        4. HIVE-19847.4.patch
          50 kB
          David Mollitor
        5. HIVE-19847.5.patch
          50 kB
          David Mollitor
        6. HIVE-19847.6.patch
          50 kB
          David Mollitor

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            belugabehr David Mollitor Assign to me
            belugabehr David Mollitor

            Dates

              Created:
              Updated:

              Slack

                Issue deployment