Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14217 Druid integration
  3. HIVE-14468

Implement Druid query based input format

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • Druid integration
    • None

    Description

      It is responsible of generating the splits and creating the record readers.

      • For Timeseries, TopN, GroupBy queries. Create a single split containing the broker address and the query. Then the record reader will submit the query to the broker, retrieve the results, and parse them and generate records.
      • For Select queries. Druid has the concept of threshold (limit) in Select query. In fact, it is used for retrieving the query results in multiple requests. Hence, we will emit a Druid Segment Metadata query to obtain the number of rows in the datasource. Then we create number of rows / default_threshold splits; default_threshold is a Hive configuration property defined as hive.druid.select.threshold. Each split generated contains the broker address and a Select JSON query with start and end date range (currently we assume uniform distribution of records across the time dimension). The splits are handled independently by the record readers, which submit the query to the broker, retrieve the results, and parse them and generate records. This way we can parallelize the retrieval of results for these queries.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jcamacho Jesús Camacho Rodríguez Assign to me
            jcamacho Jesús Camacho Rodríguez
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment