Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22221

Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint

Log workAgile BoardRank to TopRank to BottomVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.2
    • Fix Version/s: 4.0.0
    • Component/s: llap, UDF

      Description

      While querying through llap external client, LlapBaseInputFormat#getSplits() invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods.

      GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies around 90% of the split size.
      Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb with planBytes[] being common to all the splits and occupying more than 850 kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size.

      This can be resolved by separating out common parts from actual splits and reassembling them at client side.
      We can also provide an option where client can say it does not want to reassemble them and can take the control of reassembling in it's hands.

      Splits can be broken like:
      1) schema split
      2) plan split
      3) actual split 1
      4) actual split 2....and so on.

      This greatly reduces the memory(in my case from 5GB(~5000 splits) to around 15MB) on server side and hence the data transfer. And this eliminates OOM on HS2 side.

      cc Jason Dere Sankar Hariappan Thejas Nair

        Attachments

        1. HIVE-22221.1.patch
          44 kB
          Shubham Chaurasia
        2. HIVE-22221.2.patch
          45 kB
          Shubham Chaurasia
        3. HIVE-22221.3.patch
          45 kB
          Shubham Chaurasia
        4. HIVE-22221.4.patch
          45 kB
          Shubham Chaurasia
        5. HIVE-22221.5.patch
          45 kB
          Shubham Chaurasia

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              ShubhamChaurasia Shubham Chaurasia Assign to me
              Reporter:
              ShubhamChaurasia Shubham Chaurasia

              Dates

              • Created:
                Updated:
                Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 50m
              50m

                Issue deployment