Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20552

Get Schema from LogicalPlan faster

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      To get the schema of a query faster, it currently needs to compile, optimize, and generate a TezPlan, which creates extra overhead when only the LogicalPlan is needed.

      1. Copy the method HiveMaterializedViewsRegistry.parseQuery, making it public static and putting it in a utility class.
      2. Change the return statement of the method to return analyzer.getResultSchema();
      3. Change the return type of the method to List<FieldSchema>
      4. Call the new method from GenericUDTFGetSplits.createPlanFragment replacing the current code which does this:

       if(num == 0) {
       //Schema only
       return new PlanFragment(null, schema, null);
       }
      

      moving the call earlier in getPlanFragment ... right after the HiveConf is created ... bypassing the code that uses HiveTxnManager and Driver.
      5. Convert the List<FieldSchema> to org.apache.hadoop.hive.llap.Schema.
      6. return from getPlanFragment by returning {

      {new PlanFragment(null, schema, null)}

      }

      Attachments

        1. HIVE-20552.1.patch
          5 kB
          Teddy Choi
        2. HIVE-20552.2.patch
          7 kB
          Teddy Choi
        3. HIVE-20552.3.patch
          8 kB
          Teddy Choi

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            teddy.choi Teddy Choi Assign to me
            teddy.choi Teddy Choi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 20m
              20m

              Slack

                Issue deployment