Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4256

Performance regression in hive planning

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a

      The fix for reading hive tables backed by hbase caused a performance regression. The data set used in the below test has ~3700 partitions and the filter in the query would ensure only 1 partition get selected.

      Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
      Query : explain plan for select count(*) from lineitem_partitioned where `year`=2015 and `month`=1 and `day` =1;
      Time : ~25 seconds
      
      Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
      Query : explain plan for select count(*) from lineitem_partitioned where `year`=2015 and `month`=1 and `day` =1;
      Time : ~6.5 seconds
      

      Since the data is large, I couldn't attach it here. Reach out to me if you need additional information.

      Attachments

        1. jstack.tgz
          51 kB
          Rahul Kumar Challapalli

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            venki387 Venki Korukanti
            rkins Rahul Kumar Challapalli
            dgu-atmapr dgu-atmapr
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment