Apache Drill
  1. Apache Drill
  2. DRILL-1072

Drill is very slow when we have a large number of text files

    Details

      Description

      git.commit.id.abbrev=efa3274
      Build# 26178

      As the total number of files under the below directory increase, drill becomes very slow. Check the results for different file counts for the below query.

      All files just contain 1 number and have a '.tbl' extension

      select count from dfs.`/drill/testdata/morefiles`;

      100 files — 5.183 seconds
      250 files — 15.021 seconds
      500 files — 26.846 seconds
      1000 files — 69.835 seconds
      5000 files — 1573.589 seconds

      The logs contain these messages repeatedly when executing against 5000 files:

      22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:22.819 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
      22:02:22.840 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
      22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:22.864 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
      22:02:23.035 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
      22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:23.060 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5

        Issue Links

          Activity

          Rahul Challapalli created issue -
          Jacques Nadeau made changes -
          Field Original Value New Value
          Fix Version/s 0.5.0 [ 12324880 ]
          Jacques Nadeau made changes -
          Assignee Steven Phillips [ sphillips ]
          Sudheesh Katkam made changes -
          Due Date 15/Aug/14
          Jacques Nadeau made changes -
          Fix Version/s 0.6.0 [ 12327472 ]
          Fix Version/s 0.5.0 [ 12324880 ]
          Parth Chandra made changes -
          Fix Version/s 0.8.0 [ 12328812 ]
          Fix Version/s 0.6.0 [ 12327472 ]
          Jacques Nadeau made changes -
          Priority Major [ 3 ] Minor [ 4 ]
          Jacques Nadeau made changes -
          Fix Version/s 0.9.0 [ 12328813 ]
          Fix Version/s 0.8.0 [ 12328812 ]
          Tony Stevenson made changes -
          Workflow no-reopen-closed, patch-avail, testing [ 12871822 ] Drill workflow [ 12935132 ]
          Chris Westin made changes -
          Fix Version/s 1.0.0 [ 12325568 ]
          Fix Version/s 0.9.0 [ 12328813 ]
          Chris Westin made changes -
          Link This issue is duplicated by DRILL-1681 [ DRILL-1681 ]
          Chris Westin made changes -
          Fix Version/s 1.1.0 [ 12329689 ]
          Fix Version/s 1.0.0 [ 12325568 ]
          Chris Westin made changes -
          Fix Version/s 1.2.0 [ 12332042 ]
          Fix Version/s 1.1.0 [ 12329689 ]
          Parth Chandra made changes -
          Fix Version/s 1.4.0 [ 12332947 ]
          Fix Version/s 1.2.0 [ 12332042 ]

            People

            • Assignee:
              Steven Phillips
              Reporter:
              Rahul Challapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:

                Development