Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1072

Drill is very slow when we have a large number of text files

    Details

      Description

      git.commit.id.abbrev=efa3274
      Build# 26178

      As the total number of files under the below directory increase, drill becomes very slow. Check the results for different file counts for the below query.

      All files just contain 1 number and have a '.tbl' extension

      select count from dfs.`/drill/testdata/morefiles`;

      100 files — 5.183 seconds
      250 files — 15.021 seconds
      500 files — 26.846 seconds
      1000 files — 69.835 seconds
      5000 files — 1573.589 seconds

      The logs contain these messages repeatedly when executing against 5000 files:

      22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:22.819 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
      22:02:22.840 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
      22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:22.864 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
      22:02:23.035 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
      22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
      22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
      22:02:23.060 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rkins Rahul Challapalli
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: