Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4379

Unexpected Table Behavior with only one subdirectory vs. Many

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.0
    • None
    • None

    Description

      A common practice is to use directories below a main directory as a partitioning device. Say you have a table named "myawesomedata" and you get data into that table every day, it would be valuable to create the main directory, then subdirectories per day to help optimize queries running against only certain days of data.

      /myawesomedata/
      /myawesomedata/2016-02-01
      /myawesomedata/2016-02-02
      /myawesomedata/2016-02-03
      /myawesomedata/2016-02-04

      I have identified a condition that if there is ONLY one subdirectory, queries do not return results as expected by a user.

      Example:

      In the above, if I run a query of

      select count(1) from `myawesomedata`;

      I get accurate results of the count in all subdirectories

      If I run:

      select count(1) from `myawesomedata` where dir0 = '2016-02-01';

      I get accurate results of the count of only the subdirectory 2016-02-01

      However, if I delete subdirectories 2016-02-02, 2016-02-03, and 2016-02-04 and am left with:

      /myawesomedata/
      /myawesomedata/2016-02-01

      Then if I run

      select count(1) from `myawesomedata`;

      It returns the accurate count (which is just that of the 2016-02-01 directory).

      However, if I run

      select count(1) from `myawesomedata` where dir0 = '2016-02-01';

      It takes much longer (15 seconds vs instant on the other queries) and returns no results. Even though this is the same query as above that worked with 2 or more subdirectories. Basically, when there is only one subdirectory, a query asking for only that directory does not work in the same way as when there are more subdirectories. This is an unexpected user experience and something I believe could cause user frustration and unexpected results from Drill usage on data.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mandoskippy John Omernik
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: