Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5414

Issue with Querying Directories

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.10.0
    • None
    • Functions - Drill
    • None
    • Kubernetes running Debian GNU/Linux 8 containers.
      openjdk version "1.8.0_111".
      AWS.
      Using s3 buckets

    Description

      Hi

      *Thanks for apache drill - it's pretty awesome

      I'm hoping to exploit drill directory querying and have structured my data archive in s3 to test this. However, I've got an issue using directory querying.

      My directory structure in s3 is like:
      s3/devices_by_id/device_id/2016/11/12/<filename>.json.gz

      From the documentation I figured the following queries were equivalent:

      select count from `s3`.`/deviceid/xyz/2016/11/` ;
      ---------

      EXPR$0

      ---------

      286049

      ---------
      1 row selected (10.351 seconds)

      select count from `s3`.`/deviceid/` where dir0='xyz' and dir1='2016' and dir2='11'; But this latter query just hangs. There is no profile in the UI. I cntrl-c and get :

      --

       

      --
      --
      No rows selected (1481.727 seconds)

      If I try to run an explain plan, that also hangs.

      There are a total of 13283 compressed json files in the 2016/11 s3 bucket.

      The log doesn't show much information.

      If anyone can help with this please? I can provide more information as required. Hopefully this is not user error.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mobime Paul Makkar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: