Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4648

select count(*) on csv file fails with UNSUPPORTED_OPERATION

    XMLWordPrintableJSON

Details

    Description

      When trying to perform a select count on a CSV file the following error is encountered:
      0: jdbc:drill:drillbit=10.1.101.10> select count from `views/db/test.csv`;
      Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header names are supported

      column name columns
      column index
      Fragment 0:0

      [Error Id: b38a1e44-c2f5-44a3-9960-6062debc6b50 on xxxxxx.compute.internal:31010] (state=,code=0)

      If we refer to a column in the file by name it works, eg:

      0: jdbc:drill:drillbit=10.1.101.10> select count(COLUMN_ONE) from `views/db/test.csv`;
      ---------

      EXPR$0

      ---------

      1

      ---------
      1 row selected (0.144 seconds)
      0: jdbc:drill:drillbit=10.1.101.10>

      The test.csv file contents:
      ~/D❯❯❯ cat test.csv
      "COLUMN_ONE","COLUMN_TWO"
      "Hello","World"
      ~/D❯❯❯

      Drill is talking to a file mounted on Alluxio.

      More info:
      Mounting s3 directly gives the following results:
      With extractHeaders NOT turned on:
      : jdbc:drill:drillbit=10.1.101.10> select count from `src/db/test.csv`;
      ---------

      EXPR$0

      ---------

      2

      ---------
      1 row selected (0.951 seconds)
      0: jdbc:drill:drillbit=10.1.101.10>

      *With extractHeaders = true :*

      0: jdbc:drill:drillbit=10.1.101.10> select count from `src/db/test.csv`;
      Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header names are supported

      column name columns
      column index
      Fragment 0:0

      [Error Id: 5609cf0d-7553-44b5-bd90-40bce1c020a9 on ixxxxxx.compute.internal:31010] (state=,code=0)
      0: jdbc:drill:drillbit=10.1.101.10>

      Workspace file:

      {
      "type": "file",
      "enabled": true,
      "connection": "s3a://<my-bucket>",
      "config":

      { "fs.s3a.access.key": "xxx", "fs.s3a.secret.key": "xxx" }

      ,
      "workspaces": {
      "root":

      { "location": "/", "writable": false, "defaultInputFormat": null }

      ,
      "tmp":

      { "location": "/tmp", "writable": true, "defaultInputFormat": null }

      },
      "formats": {
      "psv":

      { "type": "text", "extensions": [ "tbl" ], "delimiter": "|" }

      ,
      "csv":

      { "type": "text", "extensions": [ "csv" ], "extractHeader": true, "delimiter": "," }

      ,
      "tsv":

      { "type": "text", "extensions": [ "tsv" ], "delimiter": "\t" }

      ,
      "parquet":

      { "type": "parquet" }

      ,
      "json":

      { "type": "json", "extensions": [ "json" ] }

      ,
      "avro":

      { "type": "avro" }

      ,
      "sequencefile":

      { "type": "sequencefile", "extensions": [ "seq" ] }

      ,
      "csvh":

      { "type": "text", "extensions": [ "csvh" ], "extractHeader": true, "delimiter": "," }

      }
      }

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pdmct Peter McTaggart
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: