[DRILL-4648] select count(*) on csv file fails with UNSUPPORTED_OPERATION - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 1.10.0
Component/s: Execution - Data Types, Functions - Drill, Storage - Text & CSV
Labels:
None

Description

When trying to perform a select count on a CSV file the following error is encountered:
0: jdbc:drill:drillbit=10.1.101.10> select count from `views/db/test.csv`;
Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header names are supported

column name columns
column index
Fragment 0:0

[Error Id: b38a1e44-c2f5-44a3-9960-6062debc6b50 on xxxxxx.compute.internal:31010] (state=,code=0)

If we refer to a column in the file by name it works, eg:

0: jdbc:drill:drillbit=10.1.101.10> select count(COLUMN_ONE) from `views/db/test.csv`;
---------

EXPR$0

---------

---------
1 row selected (0.144 seconds)
0: jdbc:drill:drillbit=10.1.101.10>

The test.csv file contents:
~/D❯❯❯ cat test.csv
"COLUMN_ONE","COLUMN_TWO"
"Hello","World"
~/D❯❯❯

Drill is talking to a file mounted on Alluxio.

More info:
Mounting s3 directly gives the following results:
With extractHeaders NOT turned on:
: jdbc:drill:drillbit=10.1.101.10> select count from `src/db/test.csv`;
---------

EXPR$0

---------

---------
1 row selected (0.951 seconds)
0: jdbc:drill:drillbit=10.1.101.10>

*With extractHeaders = true :*

0: jdbc:drill:drillbit=10.1.101.10> select count from `src/db/test.csv`;
Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header names are supported

column name columns
column index
Fragment 0:0

[Error Id: 5609cf0d-7553-44b5-bd90-40bce1c020a9 on ixxxxxx.compute.internal:31010] (state=,code=0)
0: jdbc:drill:drillbit=10.1.101.10>

Workspace file:

{
"type": "file",
"enabled": true,
"connection": "s3a://<my-bucket>",
"config":

{ "fs.s3a.access.key": "xxx", "fs.s3a.secret.key": "xxx" }

,
"workspaces": {
"root":

{ "location": "/", "writable": false, "defaultInputFormat": null }

,
"tmp":

{ "location": "/tmp", "writable": true, "defaultInputFormat": null }

},
"formats": {
"psv":

{ "type": "text", "extensions": [ "tbl" ], "delimiter": "|" }

,
"csv":

{ "type": "text", "extensions": [ "csv" ], "extractHeader": true, "delimiter": "," }

,
"tsv":

{ "type": "text", "extensions": [ "tsv" ], "delimiter": "\t" }

,
"parquet":

{ "type": "parquet" }

,
"json":

{ "type": "json", "extensions": [ "json" ] }

,
"avro":

{ "type": "avro" }

,
"sequencefile":

{ "type": "sequencefile", "extensions": [ "seq" ] }

,
"csvh":

{ "type": "text", "extensions": [ "csvh" ], "extractHeader": true, "delimiter": "," }

}
}

Attachments

Issue Links

relates to

DRILL-4919 Fix select count(1) / count(*) on csv with header

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Peter McTaggart

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/May/16 03:12

Updated:: 05/Jan/18 13:08

Resolved:: 05/Jan/18 13:08