Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.17.0
-
None
-
None
-
64Gb machine running on AWS.
Description
Running a `SELECT *` query against an empty Parquet file (i.e. one with correct column metadata written, but no rows) triggers an `IndexOutOfBoundsException`.
I've got an empty parquet file with the following schema:
$ parquet-tools schema dispute.parquet message parquet_go_root { required int32 dispute_id (INT_32) = 0; required binary title (UTF8) = 0; optional int32 start_date (DATE) = 0; optional int32 end_date (DATE) = 0; optional binary docket_number (UTF8) = 0; required binary route (UTF8) = 0; required binary jurisdiction (UTF8) = 0; }
If I then run the following query via the Drill web UI:
SELECT * FROM dfs.`/data/dispute.parquet`
then I get the following error from Drill:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IndexOutOfBoundsException: Index: 0, Size: 0 Please, refer to logs for more information. [Error Id: a93e1aa1-a7e6-4bc9-9f11-c42b9f6fe108 on e531a6492cf4:31010]
Expected result was just to get an empty result set (i.e. 0 rows).
I've attached the parquet file in question, and the relevant entries from the drillbit.log.