Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Files containing . and _ as the first characters are ignored by hive and others are these are typically logs and status files written out by tools like mapreduce. Drill should not read them when querying a directory containing a list of parquet files.
Currently it fails with the error:
message: "Failure while setting up Foreman. < AssertionError:[ Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#78:ProjectRel.NONE.ANY([]).[](child=rel#15:Subset#1.ENUMERABLE.ANY([]).[],p_partkey=$1,p_type=$2), rel#8:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs, drillTestDirDencTpchSF100, part])] ] < DrillRuntimeException:[ java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file com.mapr.fs.MapRFileStatus@99c9d45e ] < IOException:[ Could not read footer: java.io.IOException: Could not read footer for file com.mapr.fs.MapRFileStatus@99c9d45e ] < IOException:[ Could not read footer for file com.mapr.fs.MapRFileStatus@99c9d45e ] < IOException:[ Open failed for file: /drill/testdata/dencSF100/part/.impala_insert_staging, error: Invalid argument (22) ]"
Attachments
Issue Links
- is related to
-
DRILL-2577 Parquet scan fails when directory contains _SUCCESS or _logs
- Resolved
- relates to
-
DRILL-2424 Ignore hidden files in directory path
- Closed