Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.15.0
-
None
-
None
Description
Drill seems to use different filtering rules for files depending on the type.
- Parquet: filtering hidden file (starting with ".") whether we request the directory or the files with *
/* DirPqt |--sub1.pqt |--sub2.pqt |--.sub3.pqt */ SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt`); => 2 SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt/*`); => 2 /* Its possible to request the hidden file */ SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt/.*`); => 1 /* But don't know how to request visible and hidden simultaneously (except to do an union) */
- CSV, json: filtering hidden file (starting with ".") depends if the request is on directory or files
/* DirCSVH |--sub1.csvh |--sub2.csvh |--.sub3.csvh */ SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH`); => 2 SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/*`); => 3 /* Like for Parquet, its possible to request the hidden file*/ SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/.*`); =>1 /* It's also possible to request only visible */ SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/[^.]*`); =>2 /* But don't know how to request visible and hidden simultaneously (except to do an union)*/
Some issue are about the problematic of hidden files, example : DRILL-2424
But don't found any precision of this filtering in the documentation. I found that hidden file start with "." or "_" but maybe there are other case ?
It's a little bit strange to not have the same filtering rules depending of the type of the file.
It's not practical to not have the possibility to simply say if we want or not hidden file. For example with a :
SELECT * FROM ....`MyDir/[.]?*`;