Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7219

Ignore hidden file problems

    XMLWordPrintableJSON

Details

    Description

      Drill seems to use different filtering rules for files depending on the type.

      • Parquet: filtering hidden file (starting with ".") whether we request the directory or the files with *
        /* DirPqt
           |--sub1.pqt
           |--sub2.pqt
           |--.sub3.pqt
        */
        SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt`);
        => 2
        SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt/*`);
        => 2
        /* Its possible to request the hidden file */
        SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt/.*`);
        => 1
        /* But don't know how to request visible and hidden simultaneously (except to do an union) */
        
      • CSV, json: filtering hidden file (starting with ".") depends if the request is on directory or files
        /* DirCSVH
           |--sub1.csvh
           |--sub2.csvh
           |--.sub3.csvh
        */
        SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH`);
        => 2
        SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/*`);
        => 3
        /* Like for Parquet, its possible to request the hidden file*/
        SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/.*`);
        =>1
        /* It's also possible to request only visible */
        SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/[^.]*`);
        =>2
        /* But don't know how to request visible and hidden simultaneously (except to do an union)*/
        

      Some issue are about the problematic of hidden files, example : DRILL-2424
      But don't found any precision of this filtering in the documentation. I found that hidden file start with "." or "_" but maybe there are other case ?

      It's a little bit strange to not have the same filtering rules depending of the type of the file.
      It's not practical to not have the possibility to simply say if we want or not hidden file. For example with a :

      SELECT * FROM ....`MyDir/[.]?*`;
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            benj641 benj
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: