Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5747

Drill should put directory name field in same sequence w.r.t regular column for select * query

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Today, star column * in Drill would expand into a list of regular columns, and the directory name field such as dir0, dir1. However, Drill does not put the directory name field with respect to regular field in a consistent way.

      For instance, for parquet files, dir0 is put behind the list of regular columns.

      select * from dfs.tmp.parquetTbl where dir0 = 1990;
      +--------------+--------------+--------------+--------------+-------+
      | N_NATIONKEY  |    N_NAME    | N_REGIONKEY  |  N_COMMENT   | dir0  |
      +--------------+--------------+--------------+--------------+-------+
      | 0            | [B@5527446   | 0            | [B@684fa264  | 1990  |
      | 1            | [B@442e88bc  | 1            | [B@4b13119c  | 1990  |
      | 2            | [B@50e93f45  | 1            | [B@138f483   | 1990  |
      | 3            | [B@423cc515  | 1            | [B@23af07ac  | 1990  |
      | 4            | [B@3820bf81  | 4            | [B@6dfccaf0  | 1990  |
      | 5            | [B@6f6f8af9  | 0            | [B@40d1a97   | 1990  |
      | 6            | [B@784cb194  | 3            | [B@731ea93f  | 1990  |
      | 7            | [B@61f9a224  | 3            | [B@4c041bbc  | 1990  |
      | 8            | [B@21b8faa1  | 2            | [B@774e7152  | 1990  |
      | 9            | [B@3ef1fbaf  | 2            | [B@c2be72    | 1990  |
      | 10           | [B@71652ec1  | 4            | [B@29e0bb10  | 1990  |
      | 11           | [B@61192cea  | 4            | [B@3bd3e873  | 1990  |
      | 12           | [B@5541f4b4  | 2            | [B@5d288126  | 1990  |
      | 13           | [B@e371592   | 4            | [B@42692b88  | 1990  |
      | 14           | [B@6a90fc8   | 0            | [B@454b16e2  | 1990  |
      | 15           | [B@44cb72f8  | 0            | [B@8e91b11   | 1990  |
      | 16           | [B@7feffda8  | 0            | [B@64f66236  | 1990  |
      | 17           | [B@6ba9fb02  | 1            | [B@649e7786  | 1990  |
      | 18           | [B@5fb93205  | 2            | [B@7783175b  | 1990  |
      | 19           | [B@3f7294a9  | 3            | [B@7b7e03c9  | 1990  |
      | 20           | [B@e2ac076   | 4            | [B@18c18a3e  | 1990  |
      | 21           | [B@4a5af924  | 2            | [B@1a9ad09f  | 1990  |
      | 22           | [B@29f6845e  | 3            | [B@776c4cd7  | 1990  |
      | 23           | [B@6728f481  | 3            | [B@31cc7610  | 1990  |
      | 24           | [B@665b2dfa  | 1            | [B@6c27ac95  | 1990  |
      +--------------+--------------+--------------+--------------+-------+
      

      Notice in the above output, dir0 = 1990 is the last column.

      However, for JSON, dir0 is put in front of the list of regular columns.

      select * from dfs.tmp.jsonTbl where dir0 = 1990;
      +-------+------+
      | dir0  |  a   |
      +-------+------+
      | 1990  | 100  |
      | 1990  | 200  |
      +-------+------+
      

      It would be good to present the directory name field in the same sequence regardless of file format, storage plugin. IMHO, it makes sense to put the directory name field in front of the list of regular columns ( the behavior that JSON format present today).

      This ticket is opened to modify Drill's ScanBatch code for the above explained purpose.

      Attachments

        Activity

          People

            jni Jinfeng Ni
            jni Jinfeng Ni
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: