Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5553

SELECT *, columns produces nonsense results

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.10.0
    • Fix Version/s: None
    • Component/s: Storage - Text & CSV
    • Labels:
      None

      Description

      Consider the case discussed in DRILL-5551. Create a slight variation.

      Input file: CSV with headers:

      a,b,c
      10,foo,bar
      

      As in DRILL-5550, CSV plugin is configured to use headers.

      Run this (admittedly strange) query:

      SELECT *, columns FROM `dfs.data.example.csv`
      

      The resulting schema is:

      BatchSchema [fields=[
      a(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
      b(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
      c(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
      columns(INT:OPTIONAL) [$bits$(UINT1:REQUIRED), columns(INT:OPTIONAL)]], 
      selectionVector=NONE]
      

      To make it easier to read:

      a(VARCHAR:REQUIRED), 
      b(VARCHAR:REQUIRED).
      c(VARCHAR:REQUIRED),
      columns(INT:OPTIONAL)
      

      In DRILL-5551, columns changes meaning from an array of columns to a blank normal column. Here, it changes meaning again to a nullable Int (our normal "placeholder" for missing columns.)

      Expected:

      1. That, per DRILL-5552, no other column reference can occur with "*".
      2. If item 1 is not fixed, that the scanner (or text reader) forbid the use of either "*" or "columns" with other column references.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              paul-rogers Paul Rogers
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: