Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5553

SELECT *, columns produces nonsense results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.10.0
    • None
    • Storage - Text & CSV
    • None

    Description

      Consider the case discussed in DRILL-5551. Create a slight variation.

      Input file: CSV with headers:

      a,b,c
      10,foo,bar
      

      As in DRILL-5550, CSV plugin is configured to use headers.

      Run this (admittedly strange) query:

      SELECT *, columns FROM `dfs.data.example.csv`
      

      The resulting schema is:

      BatchSchema [fields=[
      a(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
      b(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
      c(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
      columns(INT:OPTIONAL) [$bits$(UINT1:REQUIRED), columns(INT:OPTIONAL)]], 
      selectionVector=NONE]
      

      To make it easier to read:

      a(VARCHAR:REQUIRED), 
      b(VARCHAR:REQUIRED).
      c(VARCHAR:REQUIRED),
      columns(INT:OPTIONAL)
      

      In DRILL-5551, columns changes meaning from an array of columns to a blank normal column. Here, it changes meaning again to a nullable Int (our normal "placeholder" for missing columns.)

      Expected:

      1. That, per DRILL-5552, no other column reference can occur with "*".
      2. If item 1 is not fixed, that the scanner (or text reader) forbid the use of either "*" or "columns" with other column references.

      Attachments

        Activity

          People

            Unassigned Unassigned
            paul-rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: