Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5498 CSV text reader does not handle duplicate header names
  3. DRILL-5492

CSV reader does not validate header names, causes nonsense output

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      Consider the same test case as in DRILL-5491, but with a slightly different input file:

      ___
      a,b,c
      d,e,f
      

      The underscores represent three spaces: use spaces in the real test.

      In this case, the code discussed in DRILL-5491 finds some characters and happily returns the following array:

      ["   "]
      

      The field name of three blanks is returned to the client to produce the following bizarre output:

      2 row(s):
          
      a
      d
      

      The blank line is normally the header, but the header here was considered to be three blanks. (In fact, the blanks are actually printed.)

      Since the blanks were considered to be a field, the file is assumed to have only one field, so only the first column was returned.

      The expected behavior is that spaces are trimmed from field names, so the field name list would be empty and a User Error thrown. (That is, it is confusing to the user why a blank line produces NPE, some produce the ExecutionSetupException shown in DRILL-5491, and some produce blank headings. Behavior should be consistent.

      Attachments

        Activity

          People

            paul-rogers Paul Rogers
            paul-rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: