Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3178

csv reader should allow newlines inside quotes

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.9.0
    • Component/s: Storage - Text & CSV
    • Labels:
      None
    • Environment:

      Ubuntu Trusty 14.04.2 LTS

      Description

      When reading a csv file which contains newlines within quoted strings, e.g. via

      select * from dfs.`/tmp/q.csv`;

      Drill 1.0 says:

      Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException: Error processing input: Cannot use newline character within quoted string

      But many tools produce csv files with newlines in quoted strings. Drill should be able to handle them.

      Workaround: the csvquote program (https://github.com/dbro/csvquote) can encode embedded commas and newlines, and even decode them later if desired.

        Attachments

        1. drill-3178.patch
          8 kB
          F Méthot

          Issue Links

            Activity

              People

              • Assignee:
                fmethot F Méthot
                Reporter:
                nealmcb Neal McBurnett
                Reviewer:
                Krystal
              • Votes:
                5 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: