Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1712

Quoted CSV parsing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.6.0
    • Future
    • None
    • None
    • MapR 4.0.1 M5

    Description

      When querying CSV files Drill doesn't handle quoted CSV files properly and includes the quotes in the data. The directory /tmp/hari in MapR-FS has two simple CSV files, one quoted, one not quoted so you can see the difference.

      0: jdbc:drill:> select * from dfs.`/tmp/hari` limit 10;
      +------------+
      |  columns   |
      +------------+
      | ["1","2","3"] |
      | ["4","5","6"] |
      | ["7","8","9"] |
      | ["\"1\"","\"2\"","\"3\""] |
      | ["\"4\"","\"5\"","\"6\""] |
      | ["\"7\"","\"8\"","\"9\""] |
      +------------+
      6 rows selected (0.238 seconds)
      
       cat hari/hari.csv
      1,2,3
      4,5,6
      7,8,9
      cat hari/hari2.csv
      "1","2","3"
      "4","5","6"
      "7","8","9"
      

      It shouldn't be including the quotes as data, they're just containers to the data.

      This is related to DRILL-950 but is not the same issue.

      Regards,

      Hari Sekhon
      http://www.linkedin.com/in/harisekhon

      Attachments

        Activity

          People

            paul-rogers Paul Rogers
            harisekhon Hari Sekhon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: