Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-8496

Drill Query fails when the escape character(which is part of the data) is just before the quote

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.21.1
    • None
    • None

    Description

      I have the following csv-

       

      "id"^"first_name"^"last_name"^"email"^"gender"
      "1"^"John"^"143 \\"^"
      ewilkes0@buzzfeed.com"^"Male"
      "2"^"Willaim"^"Khan"^"bmacdonald1@microsoft.com"^"Male"

      and when i run a drill query (SELECT *
      FROM dfs.`C:\Users\achyu\Documents\dir2`)-
      I am getting the following error-

      UserRemoteException :  DATA_READ ERROR: Unexpected character '101' following quoted value of CSV field. Expecting '94'. Cannot parse CSV input." 

      This is my dfs configuration for csv in apache drill.I am using the version 1.21.1-

      "csv": { "type": "text", "extensions": [ "csv" ], "lineDelimiter": "\n", "fieldDelimiter": "^", "quote": "\"", "escape": "\\", "comment": "#", "extractHeader": true }

      Turns out this is because of this particular portion- 

      "143 \\"

      In this csv 

      143 \\

      is part of the data and its not an escape character, But as this character is before the quote its failing. If i just give a space between the escape and " and quote then it works completely fine.
      I guess this is a bug.
      Any insights(for escaping the escape character before the quote) or workaround on the same?

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            achyut09 achyut09
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: