Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2760

Quoted strings from CSV file appear in query output in different forms

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 1.2.0
    • Component/s: Storage - Text & CSV
    • Labels:
      None
    • Environment:

      Description

      Quoted strings appear in query output in different forms, as shown in the section below.
      Quotes should NOT appear in query output. Strings must be stripped of their leading and prevailing quotes. (I am referring to this character - " )

      Snippet of data from airports.cv file, first three lines, the first line has header information.

      [root@centos-01 airport_CSV_data]# head -3 airports.csv
      "id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
      6523,"00A","heliport","Total Rf Heliport",40.07080078125,-74.9336013793945,11,"NA","US","US-PA","Bensalem","no","00A",,"00A",,,
      6524,"00AK","small_airport","Lowell Field",59.94919968,-151.695999146,450,"NA","US","US-AK","Anchor Point","no","00AK",,"00AK",,,
      

      case 1) In this case quotes are not escaped, they appear in the output as is.

      0: jdbc:drill:> select columns[0] id,columns[1] ident,columns[2] type,columns[3] name,columns[4] latitude_deg,columns[5] longitude_deg,columns[6] elevation_ft,columns[7] continent,columns[8] iso_country,columns[9] iso_region,columns[10] municipality,columns[11] scheduled_service,columns[12] gps_code,columns[13] iata_code, columns[14] local_code,columns[15] home_link,columns[16] wikipedia_link,columns[17] keywords from `airports.csv` limit 3;
      +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
      |     id     |   ident    |    type    |    name    | latitude_deg | longitude_deg | elevation_ft | continent  | iso_country | iso_region | municipality | scheduled_service |  gps_code  | iata_code  | local_code | home_link  | wikipedia_link |  keywords  |
      +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
      | "id"       | "ident"    | "type"     | "name"     | "latitude_deg" | "longitude_deg" | "elevation_ft" | "continent" | "iso_country" | "iso_region" | "municipality" | "scheduled_service" | "gps_code" | "iata_code" | "local_code" | "home_link" | "wikipedia_link" | "keywords" |
      | 6523       | "00A"      | "heliport" | "Total Rf Heliport" | 40.07080078125 | -74.9336013793945 | 11           | "NA"       | "US"        | "US-PA"    | "Bensalem"   | "no"              | "00A"      |            | "00A"      |            |                | null       |
      | 6524       | "00AK"     | "small_airport" | "Lowell Field" | 59.94919968  | -151.695999146 | 450          | "NA"       | "US"        | "US-AK"    | "Anchor Point" | "no"              | "00AK"     |            | "00AK"     |            |                | null       |
      +------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
      3 rows selected (0.155 seconds)
      

      In this case quotes appear in the query output but they are escaped with backslash character in the output.

      0: jdbc:drill:> select * from `airports.csv` limit 3;
      +------------+
      |  columns   |
      +------------+
      | ["\"id\"","\"ident\"","\"type\"","\"name\"","\"latitude_deg\"","\"longitude_deg\"","\"elevation_ft\"","\"continent\"","\"iso_country\"","\"iso_region\"","\"municipality\"","\"scheduled_service\"","\"gps_code\"","\"iata_code\"","\"local_code\"","\"home_link\"","\"wikipedia_link\"","\"keywords\""] |
      | ["6523","\"00A\"","\"heliport\"","\"Total Rf Heliport\"","40.07080078125","-74.9336013793945","11","\"NA\"","\"US\"","\"US-PA\"","\"Bensalem\"","\"no\"","\"00A\"","","\"00A\"","",""] |
      | ["6524","\"00AK\"","\"small_airport\"","\"Lowell Field\"","59.94919968","-151.695999146","450","\"NA\"","\"US\"","\"US-AK\"","\"Anchor Point\"","\"no\"","\"00AK\"","","\"00AK\"","",""] |
      +------------+
      3 rows selected (0.097 seconds)
      

        Attachments

          Activity

            People

            • Assignee:
              sphillips Steven Phillips
              Reporter:
              khfaraaz Khurram Faraaz
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: