Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-8457

Allow configuring csv parser in http storage plugin configuration

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Implemented
    • Future
    • 1.22.0
    • Storage - HTTP
    • None

    Description

      Currently there is no way to configure csv parser when http plugin is used. Because of that some kind of files cannot be parsed (e.g. when any column has more than 4096 chars or file has a delimiter different from `,`).

      Since in DataWalk we utilize http plugin quite often we've changed our internal fork of Drill so following parser/format properties can be configured using additional `csvOptions` field:

       

      {
        "csvOptions": {
          "delimiter": "\t",
          "quote": "\"",
          "quote_escape": "\"",
          "line_separator": "\n",
          "header_extraction_enabled": null,
          "number_of_rows_to_skip": 0,
          "number_of_records_to_read": -1,
          "line_separator_detection_enabled": true,
          "max_columns": 512,
          "max_chars_per_column": 4096,
          "skip_empty_lines": true,
          "ignore_leading_whitespaces": true,
          "ignore_trailing_whitespaces": true,
          "null_value": null
        }
      }

      I'd be glad to get feedback whether creating PR with these changes would bring any value to the Drill

      Attachments

        Activity

          People

            Unassigned Unassigned
            ztomanek-dw Zbigniew Tomanek
            Charles Givre Charles Givre
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: