Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6096

Provide mechanisms to specify field delimiters and quoted text for TextRecordWriter

    XMLWordPrintableJSON

    Details

      Description

      Currently, there is no way for a user to specify theĀ field delimiter for the writing records as a text output. Further more, if the fields contain the delimiter, we have no mechanism of specifying quotes.

      By default, quotes should be used to enclose non-numeric fields being written.

      Description of the implemented changes:

      2 options are added to control text writer output:
      store.text.writer.add_header - indicates if header should be added in created text file. Default is true.
      store.text.writer.force_quotes - indicates if all value should be quoted. Default is false. It means only values that contain special characters (line / field separators) will be quoted.

      Line / field separators, quote / escape characters can be configured using text format configuration using Web UI. User can create special format only for writing data and then use it when creating files. Though such format can be always used to read back written data.

        "formats": {
          "write_text": {
            "type": "text",
            "extensions": [
              "txt"
            ],
            "lineDelimiter": "\n",
            "fieldDelimiter": "!",
            "quote": "^",
            "escape": "^",
          }
         },
      ...
      

      Next set specified format and create text file:

      alter session set `store.format` = 'write_text';
      create table dfs.tmp.t as select 1 as id from (values(1));
      

      Notes:
      1. To write data univocity-parsers are used, they limit line separator length to not more than 2 characters, though Drill allows setting more 2 chars as line separator since Drill can read data splitting by line separator of any length, during data write exception will be thrown.
      2. extractHeader in text format configuration does not affect if header will be written to text file, only store.text.writer.add_header controls this action. extractHeader is used only when reading the data.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                arina Arina Ielchiieva
                Reporter:
                kkhatua Kunal Khatua
                Reviewer:
                Vova Vysotskyi
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: