Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-3429

Allow configurable escape character for delimited text parser

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.10
    • EXT - External data

    Description

      The current CSV/Delimited Text parser supports RFC4180-style escaping rules. In brief, this means that string fields can be escaped by surrounding them in " , and " itself can be escaped as ""  .

      However, many CSV files do not follow this convention. For example, this row using C-style escaping of quotes would be invalid following RFC4180:

       

      1,"The \"quick\" fox, jumped over the lazy dog "

      The CSV parser should allow the escape character to be configurable, to allow parsing of these nonstandard, but common examples. 

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            imaxon Ian Maxon
            imaxon Ian Maxon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment