Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4623

Fixed the 'new line' character inside double-quote causing the csv parsing failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 0.15.0
    • 0.18.0
    • piggybank
    • None
    • Reviewed

    Description

      A new line character should be allowed inside a double quote as a valid csv document. For example, the following csv document should be treated as a SINGLE valid csv data

      Iphone,"

      { ItemName : Cheez-It 21 Ounce}

      ",

      However, the current implementation of the getNext() inside org.apache.pig.piggybank.storage.CSVLoader class fails to take care of this case and it sees two lines of data while in fact it should be treated as single line of data.

      This pull request fixes the above issue.

      (Note: here is a linke to validate whether a csv document: http://csvlint.io/)

      Attachments

        1. TestCSVStorage.java
          5 kB
          Ken Wu
        2. PIG-4623-1.patch
          8 kB
          Daniel Dai
        3. CSVLoader.java
          9 kB
          Ken Wu

        Activity

          People

            ken11223 Ken Wu
            ken11223 Ken Wu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified