Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1898

The ESCAPED BY clause does not seem to pick up newlines in columns and the line terminator cannot be changed

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.5.0
    • Fix Version/s: None
    • Labels:
      None

      Description

      If I want to preserve data in columns which contains a newline (webcrawling for instance) I cannot set the ESCAPED BY clause to escape these out (other characters such as commas escape fine, however). This may be due to the line terminators, which are locked to be newlines, are picked up first, and then fields processed.

      This seems to be related to:

      "SerDe should escape some special characters"
      https://issues.apache.org/jira/browse/HIVE-136

      and

      "Implement "LINES TERMINATED BY""
      https://issues.apache.org/jira/browse/HIVE-302

      where at comment: https://issues.apache.org/jira/browse/HIVE-302?focusedCommentId=12793435&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793435

      "This is not fixable currently because the line terminator is determined by LineRecordReader.LineReader which is in the Hadoop land."

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jpatterson Josh Patterson
              • Votes:
                4 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated: