Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1898

The ESCAPED BY clause does not seem to pick up newlines in columns and the line terminator cannot be changed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.5.0
    • None
    • None

    Description

      If I want to preserve data in columns which contains a newline (webcrawling for instance) I cannot set the ESCAPED BY clause to escape these out (other characters such as commas escape fine, however). This may be due to the line terminators, which are locked to be newlines, are picked up first, and then fields processed.

      This seems to be related to:

      "SerDe should escape some special characters"
      https://issues.apache.org/jira/browse/HIVE-136

      and

      "Implement "LINES TERMINATED BY""
      https://issues.apache.org/jira/browse/HIVE-302

      where at comment: https://issues.apache.org/jira/browse/HIVE-302?focusedCommentId=12793435&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793435

      "This is not fixable currently because the line terminator is determined by LineRecordReader.LineReader which is in the Hadoop land."

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jpatterson Josh Patterson
              Votes:
              4 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: