Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11996

Row Delimiter other than '\n' throws error in Hive.

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      ERROR CODE and ERROR TEXT:

      " LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\u0001'' (state=42000,code=40000)"

      ISSUE DISCRIPTION:

      Hive Language Manual States that Changing the Line Delimeter is Possible.

      row_format
      : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
      [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
      [NULL DEFINED AS char] – (Note: Available in Hive 0.13 and later)

      SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

      Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable

      But on defining the [LINES TERMINATED BY char], an error stating hive only supports newline '\n' right now is encountered. Whcih essentially means that the choice of new line character is static. Why does this come as a a configurable item in the DDL is unclear.

      This limitation seems to be hardcoded here:
      https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171

      IMPACT:

      While storing freform data such as Email or Comments, it is fairly common to have a '\n' character crop up. A lot of free form ETL on Linux using majority of ETL tools also adds a $ (new line character) to maintain formating.

      As the Hive Language manual shows this as a configurable property, it also leads to misleading solution designs which fail when the create statement is triggered in the development phase.

      having the ability to choose your row delimiter is a very basic necessacity and it is alarming the this is not supported till Hive 14 to the best of mu knowledge.

      SOLUTION:

      A possible solution is being worked on over here:

      https://issues.apache.org/jira/browse/MAPREDUCE-2254

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            netsameer Sameer Gupta

            Dates

              Created:
              Updated:

              Slack

                Issue deployment