Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26105

Show columns shows extra values if column comments contains specific Chinese character

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The issue is happening because the UTF code for one of the Chinese character contains the binary value of '\r' (CR). Because of this, the Hadoop line reader (used by fetch task in Hive) is assuming the value after that character as new value and this extra value with junk is getting displayed. The issue is with 0x540D 名 ... The last value is "D" ..that is 13. While reading the result, Hadoop line reader interpreting it as CR ( '\r'). Thus an extra value with Junk is coming as output. For show column, we do not need the comments. So while writing to the file, only column names should be included.

      https://github.com/apache/hadoop/blob/0fbd96a2449ec49f840d93e1c7d290c5218ef4ea/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L238

       

      create table tbl_test  (fld0 string COMMENT  '期 ' , fld string COMMENT '期末日期', fld1 string COMMENT '班次名称', fld2  string COMMENT '排班人数');
      
      show columns from tbl_test;
      +--------+
      | field  |
      +--------+
      | fld    |
      | fld0   |
      | fld1   |
      | �      |
      | fld2   |
      +--------+
      5 rows selected (171.809 seconds)
       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maheshk114 mahesh kumar behera Assign to me
            maheshk114 mahesh kumar behera
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 0.5h
              0.5h

              Slack

                Issue deployment