Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18265

desc formatted/extended or show create table can not fully display the result when field or table comment contains tab character

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 1.2.1, 3.1.0
    • None
    • CLI
    • None

    Description

      Here are some examples:

      create table test_comment (id1 string comment 'full_\tname1', id2 string comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile;

      When execute `show create table test_comment`, we can see the following content in the console,

      createtab_stmt
      CREATE TABLE `test_comment`(
      `id1` string COMMENT 'full_
      `id2` string COMMENT 'full_
      `id3` string COMMENT 'full_
      ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
      STORED AS INPUTFORMAT
      'org.apache.hadoop.mapred.TextInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
      LOCATION
      'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment'
      TBLPROPERTIES (
      'transient_lastDdlTime'='1513095570')

      And the output of `desc formatted table ` is a little similar,

      col_name data_type comment
      # col_name data_type comment

      id1 string full_
      id2 string full_
      id3 string full_

      # Detailed Table Information
      (ignore)...

      When execute `desc extended test_comment`, the problem is more obvious,

      col_name data_type comment
      id1 string full_
      id2 string full_
      id3 string full_

      Detailed Table Information Table(tableName:test_comment, dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, comment:full_ name1), FieldSchema(name:id2, type:string, comment:full_

      the rest of the content is lost.

      The content is not really lost, it's just can not display normal. Because hive store the result in LazyStruct, and LazyStruct use '\t' as field separator:

      // LazyStruct.java#parse()
      // Go through all bytes in the byte[]
          while (fieldByteEnd <= structByteEnd) {
            if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) {
              // Reached the end of a field?
              if (lastColumnTakesRest && fieldId == fields.length - 1) {
                fieldByteEnd = structByteEnd;
              }
              startPosition[fieldId] = fieldByteBegin;
              fieldId++;
              if (fieldId == fields.length || fieldByteEnd == structByteEnd) {
                // All fields have been parsed, or bytes have been parsed.
                // We need to set the startPosition of fields.length to ensure we
                // can use the same formula to calculate the length of each field.
                // For missing fields, their starting positions will all be the same,
                // which will make their lengths to be -1 and uncheckedGetField will
                // return these fields as NULLs.
                for (int i = fieldId; i <= fields.length; i++) {
                  startPosition[i] = fieldByteEnd + 1;
                }
                break;
              }
              fieldByteBegin = fieldByteEnd + 1;
              fieldByteEnd++;
      

      Attachments

        1. HIVE-18265.patch
          4 kB
          Hui Huang
        2. HIVE-18265.2.patch
          6 kB
          Hui Huang
        3. HIVE-18265.1.patch
          7 kB
          Hui Huang

        Activity

          People

            BIGrey Hui Huang
            BIGrey Hui Huang
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: