Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10983

SerDeUtils bug ,when Text is reused

Log workAgile BoardRank to TopRank to BottomVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.14.0, 1.0.0, 1.2.0
    • Fix Version/s: None
    • Component/s: API, CLI
    • Labels:
    • Environment:

      Hadoop 2.3.0-cdh5.0.0
      Hive 0.14

      Description

      The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It invoke a bad method of Text,getBytes()!
      The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is  use copyBytes()  if you need the returned array to be precisely the length of the data.
      But the copyBytes is added behind hadoop1. 
      

      When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql ,

      select *   from web_searchhub where logdate=2015061003
      

      the result of sql see blow.Notice that ,the second row content contains the first row content.

      INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
      INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> session=901,thread=223ession=3151,thread=254 2015061003
      

      The content of origin lzo file content see below ,just 2 rows.

      INFO [03:00:05.635] <b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb> session=3148,thread=285
      INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
      

      I think this error is caused by the Text reuse,and I found the solutions .

      Addicational, table create sql is :

      CREATE EXTERNAL TABLE `web_searchhub`(
        `line` string)
      PARTITIONED BY (
        `logdate` string)
      ROW FORMAT DELIMITED
        FIELDS TERMINATED BY '\\U0000'
      WITH SERDEPROPERTIES (
        'serialization.encoding'='GBK')
      STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
                OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
      
      LOCATION
        'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
      

        Attachments

        1. HIVE-10983.5.patch.txt
          0.8 kB
          Xiaowei Wang
        2. HIVE-10983.4.patch.txt
          0.7 kB
          Xiaowei Wang
        3. HIVE-10983.3.patch.txt
          0.7 kB
          Xiaowei Wang
        4. HIVE-10983.2.patch.txt
          0.8 kB
          Xiaowei Wang
        5. HIVE-10983.1.patch.txt
          0.6 kB
          Xiaowei Wang

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              wisgood Xiaowei Wang Assign to me
              Reporter:
              wisgood Xiaowei Wang

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Issue deployment