Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11325

Infinite loop in HiveHFileOutputFormat

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 1.0.0
    • None
    • HBase Handler
    • None

    Description

      No idea why hbase_handler_bulk.q does not catch this if its being run regularly in Hive builds, but here's the gist of the issue:

      The condition at https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164 indicates that we will infinitely loop until we find a file whose last path component (the name) is equal to the column family name.

      In execution, however, the iteration enters an actual infinite loop cause the file we end up considering as the srcDir name, is actually the region file, whose name will never match the family name.

      This is an example of the IPC the listing loop of a 100% progress task gets stuck in:

      2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 1: Call -> cdh54.vm/172.16.29.132:8020: getListing {src: "/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_000000_0/family/97112ac1c09548ae87bd85af072d2e8c" startAfter: "" needLocation: false}
      2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive sending #510346
      2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got value #510346
      2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getListing took 0ms
      2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 1: Response <- cdh54.vm/172.16.29.132:8020: getListing {dirList { partialListing { fileType: IS_FILE path: "" length: 863 permission { perm: 4600 } owner: "hive" group: "hive" modification_time: 1437454718130 access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
      

      The path we are getting out of the listing results is /user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_000000_0/family/97112ac1c09548ae87bd85af072d2e8c, but instead of checking the path's parent family we're instead looping infinitely over its hashed filename 97112ac1c09548ae87bd85af072d2e8c cause it does not match family.

      It stays in the infinite loop therefore, until the MR framework kills it away due to an idle task timeout (and then since the subsequent task attempts fail outright, the job fails).

      While doing a getPath().getParent() will resolve that, is that infinite loop even necessary? Especially given the fact that we throw exceptions if there are no entries or there is more than one entry.

      Attachments

        1. HIVE-11325.patch
          2 kB
          Harsh J

        Activity

          People

            Unassigned Unassigned
            qwertymaniac Harsh J
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: