Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11878

ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.1
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: Hive
    • Labels:

      Description

      When we register a jar on the Hive console. Hive creates a fresh URL classloader which includes the path of the current jar to be registered and all the jar paths of the parent classloader. The parent classlaoder is the current ThreadContextClassLoader. Once the URLClassloader is created Hive sets that as the current ThreadContextClassloader.

      So if we register multiple jars in Hive, there will be multiple URLClassLoaders created, each classloader including the jars from its parent and the one extra jar to be registered. The last URLClassLoader created will end up as the current ThreadContextClassLoader. (See details: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)

      Now here's an example in which the above strategy can lead to a CNF exception.
      We register 2 jars j1 and j2 in Hive console. j1 contains the UDF class c1 and internally relies on class c2 in jar j2. We register j1 first, the URLClassLoader u1 is created and also set as the ThreadContextClassLoader. We register j2 next, the new URLClassLoader created will be u2 with u1 as parent and u2 becomes the new ThreadContextClassLoader. Note u2 includes paths to both jars j1 and j2 whereas u1 only has paths to j1 (For details see: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).

      Now when we register class c1 under a temporary function in Hive, we load the class using

       class.forName("c1", true, Thread.currentThread().getContextClassLoader()) 

      . The currentThreadContext class-loader is u2, and it has the path to the class c1, but note that Class-loaders work by delegating to parent class-loader first. In this case class c1 will be found and defined by class-loader u1.

      Now c1 from jar j1 has u1 as its class-loader. If a method (say initialize) is called in c1, which references the class c2, c2 will not be found since the class-loader used to search for c2 will be u1 (Since the caller's class-loader is used to load a class)

      I've added a qtest to explain the problem. Please see the attached patch

        Attachments

        1. HIVE-11878.patch
          12 kB
          Ratandeep Ratti
        2. HIVE-11878_qtest.patch
          11 kB
          Ratandeep Ratti
        3. HIVE-11878_approach3.patch
          16 kB
          Ratandeep Ratti
        4. HIVE-11878_approach3_per_session_clasloader.patch
          32 kB
          Ratandeep Ratti
        5. HIVE-11878 ClassLoader Issues when Registering Jars.pptx
          165 kB
          Anthony Hsu
        6. HIVE-11878_approach3_with_review_comments.patch
          32 kB
          Ratandeep Ratti
        7. HIVE-11878_approach3_with_review_comments1.patch
          32 kB
          Ratandeep Ratti
        8. HIVE-11878.2.patch
          32 kB
          Jason Dere
        9. HIVE-11878.3.patch
          30 kB
          Jason Dere
        10. HIVE-11878.4.patch
          33 kB
          Ratandeep Ratti
        11. HIVE-11878.4.patch.branch-1
          34 kB
          Jason Dere

          Issue Links

            Activity

              People

              • Assignee:
                rdsr Ratandeep Ratti
                Reporter:
                rdsr Ratandeep Ratti
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: