Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11878

ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.1
    • 1.3.0, 2.0.0
    • Hive

    Description

      When we register a jar on the Hive console. Hive creates a fresh URL classloader which includes the path of the current jar to be registered and all the jar paths of the parent classloader. The parent classlaoder is the current ThreadContextClassLoader. Once the URLClassloader is created Hive sets that as the current ThreadContextClassloader.

      So if we register multiple jars in Hive, there will be multiple URLClassLoaders created, each classloader including the jars from its parent and the one extra jar to be registered. The last URLClassLoader created will end up as the current ThreadContextClassLoader. (See details: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)

      Now here's an example in which the above strategy can lead to a CNF exception.
      We register 2 jars j1 and j2 in Hive console. j1 contains the UDF class c1 and internally relies on class c2 in jar j2. We register j1 first, the URLClassLoader u1 is created and also set as the ThreadContextClassLoader. We register j2 next, the new URLClassLoader created will be u2 with u1 as parent and u2 becomes the new ThreadContextClassLoader. Note u2 includes paths to both jars j1 and j2 whereas u1 only has paths to j1 (For details see: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).

      Now when we register class c1 under a temporary function in Hive, we load the class using

       class.forName("c1", true, Thread.currentThread().getContextClassLoader()) 

      . The currentThreadContext class-loader is u2, and it has the path to the class c1, but note that Class-loaders work by delegating to parent class-loader first. In this case class c1 will be found and defined by class-loader u1.

      Now c1 from jar j1 has u1 as its class-loader. If a method (say initialize) is called in c1, which references the class c2, c2 will not be found since the class-loader used to search for c2 will be u1 (Since the caller's class-loader is used to load a class)

      I've added a qtest to explain the problem. Please see the attached patch

      Attachments

        1. HIVE-11878 ClassLoader Issues when Registering Jars.pptx
          165 kB
          Anthony Hsu
        2. HIVE-11878.patch
          12 kB
          Ratandeep Ratti
        3. HIVE-11878.4.patch.branch-1
          34 kB
          Jason Dere
        4. HIVE-11878.4.patch
          33 kB
          Ratandeep Ratti
        5. HIVE-11878.3.patch
          30 kB
          Jason Dere
        6. HIVE-11878.2.patch
          32 kB
          Jason Dere
        7. HIVE-11878_qtest.patch
          11 kB
          Ratandeep Ratti
        8. HIVE-11878_approach3.patch
          16 kB
          Ratandeep Ratti
        9. HIVE-11878_approach3_with_review_comments1.patch
          32 kB
          Ratandeep Ratti
        10. HIVE-11878_approach3_with_review_comments.patch
          32 kB
          Ratandeep Ratti
        11. HIVE-11878_approach3_per_session_clasloader.patch
          32 kB
          Ratandeep Ratti

        Issue Links

          Activity

            People

              rdsr Ratandeep Ratti
              rdsr Ratandeep Ratti
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: