Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I found this code a while back and have been thinking about OSGi again for Nutch.
Our justification here is that we want to dynamically create InteractiveSeleniumHandler's and inject the code into the .job artifacts which can then be used in the next round of fetching.
The code looks like the following
+ List<URL> nutchConfigurationClasspathURLs = new ArrayList<URL>(); + + // Collect classpath URLs from Hadoop's Configuration class CL + URLClassLoader hadoopBundleConfigurationClassLoader = (URLClassLoader) conf.getClassLoader(); + for (URL hadoopBundleClasspathURL : hadoopBundleConfigurationClassLoader.getURLs()) { + nutchConfigurationClasspathURLs.add(hadoopBundleClasspathURL); + } + + // Append classpath URLs from current thread, which ostensibly include a Nutch job file + URLClassLoader tccl = (URLClassLoader) Thread.currentThread().getContextClassLoader(); + for (URL tcclClasspathURL : tccl.getURLs()) { + nutchConfigurationClasspathURLs.add(tcclClasspathURL); + } + + URLClassLoader nutchConfigurationClassLoader = new URLClassLoader(nutchConfigurationClasspathURLs.toArray(new URL[0])); + // Reset the Configuration object's CL to the new one + conf.setClassLoader(nutchConfigurationClassLoader);
The Thread.currentThread().getContextClassLoader(); is the secret sauce... however I just wonder what thoughts are about this approach?
We have, from time to time over the years discussed Nutch and I spoke with bdelacretaz a good few years ago @ApacheCon but I don't have the time to implement total OSGi coverage of the Nutch codebase.