Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.4.0
-
None
-
Linux x86-64
Java 1.6.0_20
Description
I'm getting ClassNotFoundException errors when running inside Hadoop's map phase, unable to find my class org.apache.hadoop.chukwa.extraction.demux.processor.mapper.XmlBasedDemux which I've packaged in a JAR named data-collection-demux-0.1.jar.
The problem seems to be in the values of these two properties in the Hadoop job configuration:
<property> <name>mapred.job.classpath.files</name> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value> </property> <property> <name>mapred.cache.files</name> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value> </property>
The problem seems to stem from the fact that the call to DistributedCache.addFileToClassPath is passing in a Path that is in URI form, i.e. hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar whereas the DistributedCache API expects it to be a filesystem-based path (i.e. /chukwa/demux/data-collection-demux-0.1.jar). I'm not sure why, but the FileStatus object returned by FileSystem.listStatus is returning a URL-based path instead of a filesystem-based path.
I kludged the Demux class' addParsers to strip the "hdfs://localhost:9000" portion of the string and now my class is found. I will attempt to provide a patch today that determines the value of Hadoop's fs.default.name and strips that from the value returned in Demux.java.