Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Not A Problem
-
0.2
-
None
-
None
Description
At present, if we run a pig script like below w/o register hive-metastore.jar or libthrift.jar.
A = LOAD 'orders' USING org.apache.hcatalog.pig.HCatLoader(); B = FOREACH A GENERATE o_custkey; C = LIMIT B 10; DUMP C;
Each mapper would throw exceptions like below
java.lang.RuntimeException: could not instantiate 'org.apache.hcatalog.pig.HCatLoader' with arguments 'null' at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:504) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:154) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:106) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:594) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:308) at org.apache.hadoop.mapred.Child.main(Child.java:156) Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/NoSuchObjectException at org.apache.hcatalog.pig.HCatLoader.(HCatLoader.java:55) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at java.lang.Class.newInstance0(Class.java:355) at java.lang.Class.newInstance(Class.java:308) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:474) ... 5 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.api.NoSuchObjectException at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 13 more
Theoretically, hive metastore and thrift are needed by HCatLoader/HCatStorer when it's running on the client side, However, they actually have no use for slave side. The scripts people register those jars are unnecessary. Those jars shouldn't be distributed to any nodes where MR tasks will run on.