1) the local file system handled in shims are in a way that they are with the same file name (class name) and are compiled conditionally depending on the hadoop version during compile time. This may cause problem when deploying the same hive jar file to be used in different clusters with different version. The current shim was implemented by naming the classes differently and use ShimsLoader to get the correct class during execution time. This allows hive jar files to be deployed to different hadoop clusters.
2) data/conf/hive-site.xml fs.pfile.impl is not needed if ShimsLoader is used as described above.
3) the hive.exec.mode.local.auto default values are different in HiveConf.java and conf/hive-default.xml. It's better to be the same to avoid confusion.
4) ctas.q.out: do you know why the GlobalTableID was changed?
5) MapRedTask.java:149 The plan file name is not randomized as before. It may cause problem when the parallel execution mode is true and multiple MapRedTasks are running at the same time (e.g., parallel muti-table inserts).
6) If there are 2 MapRed tasks and MR2 depends on MR1 and MR1 is decided to be running local, it seems MR2 have to be local since the intermediate files are stored in local file system? What about in parallel execution when MR1 and MR2 running in parallel and only one of them is local? It seems the info of whether a task is "local" is stored in Context (and HiveConf) which is shared among parallel MR tasks?
7) ExecDriver.localizeMRTmpFileImpl changes the FileSinkDesc.dirName after the MR tasks have generated, it breaks the dynamic partition code which runs when the FileSinkOperator is generated. In particular, the DynamicPartitionCtx also stores the dirName, it has to be changed as well in localizeMRTmpFileImpl.
8) MoveTask previously move intermediate directory in HDFS to the final directory also in HDFS. In the local mode, we should change the MoveTask execution as well?
9) Driver.java:100 the two functions are made static. Should they be moved to Utilities?