Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Invalid
-
None
-
None
-
None
-
None
Description
Oozie doesn't check for el functions while submitting the coordinator job. Later on the coordinator action(s) can remain in WAITING state if there user has messed up the el functions.
For Example.
<?xml version="1.0" encoding="UTF-8"?> <coordinator-app xmlns="uri:oozie:coordinator:0.4" name="my_coord" frequency="${coord:hours(1)}" start="${startTime}" end="${endTime}" timezone="UTC"> <controls> <concurrency>1</concurrency> <execution>FIFO</execution> </controls> <datasets> <dataset name="my_dataset" frequency="${coord:hours(1)}" initial-instance="${initInstanceTime}" timezone="UTC"> <uri-template>hcat://${HCAT_SERVER}/${HCAT_DB_NAME}/${TABLE_NAME}/dt=${YEAR}${MONTH}${DAY};hr=${HOUR}</uri-template> </dataset> </datasets> <input-events> <data-in name="my_dataset_name" dataset="my_dataset"> <instance>${coord:current(0)}</instance> </data-in> </input-events> <action> <workflow> <app-path>${oozieAppWorkflowPath}/my_workflow.xml</app-path> <configuration> <property> <name>yyyymmdd</name> <value>${coord:formatTime(coord:nominalTime(), 'DAY')}</value> </property> <property> <name>hh</name> <value>${coord:formatTime(coord:nominalTime(),'HH')}</value> </property> </configuration> </workflow> </action> </coordinator-app>
After Oozie finds out the dependency.
2017-04-25 16:51:53,503 DEBUG DependencyChecker:526 [pool-11-thread-66] - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0100010100101-saley-C] ACTION[0100010100101-saley-C@1] Dependency [hcat://localhost:9098/my_database/my_table/dt=20170411;hr=02] is available
The issue is with el function
2017-04-25 16:51:53,506 ERROR CoordPushDependencyCheckXCommand:517 [pool-11-thread-66] - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0100010100101-saley-C] ACTION[0100010100101-saley-C@1] XException, org.apache.oozie.command.CommandException: E1021: Coord Action Input Check Error: E1021: Coord Action Input Check Error: Unable to evaluate :${coord:formatTime(coord:nominalTime(), 'DAY')}: <configuration> <property> <name>yyyymmdd</name> <value>${coord:formatTime(coord:nominalTime(), 'DAY')}</value> </property>
The coord action remained in WAITING state.
Solution:
We should error out at the time of job submission. Currently users are supposed to run dry run on the coordinator before actually running it. But everybody wants to run directly. We should run dry run by default to catch such errors. While working the fix, I have found some buggy test cases which would have been caught if we run dry run first. Fixing those cases as well.