Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2310

If the Hadoop configuration is not configured, you get a NullPointerException on job submission

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.1.0
    • 5.3.0
    • core
    • None

    Description

      A user reported an NPE on startup here:
      http://mail-archives.apache.org/mod_mbox/oozie-user/201507.mbox/%3cCALBGZ8oZ0GZ+hf76nQYKxiATHH5g2gbQ_0sQ78uQv_=r4Hct=Q@mail.gmail.com%3e

      I did some digging and the problem is that Oozie is trying to load the Sharelib from but the FileSystem class variable is null because the ShareLibService wasn't able to create it on init. That would normally cause Oozie to fail on startup, but the default value of oozie.service.ShareLibService.fail.fast.on.startup is false, so it gets ignored.

      The code in question is this:

      try {
                  fs = FileSystem.get(has.createJobConf(uri.getAuthority()));
                  //cache action key sharelib conf list
                  cacheActionKeySharelibConfList();
                  updateLauncherLib();
                  updateShareLib();
              }
              catch (Throwable e) {
                  if (failOnfailure) {
                      LOG.error("Sharelib initialization fails", e);
                      throw new ServiceException(ErrorCode.E0104, getClass().getName(), "Sharelib initialization fails. ", e);
                  }
                  else {
                      // We don't want to actually fail init by throwing an Exception, so only create the ServiceException and
                      // log it
                      ServiceException se = new ServiceException(ErrorCode.E0104, getClass().getName(),
                              "Not able to cache sharelib. An Admin needs to install the sharelib with oozie-setup.sh and issue the "
                                      + "'oozie admin' CLI command to update the sharelib", e);
                      LOG.error(se);
                  }
              }
      

      where failOnfailure is false by default. So, fs ends up being null, and if anything later tries to use it, you get an NPE.

      I think we should do two things here:

      1. Creating the FileSystem should be in a different try-catch so that the failOnfailure doesn't affect it. The original intention of that behavior was to ignore ShareLib failures, not Hadoop failures.
      2. We should improve the default Hadoop configuration (i.e. oozie.service.HadoopAccessorService.hadoop.configurations). This has been a problem for a while now where out-of-the-box, Oozie doesn't work even for a local psuedo-cluster because of this config's default. If that's not possible, we need to make it more obvious that user's must configure this before doing anything.

      Attachments

        Issue Links

          Activity

            People

              dbist13 Artem Ervits
              rkanter Robert Kanter
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: