Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13622

Issue creating level db file for YARN shuffle service if URI is used in yarn.nodemanager.local-dirs

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6.0
    • 1.6.2, 2.0.0
    • Spark Core, YARN
    • None
    • cdh 5.5.2

    Description

      After activating the spark external shuffle service for YARN in order to have dynamic ressource allocation I face those level db creation issues on the nodemanager:
      2016-03-02 03:13:59,692 ERROR org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error opening leveldb file file:/data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb. Creating new file, will not be able to recover state for existing applications
      org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /usr/lib/hadoop-yarn/file:/data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb/LOCK: Aucun fichier ou dossier de ce type
      at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
      at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
      at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:100)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:81)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.<init>(ExternalShuffleBlockHandler.java:56)
      at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:236)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:255)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
      2016-03-02 03:13:59,694 WARN org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error deleting file:/data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb
      2016-03-02 03:13:59,694 ERROR org.apache.spark.network.yarn.YarnShuffleService: Failed to initialize external shuffle service
      java.io.IOException: Unable to create state store
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:129)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:81)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.<init>(ExternalShuffleBlockHandler.java:56)
      at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:236)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:255)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
      Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /usr/lib/hadoop-yarn/file:/data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb/LOCK: Aucun fichier ou dossier de ce type
      at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
      at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
      at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:127)
      ... 14 more

      On the yarn-site.xml config file I used URI to set path:
      <property>
      <description>List of directories to store localized files in.</description>
      <name>yarn.nodemanager.local-dirs</name>
      <value>file:///data/yarn/cache/${user.name}/nm-local-dir</value>
      </property>

      If I removed the scheme for this config in order to have /data/yarn/cache/${user.name}/nm-local-dir the nodemanager not face this issue and the level db is well created.

      Attachments

        Activity

          No work has yet been logged on this issue.

          People

            ashangit Nicolas Fraison
            ashangit Nicolas Fraison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: