Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13622

Issue creating level db file for YARN shuffle service if URI is used in yarn.nodemanager.local-dirs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6.0
    • 1.6.2, 2.0.0
    • Spark Core, YARN
    • None
    • cdh 5.5.2

    Description

      After activating the spark external shuffle service for YARN in order to have dynamic ressource allocation I face those level db creation issues on the nodemanager:
      2016-03-02 03:13:59,692 ERROR org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error opening leveldb file file:/data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb. Creating new file, will not be able to recover state for existing applications
      org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /usr/lib/hadoop-yarn/file:/data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb/LOCK: Aucun fichier ou dossier de ce type
      at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
      at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
      at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:100)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:81)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.<init>(ExternalShuffleBlockHandler.java:56)
      at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:236)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:255)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
      2016-03-02 03:13:59,694 WARN org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error deleting file:/data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb
      2016-03-02 03:13:59,694 ERROR org.apache.spark.network.yarn.YarnShuffleService: Failed to initialize external shuffle service
      java.io.IOException: Unable to create state store
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:129)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:81)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.<init>(ExternalShuffleBlockHandler.java:56)
      at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:236)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:255)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
      at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
      Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /usr/lib/hadoop-yarn/file:/data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb/LOCK: Aucun fichier ou dossier de ce type
      at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
      at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
      at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:127)
      ... 14 more

      On the yarn-site.xml config file I used URI to set path:
      <property>
      <description>List of directories to store localized files in.</description>
      <name>yarn.nodemanager.local-dirs</name>
      <value>file:///data/yarn/cache/${user.name}/nm-local-dir</value>
      </property>

      If I removed the scheme for this config in order to have /data/yarn/cache/${user.name}/nm-local-dir the nodemanager not face this issue and the level db is well created.

      Attachments

        Activity

          People

            ashangit Nicolas Fraison
            ashangit Nicolas Fraison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: