Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23460

PySpark concurrency python egg cache directory

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Trivial
    • Resolution: Incomplete
    • 2.1.2
    • None
    • PySpark
    • YARN last

    Description

      We are experiencing intermittent failures when running task on pyspark while installing dependencies through --py-files with python egg. We set (else permission denied on egg cache):

      --conf "spark.executorEnv.PYTHON_EGG_CACHE=./.python-eggs"

       

      Error:

      INFO - File "build/bdist.linux-x86_64/egg/ua_parser/user_agent_parser.py", line 409, in <module>
      INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 904, in resource_filename
      INFO - self, resource_name
      INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1380, in get_resource_filename
      INFO - return self._extract_resource(manager, zip_path)
      INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1405, in _extract_resource
      INFO - self.egg_name, self._parts(zip_path)
      INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 984, in get_cache_path
      INFO - self.extraction_error()
      INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 950, in extraction_error
      INFO - raise err
      INFO - ExtractionError: Can't extract file(s) to egg cache
      INFO - 
      INFO - The following error occurred while trying to extract file(s) to the Python egg
      INFO - cache:
      INFO - 
      INFO - [Errno 17] File exists: './.python-eggs'
      INFO - 
      INFO - The Python egg cache directory is currently set to:
      INFO - 
      INFO - ./.python-eggs/
      INFO - 
      INFO - Perhaps your account does not have write access to this directory? You can
      INFO - change the cache directory by setting the PYTHON_EGG_CACHE environment
      INFO - variable to point to an accessible directory.

       

      We create a package with an option `safe_zip=False`. But pyspark whatever use egg cache directory.

      Is there any way around this?

      Attachments

        Activity

          People

            Unassigned Unassigned
            Vasilev Dmitiry
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: