Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5049

[Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.12.0, 0.12.1, 0.13.0
    • 0.14.0
    • Python

    Description

      when i init pyarrow filesystem to connect hdfs clusfter in spark,the libhdfs throws error:

      org/apache/hadoop/fs/FileSystem class not found 
      

      I print out the CLASSPATH, the classpath value is wildcard mode  

      ../share/hadoop/hdfs;spark/spark-2.0.2-bin-hadoop2.7/jars...
      

      The value is set by spark,but libhdfs must load class from jar files.

       

      Root cause is:

      In hdfs.py we just check the string  ''hadoop"  in classpath,but not jar file

      def _maybe_set_hadoop_classpath():
          if 'hadoop' in os.environ.get('CLASSPATH', ''):
              return

      Attachments

        Issue Links

          Activity

            People

              Tiger068 Tiger068
              Tiger068 Tiger068
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h

                  Slack

                    Issue deployment