Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4802

[Python] Hadoop classpath discovery broken HADOOP_HOME is a symlink

    XMLWordPrintableJSON

Details

    Description

      From https://github.com/apache/arrow/issues/3748:

      CLASSPATH discovery was recently changed in d911850 to resolve ARROW-2113 and ARROW-3768.

      Specifically, the logic used to find all jars under HADOOP_HOME uses the find command directly
      arrow/python/pyarrow/hdfs.py

      Line 144 in d911850

        find_args = ('find', os.environ['HADOOP_HOME'], '-name', '*.jar')

      This will not work when HADOOP_HOME is a symlink, in which case '-L' needs to be passed to the find command.

      CLASSPATH can still be set explicitly, but this is a change in behavior as HADOOP_HOME symlinks worked without issue before.

      Attachments

        Activity

          People

            Tiger068 Tiger068
            emkornfield@gmail.com Micah Kornfield
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1.5h
                1.5h