Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19705

Preferred location supporting HDFS Cache for FileScanRDD

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.1.0
    • None
    • SQL

    Description

      Although NewHadoopRDD and HadoopRdd considers HDFS cache while calculating preferredLocations, FileScanRDD do not take into account HDFS cache while calculating preferredLocations
      The enhancement can be easily implemented for large files where FilePartition only contains single HDFS file
      The enhancement will also result in significant performance improvement for cached hdfs partitions

      Attachments

        Activity

          People

            Unassigned Unassigned
            tanejagagan gagan taneja
            Sandy Ryza Sandy Ryza
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: