Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-673

Data locality for DoFns

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: runner-spark
    • Labels:
      None

      Description

      In some distributed filesystems, such as HDFS, we should be able to hint to Spark the preferred locations of splits.
      Here is an example of how Spark does that for Hadoop RDDs:
      https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L249

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                amitsela Amit Sela
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: