Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10965

Optimize filesEqualRecursive

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 1.5.2
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None
    • Target Version/s:

      Description

      When we try to download dependencies, if there is a file at the destination already, we compare if the files are equal (recursively, if they are directories). For files, we compare their bytes. Now, these dependencies can be jars and be really large and byte-by-byte comparisons can super slow.

      I think it'd be better to do a checksum.
      Here's the code in question:
      https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L500

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mgrover Mark Grover
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: