Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5739

Size exceeds Integer.MAX_VALUE in File Map

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 1.1.1
    • Fix Version/s: None
    • Component/s: MLlib
    • Labels:
      None
    • Environment:

      Spark1.1.1 on a cluster with 12 node. Every node with 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a node.

      Description

      I just run the kmeans algorithm using a random generate data,but occurred this problem after some iteration. I try several time, and this problem is reproduced.

      Because the data is random generate, so I guess is there a bug ? Or if random data can lead to such a scenario that the size is bigger than Integer.MAX_VALUE, can we check the size before using the file map?

      015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN org.apache.spark.util.SizeEstimator - Failed to check whether UseCompressedOops is set; assuming yes
      [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
      java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
      at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850)
      at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105)
      at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86)
      at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140)
      at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105)
      at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747)
      at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598)
      at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869)
      at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79)
      at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68)
      at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
      at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
      at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
      at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
      at org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270)
      at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143)
      at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126)
      at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338)
      at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348)
      at KMeansDataGenerator$.main(kmeans.scala:105)
      at KMeansDataGenerator.main(kmeans.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
      at java.lang.reflect.Method.invoke(Method.java:619)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                DjvuLee DjvuLee
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: