Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30264

Unexpected behaviour when using persist MEMORY_ONLY in RDD

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.0, 2.4.4
    • None
    • Java API

    Description

      Persist method with MEMORY_ONLY behave different than using with MEMORY_ONLY_SER.

      persist(StorageLevel.MEMORY_ONLY()).distinct().count() return 1

      while persist(StorageLevel.MEMORY_ONLY_SER()).distinct().count() return 100

      I expect both to return the same results. The right result is 100, for some reason MEMORY_ONLY causing all the objects in the RDD to be the same one. 

      Attachments

        1. users8.avro
          0.6 kB
          moshe ohaion
        2. GenericMain.java
          4 kB
          moshe ohaion

        Activity

          People

            Unassigned Unassigned
            ohaionm moshe ohaion
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: