Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21510

Add isMaterialized() and eager persist() to Dataset APIs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • SQL

    Description

      Currently, when using Spark, the beginners do not realize our persist API is lazy. They do not know what is the most efficient way to materialize it. Sometimes, they just use collect(), which is very expensive when the data set is big.

      In addition, we also need another API to verify whether the Dataset has been cached and materialized.

      Attachments

        Activity

          People

            smilegator Xiao Li
            smilegator Xiao Li
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: