Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25603 Generalize Nested Column Pruning
  3. SPARK-26837

Pruning nested fields from object serializers

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      In SPARK-26619, we make change to prune unnecessary individual serializers when serializing objects. This is extension to SPARK-26619. We can further prune nested fields from object serializers if they are not used.

      For example, in following query, we only use one field in a struct column:

      val data = Seq((("a", 1), 1), (("b", 2), 2), (("c", 3), 3))
      val df = data.toDS().map(t => (t._1, t._2 + 1)).select("_1._1")
      

      So, instead of having a serializer to create a two fields struct, we can prune unnecessary field from it.

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              viirya L. C. Hsieh
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: