Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13574

Improve parquet dictionary decoding for strings

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Currently, the parquet reader will copy the dictionary value for each data value. This is bad for string columns as we explode the dictionary during decode. We should instead, have the data values point to the safe backing memory.

        Issue Links

          Activity

          Hide
          apachespark Apache Spark added a comment -

          User 'nongli' has created a pull request for this issue:
          https://github.com/apache/spark/pull/11434

          Show
          apachespark Apache Spark added a comment - User 'nongli' has created a pull request for this issue: https://github.com/apache/spark/pull/11434
          Hide
          apachespark Apache Spark added a comment -

          User 'nongli' has created a pull request for this issue:
          https://github.com/apache/spark/pull/11454

          Show
          apachespark Apache Spark added a comment - User 'nongli' has created a pull request for this issue: https://github.com/apache/spark/pull/11454
          Hide
          davies Davies Liu added a comment -

          Issue resolved by pull request 11454
          https://github.com/apache/spark/pull/11454

          Show
          davies Davies Liu added a comment - Issue resolved by pull request 11454 https://github.com/apache/spark/pull/11454

            People

            • Assignee:
              nongli Nong Li
              Reporter:
              nongli Nong Li
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development