Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13574

Improve parquet dictionary decoding for strings

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Currently, the parquet reader will copy the dictionary value for each data value. This is bad for string columns as we explode the dictionary during decode. We should instead, have the data values point to the safe backing memory.

        Attachments

          Activity

            People

            • Assignee:
              nongli Nong Li
              Reporter:
              nongli Nong Li
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: