Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15380

Generate code that stores a float/double value in each column from ColumnarBatch when DataFrame.cache() is used

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • SQL
    • None

    Description

      When DataFrame.cache() is called, data will be stored as column-oriented storage in CachedBatch. The current Catalyst generates Java program to store a computed value to an InternalRow and then the value is stored into CachedBatch even if data is read from ColumnarBatch for ParquetReader.
      This JIRA generates Java code to store a value into a ColumnarBatch, and store data from the ColumnarBatch to the CachedBatch. This JIRA handles only float and double types for a value.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kiszk Kazuaki Ishizaki
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: