Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5242

Evaluate DataFrame API for Pig on Spark

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 0.18.0
    • spark
    • None

    Description

      Currently, Pig on Spark uses RDD-s. Higher level DataFrame API offers many optimization opportunities like Catalyst optimizer, better serialization (project Tungsten). We should investigate how we can migrate from RDD-s to DataFrames, and does this result in performance improvement.

      Attachments

        Activity

          People

            nkollar Nándor Kollár
            nkollar Nándor Kollár
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: