Uploaded image for project: 'MRQL'
  1. MRQL
  2. MRQL-98

Improve Data Serialization in Spark Evaluation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.9.8
    • None
    • Run-Time/Spark
    • None

    Description

      MRQL data (MRData) are serialized as Writable (for Hadoop Map-Reduce), Java Serializable (for Spark), and CopyableValue (for Flink). Until now, the Spark MRQL engine was using a wrapper for MRData (called MRContainer) to serialize data using the Writable methods. Some data used in Spark mode though were left unwrapped, so Spark was using the default Java serialization, which was inefficient. With this patch, MRData becomes Serializable with custom serialization methods that are very efficient. My performance evaluation of the Pagerank query over 10 millions links run on a cluster with 16 cores gives 38% improvement compared to the old Spark evaluation.

      Attachments

        Issue Links

          Activity

            People

              fegaras Leonidas Fegaras
              fegaras Leonidas Fegaras
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: