Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8160 Tungsten style external aggregation
  3. SPARK-9517

BytesToBytesMap should encode data the same way as UnsafeExternalSorter

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • SQL
    • None
    • Spark 1.5 release

    Description

      BytesToBytesMap current encodes key/value data in the following format:

      8B key length, key data, 8B value length, value data
      

      UnsafeExternalSorter, on the other hand, encodes data this way:

      4B record length, data
      

      As a result, we cannot pass records encoded by BytesToBytesMap directly into UnsafeExternalSorter for sorting. However, if we rearrange data slightly, we can then pass the key/value records directly into UnsafeExternalSorter:

      4B key+value length, 4B key length, key data, value data
      

      Attachments

        Activity

          People

            rxin Reynold Xin
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: