Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-13194

Fast-serialize position delete records

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.5.0
    • Backend
    • ghx-label-10

    Description

      Currently the serialization of position delete records are very wasteful. The records contain slots 'file_path' and 'pos'. And what we do during serialization is the following.

      1. Write fixed-size tuple that have a StringValue and a BigInt slot (20 bytes in total)
      2. We copy the StringValue's contents after the tuple.
      3. We convert the StringValue slot to be an offset to the string data

      So we end up having something like this:

      +-------------+--------+----------------+-------------+--------+----------------+-----+
      | StringValue | BigInt |   File path    | StringValue | BigInt |   File path    | ... |
      +-------------+--------+----------------+-------------+--------+----------------+-----+
      | ptr, len    |     42 | /.../a.parquet | ptr, len    |     43 | /.../a.parquet | ... |
      +-------------+--------+----------------+-------------+--------+----------------+-----+
      

      This is very redundant to store the file paths that way, and at the end we will have a huge buffer that we need to compress and send over the network. Moreover, we copy the file paths in memory twice:

      1. From input row batch to the KrpcDataStreamSender::Channel's temporary row batch
      2. From the temporary row batch to the outbound row batch (during serialization)

      The position delete files store the delete records in ascending order. This means adjacent records mostly have the same file path. So we could just buffer the position delete records up to the Channel's capacity, then serialize the data in a more efficient way.

      Attachments

        Activity

          People

            boroknagyz Zoltán Borók-Nagy
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: