Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4656

Improve String serialization and comparator performance in BinInterSedes

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None

      Description

      Two major optimizations can be done:

      • PIG-1472 added multiple data types to store different sizes (byte, short, int). It can be simplified using WritableUtils.writeVInt. There is no difference for byte and short compared to current approach. But with int, it could be beneficial where lot of numbers could be written with 3 bytes instead of 4. For eg: 32768 is written using 3 bytes in with WritableUtils.writeVInt whereas currently 4 bytes (int) is used.
      • String comparison in BinInterSedesTupleRawComparator initializes String for comparison. Should instead compare bytes like Text.Comparator.
        str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8);
        str2 = new String(bb2.array(), bb2.position(), casz2, BinInterSedes.UTF8);
        

        Attachments

        1. PIG-4656-1.patch
          11 kB
          Rohini Palaniswamy

          Activity

            People

            • Assignee:
              rohini Rohini Palaniswamy
              Reporter:
              rohini Rohini Palaniswamy
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: