Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4656

Improve String serialization and comparator performance in BinInterSedes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • 0.18.0
    • None
    • None

    Description

      Two major optimizations can be done:

      • PIG-1472 added multiple data types to store different sizes (byte, short, int). It can be simplified using WritableUtils.writeVInt. There is no difference for byte and short compared to current approach. But with int, it could be beneficial where lot of numbers could be written with 3 bytes instead of 4. For eg: 32768 is written using 3 bytes in with WritableUtils.writeVInt whereas currently 4 bytes (int) is used.
      • String comparison in BinInterSedesTupleRawComparator initializes String for comparison. Should instead compare bytes like Text.Comparator.
        str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8);
        str2 = new String(bb2.array(), bb2.position(), casz2, BinInterSedes.UTF8);
        

      Attachments

        1. PIG-4656-1.patch
          11 kB
          Rohini Palaniswamy

        Activity

          People

            rohini Rohini Palaniswamy
            rohini Rohini Palaniswamy
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: