Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-628

PERFORMANCE: Misc. optimizations including optimization in Tuple serialization, set up of PigMapReduce & PigCombiner, accessing index in POLocalRearrange

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.2.0
    • 0.2.0
    • None
    • None

    Description

      • Currently DefaultTuple.write() needlessly writes a marker for null/not null. This is already handled by PigNullableWritable for keys and NullableTuple for values. Nested null tuples inside a tuple are written out as nulls in DataReaderWriter.writeDatum. So the null/not null marker in DefaultTuple can be avoided.
      • In PigMapReduce and PigCombiner the roots and leaves of the plans are calculated in each reduce() call. Instead these can be computed in configure() one time.
      • In each call of POLocalRearrange.getNext(), a new lroutput tuple is created whose first position is filled with index, second with key and third with value - this can be optimized by having a tuple member in POLocalRearrange which is reused in each getNext() call. Further, the first position of this tuple can be pre-filled with the index in the setIndex() method of POLocalRearrange at script compile time.
      • In POCombinerPackage, the metadata data structures to figure out which parts of the value are present in the key can be set up in the setKeyInfo() method at compile time. This is because we currently use POCombinerPackage only with a "group by". Hence we don't need to look up the metadata at run time based on input index since there will be only one input (index = 0)

      Attachments

        1. PIG-628.patch
          21 kB
          Pradeep Kamath

        Activity

          People

            pkamath Pradeep Kamath
            pkamath Pradeep Kamath
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: