Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3823

Use newTupleNoCopy instead of newTuple wherever possible to avoid object copy penalty

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The readFields() method of all Tuple implementations do a mFields.clear() and then load data into the same tuple. If that was changed to do

      if (mFields.size() > 0) {
                  mFields = new ArrayList<Object>(mFields.size());
              }
      

      many places in code where we do TupleFactory.newTuple(List c); can be replaced with TupleFactory.newTupleNoCopy(List c);. This will avoid a expensive System.arrayCopy() call which is a native method.

      POPackage.java getValueTuple()

      if( keyLookupSize > 0) {
      
                  // we have some fields of the "value" in the
                  // "key".
                  int finalValueSize = keyLookupSize + val.size();
                  copy = mTupleFactory.newTuple(finalValueSize);
                  int valIndex = 0; // an index for accessing elements from
                                    // the value (val) that we have currently
                  for(int i = 0; i < finalValueSize; i++) {
                      Integer keyIndex = keyLookup.get(i);
                      if(keyIndex == null) {
                          // the field for this index is not in the
                          // key - so just take it from the "value"
                          // we were handed
                          copy.set(i, val.get(valIndex));
                          valIndex++;
                      } .....
              } else if (isProjectStar) {
                  // the whole "value" is present in the "key"
                  copy = mTupleFactory.newTuple(keyAsTuple.getAll());   
              } else {
                  // there is no field of the "value" in the
                  // "key" - so just make a copy of what we got
                  // as the "value"
                  copy = mTupleFactory.newTuple(val.getAll());
              }
      

      Some cases might take a slight hit in GC due to new ArrayList initialization. For eg: if( keyLookupSize > 0) condition in above code as it does not do a newTuple. But other 2 cases would greatly benefit as we can avoid arraylist copy. Same in POFRJoin.

      Will run some tests to validate the theory and post a patch.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rohini Rohini Palaniswamy
            rohini Rohini Palaniswamy

            Dates

              Created:
              Updated:

              Slack

                Issue deployment