[PIG-3409] org.apache.pig.data.DefaultTuple hashcode perfomance issue - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 0.11
Fix Version/s: None
Component/s: impl
Labels:
None

Description

I've met serious perfomance issue.
please see visualvm screenshot.

Here is hashCode implementation from the class:

 @Override
    public int hashCode() {
        int hash = 17;
        for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
            Object o = it.next();
            if (o != null) {
                hash = 31 * hash + o.hashCode();
            }
        }
        return hash;
    }

I don't see any reason here to iterate over the whole tuple, aggregate hash value and then return it.

I can fix it, if it's possible to take part in dev process. I'm new to it

The idea for any join:
If we have a plan we know for sure which relations would be joined.
It means that we can precalculate hashcode values.
The difference is: m+n hashcode calculations or m*n (current implementation).
It think it should bring significant perfomance boost.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Sergey

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/Aug/13 10:46

Updated:: 09/Aug/13 08:18

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified