Hive
  1. Hive
  2. HIVE-1751

Optimize ColumnarStructObjectInspector.getStructFieldData()

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      ColumnarStructObjectInspector.getStructFieldData() is a heavy used function and is expensive.
      By optimizing this function, including ColumnarStruct.uncheckedGetField() called by it, most queries can benefit from it.

        Activity

        Hide
        Namit Jain added a comment -

        Committed. Thanks Siying

        Show
        Namit Jain added a comment - Committed. Thanks Siying
        Hide
        Namit Jain added a comment -

        +1

        running tests.

        Show
        Namit Jain added a comment - +1 running tests.
        Hide
        Siying Dong added a comment -

        ExprNodeColumnEvaluator.evaluate() is very heavily used function. For most queries, it is called multiple time per row. In the group-by query in the benchmark, it is even 10 times per row.

        This function call sometimes takes 17%-20% CPU time. Usually ExprNodeColumnEvaluator.evaluate() itself takes 2%-3%, UnionStructObjectInspector.getStructFieldData() itself takes 2%-3%, ColumnarStruct.uncheckedGetField() itself takes 3%.

        It's hard to come up with a general solution that reduce the costs in a structual way. I tried to did several small code rewriting and hope we can get slight improvements:

        1. nullSequence is not passed in for every call but from constructor
        2. Restructure ColumnarStruct a little bit.
        3. In ExprNodeColumnEvaluator, makes the single level special case, which in most of the time is the common case when referring a column.

        When trying to optimize functions which already only take 3%, it's hard to verify the performance enhancement since experiments anyway have slight variation eveyr time.

        For 1 and 2, I think they anyway make code better readable. I ran many times, and consistently see about 1% improvement too.
        3 might make code less readable, but I see about 5% improvement from some simple group-by query.

        Show
        Siying Dong added a comment - ExprNodeColumnEvaluator.evaluate() is very heavily used function. For most queries, it is called multiple time per row. In the group-by query in the benchmark, it is even 10 times per row. This function call sometimes takes 17%-20% CPU time. Usually ExprNodeColumnEvaluator.evaluate() itself takes 2%-3%, UnionStructObjectInspector.getStructFieldData() itself takes 2%-3%, ColumnarStruct.uncheckedGetField() itself takes 3%. It's hard to come up with a general solution that reduce the costs in a structual way. I tried to did several small code rewriting and hope we can get slight improvements: 1. nullSequence is not passed in for every call but from constructor 2. Restructure ColumnarStruct a little bit. 3. In ExprNodeColumnEvaluator, makes the single level special case, which in most of the time is the common case when referring a column. When trying to optimize functions which already only take 3%, it's hard to verify the performance enhancement since experiments anyway have slight variation eveyr time. For 1 and 2, I think they anyway make code better readable. I ran many times, and consistently see about 1% improvement too. 3 might make code less readable, but I see about 5% improvement from some simple group-by query.
        Hide
        Ashutosh Chauhan added a comment -

        Can you provide some info how you are planning to optimize it?

        Show
        Ashutosh Chauhan added a comment - Can you provide some info how you are planning to optimize it?

          People

          • Assignee:
            Siying Dong
            Reporter:
            Siying Dong
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development