Details

    • Type: Sub-task Sub-task
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      VectorizedRowBatch exposes members as public to avoid method call overheads. Alternative is to rely on JIT to inline the methods.

        Activity

        Hide
        Eric Hanson added a comment -

        This is by design. We forego the normally encapsulation on purpose to get a performance benefit in vectorized query execution. I recommend we close this issue.

        Show
        Eric Hanson added a comment - This is by design. We forego the normally encapsulation on purpose to get a performance benefit in vectorized query execution. I recommend we close this issue.
        Hide
        Brock Noland added a comment -

        Eric Hanson If we can show that it is faster, then I am totally fine with keeping this as is. What evidence that the encapsulation causes performance overhead?

        Brian Goetz, an authority on the topic, writes items like the one below all over the web:

        How can developers write Java code that performs well?
        
        The answer may seem counterintuitive. Often, the way to write fast code in Java applications is to write dumb code -- code that is straightforward, clean, and follows the most obvious object-oriented principles. This has to do with the nature of dynamic compilers, which are big pattern-matching engines. Because compilers are written by humans who have schedules and time budgets, the compiler developers focus their efforts on the most common code patterns, because that's where they get the most leverage. So if you write code using straightforward object-oriented principles, you'll get better compiler optimization than if you write gnarly, hacked-up, bit-banging code that looks really clever but that the compiler can't optimize effectively.
        

        http://www.oracle.com/technetwork/articles/javase/devinsight-1-139780.html

        Show
        Brock Noland added a comment - Eric Hanson If we can show that it is faster, then I am totally fine with keeping this as is. What evidence that the encapsulation causes performance overhead? Brian Goetz, an authority on the topic, writes items like the one below all over the web: How can developers write Java code that performs well? The answer may seem counterintuitive. Often, the way to write fast code in Java applications is to write dumb code -- code that is straightforward, clean, and follows the most obvious object-oriented principles. This has to do with the nature of dynamic compilers, which are big pattern-matching engines. Because compilers are written by humans who have schedules and time budgets, the compiler developers focus their efforts on the most common code patterns, because that's where they get the most leverage. So if you write code using straightforward object-oriented principles, you'll get better compiler optimization than if you write gnarly, hacked-up, bit-banging code that looks really clever but that the compiler can't optimize effectively. http://www.oracle.com/technetwork/articles/javase/devinsight-1-139780.html
        Hide
        Eric Hanson added a comment -

        Hi Brock,

        I'm in favor of encapsulation for most code. But this is different because this is a low-level performance enhancement project that has some research behind it. The theory behind the vectorized query execution technique that we use was published in this paper:

        Peter Boncz et al., MonetDB/X100: Hyper-Pipelining Query Execution, Proceedings of the CIDR Conference, 2005. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=C26BD72358252F6A301DA1FF6E37D44B?doi=10.1.1.324.9516&rep=rep1&type=pdf

        Please see the performance numbers in the paper.

        State of the art query execution systems like the one in Microsoft SQL Server, Vectorwise, Vertica, and ParAccel/Redshift (not in any particular order), all use this strategy or something like it. It's well known in the industry that this is a place where being architecture-conscious pays big dividends. That requires some violation of encapsulation.

        It is possible that the compiler might do some function inlining for us in the inner loop of some of the vector "for" loops, but that is too much of a risk for us in most cases to rely on the compiler here for the most primitive operations like arithmetic and comparisons. Arguably, using put/get methods to access columns rather than array access like we use in our VectorExpression subclasses probably would not lose much perfomance. But we already decided to use array access to get columns, and it is used in hundreds of places in the code. I think it is a reasonable choice and not necessary to change it.

        -Eric

        Show
        Eric Hanson added a comment - Hi Brock, I'm in favor of encapsulation for most code. But this is different because this is a low-level performance enhancement project that has some research behind it. The theory behind the vectorized query execution technique that we use was published in this paper: Peter Boncz et al., MonetDB/X100: Hyper-Pipelining Query Execution, Proceedings of the CIDR Conference, 2005. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=C26BD72358252F6A301DA1FF6E37D44B?doi=10.1.1.324.9516&rep=rep1&type=pdf Please see the performance numbers in the paper. State of the art query execution systems like the one in Microsoft SQL Server, Vectorwise, Vertica, and ParAccel/Redshift (not in any particular order), all use this strategy or something like it. It's well known in the industry that this is a place where being architecture-conscious pays big dividends. That requires some violation of encapsulation. It is possible that the compiler might do some function inlining for us in the inner loop of some of the vector "for" loops, but that is too much of a risk for us in most cases to rely on the compiler here for the most primitive operations like arithmetic and comparisons. Arguably, using put/get methods to access columns rather than array access like we use in our VectorExpression subclasses probably would not lose much perfomance. But we already decided to use array access to get columns, and it is used in hundreds of places in the code. I think it is a reasonable choice and not necessary to change it. -Eric
        Hide
        Brock Noland added a comment -

        Eric,

        Thank you for the comment. This issue should not be resolved until committers are in agreement. please reopen.

        Show
        Brock Noland added a comment - Eric, Thank you for the comment. This issue should not be resolved until committers are in agreement. please reopen.
        Hide
        Eric Hanson added a comment -

        Reopening per Brocks request. I'd be happy to discuss this further.

        Show
        Eric Hanson added a comment - Reopening per Brocks request. I'd be happy to discuss this further.

          People

          • Assignee:
            Jitendra Nath Pandey
            Reporter:
            Jitendra Nath Pandey
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development