Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
    • Epic Color:
      ghx-label-7

      Description

      Replacing StringCompare (which uses SSE4.2 instructions) with a call to glibc's dynamically-dispatched memcmp results in a >5x improvement for large strings.

      memcmp on my machine mainly uses sse4.1's ptest, after detecting at run-time that I have sse4.1 instructions available. The StringCompare benchmark is 5 years old and likely out-of-date by now.

      To replicate:

      create table long_strings (s string) stored as parquet;
      insert into long_strings values (repeat("a", 2048));
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, (select * from long_strings limit 10) b;
      select count(*) from long_strings where s <= repeat("a", 2048);
      

        Activity

        Hide
        jbapple Jim Apple added a comment -

        Patch available here: https://gerrit.cloudera.org/6768

        Show
        jbapple Jim Apple added a comment - Patch available here: https://gerrit.cloudera.org/6768
        Hide
        jbapple Jim Apple added a comment -
        IMPALA-5273: Replace StringCompare with glibc memcmp
        
        glibc's memcmp, which dispatches dynamically based on the instructions
        the processor supports, uses sse4.1's ptest, which is faster than our
        implementation.
        
        I ran a the benchmark below. The final query sped up by about 5x with
        this patch.
        
            create table long_strings (s string) stored as parquet;
            insert into long_strings values (repeat("a", 2048));
            insert into long_strings select a.s from long_strings a,
              long_strings b;
            insert into long_strings select a.s from long_strings a,
              long_strings b;
            insert into long_strings select a.s from long_strings a,
              long_strings b;
            insert into long_strings select a.s from long_strings a,
              long_strings b;
            insert into long_strings select a.s from long_strings a,
              long_strings b;
            insert into long_strings select a.s from long_strings a,
              (select * from long_strings limit 10) b;
            select count(*) from long_strings where s <= repeat("a", 2048);
        
        Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
        Reviewed-on: http://gerrit.cloudera.org:8080/6768
        Reviewed-by: Jim Apple <jbapple-impala@apache.org>
        Tested-by: Impala Public Jenkins
        
        Show
        jbapple Jim Apple added a comment - IMPALA-5273: Replace StringCompare with glibc memcmp glibc's memcmp, which dispatches dynamically based on the instructions the processor supports, uses sse4.1's ptest, which is faster than our implementation. I ran a the benchmark below. The final query sped up by about 5x with this patch. create table long_strings (s string) stored as parquet; insert into long_strings values (repeat("a", 2048)); insert into long_strings select a.s from long_strings a, long_strings b; insert into long_strings select a.s from long_strings a, long_strings b; insert into long_strings select a.s from long_strings a, long_strings b; insert into long_strings select a.s from long_strings a, long_strings b; insert into long_strings select a.s from long_strings a, long_strings b; insert into long_strings select a.s from long_strings a, (select * from long_strings limit 10) b; select count(*) from long_strings where s <= repeat("a", 2048); Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Reviewed-on: http://gerrit.cloudera.org:8080/6768 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            jbapple Jim Apple
            Reporter:
            jbapple Jim Apple
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development