Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
    • Epic Color:
      ghx-label-7

      Description

      Replacing StringCompare (which uses SSE4.2 instructions) with a call to glibc's dynamically-dispatched memcmp results in a >5x improvement for large strings.

      memcmp on my machine mainly uses sse4.1's ptest, after detecting at run-time that I have sse4.1 instructions available. The StringCompare benchmark is 5 years old and likely out-of-date by now.

      To replicate:

      create table long_strings (s string) stored as parquet;
      insert into long_strings values (repeat("a", 2048));
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, (select * from long_strings limit 10) b;
      select count(*) from long_strings where s <= repeat("a", 2048);
      

        Attachments

          Activity

            People

            • Assignee:
              jbapple Jim Apple
              Reporter:
              jbapple Jim Apple
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: