Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5273

StringCompare is very slow

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.9.0
    • Impala 2.9.0
    • Backend
    • ghx-label-7

    Description

      Replacing StringCompare (which uses SSE4.2 instructions) with a call to glibc's dynamically-dispatched memcmp results in a >5x improvement for large strings.

      memcmp on my machine mainly uses sse4.1's ptest, after detecting at run-time that I have sse4.1 instructions available. The StringCompare benchmark is 5 years old and likely out-of-date by now.

      To replicate:

      create table long_strings (s string) stored as parquet;
      insert into long_strings values (repeat("a", 2048));
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, long_strings b;
      insert into long_strings select a.s from long_strings a, (select * from long_strings limit 10) b;
      select count(*) from long_strings where s <= repeat("a", 2048);
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jbapple Jim Apple
            jbapple Jim Apple
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment