Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5965

Avoid per-value switch on NeedsConversionInline() when decoding dictionary-encoded strings and timestamps

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.10.0
    • Impala 2.11.0
    • Backend

    Description

      I noticed a minor inefficiency in the handling of NeedsConversion() in the parquet scanner. In cases where it's not a runtime constant like dictionary-encoded strings and timestamps, we switch per value. This is probably only a few instructions but in this part of the code that matters.

      I did a quick benchmark and saw speedups from ~2.25s->2.11s in scan time on this query:

      use tpch_parquet; 
      set num_nodes=1;
      set mt_dop=1;
      select min(l_returnflag), min(l_linestatus) from biglineitem;
      summary;
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tarmstrong Tim Armstrong
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment