Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2231

"materializing_iterator_do_pushdown=true" cause simple query slow

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.0, 1.5.0, 1.6.0
    • 1.5.1, 1.7.0, 1.6.1
    • master, tserver
    • None

    Description

      I ran the following SQL again and again
      while refresh 8050/scans page at the same time.

      sql:

      select count(xx_id),count(yy_id),count(time) from  test_table  where event_id =29983; 
      

      "Cells read from disk" is much more greater then table size when materializing_iterator_do_pushdown = true (default).

      after setting materializing_iterator_do_pushdown = false
      "Cells read from disk" reduced to some reasonable value (close to table size)
      and the sql run faster.

      here's detail:

      table under test:

      CREATE TABLE rawdata.test_table (
        day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
        user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
        time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
        event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
        distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION DEFAULT_COMPRESSION,
        ...
        ...  other fields ...
        ...
        PRIMARY KEY (day, user_id, time, _offset)
      )
      PARTITION BY HASH (user_id) PARTITIONS 9
      STORED AS KUDU
      TBLPROPERTIES ( ... );
      

      table size (select count(1) from test_table) : 19510709

      CASE 1, materializing_iterator_do_pushdown = true
      756ACA6F105F0905EBCB79B940FFCE86.jpg

      CASE 2, materializing_iterator_do_pushdown = false (sql ran faster)
      F8C604537B8E921DDCCA78995DC11BDA.jpg

      it looks like kudu scan table multiple times for the simple sql caused by some silly bug.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            danburkert Dan Burkert
            dawn110110 DawnZhang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment