Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3780

Uncompressed text scanner is slow when reading strings that significantly exceed the HDFS block size

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0
    • Fix Version/s: Impala 2.7.0
    • Component/s: Backend
    • Labels:

      Description

      create table x as select repeat(' ', 256 * 1024 * 1024);
      select count(*) from x;
      

      I observed after adding logging that the scanner was issuing many small 1024-byte scan ranges:

      I0623 10:20:47.557819 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268382208
      I0623 10:20:47.563014 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268383232
      I0623 10:20:47.563601 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268384256
      I0623 10:20:47.564059 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268385280
      I0623 10:20:47.564450 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268386304
      I0623 10:20:47.564824 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268387328
      I0623 10:20:47.565206 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268388352
      I0623 10:20:47.565558 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268389376
      I0623 10:20:47.565938 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268390400
      I0623 10:20:47.566298 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268391424
      I0623 10:20:47.566640 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268392448
      I0623 10:20:47.566963 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268393472
      I0623 10:20:47.567358 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268394496
      I0623 10:20:47.567790 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268395520
      I0623 10:20:47.568143 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268396544
      I0623 10:20:47.568578 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268397568
      I0623 10:20:47.568936 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268398592
      I0623 10:20:47.569310 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268399616
      I0623 10:20:47.569677 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268400640
      I0623 10:20:47.570036 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268401664
      I0623 10:20:47.570395 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268402688
      I0623 10:20:47.570732 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268403712
      I0623 10:20:47.571077 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268404736
      I0623 10:20:47.571424 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268405760
      I0623 10:20:47.571790 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268406784
      I0623 10:20:47.572145 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268407808
      I0623 10:20:47.572485 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268408832
      I0623 10:20:47.572835 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268409856
      I0623 10:20:47.573196 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268410880
      I0623 10:20:47.573559 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268411904
      I0623 10:20:47.573904 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268412928
      I0623 10:20:47.574261 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268413952
      I0623 10:20:47.574631 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268414976
      I0623 10:20:47.574952 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268416000
      I0623 10:20:47.575294 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268417024
      I0623 10:20:47.575670 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268418048
      I0623 10:20:47.576092 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268419072
      I0623 10:20:47.576462 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268420096
      I0623 10:20:47.576815 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268421120
      I0623 10:20:47.577217 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268422144
      I0623 10:20:47.577611 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268423168
      I0623 10:20:47.577955 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268424192
      I0623 10:20:47.578313 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268425216
      I0623 10:20:47.578652 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268426240
      I0623 10:20:47.579011 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268427264
      I0623 10:20:47.579371 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268428288
      I0623 10:20:47.579727 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268429312
      I0623 10:20:47.580072 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268430336
      I0623 10:20:47.580392 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268431360
      I0623 10:20:47.580736 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268432384
      I0623 10:20:47.581080 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268433408
      I0623 10:20:47.581418 24267 disk-io-mgr.cc:526] Range for  hdfs://localhost:20500/test-warehouse/x/3b4203657863c272-77b1da9f79b0f998_2139861805_data.0.: len: 1024 offset 268434432
      

      This appears to be based on the "NEXT_BLOCK_READ_SIZE" constant in the text scanner, which is used to try and find the end of a field that extends into the next HDFS block. We could avoid this in a couple of ways:

      • Increase the constant. It's unclear why it's so low: I think the cost of reading additional data is probably negligible, at least up to 10's or 100's of KB or so.
      • Ramp up the read size, e.g. recursive doubling up to 8MB.

        Attachments

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: