Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4722

test_scratch_disk.py fails sporadically when asserting logfile content

    Details

      Description

      In automated tests we're seeing sporadic failures of tests/custom_cluster/test_scratch_disk.py. It looks like the log contents may not have been flushed to disk before calling self.assert_impalad_log_contains("INFO", "Using scratch directory ", expected_count=1). This can be fixed by adding --logbuflevel=-1 or --logbufsecs=0 to the startup flags.

        Activity

        Hide
        lv Lars Volker added a comment -

        Adding '--logbuflevel=-1' didn't help. The test failed again, this time after 5560 iterations. I had it print the last line of the logfile that it saw and the log message that the test expects got printed 180ms after that last line. The log check is performed right after starting the cluster, which waits until all backends show up in the /backends debug page.

        The TmpFileMgr is initialized in impala-server.cc:260, the webserver is and cluster membership is initialized after this. I don't see why the registration of backends should complete before the TmpFileMgr initialization is printed to the logs. Does anyone have an idea?

        The last line of the log that the test saw was the query options at I0104 16:24:06.227200 24470 impala-server.cc:1133

        I0104 16:24:06.227200 24470 impala-server.cc:1133] Default query options:TQueryOptions {
          01: abort_on_error (bool) = false,
          02: max_errors (i32) = 0,
          03: disable_codegen (bool) = false,
          04: batch_size (i32) = 0,
          05: num_nodes (i32) = 0,
          06: max_scan_range_length (i64) = 0,
          07: num_scanner_threads (i32) = 0,
          08: max_io_buffers (i32) = 0,
          09: allow_unsupported_formats (bool) = false,
          10: default_order_by_limit (i64) = -1,
          11: debug_action (string) = "",
          12: mem_limit (i64) = 0,
          13: abort_on_default_limit_exceeded (bool) = false,
          15: hbase_caching (i32) = 0,
          16: hbase_cache_blocks (bool) = false,
          17: parquet_file_size (i64) = 0,
          18: explain_level (i32) = 1,
          19: sync_ddl (bool) = false,
          23: disable_cached_reads (bool) = false,
          24: disable_outermost_topn (bool) = false,
          25: rm_initial_mem (i64) = 0,
          26: query_timeout_s (i32) = 0,
          28: appx_count_distinct (bool) = false,
          29: disable_unsafe_spills (bool) = false,
          31: exec_single_node_rows_threshold (i32) = 100,
          32: optimize_partition_key_scans (bool) = false,
          33: replica_preference (i32) = 0,
          34: schedule_random_replica (bool) = false,
          35: scan_node_codegen_threshold (i64) = 1800000,
          36: disable_streaming_preaggregations (bool) = false,
          37: runtime_filter_mode (i32) = 2,
          38: runtime_bloom_filter_size (i32) = 1048576,
          39: runtime_filter_wait_time_ms (i32) = 0,
          40: disable_row_runtime_filtering (bool) = false,
          41: max_num_runtime_filters (i32) = 10,
          42: parquet_annotate_strings_utf8 (bool) = false,
          43: parquet_fallback_schema_resolution (i32) = 0,
          45: s3_skip_insert_staging (bool) = true,
          46: runtime_filter_min_size (i32) = 1048576,
          47: runtime_filter_max_size (i32) = 16777216,
          48: prefetch_mode (i32) = 1,
          49: strict_mode (bool) = false,
          50: scratch_limit (i64) = -1,
          51: enable_expr_rewrites (bool) = true,
        }
        I0104 16:24:06.335373 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162201.16165
        I0104 16:24:06.335451 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162215.17099
        I0104 16:24:06.337698 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.WARNING.20170104-162227.17942
        I0104 16:24:06.337749 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.WARNING.20170104-162240.18898
        I0104 16:24:06.337774 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.WARNING.20170104-162256.19865
        I0104 16:24:06.337800 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.WARNING.20170104-162309.20710
        I0104 16:24:06.340106 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162159.16102
        I0104 16:24:06.340132 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162215.17099
        I0104 16:24:06.340153 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162229.18004
        I0104 16:24:06.407470 24470 tmp-file-mgr.cc:109] Using scratch directory /tmp/tmpeursQw/impala-scratch on disk 1
        I0104 16:24:06.407840 24470 simple-logger.cc:83] Logging to: /tmp//profiles//impala_profile_log_1.1-1483575846407
        I0104 16:24:06.408027 24470 impala-server.cc:493] Event logging is disabled
        I0104 16:24:06.408041 24470 impala-server.cc:401] Lineage logging is disabled
        I0104 16:24:06.431881 24470 impala-server.cc:1857] Impala Beeswax Service listening on 21000
        I0104 16:24:06.433992 24470 impala-server.cc:1879] Impala HiveServer2 Service listening on 21050
        I0104 16:24:06.435863 24470 impala-server.cc:1898] ImpalaInternalService listening on 22000
        I0104 16:24:06.436725 24470 thrift-server.cc:449] ThriftServer 'backend' started on port: 22000
        I0104 16:24:06.448998 24470 thrift-server.cc:449] ThriftServer 'beeswax-frontend' started on port: 21000
        I0104 16:24:06.453007 24470 thrift-server.cc:449] ThriftServer 'hiveserver2-frontend' started on port: 21050
        I0104 16:24:06.453022 24470 exec-env.cc:241] Starting global services
        I0104 16:24:06.462091 24470 exec-env.cc:318] Using global memory limit: 8.35 GB
        I0104 16:24:06.463999 24470 webserver.cc:233] Starting webserver on 0.0.0.0:25000
        I0104 16:24:06.464025 24470 webserver.cc:247] Document root: /home/lv/i2
        I0104 16:24:06.464481 24470 webserver.cc:331] Webserver started
        I0104 16:24:06.464524 24470 simple-scheduler.cc:118] Starting simple scheduler
        I0104 16:24:06.464694 24470 simple-scheduler.cc:132] Simple-scheduler using 127.0.0.1 as IP address
        I0104 16:24:06.470088 24470 statestore-subscriber.cc:181] Starting statestore subscriber
        I0104 16:24:06.472921 24470 thrift-server.cc:449] ThriftServer 'StatestoreSubscriber' started on port: 23000
        I0104 16:24:06.472936 24470 statestore-subscriber.cc:193] Registering with statestore
        I0104 16:24:06.475055 24470 statestore-subscriber.cc:165] Subscriber registration ID: e146a9dafa99184d:22444f0f0ce6f381
        I0104 16:24:06.475244 24470 statestore-subscriber.cc:197] statestore registration successful
        I0104 16:24:06.475464 24470 impalad-main.cc:101] Impala has started.
        I0104 16:24:06.489202 25026 simple-scheduler.cc:271] Registering local backend with statestore
        W0104 16:24:08.695480 25026 HiveConf.java:2789] HiveConf of name hive.access.conf.url does not exist
        I0104 16:24:11.343762 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162200.16135
        I0104 16:24:11.343788 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162214.17069
        I0104 16:24:11.343802 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162229.18004
        I0104 16:24:11.346206 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162214.17069
        I0104 16:24:11.346220 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162228.17974
        I0104 16:24:16.348821 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162159.16102
        I0104 16:24:16.348847 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162213.17037
        I0104 16:24:16.348862 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162228.17974
        I0104 16:24:16.351285 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162213.17037
        I0104 16:24:16.351299 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162227.17942
        I0104 16:24:21.353953 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162227.17942
        I0105 03:02:01.819298 25025 thrift-util.cc:111] TSocket::read() recv() <Host: ::ffff:127.0.0.1 Port: 43704>Connection reset by peer
        I0105 03:02:01.819434 25025 thrift-util.cc:111] TAcceptQueueServer client died: ECONNRESET
        I0105 03:02:01.819787 25026 thrift-util.cc:111] TSocket::read() recv() <Host: ::ffff:127.0.0.1 Port: 43706>Connection reset by peer
        I0105 03:02:01.819806 25026 thrift-util.cc:111] TAcceptQueueServer client died: ECONNRESET
        
        Show
        lv Lars Volker added a comment - Adding '--logbuflevel=-1' didn't help. The test failed again, this time after 5560 iterations. I had it print the last line of the logfile that it saw and the log message that the test expects got printed 180ms after that last line. The log check is performed right after starting the cluster, which waits until all backends show up in the /backends debug page. The TmpFileMgr is initialized in impala-server.cc:260, the webserver is and cluster membership is initialized after this. I don't see why the registration of backends should complete before the TmpFileMgr initialization is printed to the logs. Does anyone have an idea? The last line of the log that the test saw was the query options at I0104 16:24:06.227200 24470 impala-server.cc:1133 I0104 16:24:06.227200 24470 impala-server.cc:1133] Default query options:TQueryOptions { 01: abort_on_error (bool) = false, 02: max_errors (i32) = 0, 03: disable_codegen (bool) = false, 04: batch_size (i32) = 0, 05: num_nodes (i32) = 0, 06: max_scan_range_length (i64) = 0, 07: num_scanner_threads (i32) = 0, 08: max_io_buffers (i32) = 0, 09: allow_unsupported_formats (bool) = false, 10: default_order_by_limit (i64) = -1, 11: debug_action (string) = "", 12: mem_limit (i64) = 0, 13: abort_on_default_limit_exceeded (bool) = false, 15: hbase_caching (i32) = 0, 16: hbase_cache_blocks (bool) = false, 17: parquet_file_size (i64) = 0, 18: explain_level (i32) = 1, 19: sync_ddl (bool) = false, 23: disable_cached_reads (bool) = false, 24: disable_outermost_topn (bool) = false, 25: rm_initial_mem (i64) = 0, 26: query_timeout_s (i32) = 0, 28: appx_count_distinct (bool) = false, 29: disable_unsafe_spills (bool) = false, 31: exec_single_node_rows_threshold (i32) = 100, 32: optimize_partition_key_scans (bool) = false, 33: replica_preference (i32) = 0, 34: schedule_random_replica (bool) = false, 35: scan_node_codegen_threshold (i64) = 1800000, 36: disable_streaming_preaggregations (bool) = false, 37: runtime_filter_mode (i32) = 2, 38: runtime_bloom_filter_size (i32) = 1048576, 39: runtime_filter_wait_time_ms (i32) = 0, 40: disable_row_runtime_filtering (bool) = false, 41: max_num_runtime_filters (i32) = 10, 42: parquet_annotate_strings_utf8 (bool) = false, 43: parquet_fallback_schema_resolution (i32) = 0, 45: s3_skip_insert_staging (bool) = true, 46: runtime_filter_min_size (i32) = 1048576, 47: runtime_filter_max_size (i32) = 16777216, 48: prefetch_mode (i32) = 1, 49: strict_mode (bool) = false, 50: scratch_limit (i64) = -1, 51: enable_expr_rewrites (bool) = true, } I0104 16:24:06.335373 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162201.16165 I0104 16:24:06.335451 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162215.17099 I0104 16:24:06.337698 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.WARNING.20170104-162227.17942 I0104 16:24:06.337749 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.WARNING.20170104-162240.18898 I0104 16:24:06.337774 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.WARNING.20170104-162256.19865 I0104 16:24:06.337800 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.WARNING.20170104-162309.20710 I0104 16:24:06.340106 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162159.16102 I0104 16:24:06.340132 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162215.17099 I0104 16:24:06.340153 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162229.18004 I0104 16:24:06.407470 24470 tmp-file-mgr.cc:109] Using scratch directory /tmp/tmpeursQw/impala-scratch on disk 1 I0104 16:24:06.407840 24470 simple-logger.cc:83] Logging to: /tmp//profiles//impala_profile_log_1.1-1483575846407 I0104 16:24:06.408027 24470 impala-server.cc:493] Event logging is disabled I0104 16:24:06.408041 24470 impala-server.cc:401] Lineage logging is disabled I0104 16:24:06.431881 24470 impala-server.cc:1857] Impala Beeswax Service listening on 21000 I0104 16:24:06.433992 24470 impala-server.cc:1879] Impala HiveServer2 Service listening on 21050 I0104 16:24:06.435863 24470 impala-server.cc:1898] ImpalaInternalService listening on 22000 I0104 16:24:06.436725 24470 thrift-server.cc:449] ThriftServer 'backend' started on port: 22000 I0104 16:24:06.448998 24470 thrift-server.cc:449] ThriftServer 'beeswax-frontend' started on port: 21000 I0104 16:24:06.453007 24470 thrift-server.cc:449] ThriftServer 'hiveserver2-frontend' started on port: 21050 I0104 16:24:06.453022 24470 exec-env.cc:241] Starting global services I0104 16:24:06.462091 24470 exec-env.cc:318] Using global memory limit: 8.35 GB I0104 16:24:06.463999 24470 webserver.cc:233] Starting webserver on 0.0.0.0:25000 I0104 16:24:06.464025 24470 webserver.cc:247] Document root: /home/lv/i2 I0104 16:24:06.464481 24470 webserver.cc:331] Webserver started I0104 16:24:06.464524 24470 simple-scheduler.cc:118] Starting simple scheduler I0104 16:24:06.464694 24470 simple-scheduler.cc:132] Simple-scheduler using 127.0.0.1 as IP address I0104 16:24:06.470088 24470 statestore-subscriber.cc:181] Starting statestore subscriber I0104 16:24:06.472921 24470 thrift-server.cc:449] ThriftServer 'StatestoreSubscriber' started on port: 23000 I0104 16:24:06.472936 24470 statestore-subscriber.cc:193] Registering with statestore I0104 16:24:06.475055 24470 statestore-subscriber.cc:165] Subscriber registration ID: e146a9dafa99184d:22444f0f0ce6f381 I0104 16:24:06.475244 24470 statestore-subscriber.cc:197] statestore registration successful I0104 16:24:06.475464 24470 impalad-main.cc:101] Impala has started. I0104 16:24:06.489202 25026 simple-scheduler.cc:271] Registering local backend with statestore W0104 16:24:08.695480 25026 HiveConf.java:2789] HiveConf of name hive.access.conf.url does not exist I0104 16:24:11.343762 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162200.16135 I0104 16:24:11.343788 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162214.17069 I0104 16:24:11.343802 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162229.18004 I0104 16:24:11.346206 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162214.17069 I0104 16:24:11.346220 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162228.17974 I0104 16:24:16.348821 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162159.16102 I0104 16:24:16.348847 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162213.17037 I0104 16:24:16.348862 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162228.17974 I0104 16:24:16.351285 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162213.17037 I0104 16:24:16.351299 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.ERROR.20170104-162227.17942 I0104 16:24:21.353953 24478 logging-support.cc:139] Old log file deleted during log rotation: /tmp//impalad.lv-desktop.lv.log.INFO.20170104-162227.17942 I0105 03:02:01.819298 25025 thrift-util.cc:111] TSocket::read() recv() <Host: ::ffff:127.0.0.1 Port: 43704>Connection reset by peer I0105 03:02:01.819434 25025 thrift-util.cc:111] TAcceptQueueServer client died: ECONNRESET I0105 03:02:01.819787 25026 thrift-util.cc:111] TSocket::read() recv() <Host: ::ffff:127.0.0.1 Port: 43706>Connection reset by peer I0105 03:02:01.819806 25026 thrift-util.cc:111] TAcceptQueueServer client died: ECONNRESET
        Hide
        lv Lars Volker added a comment -

        I found out that the with_args decorator doesn't work in test_scratch_disk.py since all tests are responsible for starting the minicluster themselves. I added -logbuflevel=-1 to the startup args inside a test and will run it in a loop again.

        Show
        lv Lars Volker added a comment - I found out that the with_args decorator doesn't work in test_scratch_disk.py since all tests are responsible for starting the minicluster themselves. I added -logbuflevel=-1 to the startup args inside a test and will run it in a loop again.
        Hide
        lv Lars Volker added a comment -

        IMPALA-4722: Disable log caching in test_scratch_disk

        test_scratch_disk fails sporadically when trying to assert the presence
        of log messages. This is probably caused by log caching, since after
        such failures the log files do contains the lines in question.

        I manually tested this by running the tests repeatedly for 2 days (10k
        runs).

        To make future diagnosis of similar problems easier, this change also
        adds more output to assert_impalad_log_contains().

        Change-Id: I9f21284338ee7b4374aca249b6556282b0148389
        Reviewed-on: http://gerrit.cloudera.org:8080/5669
        Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        lv Lars Volker added a comment - IMPALA-4722 : Disable log caching in test_scratch_disk test_scratch_disk fails sporadically when trying to assert the presence of log messages. This is probably caused by log caching, since after such failures the log files do contains the lines in question. I manually tested this by running the tests repeatedly for 2 days (10k runs). To make future diagnosis of similar problems easier, this change also adds more output to assert_impalad_log_contains(). Change-Id: I9f21284338ee7b4374aca249b6556282b0148389 Reviewed-on: http://gerrit.cloudera.org:8080/5669 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            lv Lars Volker
            Reporter:
            lv Lars Volker
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development