Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9421

Metadata operations are slow in impala-shell when using hs2-http with LDAP auth.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • Impala 3.4.0
    • None
    • Clients
    • None
    • ghx-label-4

    Description

      Show database operation takes ~ 3 - 4 seconds, sometimes ~ 8 - 9 seconds in impala-shell when connecting to a coordinator using hs2-http with LDAP authentication:

      $ impala-shell.sh --protocol='hs2-http' --ssl -i "impala-coordinator:443" -u username -l
      
      impala-shell> show database;
      +------------------------+----------------------------------------------+
      | name | comment |
      +------------------------+----------------------------------------------+
      | _impala_builtins | System database for Impala builtin functions |
      | airline_ontime_orc | |
      | airline_ontime_parquet | |
      | default | Default Hive database |
      +------------------------+----------------------------------------------+
      Fetched 4 row(s) in 8.87s
      

      impala-coordinator logs show that there are multiple new connections set up and authenticated:

      I0225 16:07:58.143942   317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client <Host: 127.0.0.1 Port: 50216>
      I0225 16:07:58.144042   321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client <Host: 127.0.0.1 Port: 50216>
      I0225 16:07:58.144101   321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client <Host: 127.0.0.1 Port: 50216>
      I0225 16:07:58.144338 128883 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
      I0225 16:07:58.155827 128883 authentication.cc:273] LDAP bind successful
      I0225 16:07:58.155901 128883 impala-hs2-server.cc:1085] PingImpalaHS2Service(): request=TPingImpalaHS2ServiceReq {
        01: sessionHandle (struct) = TSessionHandle {
          01: sessionId (struct) = THandleIdentifier {
            01: guid (string) = "\xab\x9bS/\r\xd1@\xab\x862z\xee(#\x14h",
            02: secret (string) = "\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04",
          },
        },
      }
      I0225 16:07:58.876168   317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client <Host: 127.0.0.1 Port: 50222>
      I0225 16:07:58.876317   320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client <Host: 127.0.0.1 Port: 50222>
      I0225 16:07:58.876364   320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client <Host: 127.0.0.1 Port: 50222>
      I0225 16:07:58.876847 128884 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
      I0225 16:07:58.887931 128884 authentication.cc:273] LDAP bind successful
      I0225 16:07:58.888008 128884 impala-hs2-server.cc:442] ExecuteStatement(): request=TExecuteStatementReq {
        01: sessionHandle (struct) = TSessionHandle {
          01: sessionId (struct) = THandleIdentifier {
            01: guid (string) = "\xab\x9bS/\r\xd1@\xab\x862z\xee(#\x14h",
            02: secret (string) = "\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04",
          },
        },
        02: statement (string) = "show databases",
        03: confOverlay (map) = map<string,string>[1] {
          "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020",
        },
        04: runAsync (bool) = true,
      }
      I0225 16:07:58.888049 128884 impala-hs2-server.cc:230] TExecuteStatementReq: TExecuteStatementReq {
        01: sessionHandle (struct) = TSessionHandle {
          01: sessionId (struct) = THandleIdentifier {
            01: guid (string) = "\xab\x9bS/\r\xd1@\xab\x862z\xee(#\x14h",
            02: secret (string) = "\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04",
          },
        },
        02: statement (string) = "show databases",
        03: confOverlay (map) = map<string,string>[1] {
          "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020",
        },
        04: runAsync (bool) = true,
      }
      I0225 16:07:58.898981 128884 impala-hs2-server.cc:268] TClientRequest.queryOptions: TQueryOptions {
        01: abort_on_error (bool) = false,
        02: max_errors (i32) = 100,
        03: disable_codegen (bool) = false,
        04: batch_size (i32) = 0,
        05: num_nodes (i32) = 0,
        06: max_scan_range_length (i64) = 0,
        07: num_scanner_threads (i32) = 0,
        11: debug_action (string) = "",
        12: mem_limit (i64) = 0,
        15: hbase_caching (i32) = 0,
        16: hbase_cache_blocks (bool) = false,
        17: parquet_file_size (i64) = 0,
        18: explain_level (i32) = 1,
        19: sync_ddl (bool) = false,
        24: disable_outermost_topn (bool) = false,
        26: query_timeout_s (i32) = 0,
        28: appx_count_distinct (bool) = false,
        29: disable_unsafe_spills (bool) = false,
        31: exec_single_node_rows_threshold (i32) = 100,
        32: optimize_partition_key_scans (bool) = false,
        33: replica_preference (i32) = 0,
        34: schedule_random_replica (bool) = false,
        36: disable_streaming_preaggregations (bool) = false,
        37: runtime_filter_mode (i32) = 2,
        38: runtime_bloom_filter_size (i32) = 1048576,
        39: runtime_filter_wait_time_ms (i32) = 0,
        40: disable_row_runtime_filtering (bool) = false,
        41: max_num_runtime_filters (i32) = 10,
        42: parquet_annotate_strings_utf8 (bool) = false,
        43: parquet_fallback_schema_resolution (i32) = 0,
        45: s3_skip_insert_staging (bool) = true,
        46: runtime_filter_min_size (i32) = 1048576,
        47: runtime_filter_max_size (i32) = 16777216,
        48: prefetch_mode (i32) = 1,
        49: strict_mode (bool) = false,
        50: scratch_limit (i64) = -1,
        51: enable_expr_rewrites (bool) = true,
        52: decimal_v2 (bool) = true,
        53: parquet_dictionary_filtering (bool) = true,
        54: parquet_array_resolution (i32) = 0,
        55: parquet_read_statistics (bool) = true,
        56: default_join_distribution_mode (i32) = 0,
        57: disable_codegen_rows_threshold (i32) = 50000,
        58: default_spillable_buffer_size (i64) = 2097152,
        59: min_spillable_buffer_size (i64) = 65536,
        60: max_row_size (i64) = 524288,
        61: idle_session_timeout (i32) = 900,
        62: compute_stats_min_sample_size (i64) = 1073741824,
        63: exec_time_limit_s (i32) = 0,
        64: shuffle_distinct_exprs (bool) = true,
        65: max_mem_estimate_for_admission (i64) = 0,
        66: thread_reservation_limit (i32) = 3000,
        67: thread_reservation_aggregate_limit (i32) = 0,
        68: kudu_read_mode (i32) = 0,
        69: allow_erasure_coded_files (bool) = false,
        70: timezone (string) = "",
        71: scan_bytes_limit (i64) = 0,
        72: cpu_limit_s (i64) = 0,
        73: topn_bytes_limit (i64) = 536870912,
        74: client_identifier (string) = "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020",
        75: resource_trace_ratio (double) = 0,
        76: num_remote_executor_candidates (i32) = 3,
        77: num_rows_produced_limit (i64) = 0,
        78: planner_testcase_mode (bool) = false,
        79: default_file_format (i32) = 4,
        80: parquet_timestamp_type (i32) = 0,
        81: parquet_read_page_index (bool) = true,
        82: parquet_write_page_index (bool) = true,
        84: disable_hdfs_num_rows_estimate (bool) = false,
        86: spool_query_results (bool) = true,
        87: default_transactional_type (i32) = 1,
        88: statement_expression_limit (i32) = 250000,
        89: max_statement_length_bytes (i32) = 16777216,
        90: disable_data_cache (bool) = false,
        91: max_result_spooling_mem (i64) = 104857600,
        92: max_spilled_result_spooling_mem (i64) = 1073741824,
        93: disable_hbase_num_rows_estimate (bool) = false,
        94: fetch_rows_timeout_ms (i64) = 10000,
        95: now_string (string) = "",
        96: parquet_object_store_split_size (i64) = 268435456,
        97: mem_limit_executors (i64) = 0,
        98: broadcast_bytes_limit (i64) = 34359738368,
      }
      I0225 16:07:58.899091 128884 impala-server.cc:987] Found local timezone "UTC".
      I0225 16:07:58.900794 128884 impala-server.cc:1042] ac4832ea4ab1a2be:38f5a41400000000] Registered query query_id=ac4832ea4ab1a2be:38f5a41400000000 session_id=ab40d10d2f539bab:68142328ee7a3286
      I0225 16:07:58.901051 128884 Frontend.java:1499] ac4832ea4ab1a2be:38f5a41400000000] Analyzing query: show databases db: default
      I0225 16:07:58.901293 128884 BaseAuthorizationChecker.java:110] ac4832ea4ab1a2be:38f5a41400000000] Authorization check took 0 ms
      I0225 16:07:58.901369 128884 Frontend.java:1541] ac4832ea4ab1a2be:38f5a41400000000] Analysis and authorization finished.
      I0225 16:07:58.903031 128884 impala-server.cc:1080] Query ac4832ea4ab1a2be:38f5a41400000000 has idle timeout of 10m
      I0225 16:07:58.903087 128884 impala-hs2-server.cc:512] ExecuteStatement(): return_val=TExecuteStatementResp {
        01: status (struct) = TStatus {
          01: statusCode (i32) = 0,
        },
        02: operationHandle (struct) = TOperationHandle {
          01: operationId (struct) = THandleIdentifier {
            01: guid (string) = "\xbe\xa2\xb1J\xea2H\xac\x00\x00\x00\x00\x14\xa4\xf58",
            02: secret (string) = "\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04",
          },
          02: operationType (i32) = 0,
          03: hasResultSet (bool) = true,
        },
      }
      I0225 16:07:59.617283   317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client <Host: 127.0.0.1 Port: 50244>
      I0225 16:07:59.617388   321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client <Host: 127.0.0.1 Port: 50244>
      I0225 16:07:59.617424   321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client <Host: 127.0.0.1 Port: 50244>
      I0225 16:07:59.617705 128886 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
      I0225 16:07:59.629288 128886 authentication.cc:273] LDAP bind successful
      I0225 16:07:59.629354 128886 impala-hs2-server.cc:812] GetResultSetMetadata(): query_id=ac4832ea4ab1a2be:38f5a41400000000
      I0225 16:07:59.629410 128886 impala-hs2-server.cc:847] GetResultSetMetadata(): return_val=TGetResultSetMetadataResp {
        01: status (struct) = TStatus {
          01: statusCode (i32) = 0,
        },
        02: schema (struct) = TTableSchema {
          01: columns (list) = list<struct>[2] {
            [0] = TColumnDesc {
              01: columnName (string) = "name",
              02: typeDesc (struct) = TTypeDesc {
                01: types (list) = list<struct>[1] {
                  [0] = TTypeEntry {
                    01: primitiveEntry (struct) = TPrimitiveTypeEntry {
                      01: type (i32) = 7,
                    },
                  },
                },
              },
              03: position (i32) = 0,
            },
            [1] = TColumnDesc {
              01: columnName (string) = "comment",
              02: typeDesc (struct) = TTypeDesc {
                01: types (list) = list<struct>[1] {
                  [0] = TTypeEntry {
                    01: primitiveEntry (struct) = TPrimitiveTypeEntry {
                      01: type (i32) = 7,
                    },
                  },
                },
              },
              03: position (i32) = 1,
            },
          },
        },
      }
      I0225 16:08:00.347491 128862 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
      I0225 16:08:00.535367 128862 authentication.cc:273] LDAP bind successful
      I0225 16:08:01.253826   317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client <Host: 127.0.0.1 Port: 50256>
      I0225 16:08:01.253938   320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client <Host: 127.0.0.1 Port: 50256>
      I0225 16:08:01.253988   320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client <Host: 127.0.0.1 Port: 50256>
      I0225 16:08:01.254253 128887 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
      I0225 16:08:01.264217 128887 authentication.cc:273] LDAP bind successful
      I0225 16:08:01.982829   317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client <Host: 127.0.0.1 Port: 50282>
      I0225 16:08:01.982926   321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client <Host: 127.0.0.1 Port: 50282>
      I0225 16:08:01.982965   321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client <Host: 127.0.0.1 Port: 50282>
      I0225 16:08:01.983230 128901 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
      I0225 16:08:07.029768 128901 authentication.cc:273] LDAP bind successful
      I0225 16:08:07.747694 128860 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
      I0225 16:08:07.758265 128860 authentication.cc:273] LDAP bind successful
      I0225 16:08:07.758330 128860 impala-hs2-server.cc:778] CloseOperation(): query_id=ac4832ea4ab1a2be:38f5a41400000000
      I0225 16:08:07.758345 128860 impala-server.cc:1121] UnregisterQuery(): query_id=ac4832ea4ab1a2be:38f5a41400000000
      I0225 16:08:07.758352 128860 impala-server.cc:1223] Cancel(): query_id=ac4832ea4ab1a2be:38f5a41400000000
      I0225 16:08:08.463980   317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client <Host: 127.0.0.1 Port: 50354>
      I0225 16:08:08.464076   320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client <Host: 127.0.0.1 Port: 50354>
      I0225 16:08:08.464107   320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client <Host: 127.0.0.1 Port: 50354>
      I0225 16:08:08.464340 128913 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
      I0225 16:08:08.474979 128913 authentication.cc:273] LDAP bind successful
      I0225 16:08:28.186151 128883 impala-server.cc:1957] Connection 5f417413b1c20e07:c36931be0b8aa094 from client 127.0.0.1:50216 to server hiveserver2-http-frontend closed. The connection had 1 associated session(s).
      I0225 16:08:28.933374 128884 impala-server.cc:1957] Connection 0b418f1d01e5c8c7:ee2c4c7a963619a6 from client 127.0.0.1:50222 to server hiveserver2-http-frontend closed. The connection had 1 associated session(s).
      I0225 16:08:29.658985 128886 impala-server.cc:1957] Connection c74c832b5368b85a:49e58a673f5d7ba9 from client 127.0.0.1:50244 to server hiveserver2-http-frontend closed. The connection had 1 associated session(s).
      I0225 16:08:30.565888 128862 impala-server.cc:1957] Connection ce47f868d7284d03:d86d2e769c3726a4 from client 127.0.0.1:50142 to server hiveserver2-http-frontend closed. The connection had 1 associated session(s).
      I0225 16:08:31.294651 128887 impala-server.cc:1957] Connection df4d3b98d939f104:01f15c102203cc81 from client 127.0.0.1:50256 to server hiveserver2-http-frontend closed. The connection had 1 associated session(s).
      I0225 16:08:37.060230 128901 impala-server.cc:1957] Connection c6481bc01063b9c4:b9c726aeb14b91a6 from client 127.0.0.1:50282 to server hiveserver2-http-frontend closed. The connection had 1 associated session(s).
      I0225 16:08:37.789135 128860 impala-server.cc:1957] Connection ad43f983d8502f74:106705062799569e from client 127.0.0.1:50136 to server hiveserver2-http-frontend closed. The connection had
      1 associated session(s).
      I0225 16:08:38.505254 128913 impala-server.cc:1957] Connection d146ef0df1cad245:1e4d8c8dd97fcc81 from client 127.0.0.1:50354 to server hiveserver2-http-frontend closed. The connection had 1 associated session(s).
      

      Looks like there's a new connection and LDAP authentication for each RPC call made which imposes an overhead.

      Please investigate whether it's possible to speed things up by reusing connections.

      Attachments

        Activity

          People

            Unassigned Unassigned
            attilaj Attila Jeges
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: