Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3080

Slow performance on kerberos cluster - Possible inefficient implementation of TSaslTransport::write

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.5.0
    • Impala 2.6.0
    • Perf Investigation
    • None

    Description

      Test results for Kerberized and non-Kerberized clusters (CDH5.4.9) -
      Driver version: 2.5.31
      Dataset 11 STRING columns and 50000 rows (same as the one used for CDH 5.0 testing)
      Linux driver:

      Kerberized Impala Non-Kerberized Impala (NOSASL) Kerberized Hive Non-Kerberized Hive (SASL-PLAIN)
      4.232 s 1.88 s 1.6 s 1.413 s

      There were 7694354 bytes sent from Impala VS 5770412 sent from Hive. A closer look at Wireshark/tcpdump trace it looks like from Impala the majority of the TCP packets contain 1,2 and 4 bytes of data whereas for Hive the majority of the TCP packets contain 1460 bytes for data. Impala sends the entire dataset using 33126 TCP packets whereas Hive using 4882 TCP packets.

      Attachments

        Issue Links

          Activity

            People

              mmokhtar Mostafa Mokhtar
              anujphadke Anuj Phadke
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: