Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9128

Improve debugging for slow sends in KrpcDataStreamSender

    XMLWordPrintableJSON

Details

    • ghx-label-13

    Description

      I'm trying to debug a problem that appears to be caused by a slow RPC:

          Fragment F00
            Instance 754fc21ba4744310:d58fd04200000020 (host=xxxxxxxxx)
              Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/120.48 MB 
              - AverageThreadTokens: 1.00 (1.0)
              - BloomFilterBytes: 0 B (0)
              - InactiveTotalTime: 0ns (0)
              - PeakMemoryUsage: 3.2 MiB (3337546)
              - PeakReservation: 2.0 MiB (2097152)
              - PeakUsedReservation: 0 B (0)
              - PerHostPeakMemUsage: 6.7 MiB (6987376)
              - RowsProduced: 7 (7)
              - TotalNetworkReceiveTime: 0ns (0)
              - TotalNetworkSendTime: 3.6m (215354065071)
              - TotalStorageWaitTime: 4ms (4552708)
              - TotalThreadsInvoluntaryContextSwitches: 2 (2)
              - TotalThreadsTotalWallClockTime: 3.6m (215924079474)
                - TotalThreadsSysTime: 24ms (24386000)
                - TotalThreadsUserTime: 505ms (505714000)
              - TotalThreadsVoluntaryContextSwitches: 3,623 (3623)
              - TotalTime: 3.6m (215801961705)
              Fragment Instance Lifecycle Event Timeline
                Prepare Finished: 1ms (1812344)
                Open Finished: 322ms (322905753)
                First Batch Produced: 447ms (447050377)
                First Batch Sent: 447ms (447054546)
                ExecInternal Finished: 3.6m (215802284852)
              Buffer pool
                - AllocTime: 0ns (0)
                - CumulativeAllocationBytes: 0 B (0)
                - CumulativeAllocations: 0 (0)
                - InactiveTotalTime: 0ns (0)
                - PeakReservation: 0 B (0)
                - PeakUnpinnedBytes: 0 B (0)
                - PeakUsedReservation: 0 B (0)
                - ReadIoBytes: 0 B (0)
                - ReadIoOps: 0 (0)
                - ReadIoWaitTime: 0ns (0)
                - ReservationLimit: 0 B (0)
                - TotalTime: 0ns (0)
                - WriteIoBytes: 0 B (0)
                - WriteIoOps: 0 (0)
                - WriteIoWaitTime: 0ns (0)
              Fragment Instance Lifecycle Timings
                - ExecTime: 3.6m (215479380267)
                  - ExecTreeExecTime: 124ms (124299400)
                - InactiveTotalTime: 0ns (0)
                - OpenTime: 321ms (321088906)
                  - ExecTreeOpenTime: 572.04us (572045)
                - PrepareTime: 1ms (1426412)
                  - ExecTreePrepareTime: 233.32us (233318)
                - TotalTime: 0ns (0)
              KrpcDataStreamSender (dst_id=3)
                - EosSent: 58 (58)
                - InactiveTotalTime: 3.6m (215354085858)
                - PeakMemoryUsage: 464.4 KiB (475504)
                - RowsSent: 7 (7)
                - RpcFailure: 0 (0)
                - RpcRetry: 0 (0)
                - SerializeBatchTime: 99.87us (99867)
                - TotalBytesSent: 207 B (207)
                - TotalTime: 3.6m (215355336381)
                - UncompressedRowBatchSize: 267 B (267)
      
      

      We should add some diagnostics that will allow us to figure out which RPCs are slow and whether there's a pattern about which host is the problem. E.g. maybe we should log if the RPC time exceeds a configured threshold.

      It may also be useful to include some stats about the wait time, e.g. a histogram of the wait times, so that we can see if it's an outlier or general slowness.

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: