Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2329

Random RPC timeout errors when inserting rows in a Kudu table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.5.0
    • n/a
    • rpc, server
    • None

    Description

      When executing inserts into a Kudu table, we are experiencing errors at random times. The first time we found one of these errors was during a bulk update of a Kudu table via Spark (in Scala):

      kuduContext.updateRows(dataFrame, "table_name")

      The error message in Spark was the following:

      {{java.lang.RuntimeException: failed to write 579 rows from DataFrame to Kudu; sample errors: Timed out: can not complete before timeout: Batch

      {operations=6, tablet="cd1e33fce0114c9bbd9c14e2559e7591" [0x0000000F, 0x00000010), ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, tablet=cd1e33fce0114c9bbd9c14e2559e7591, attempt=3, DeadlineTracker(timeout=30000, elapsed=30090), Traces: [0ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, [10011ms] received from server 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the channel, [10011ms] delaying RPC due to Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the channel, [10033ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, [20050ms] received from server 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the channel, [20050ms] delaying RPC due to Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the channel, [20072ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, [30090ms] received from server 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the channel, [30090ms] delaying RPC due to Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the channel)}

      }}

      (+ 4 more errors similar to this one in the error message)

      We first thought it was actually a problem with our Spark code, but when we tried to execute a simple "INSERT INTO" query from the impala shell into a Kudu table, we got the following error:

      [.............................] > insert into test_kudu values (282, 'hola');
      {{ {{ Query: insert into test_kudu values (282, 'hola')}}}}
      {{ {{ Query submitted at: ......................}}}}
      {{ {{ Query progress can be monitored at: ........................}}}}
      {{ {{ WARNINGS: Kudu error(s) reported, first error: Timed out: Failed to write batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 attempt(s): Failed to write to server: 071bcafbb1644678a697c474662047b7 (.........................:7050): Write RPC to ....................:7050 timed out after 179.949s (SENT)}}}}

      Error in Kudu table 'impala:kudu_db.test_kudu': Timed out: Failed to write batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 attempt(s): Failed to write to server: 071bcafbb1644678a697c474662047b7 (...........................:7050): Write RPC to ......................:7050 timed out after 179.949s (SENT)

      To make things even more confusing, despite getting this error in the impala shell, after a while (and not immediately), the inserted rows ended up in the table, so somehow they were actually inserted.

      We also tried tweaking the Kudu timeout configuration values that we had previously set, but it didn't solve anything and the problem kept appearing.

      Furthermore, we don't always get these errors, they only appear at random times. For example, right now we're just getting errors in that update we have in the Spark code, but we are not experiencing issues when working from the impala shell.

      After all that we have tried, we are pretty certain that this is a bug in Kudu, although we think it is a bit strange that it is undocumented and certainly it's hard to reproduce.

      Attachments

        Activity

          People

            Unassigned Unassigned
            hectorgut47 Héctor Gutiérrez
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: