Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2550

Drillbit disconnect from ZK results in drillbit being lost until restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.8.0
    • Future
    • Execution - Flow
    • None

    Description

      Not quite sure if this is an issue or even if its important- maybe someone can think of a situation where this might be a bigger issue.

      Steps taken to recreate:
      1. Startup drillbits on multiple nodes. (They all come up and form a 8 node cluster)
      2. Start executing a long running query.
      3. Use TCPKILL to kill all connections between one node and zookeeper port 5181.
      Drill seems to behave very gracefully here - I see a nice error message saying Query failed: ForemanException: One more more nodes lost connectivity during query. Identified node was atsqa6c61.qa.lab

      However, once I start allowing connections back the node is not brought back as part of the cluster until a drillbit restart.

      Attachments

        Activity

          People

            Unassigned Unassigned
            inramana Ramana Inukonda Nagaraj
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: