Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-422

Bulk import failing when tablet server dies

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • None
    • 10 node cluster running 1.4.0-SNAPSHOT

    Description

      Saw this issue while running random walk test w/ agitation. The bulk import code picks random tablet servers and ask them to bulk load files. If a tablet server dies it takes 30 seconds for the master to see the zookeeper lock was lost. During this 30 second period the bulk import code will still try to use the tserver and fail. After it fails three times it will mark the file as a failure. This all happens within a second.

      The bulk import code should probably catch TTransportException and black list the tablet server for that bulk import transaction.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kturner Keith Turner
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: