Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.5, 1.4.0
    • Fix Version/s: 1.4.1, 1.5.0
    • Component/s: logger, master
    • Labels:

      Description

      I shut a single node instance down uncleanly. When I restarted it the logger did not have enough memory to preform the log sort, it got an OOME and died. I edited accumulo-env.sh and gave the logger process more memory. I restarted the logger process. However, the log recovery never restarted.

      The master was continually printing message like the following.

      06 17:07:16,609 [master.CoordinateRecoveryTask] DEBUG: Copying 65c48045-88c1-48e4-93d3-4865a9a86050 from xxx.xxx.xxx.xxx:11224 (for 1210.306000 seconds) 0.0
      

      After 20m I restarted the master and then log recovery proceeded.

        Activity

        Hide
        Keith Turner added a comment -

        Originally the logger had a max of 128m of heap. I think I copied accumulo-env.sh.512MBBstandalone-native-example. I upped the logger heap to 512m max.

        Show
        Keith Turner added a comment - Originally the logger had a max of 128m of heap. I think I copied accumulo-env.sh.512MBBstandalone-native-example. I upped the logger heap to 512m max.
        Hide
        Eric Newton added a comment -

        It does restart, but it takes a long time to timeout (an hour?!?). We need to use an API to get the status from the logger: using HDFS to communicate is too much of a kludge.

        Show
        Eric Newton added a comment - It does restart, but it takes a long time to timeout (an hour?!?). We need to use an API to get the status from the logger: using HDFS to communicate is too much of a kludge.
        Hide
        Keith Turner added a comment -

        Should probably notice that the logger lost its zookeeper lock.

        Show
        Keith Turner added a comment - Should probably notice that the logger lost its zookeeper lock.
        Hide
        Keith Turner added a comment - - edited

        ACCUMULO-388 should test the fix for this issue

        Show
        Keith Turner added a comment - - edited ACCUMULO-388 should test the fix for this issue

          People

          • Assignee:
            Keith Turner
            Reporter:
            Keith Turner
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development