HBase
  1. HBase
  2. HBASE-5702

MasterSchemaChangeTracker.excludeRegionServerForSchemaChanges leaks a MonitoredTask per call

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.94.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This bug is so easy to reproduce I'm wondering why it hasn't been reported yet. Stop any number of region servers on a 0.94/6 cluster and you'll see in the master interface one task per stopped region server saying the following:

      Processing schema change exclusion for region server = sv4r27s44,62023,1333402175340 RUNNING (since 5sec ago) No schema change in progress. Skipping exclusion for server = sv4r27s44,62023,1333402175340 (since 5sec ago)

      It's gonna stay there until the master cleans it:

      WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status Processing schema change exclusion for region server = sv4r27s44,62023,1333402175340: status=No schema change in progress. Skipping exclusion for server = sv4r27s44,62023,1333402175340, state=RUNNING, startTime=1333404636419, completionTime=-1 appears to have been leaked

      It's not clear to me why it's using a MonitoredTask in the first place.

        Issue Links

          Activity

          Jean-Daniel Cryans created issue -
          Jean-Daniel Cryans made changes -
          Field Original Value New Value
          Description This bug is so easy to reproduce I'm wondering why it hasn't been reported yet. Stop any number of region servers on a 0.94/6 cluster and you'll see in the master interface one task per stopped region server saying the following:

          |Processing schema change exclusion for region server = sv4r27s44,62023,1333402175340|RUNNING (since 5sec ago)|No schema change in progress. Skipping exclusion for server = sv4r28s44,62023,1333402175342 (since 5sec ago)|

          It's gonna stay there until the master cleans it:

          bq. WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status Processing schema change exclusion for region server = sv4r27s44,62023,1333402175340: status=No schema change in progress. Skipping exclusion for server = sv4r27s44,62023,1333402175340, state=RUNNING, startTime=1333404636419, completionTime=-1 appears to have been leaked

          It's not clear to me why it's using a MonitoredTask in the first place.
          This bug is so easy to reproduce I'm wondering why it hasn't been reported yet. Stop any number of region servers on a 0.94/6 cluster and you'll see in the master interface one task per stopped region server saying the following:

          |Processing schema change exclusion for region server = sv4r27s44,62023,1333402175340|RUNNING (since 5sec ago)|No schema change in progress. Skipping exclusion for server = sv4r27s44,62023,1333402175340 (since 5sec ago)|

          It's gonna stay there until the master cleans it:

          bq. WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status Processing schema change exclusion for region server = sv4r27s44,62023,1333402175340: status=No schema change in progress. Skipping exclusion for server = sv4r27s44,62023,1333402175340, state=RUNNING, startTime=1333404636419, completionTime=-1 appears to have been leaked

          It's not clear to me why it's using a MonitoredTask in the first place.
          Hide
          Jean-Daniel Cryans added a comment -

          Looking at the code it seems it's leaking in other places... and that message shouldn't even be there in the first place because hbase.instant.schema.alter.enabled isn't enabled on this cluster.

          Show
          Jean-Daniel Cryans added a comment - Looking at the code it seems it's leaking in other places... and that message shouldn't even be there in the first place because hbase.instant.schema.alter.enabled isn't enabled on this cluster.
          Hide
          Jean-Daniel Cryans added a comment -

          Now that I've tested "instant" schema updates, there's many more issues with MonitoredTask than I originally thought

          Show
          Jean-Daniel Cryans added a comment - Now that I've tested "instant" schema updates, there's many more issues with MonitoredTask than I originally thought
          Hide
          Lars Hofhansl added a comment -

          Seems bad, upping to critical.

          Show
          Lars Hofhansl added a comment - Seems bad, upping to critical.
          Lars Hofhansl made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          Hide
          Ted Yu added a comment -

          The following method in ServerManager.java doesn't check the current value for hbase.instant.schema.alter.enabled config:

            private void excludeRegionServerFromSchemaChanges(final ServerName serverName) {
              this.services.getSchemaChangeTracker()
                  .excludeRegionServerForSchemaChanges(serverName.getServerName());
            }
          
          Show
          Ted Yu added a comment - The following method in ServerManager.java doesn't check the current value for hbase.instant.schema.alter.enabled config: private void excludeRegionServerFromSchemaChanges( final ServerName serverName) { this .services.getSchemaChangeTracker() .excludeRegionServerForSchemaChanges(serverName.getServerName()); }
          Hide
          Subbu M Iyer added a comment -

          I will take a look at it and address it as soon as possible.

          Show
          Subbu M Iyer added a comment - I will take a look at it and address it as soon as possible.
          Subbu M Iyer made changes -
          Assignee Subbu M Iyer [ iamknome ]
          Hide
          Subbu M Iyer added a comment -

          JD,

          Can you share with me the "many more issues with MonitoredTask" that you have mentioned? I believe the problems you have seen are all related to the MonitoredTask reporting during instant schema change process and not with the actual schema change process itself? Please let me know so we can categorize and address the issues accordingly.

          Show
          Subbu M Iyer added a comment - JD, Can you share with me the "many more issues with MonitoredTask" that you have mentioned? I believe the problems you have seen are all related to the MonitoredTask reporting during instant schema change process and not with the actual schema change process itself? Please let me know so we can categorize and address the issues accordingly.
          Hide
          Jean-Daniel Cryans added a comment -

          Yeah when I opened this jira it had a narrow scope but I think it could be much bigger. Basically if you try to instant alter a table you'll see there's about two or three (sorry I can't be more precise at the moment, I'm testing something else right now) tasks that are leaked.

          I can see some issues just looking at the code:

          MasterSchemaChangeTracker

          • processCompletedSchemaChanges: creates a task, sets the status, never closes it. It's a misuse of MonitoredTask I think, a task is normally something that's long running and you need to report progress.
          • processAlterStatus: creates a task but stops it right away, also creates one then kills it.
          • handleFailedOrExpiredSchemaChanges: creates a task, sets the status, never closes it. Also it has an extra white space.
          • createSchemaChangeNode: creates a task then closes/aborts it right away

          SchemaChangeTracker

          • handleSchemaChange: creates a task, sets the status, never closes it.
          • reportAndLogSchemaRefreshError: can create task and set a status but doesn't close it.

          There really should only be 1 task throughout the alter process.

          Show
          Jean-Daniel Cryans added a comment - Yeah when I opened this jira it had a narrow scope but I think it could be much bigger. Basically if you try to instant alter a table you'll see there's about two or three (sorry I can't be more precise at the moment, I'm testing something else right now) tasks that are leaked. I can see some issues just looking at the code: MasterSchemaChangeTracker processCompletedSchemaChanges: creates a task, sets the status, never closes it. It's a misuse of MonitoredTask I think, a task is normally something that's long running and you need to report progress. processAlterStatus: creates a task but stops it right away, also creates one then kills it. handleFailedOrExpiredSchemaChanges: creates a task, sets the status, never closes it. Also it has an extra white space. createSchemaChangeNode: creates a task then closes/aborts it right away SchemaChangeTracker handleSchemaChange: creates a task, sets the status, never closes it. reportAndLogSchemaRefreshError: can create task and set a status but doesn't close it. There really should only be 1 task throughout the alter process.
          Lars Hofhansl made changes -
          Link This issue is related to HBASE-5715 [ HBASE-5715 ]
          Hide
          Lars Hofhansl added a comment -

          Unscheduling, since HBASE-4213 was reverted.

          Show
          Lars Hofhansl added a comment - Unscheduling, since HBASE-4213 was reverted.
          Lars Hofhansl made changes -
          Fix Version/s 0.94.0 [ 12316419 ]
          Fix Version/s 0.96.0 [ 12320040 ]

            People

            • Assignee:
              Subbu M Iyer
              Reporter:
              Jean-Daniel Cryans
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development