HBase
  1. HBase
  2. HBASE-5798

NPE running hbck on 0.94 out of reportTablesInFlux

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.94.0, 0.95.2
    • Fix Version/s: None
    • Component/s: hbck
    • Labels:
      None
    • Tags:
      noob

      Description

      Got this playing w/ hbck going against the 0.94RC:

      12/04/16 17:03:14 INFO util.HBaseFsck: getHTableDescriptors == tableNames => []
      Exception in thread "main" java.lang.NullPointerException
              at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:553)
              at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:344)
              at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:380)
              at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3033)
      
      1. HBASE-5798_trunk.patch
        4 kB
        Anoop Sam John
      2. HBASE-5798_94.patch
        4 kB
        Anoop Sam John

        Issue Links

          Activity

          Anoop Sam John made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Hide
          Anoop Sam John added a comment -

          The issue with NPE is fixed as part of HBASE-5928.

          Show
          Anoop Sam John added a comment - The issue with NPE is fixed as part of HBASE-5928 .
          Anoop Sam John made changes -
          Link This issue relates to HBASE-6015 [ HBASE-6015 ]
          Hide
          Anoop Sam John added a comment -

          Raised https://issues.apache.org/jira/browse/HBASE-6015 to track the second point

          Show
          Anoop Sam John added a comment - Raised https://issues.apache.org/jira/browse/HBASE-6015 to track the second point
          Hide
          Anoop Sam John added a comment -

          The NPE problem as such getting fixed by the patch HBASE-5928.
          We can close this issue.

          Also there is another point ( issue ) with HBCK rerun which is mentioned as 2nd point in my above comment. Better to raise a new issue to handle that.
          @Jon Ok with you?

          Show
          Anoop Sam John added a comment - The NPE problem as such getting fixed by the patch HBASE-5928 . We can close this issue. Also there is another point ( issue ) with HBCK rerun which is mentioned as 2nd point in my above comment. Better to raise a new issue to handle that. @Jon Ok with you?
          Hide
          Anoop Sam John added a comment -

          Got a doubt on my patch now
          We should track the skipped regions? or the included regions in the 1st run...

          Show
          Anoop Sam John added a comment - Got a doubt on my patch now We should track the skipped regions? or the included regions in the 1st run...
          Anoop Sam John made changes -
          Attachment HBASE-5798_94.patch [ 12522980 ]
          Attachment HBASE-5798_trunk.patch [ 12522981 ]
          Hide
          Anoop Sam John added a comment -

          Patch for trunk and 0.94

          Show
          Anoop Sam John added a comment - Patch for trunk and 0.94
          Hide
          Jonathan Hsieh added a comment -

          Returning empty array is valid. I dug a little into the master side as well – it returns an empty array in the case where an invalid set of table names is passed.

          Show
          Jonathan Hsieh added a comment - Returning empty array is valid. I dug a little into the master side as well – it returns an empty array in the case where an invalid set of table names is passed.
          Jonathan Hsieh made changes -
          Assignee Jonathan Hsieh [ jmhsieh ] Anoop Sam John [ anoopsamjohn ]
          Hide
          Jonathan Hsieh added a comment -

          Anoop – go for it.

          Show
          Jonathan Hsieh added a comment - Anoop – go for it.
          Hide
          Anoop Sam John added a comment -

          Jon, I can provide a patch tomorrow addressing both the points I have mentioned.[If it is ok with you]

          Show
          Anoop Sam John added a comment - Jon, I can provide a patch tomorrow addressing both the points I have mentioned. [If it is ok with you]
          Hide
          Jonathan Hsieh added a comment -

          Anoop – do you guys want to take this on or should I?

          Show
          Jonathan Hsieh added a comment - Anoop – do you guys want to take this on or should I?
          Hide
          Jonathan Hsieh added a comment -

          I think #2 makes sense, but would need to be tested to verify (it is a legacy of the original hbck – I didn't change this).

          Show
          Jonathan Hsieh added a comment - I think #2 makes sense, but would need to be tested to verify (it is a legacy of the original hbck – I didn't change this).
          Anoop Sam John made changes -
          Affects Version/s 0.94.0 [ 12316419 ]
          Affects Version/s 0.96.0 [ 12320040 ]
          Component/s hbck [ 12315702 ]
          Hide
          Anoop Sam John added a comment -

          @Jon
          Yes null check I also dont like to put...
          Also what about 2. When HBCK rerun after the fix we can set timelag =0?

          Show
          Anoop Sam John added a comment - @Jon Yes null check I also dont like to put... Also what about 2. When HBCK rerun after the fix we can set timelag =0?
          Hide
          Jonathan Hsieh added a comment -

          I started a run of the unit test suite testing this fix – for a method like this, I prefer returning empty arrays instead of null arrays.

          diff --git src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java src/main/java/org/apache/hadoop/hbase/cli
          index ee16e72..44b7c11 100644
          --- src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          +++ src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          @@ -1691,7 +1691,7 @@ public class HBaseAdmin implements Abortable, Closeable {
            /**
            * Get tableDescriptors
            * @param tableNames List of table names
          - * @return HTD[] the tableDescriptor
          + * @return HTD[] the tableDescriptor (never null)
            * @throws IOException if a remote or network exception occurs
            */
             public HTableDescriptor[] getTableDescriptors(List<String> tableNames)
          diff --git src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java src/main/java/org/apache/hadoop/h
          index 820e2a9..f183b15 100644
          --- src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          +++ src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          @@ -2195,7 +2195,7 @@ public class HConnectionManager {
           
               @Override
               public HTableDescriptor[] getHTableDescriptors(List<String> tableNames) throws IOException {
          -      if (tableNames == null || tableNames.isEmpty()) return null;
          +      if (tableNames == null || tableNames.isEmpty()) return new HTableDescriptor[0];
                 MasterKeepAliveConnection master = getKeepAliveMaster();
                 try {
                   return master.getHTableDescriptors(tableNames);
          
          Show
          Jonathan Hsieh added a comment - I started a run of the unit test suite testing this fix – for a method like this, I prefer returning empty arrays instead of null arrays. diff --git src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java src/main/java/org/apache/hadoop/hbase/cli index ee16e72..44b7c11 100644 --- src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java +++ src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java @@ -1691,7 +1691,7 @@ public class HBaseAdmin implements Abortable, Closeable { /** * Get tableDescriptors * @param tableNames List of table names - * @ return HTD[] the tableDescriptor + * @ return HTD[] the tableDescriptor (never null ) * @ throws IOException if a remote or network exception occurs */ public HTableDescriptor[] getTableDescriptors(List< String > tableNames) diff --git src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java src/main/java/org/apache/hadoop/h index 820e2a9..f183b15 100644 --- src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java +++ src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java @@ -2195,7 +2195,7 @@ public class HConnectionManager { @Override public HTableDescriptor[] getHTableDescriptors(List< String > tableNames) throws IOException { - if (tableNames == null || tableNames.isEmpty()) return null ; + if (tableNames == null || tableNames.isEmpty()) return new HTableDescriptor[0]; MasterKeepAliveConnection master = getKeepAliveMaster(); try { return master.getHTableDescriptors(tableNames);
          Hide
          Anoop Sam John added a comment -

          @Ram , Yes this is the same issue.. I got the reason.
          The scenario is like this as in our test.
          There is one table and there was a case of one region of that table was not assigned with any of the RS. HBCK tool fixing this issue. After that HBCK will run again.
          At this time getHTableDescriptors () is not finding any table in the cluster and return null and so reportTablesInFlux() -> errors.print("Number of Tables: " + allTables.length); gives a NPE

          Why at this time no tables getting out of getHTableDescriptors () [Even though one table is there in the cluster is] this table is modified recently. HBCK just changed the HRegionInfo of the region of the table by assigning it to one of the RS.

          For fix
          1. We need null check in reportTablesInFlux() I think
          2. When HBCK rerun after the fix we can set timelag =0?

          Show
          Anoop Sam John added a comment - @Ram , Yes this is the same issue.. I got the reason. The scenario is like this as in our test. There is one table and there was a case of one region of that table was not assigned with any of the RS. HBCK tool fixing this issue. After that HBCK will run again. At this time getHTableDescriptors () is not finding any table in the cluster and return null and so reportTablesInFlux() -> errors.print("Number of Tables: " + allTables.length); gives a NPE Why at this time no tables getting out of getHTableDescriptors () [Even though one table is there in the cluster is] this table is modified recently. HBCK just changed the HRegionInfo of the region of the table by assigning it to one of the RS. For fix 1. We need null check in reportTablesInFlux() I think 2. When HBCK rerun after the fix we can set timelag =0?
          Jonathan Hsieh made changes -
          Field Original Value New Value
          Assignee Jonathan Hsieh [ jmhsieh ]
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Stack
          We too got one NPE in hbck. Still not found the reason. Not sure if it is same as this one.

          Show
          ramkrishna.s.vasudevan added a comment - @Stack We too got one NPE in hbck. Still not found the reason. Not sure if it is same as this one.
          Hide
          stack added a comment -

          Error is transient. Subsequent runs worked.

          Show
          stack added a comment - Error is transient. Subsequent runs worked.
          stack created issue -

            People

            • Assignee:
              Anoop Sam John
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development