Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21755

RS aborts while performing replication with wal dir on hdfs, root dir on s3

    XMLWordPrintableJSON

Details

    Description

      Environment/Configuration

      • hbase.wal.dir : Configured to be on hdfs
      • hbase.rootdir : Configured to be on s3

      In replication scenario, while trying to get archived log dir (using method WALEntryStream.java#L314) we get the following exception:

      2019-01-21 17:43:55,440 ERROR [RS_REFRESH_PEER-regionserver/host2:22222-1.replicationSource,2.replicationSource.wal-reader.host2%2C22222%2C1548063439555.host2%2C22222%2C1548063439555.regiongroup-1,2] regionserver.ReplicationSource: Unexpected exception in RS_REFRESH_PEER-regionserver/host2:22222-1.replicationSource,2.replicationSource.wal-reader.host2%2C22222%2C1548063439555.host2%2C22222%2C1548063439555.regiongroup-1,2 currentPath=hdfs://dummy_path/hbase/WALs/host2,22222,1548063439555/host2%2C22222%2C1548063439555.host2%2C22222%2C1548063439555.regiongroup-1.1548063492594
      java.lang.IllegalArgumentException: Wrong FS: s3a://xxxxxx/hbase128/oldWALs/host2%2C22222%2C1548063439555.host2%2C22222%2C1548063439555.regiongroup-1.1548063492594, expected: hdfs://dummy_path
      	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:781)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:246)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1622)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1619)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1634)
      	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:465)
      	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1742)
      	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.getArchivedLog(WALEntryStream.java:319)
      	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.resetReader(WALEntryStream.java:404)
      	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.reset(WALEntryStream.java:161)
      	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:148)
      2019-01-21 17:43:55,444 ERROR [RS_REFRESH_PEER-regionserver/host2:22222-1.replicationSource,2.replicationSource.wal-reader.host2%2C22222%2C1548063439555.host2%2C22222%2C1548063439555.regiongroup-1,2] regionserver.HRegionServer: ***** ABORTING region server host2,22222,1548063439555: Unexpected exception in RS_REFRESH_PEER-regionserver/host2:22222-1.replicationSource,2.replicationSource.wal-reader.host2%2C22222%2C1548063439555.host2%2C22222%2C1548063439555.regiongroup-1,2 *****
      java.lang.IllegalArgumentException: Wrong FS: s3a://xxxxxx/hbase128/oldWALs/host2%2C22222%2C1548063439555.host2%2C22222%2C1548063439555.regiongroup-1.1548063492594, expected: hdfs://dummy_path
      	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:781)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:246)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1622)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1619)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1634)
      	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:465)
      	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1742)
      	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.getArchivedLog(WALEntryStream.java:319)
      	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.resetReader(WALEntryStream.java:404)
      	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.reset(WALEntryStream.java:161)
      	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:148)
      
      
      

       
      Current code is:

        private Path getArchivedLog(Path path) throws IOException {
          Path rootDir = FSUtils.getRootDir(conf);
      
          // Try found the log in old dir
          Path oldLogDir = new Path(rootDir, HConstants.HREGION_OLDLOGDIR_NAME);
          Path archivedLogLocation = new Path(oldLogDir, path.getName());
          if (fs.exists(archivedLogLocation)) {
            LOG.info("Log " + path + " was moved to " + archivedLogLocation);
            return archivedLogLocation;
          }
          .
          .
          .
          return path;
        }
      

      It considers root dir while we should use wal dir.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nihaljain.cs Nihal Jain
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: