HBase
  1. HBase
  2. HBASE-11595

WAL files with encryption not flushed properly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Invalid
    • Affects Version/s: 0.98.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Reported using HBase 0.98.3 and HDFS 2.4.1

      All data before failure has not yet been flushed so only exists in the WAL files. During distributed splitting, the WAL has either not been written out and synced in the same way as an unencrypted WAL or is unreadable:

      2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0] codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException: Premature EOF from inputStream
      

      This file is still moved to oldWALs even though splitting failed.

      Setting 'hbase.regionserver.wal.encryption' to false allows data recovery.

        Activity

        Hide
        Andrew Purtell added a comment -

        My attempt to reproduce this issue:

        1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a dev box.
        2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver also on this dev box. Set dfs.replication and hbase.regionserver.hlog.tolerable.lowreplication to 1. Set up a keystore and enabled WAL encryption.
        3. Created a test table.
        4. Used YCSB to write 1000 rows to the test table. No flushes observed.
        5. Used the shell to count the number of records in the test table. Count = 1000 rows
        6. kill -9 the regionserver process.
        7. Started a new regionserver process. Observed log splitting and replay in the regionserver log, no errors.
        8. Used the shell to count the number of records in the test table. Count = 1000 rows
        Show
        Andrew Purtell added a comment - My attempt to reproduce this issue: Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a dev box. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver also on this dev box. Set dfs.replication and hbase.regionserver.hlog.tolerable.lowreplication to 1. Set up a keystore and enabled WAL encryption. Created a test table. Used YCSB to write 1000 rows to the test table. No flushes observed. Used the shell to count the number of records in the test table. Count = 1000 rows kill -9 the regionserver process. Started a new regionserver process. Observed log splitting and replay in the regionserver log, no errors. Used the shell to count the number of records in the test table. Count = 1000 rows
        Hide
        Andrew Purtell added a comment -

        Tried the above procedure a few times. Not able to reproduce. Replied to the reporter on user@hbase with a request to run through the above and provide more information.

        Show
        Andrew Purtell added a comment - Tried the above procedure a few times. Not able to reproduce. Replied to the reporter on user@hbase with a request to run through the above and provide more information.
        Hide
        Andrew Purtell added a comment -

        Further discussion on dev@ revealed the reporter changed the WAL reader class in the site configuration back to default between crash and restart. Therefore, the encrypted WAL could not be read. There might still be an issue here with a "corrupt" WAL being improperly moved to oldWALs/ but let's follow up on another issue. Resolving as invalid.

        Show
        Andrew Purtell added a comment - Further discussion on dev@ revealed the reporter changed the WAL reader class in the site configuration back to default between crash and restart. Therefore, the encrypted WAL could not be read. There might still be an issue here with a "corrupt" WAL being improperly moved to oldWALs/ but let's follow up on another issue. Resolving as invalid.

          People

          • Assignee:
            Unassigned
            Reporter:
            Andrew Purtell
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development