HBase
  1. HBase
  2. HBASE-1155

Verify that FSDataoutputStream.sync() works

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.90.0
    • Component/s: master, regionserver
    • Labels:
      None

      Description

      In order to guarantee that an HLog sync() flushes the data to the HDFS, we will need to invoke FSDataOutputStream.sync() per HADOOP-4379.

      Currently, there is no access to the underlying FSDataOutputStream from SequenceFile.Writer, as it is a package private member.

      1. patch.txt
        7 kB
        Jim Kellerman

        Issue Links

          Activity

          Hide
          stack added a comment -

          Resolving this issue as won't fix. Being dealt with over in hbase-1470

          Show
          stack added a comment - Resolving this issue as won't fix. Being dealt with over in hbase-1470
          Hide
          stack added a comment -

          Have been testing latest patches in HADOOP-4379. See notes there on current state.

          Show
          stack added a comment - Have been testing latest patches in HADOOP-4379 . See notes there on current state.
          Hide
          stack added a comment -

          Moving out of 0.20.0. It doesn't work in hadoop 0.20.0. Hopefully 0.21.0.

          Show
          stack added a comment - Moving out of 0.20.0. It doesn't work in hadoop 0.20.0. Hopefully 0.21.0.
          Hide
          Jim Kellerman added a comment -

          Moving out of 0.19.1 because it is unlikely we will a patch for HADOOP-4379 soon enough.

          Show
          Jim Kellerman added a comment - Moving out of 0.19.1 because it is unlikely we will a patch for HADOOP-4379 soon enough.
          Hide
          Jim Kellerman added a comment -

          Yes an hour is way too long. I asked in HADOOP-4379 if there is a way to speed it up.

          Show
          Jim Kellerman added a comment - Yes an hour is way too long. I asked in HADOOP-4379 if there is a way to speed it up.
          Hide
          stack added a comment -

          An hour is unacceptable, don't you think? Regions can't be off-line an hour. Is there a timeout we can adjust in hadoop?

          Show
          stack added a comment - An hour is unacceptable, don't you think? Regions can't be off-line an hour. Is there a timeout we can adjust in hadoop?
          Hide
          Jim Kellerman added a comment -

          Patch that uses new API's to recover file lease and read from last log file being written by region server.

          It does work, but slowly. As noted in HADOOP-4379, it takes almost an hour to recover the file lease when the clusters are loaded.

          2009-02-25 21:39:16,843 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 3 of 3: hdfs:/x.y.com:8100/hbase/log_10.76.44.139_1235597506284_8020/hlog.dat.1235597820662
          2009-02-25 21:39:16,847 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Triggering lease recovery.
          ...
          2009-02-25 22:37:12,755 INFO org.apache.hadoop.hbase.regionserver.HLog: log file splitting completed for hdfs://x.y.com:8100/hbase/log_10.76.44.139_1235597506284_8020
          
          Show
          Jim Kellerman added a comment - Patch that uses new API's to recover file lease and read from last log file being written by region server. It does work, but slowly. As noted in HADOOP-4379 , it takes almost an hour to recover the file lease when the clusters are loaded. 2009-02-25 21:39:16,843 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 3 of 3: hdfs:/x.y.com:8100/hbase/log_10.76.44.139_1235597506284_8020/hlog.dat.1235597820662 2009-02-25 21:39:16,847 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Triggering lease recovery. ... 2009-02-25 22:37:12,755 INFO org.apache.hadoop.hbase.regionserver.HLog: log file splitting completed for hdfs: //x.y.com:8100/hbase/log_10.76.44.139_1235597506284_8020
          Hide
          Jim Kellerman added a comment -

          To clarify Stack's question, he said:

          > ----Original Message----
          > From: Michael Stack
          > Sent: Wednesday, February 11, 2009 12:42 PM
          > To: Jim Kellerman (POWERSET)
          > Subject: RE: [jira] Commented: (HBASE-1155) Verify that
          > FSDataoutputStream.sync() works
          >
          > On a loaded cluster do appends persist or get their knickers in a twist?
          > St.Ack

          The answer to this question is TBD. I have yet to test how it works in a loaded cluster. To this point, I have just verified that in
          a simple test that it works. More to come soon...

          Show
          Jim Kellerman added a comment - To clarify Stack's question, he said: > ---- Original Message ---- > From: Michael Stack > Sent: Wednesday, February 11, 2009 12:42 PM > To: Jim Kellerman (POWERSET) > Subject: RE: [jira] Commented: ( HBASE-1155 ) Verify that > FSDataoutputStream.sync() works > > On a loaded cluster do appends persist or get their knickers in a twist? > St.Ack The answer to this question is TBD. I have yet to test how it works in a loaded cluster. To this point, I have just verified that in a simple test that it works. More to come soon...
          Hide
          stack added a comment -

          Looks great. You think it works when lots of concurrent edits written?

          Show
          stack added a comment - Looks great. You think it works when lots of concurrent edits written?
          Hide
          Jim Kellerman added a comment -

          each record is approximately 1024 bytes.
          one block is either 1,048,576 (1MB) or 67,108,864 (64 MB)

          A 1MB block holds 1,002 records
          1026048 bytes written, overhead is 22.48 bytes/record

          expected overhead for 64MB is 1,441,792
          expected number of records for 64MB is 64,128

          A 64MB block holds 64,157 records
          65,696,768 bytes written, overhead is 1,412,096
          overhead is 22.01 bytes/record

          So overhead is ~ 22-23 bytes/record.

          ========================================

          Without the patch the best we can do is read up to the end of the last
          full block. If we write 1024 records into 1MB blocks we can read 1002
          records (~ number of records in block)

          If we write write 70,000 records into 64MB blocks we can read 64157
          records back.

          If less than a block is written, we get back nothing. We only get up
          to the last full block.

          ========================================

          With the patch, 1MB block size and no syncs:

          • Writing 1024 records, none are recovered
          • Writing 1200 records, 1188 are recovered
          • Writing 1500 records, 1499 are recovered
          • Writing 1000 records, 994 are recovered

          There seems to be a problem with writing about 1024 records to a 1MB
          block size file if there are no syncs. Writing more than 1024 records
          ia recoverable (e.g., 1500) works, as does writing less (e.g., 1000
          records - 994 are recoverable, writing 900 records - 870
          are recoverable). So there appears to be a problem with writing
          close to 1MB of data into a 1MB block size with no syncs. Adding more
          than or some less than 1024 records seems to work.

          ========================================

          With the patch, it appears that the block size is irrelevant and it is
          possible to read up to the last sync for 64MB blocks.

          With a 64MB block size:

          • If the sync rate is 1, it is possible to read every record written.
          • With a sync rate of 100, it is possible to read up to the last multiple of 100.

          With a 1MB block size:

          • Cancelling the writer's lease seems to take a lot longer.
          • Sometimes it seems to never recover the lease. (e.g., write 1024 records, sync every 100 writes, 1MB block size)

          More testing to do: try writing close to 64MB with a 64MB block size and see if it experiences the non-recoverability that writing ~1MB with 1MB block size does.

          Show
          Jim Kellerman added a comment - each record is approximately 1024 bytes. one block is either 1,048,576 (1MB) or 67,108,864 (64 MB) A 1MB block holds 1,002 records 1026048 bytes written, overhead is 22.48 bytes/record expected overhead for 64MB is 1,441,792 expected number of records for 64MB is 64,128 A 64MB block holds 64,157 records 65,696,768 bytes written, overhead is 1,412,096 overhead is 22.01 bytes/record So overhead is ~ 22-23 bytes/record. ======================================== Without the patch the best we can do is read up to the end of the last full block. If we write 1024 records into 1MB blocks we can read 1002 records (~ number of records in block) If we write write 70,000 records into 64MB blocks we can read 64157 records back. If less than a block is written, we get back nothing. We only get up to the last full block. ======================================== With the patch, 1MB block size and no syncs: Writing 1024 records, none are recovered Writing 1200 records, 1188 are recovered Writing 1500 records, 1499 are recovered Writing 1000 records, 994 are recovered There seems to be a problem with writing about 1024 records to a 1MB block size file if there are no syncs. Writing more than 1024 records ia recoverable (e.g., 1500) works, as does writing less (e.g., 1000 records - 994 are recoverable, writing 900 records - 870 are recoverable). So there appears to be a problem with writing close to 1MB of data into a 1MB block size with no syncs. Adding more than or some less than 1024 records seems to work. ======================================== With the patch, it appears that the block size is irrelevant and it is possible to read up to the last sync for 64MB blocks. With a 64MB block size: If the sync rate is 1, it is possible to read every record written. With a sync rate of 100, it is possible to read up to the last multiple of 100. With a 1MB block size: Cancelling the writer's lease seems to take a lot longer. Sometimes it seems to never recover the lease. (e.g., write 1024 records, sync every 100 writes, 1MB block size) More testing to do: try writing close to 64MB with a 64MB block size and see if it experiences the non-recoverability that writing ~1MB with 1MB block size does.
          Hide
          Jim Kellerman added a comment -

          @Stack

          Yes, there are not many changes. Will work on this as soon as I finish up what I am currently working on.

          Show
          Jim Kellerman added a comment - @Stack Yes, there are not many changes. Will work on this as soon as I finish up what I am currently working on.
          Hide
          stack added a comment -

          We have a sequence file local to hbase. We can just change our copy?

          Show
          stack added a comment - We have a sequence file local to hbase. We can just change our copy?
          Hide
          Jim Kellerman added a comment -

          Simple testing using the test programs that I attached to HADOOP-4379, would seem to indicate that the patch for 4379 works. However we need more testing in the HBase environment to verify that the patch is sufficient.

          Show
          Jim Kellerman added a comment - Simple testing using the test programs that I attached to HADOOP-4379 , would seem to indicate that the patch for 4379 works. However we need more testing in the HBase environment to verify that the patch is sufficient.
          Hide
          Jim Kellerman added a comment - - edited

          The latest patch for HADOOP-4379 combined with HADOOP-5027seems to solve the problems that we have seen. As for Doug Judd's problem with getting the length of the file, that is not an issue for HBase, as we do not look at the length of the file.

          We need more testing to confirm.

          Show
          Jim Kellerman added a comment - - edited The latest patch for HADOOP-4379 combined with HADOOP-5027 seems to solve the problems that we have seen. As for Doug Judd's problem with getting the length of the file, that is not an issue for HBase, as we do not look at the length of the file. We need more testing to confirm.

            People

            • Assignee:
              stack
              Reporter:
              Jim Kellerman
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development