Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      There was some advocacy of using Avro for serialization of HBase WAL records up on hbase-dev@. Idea is Hadoop core is getting away from Writables and Avro is the blessed replacement.

      I think we have this criteria for its use:
      1) Performance of writing Avro records is no worse than that for writing Writables into a SequenceFile.
      2) Space consumed by Avro serialization is no worse than that of Writables
      3) File format is amenable to appends (cannot require valid trailers, etc.)

      I'll put up a patch so we can try it out.

      1. HBASE-2055.patch
        17 kB
        Andrew Purtell
      2. HBASE-2055-v2.patch
        18 kB
        Andrew Purtell
      3. HBASE-2055-v3.patch
        20 kB
        Andrew Purtell
      4. HBASE-2055-v4.patch
        22 kB
        Andrew Purtell
      5. jackson-core-asl-1.0.1.jar
        133 kB
        Andrew Purtell
      6. jackson-mapper-asl-1.0.1.jar
        264 kB
        Andrew Purtell
      7. paranamer-1.5.jar
        28 kB
        Andrew Purtell
      8. TEST-org.apache.hadoop.hbase.regionserver.wal.TestHLog.txt.gz
        6 kB
        Andrew Purtell
      9. TEST-org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.txt.gz
        13 kB
        Andrew Purtell
      10. TEST-org.apache.hadoop.hbase.TestFullLogReconstruction.txt.gz
        782 kB
        Andrew Purtell
      11. test-site.patch
        0.9 kB
        Andrew Purtell

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          I agree that SYNC_INTERVAL should be configurable.

          Note that the current plan is to support appends but no longer to support changing the schema in a file. The schema is included only once, at the start of the file. If you have further comments, please add them to AVRO-160.

          Show
          Doug Cutting added a comment - I agree that SYNC_INTERVAL should be configurable. Note that the current plan is to support appends but no longer to support changing the schema in a file. The schema is included only once, at the start of the file. If you have further comments, please add them to AVRO-160 .
          Hide
          Jeff Hammerbacher added a comment -

          Hey Andy,

          I don't think putting a snapshot of trunk into your svn would be a great idea. The 1.3 release will have a finalized design and implementation of the file object container.

          As for the configurable SYNC_INTERVAL: it makes sense to me to make this a configuration parameter. That said, it would be worth raising your concerns on the AVRO JIRA rather than in this issue.

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey Andy, I don't think putting a snapshot of trunk into your svn would be a great idea. The 1.3 release will have a finalized design and implementation of the file object container. As for the configurable SYNC_INTERVAL: it makes sense to me to make this a configuration parameter. That said, it would be worth raising your concerns on the AVRO JIRA rather than in this issue. Thanks, Jeff
          Hide
          Andrew Purtell added a comment -

          Sorry, above I meant SYNC_INTERVAL, not SYNC_SIZE. Also it looks like the DataFileWriter as implemented for AVRO-160 will hold up to SYNC_INTERVAL bytes in a buffer before writing out the block. We want to hsync after a group of related commits in the WAL whether SYNC_INTERVAL is reached or not, but also have the stream marked with a sync marker at each SYNC_INTERVAL. This is basically what my v3 or v4 patch does. It also writes a copy of the schema just after the sync marker so we have an opportunity to resynchronize a reader on each block regardless of how many previous blocks are corrupt (perhaps all).

          Show
          Andrew Purtell added a comment - Sorry, above I meant SYNC_INTERVAL, not SYNC_SIZE. Also it looks like the DataFileWriter as implemented for AVRO-160 will hold up to SYNC_INTERVAL bytes in a buffer before writing out the block. We want to hsync after a group of related commits in the WAL whether SYNC_INTERVAL is reached or not, but also have the stream marked with a sync marker at each SYNC_INTERVAL. This is basically what my v3 or v4 patch does. It also writes a copy of the schema just after the sync marker so we have an opportunity to resynchronize a reader on each block regardless of how many previous blocks are corrupt (perhaps all).
          Hide
          Andrew Purtell added a comment -

          @Jeff: When that patch is committed we can look at putting a snapshot of Avro trunk on ours. Also I see that SYNC_SIZE is a constant. Should be configurable? We want 64k, others might want different?

          Show
          Andrew Purtell added a comment - @Jeff: When that patch is committed we can look at putting a snapshot of Avro trunk on ours. Also I see that SYNC_SIZE is a constant. Should be configurable? We want 64k, others might want different?
          Hide
          Jeff Hammerbacher added a comment -

          Hey Andy,

          Grab the latest Java patch from https://issues.apache.org/jira/browse/AVRO-160--the new file format puts metadata in the header, rather than the footer. For future maintenance, it may be easier to stick with the default Avro file object container.

          Later,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey Andy, Grab the latest Java patch from https://issues.apache.org/jira/browse/AVRO-160--the new file format puts metadata in the header, rather than the footer. For future maintenance, it may be easier to stick with the default Avro file object container. Later, Jeff
          Hide
          Andrew Purtell added a comment -

          v4 should make it easier to extend the reader and writer. THBase will need to do this in order to function.

          Show
          Andrew Purtell added a comment - v4 should make it easier to extend the reader and writer. THBase will need to do this in order to function.
          Hide
          Lars George added a comment -

          Great stuff Andy, this block marker makes totally sense also in the context of log splitting, which needs blocks of the log being presorted by the RS's before they get applied. With the marker this is a natural fit. BigTable does 64k chunks too (during sorts).

          Show
          Lars George added a comment - Great stuff Andy, this block marker makes totally sense also in the context of log splitting, which needs blocks of the log being presorted by the RS's before they get applied. With the marker this is a natural fit. BigTable does 64k chunks too (during sorts).
          Hide
          Andrew Purtell added a comment -

          v3 writes a sync marker every 64K which includes a copy of the schema.

          The file is initialized with a sync marker. The reader scans from the start of the file until it finds a valid sync marker and then reads in the schema.

          This is a fair amount of overhead – 1 byte per record, 1K per 64K of data – but does mean edits from corrupt logs can be partially recovered.

          Show
          Andrew Purtell added a comment - v3 writes a sync marker every 64K which includes a copy of the schema. The file is initialized with a sync marker. The reader scans from the start of the file until it finds a valid sync marker and then reads in the schema. This is a fair amount of overhead – 1 byte per record, 1K per 64K of data – but does mean edits from corrupt logs can be partially recovered.
          Hide
          ryan rawson added a comment -

          i am wondering what would happen if the header was mangled?

          Does it make sense to put the schema in multiple places? like super blocks in ext3?

          Show
          ryan rawson added a comment - i am wondering what would happen if the header was mangled? Does it make sense to put the schema in multiple places? like super blocks in ext3?
          Hide
          Andrew Purtell added a comment -

          v2 patch passes all tests. Also, in this version we write the schema as a file header and use it to initialize the reader.

          In case anyone is curious, we are not using Avro's bundled file I/O package because the file format puts schema and metadata into a trailer so seems not suitable as a log which may be truncated as part of "normal" operation.

          Show
          Andrew Purtell added a comment - v2 patch passes all tests. Also, in this version we write the schema as a file header and use it to initialize the reader. In case anyone is curious, we are not using Avro's bundled file I/O package because the file format puts schema and metadata into a trailer so seems not suitable as a log which may be truncated as part of "normal" operation.
          Hide
          Andrew Purtell added a comment -

          I put up a patch which implements a HLog reader and writer that uses Avro for serialization. Some basic function works but TestHLog fails all three cases with EOFExceptions, always when reading the last field of a particular (truncated?) record. Surprisingly, TestFullLogReconstruction succeeds.

          Show
          Andrew Purtell added a comment - I put up a patch which implements a HLog reader and writer that uses Avro for serialization. Some basic function works but TestHLog fails all three cases with EOFExceptions, always when reading the last field of a particular (truncated?) record. Surprisingly, TestFullLogReconstruction succeeds.
          Hide
          Andrew Purtell added a comment -

          No idea if it works yet.

          Show
          Andrew Purtell added a comment - No idea if it works yet.
          Hide
          Jeff Hammerbacher added a comment -

          Additional criteria: Avro has clients for reading and writing data files in several languages which will facilitate writing debugging and profiling utilities.

          Show
          Jeff Hammerbacher added a comment - Additional criteria: Avro has clients for reading and writing data files in several languages which will facilitate writing debugging and profiling utilities.

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development