Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1685

Allow specifying sync in DataFileWriter.create

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: java
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Allows specifying the 16-byte sync when creating an Avro container file via DataFileWriter

      Description

      Currently DataFileWriter generates a random 16-byte sync each time a new file is created. This means that even if you write the exact same data in a new file writer, the file itself will be slightly different (specifically the sync will be different).

      I'd like to be able to generate the exact same file multiple times. To do so, I need a way to specify the 16-byte sync.

      I've created a patch that adds this functionality by adding an overload of the create() that takes a byte[] array as the third parameter. If the byte array is null then a random sync is generated using the same internal static generateSync() method as before. If it's not null then the length is checked and it's used as the sync. The other two overloads of create(...) have been modified to call the three parameter version with a null sync.

      The patch includes three additional tests to check the error cases (invalid length) and verify that generating the same file twice results in the same byte array (i.e. exact match).

        Attachments

        1. AVRO-1685.patch
          5 kB
          Sehrope Sarkuni

          Activity

            People

            • Assignee:
              sehrope Sehrope Sarkuni
              Reporter:
              sehrope Sehrope Sarkuni
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: