Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1244

Provide a SeekableInput implementation for FileSystem retrieved output streams

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • None
    • None
    • java
    • None

    Description

      To use the DFW#appendTo API, one needs to pass a SeekableInput interface object. Avro provides a usable utility for files that can be represented by a File object, but in the Hadoop land, HDFS and other FSes can't be represented via a File object and need a longer route to implement this interface.

      We can add a simple HadoopSeekableFSInput or so that can take Hadoop provided objects and wrap it into a SeekableInput interface ready for passing to Avro.

      I propose something of the following type:

      public static class HadoopSeekableFSInput implements SeekableInput {
          FSDataInputStream in;
          long length;
       
          public SeekableFSInput(FSDataInputStream in, long length) {
            this.in = in;
            this.length = length;
          }
       
          public void close() throws IOException {
            in.close();
          }
       
          public void seek(long p) throws IOException {
            in.seek(p);
          }
       
          public long tell() throws IOException {
            return in.getPos();
          }
       
          public long length() throws IOException {
            return length;
          }
       
          public int read(byte[] b, int off, int len) throws IOException {
            return in.read(b, off, len);
          }
        }
      

      The above can be constructed by users via a simple call such as new HadoopSeekableFSInput(fs.open(filePath), fs.getFileStatus(filePath).getLen()).

      Ideally this class should belong in the avro core module but that strictly does not depend on Hadoop-Common today, and hence somewhere else may be more suitable.

      This lets users write Avro-append code such as https://gist.github.com/QwertyManiac/4724582 more easily.

      Attachments

        Activity

          People

            Unassigned Unassigned
            qwertymaniac Harsh J
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: