Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9168

The Naming and Inheritance for RecordReader, LineRecordReader, LineReader

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.21.0, 2.0.2-alpha, 0.23.5
    • None
    • util
    • Incompatible change

    Description

      I feel LineReader is not the correct name, since it reads up to a given delimiter.

      How about Text Record Reader ?
      Sounds correct but LineReader is not a RecordReader by inheritance,
      but by functionality , yes it is the Record reader.

      Now if we look at it with a different angle,

      In General,
      InputFormat would mostly has two responsibilities
      1)To Read A split
      2)Generate Key & Value pairs based upon the Reading done over Split.

      Now in TextInputFormat,
      Has a RecordReader, Which is inherited by LineRecordReader,
      which uses another class LineReader.

      But We Have
      LineReader, which does the reading of the file.
      LineRecordReader generates key & Value.

      I would suggest,

      RecordReader to be renamed as KeyValueGenerator,
      LineRecordReader to be renamed as TextInputKeyValueGenerator,
      LineReader to be renamed as delimitedTextReader,

      Generic attributes of LineReader (such as start, pos, end, buffer, bufferBytes .. etc ) to be abstracted to a class called RecordReader,
      Since its all specific to reading of the given input.

      delimitedTextReader class could extend RecordReader.

      Now the names could make better scene. We must also look into computability as well. It might be un fit to deploy unless a new API is introduced.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gelesh Gelesh
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 96h
                96h
                Remaining:
                Remaining Estimate - 96h
                96h
                Logged:
                Time Spent - Not Specified
                Not Specified