Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2285

TextInputFormat is slow compared to reading files.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.15.0
    • 0.16.0
    • None
    • None

    Description

      The LineRecordReader reads from the source byte by byte, which seems to be half as fast as if the readLine method was defined on the memory buffer directly instead of as an InputStream.

      Attachments

        1. fast-line.patch
          15 kB
          Owen O'Malley
        2. fast-line2.patch
          16 kB
          Arun Murthy
        3. fast-line3.patch
          16 kB
          Christopher Douglas

        Activity

          People

            omalley Owen O'Malley
            omalley Owen O'Malley
            Votes:
            3 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: