Hadoop Common
  1. Hadoop Common
  2. HADOOP-40

bufferSize argument is ignored in FileSystem.create(File, boolean, int)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: fs
    • Labels:
      None

      Description

      org.apache.hadoop.fs.FileSystem.create(File f, boolean overwrite, int bufferSize)

      ignores the input parameter bufferSize.
      It passes further down the internal configuration, which includes the buffer size, but not the parameter value.
      This works fine within the file system, since everything that calls create extracts buffer size from the same config.
      MapReduce although is probably affected by that, see

      org.apache.hadoop.io.SequenceFile.Sorter.MergeQueue.MergeQueue(int size, String outName, boolean done)

      The attached patch would fix it.

      1. bufferSize.patch
        2 kB
        Doug Cutting
      2. BufferSize.patch
        2 kB
        Konstantin Shvachko

        Activity

        Hide
        Doug Cutting added a comment -

        I don't think we should modify the configuration in this case, since that will affect code which uses this configuration that runs later. SequenceFile uses very large buffers when sorting and merging, in order to minimize disk seeks, and we don't want everything to start using such large buffers.

        So why not just pass the missing parameter down? I've attached a patch that does this. Does this look good to you?

        Show
        Doug Cutting added a comment - I don't think we should modify the configuration in this case, since that will affect code which uses this configuration that runs later. SequenceFile uses very large buffers when sorting and merging, in order to minimize disk seeks, and we don't want everything to start using such large buffers. So why not just pass the missing parameter down? I've attached a patch that does this. Does this look good to you?
        Hide
        Doug Cutting added a comment -

        I just committed my patch for this.

        Show
        Doug Cutting added a comment - I just committed my patch for this.
        Hide
        Konstantin Shvachko added a comment -

        Yes. This is the right way of doing it.
        It particularly makes sense, since the main file and the checksum file
        do not necessarily need to share the same buffer size.
        With the file buffer large the checksum buffer doesn't need to be large at all.

        Show
        Konstantin Shvachko added a comment - Yes. This is the right way of doing it. It particularly makes sense, since the main file and the checksum file do not necessarily need to share the same buffer size. With the file buffer large the checksum buffer doesn't need to be large at all.

          People

          • Assignee:
            Unassigned
            Reporter:
            Konstantin Shvachko
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development