Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-201

BufferedPositionedInputStream is not buffered

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.2.0
    • None
    • None

    Description

      BufferedPositionedInputStream is actualy not buffered, leading (I guess) to constant round trip to dfs as byte are read one by one. I just wrapped the provided input stream in the constructor in a good old BufferedInputStream.

      I measured a 40% performance boost on a script that reads and writes 3.7GB in dfs through PigStorage on one node. I guess the impact may be greater on a real hdfs cluster with actual network roundtrips.

      FYI, the issue was found while profiling with Yourkit java profiler. Usefull toy...

      Attachments

        1. BufferedPositionedInputStream.patch
          0.7 kB
          Mathieu Poumeyrol

        Activity

          People

            kali Mathieu Poumeyrol
            kali Mathieu Poumeyrol
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: