Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
BufferedPositionedInputStream is actualy not buffered, leading (I guess) to constant round trip to dfs as byte are read one by one. I just wrapped the provided input stream in the constructor in a good old BufferedInputStream.
I measured a 40% performance boost on a script that reads and writes 3.7GB in dfs through PigStorage on one node. I guess the impact may be greater on a real hdfs cluster with actual network roundtrips.
FYI, the issue was found while profiling with Yourkit java profiler. Usefull toy...