(LocalFileSystem) TEST ================================================================================================================== (LocalFileSystem)Use Bulk decompression in RCFile->ValueBuffer->ReadFileds, and the test adds some noisy between two RCFile reading, and after written to avoid disk cache. ================================================================================================================== Write RCFile with 10 random string columns and 100000 rows cost 5911 milliseconds. And the file's on disk size is 11501112 Write SequenceFile with 10 random string columns and 100000 rows cost 8626 milliseconds. And the file's on disk size is 13046020 Read only one column of a RCFile with 10 random string columns and 100000 rows cost 259 milliseconds. Read only first and last columns of a RCFile with 10 random string columns and 100000 rows cost 181 milliseconds. Read all columns of a RCFile with 10 random string columns and 100000 rows cost 498 milliseconds. Read SequenceFile with 10 random string columns and 100000 rows cost 7002 milliseconds. Write RCFile with 25 random string columns and 100000 rows cost 9762 milliseconds. And the file's on disk size is 28725817 Write SequenceFile with 25 random string columns and 100000 rows cost 19971 milliseconds. And the file's on disk size is 32246409 Read only one column of a RCFile with 25 random string columns and 100000 rows cost 233 milliseconds. Read only first and last columns of a RCFile with 25 random string columns and 100000 rows cost 269 milliseconds. Read all columns of a RCFile with 25 random string columns and 100000 rows cost 1082 milliseconds. Read SequenceFile with 25 random string columns and 100000 rows cost 16539 milliseconds. Write RCFile with 40 random string columns and 100000 rows cost 16458 milliseconds. And the file's on disk size is 45940679 Write SequenceFile with 40 random string columns and 100000 rows cost 37513 milliseconds. And the file's on disk size is 51436799 Read only one column of a RCFile with 40 random string columns and 100000 rows cost 261 milliseconds. Read only first and last columns of a RCFile with 40 random string columns and 100000 rows cost 301 milliseconds. Read all columns of a RCFile with 40 random string columns and 100000 rows cost 1698 milliseconds. Read SequenceFile with 40 random string columns and 100000 rows cost 25415 milliseconds. (LocalFileSystem)Use Bulk decompression in RCFile->ValueBuffer->ReadFileds, and adds some noisy between two RCFile reading, and after written to avoid disk cache. column number| RCFile size | RCFile read 1 column | RCFile read 2 column | RCFile read all columns |Sequence file size | sequence file read all 10| 11501112| 259| 181| 498| 13046020| 7002 25| 28725817| 233| 269| 1082| 32246409| 16539 40| 45940679| 261| 301| 1698| 51436799| 25415 ================================================================================================================== (LocalFileSystem)not bulk decompression in RCFile->ValueBuffer->ReadFileds, and the test adds some noisy between two RCFile reading, and after written to avoid disk cache. ================================================================================================================== Write RCFile with 10 random string columns and 100000 rows cost 5841 milliseconds. And the file's on disk size is 11501112 Write SequenceFile with 10 random string columns and 100000 rows cost 8662 milliseconds. And the file's on disk size is 13046020 Read only one column of a RCFile with 10 random string columns and 100000 rows cost 1804 milliseconds. Read only first and last columns of a RCFile with 10 random string columns and 100000 rows cost 3262 milliseconds. Read all columns of a RCFile with 10 random string columns and 100000 rows cost 15956 milliseconds. Read SequenceFile with 10 random string columns and 100000 rows cost 6927 milliseconds. Write RCFile with 25 random string columns and 100000 rows cost 9695 milliseconds. And the file's on disk size is 28725817 Write SequenceFile with 25 random string columns and 100000 rows cost 19953 milliseconds. And the file's on disk size is 32246409 Read only one column of a RCFile with 25 random string columns and 100000 rows cost 1761 milliseconds. Read only first and last columns of a RCFile with 25 random string columns and 100000 rows cost 3310 milliseconds. Read all columns of a RCFile with 25 random string columns and 100000 rows cost 39492 milliseconds. Read SequenceFile with 25 random string columns and 100000 rows cost 15983 milliseconds. Write RCFile with 40 random string columns and 100000 rows cost 15240 milliseconds. And the file's on disk size is 45940679 Write SequenceFile with 40 random string columns and 100000 rows cost 31893 milliseconds. And the file's on disk size is 51436799 Read only one column of a RCFile with 40 random string columns and 100000 rows cost 1843 milliseconds. Read only first and last columns of a RCFile with 40 random string columns and 100000 rows cost 3386 milliseconds. Read all columns of a RCFile with 40 random string columns and 100000 rows cost 63759 milliseconds. Read SequenceFile with 40 random string columns and 100000 rows cost 25256 milliseconds. (LocalFileSystem)Not bulk decompression in RCFile->ValueBuffer->readFileds, and the test adds some noisy between two RCFile reading, and after written to avoid disk cache. column number| RCFile size | RCFile read 1 column | RCFile read 2 column | RCFile read all columns |Sequence file size | sequence file read all 10| 11501112| 1804 | 3262 | 15956 | 13046020| 6927 25| 28725817| 1761 | 3310 | 39492 | 32246409| 15983 40| 45940679| 1843 | 3386 | 63759 | 51436799| 25256 (DistributedFileSystem) Test ================================================================================================================== (DistributedFileSystem)Use Bulk decompression in RCFile->ValueBuffer->ReadFileds, and the test adds some noisy between two RCFile reading, and after written to avoid disk cache. ================================================================================================================== Write RCFile with 10 random string columns and 100000 rows cost 12591 milliseconds. And the file's on disk size is 11501112 Write SequenceFile with 10 random string columns and 100000 rows cost 16740 milliseconds. And the file's on disk size is 13046020 Read only one column of a RCFile with 10 random string columns and 100000 rows cost 2381 milliseconds. Read only first and last columns of a RCFile with 10 random string columns and 100000 rows cost 3516 milliseconds. Read all columns of a RCFile with 10 random string columns and 100000 rows cost 9898 milliseconds. Read SequenceFile with 10 random string columns and 100000 rows cost 18053 milliseconds. Write RCFile with 25 random string columns and 100000 rows cost 30285 milliseconds. And the file's on disk size is 28725817 Write SequenceFile with 25 random string columns and 100000 rows cost 40758 milliseconds. And the file's on disk size is 32246409 Read only one column of a RCFile with 25 random string columns and 100000 rows cost 3754 milliseconds. Read only first and last columns of a RCFile with 25 random string columns and 100000 rows cost 5254 milliseconds. Read all columns of a RCFile with 25 random string columns and 100000 rows cost 22521 milliseconds. Read SequenceFile with 25 random string columns and 100000 rows cost 43258 milliseconds. Write RCFile with 40 random string columns and 100000 rows cost 47525 milliseconds. And the file's on disk size is 45940679 Write SequenceFile with 40 random string columns and 100000 rows cost 64169 milliseconds. And the file's on disk size is 51436799 Read only one column of a RCFile with 40 random string columns and 100000 rows cost 5597 milliseconds. Read only first and last columns of a RCFile with 40 random string columns and 100000 rows cost 8225 milliseconds. Read all columns of a RCFile with 40 random string columns and 100000 rows cost 40304 milliseconds. Read SequenceFile with 40 random string columns and 100000 rows cost 69278 milliseconds. (DistributedFileSystem)Use Bulk decompression in RCFile->ValueBuffer->readFileds, and adds some noisy between two RCFile reading, and after written to avoid disk cache. column number| RCFile size | RCFile read 1 column | RCFile read 2 column | RCFile read all columns |Sequence file size | sequence file read all 10| 11501112| 2381| 3516| 9898| 13046020| 18053 25| 28725817| 3754| 5254| 22521| 32246409| 43258 40| 45940679| 5597| 8225| 40304| 51436799| 69278