Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2406

Micro-benchmark to measure read/write times through InputFormats

    XMLWordPrintableJSON

Details

    • Test
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.16.0
    • fs, test
    • None

    Description

      The attached test writes/reads XGB to/from the default filesystem through SequenceFileInputFormat and TextInputFormat, using LzoCodec, GzipCodec, and without compression, using both block and record compression for SequenceFiles.

      The following results using 10GB of data through RawLocalFileSystem with 5 word keys, 20 word values (as generated by RandomTextWriter with the same seed for each file) are pretty stable:

      Writes:

      Format Compression Type Time (sec) Filesize (bytes)
      SEQ LZO BLOCK 318 8 604 288 397
      SEQ LZO RECORD 367 11 689 969 413
      SEQ ZIP BLOCK 929 2 827 697 769
      SEQ ZIP RECORD 1737 9 324 730 365
      SEQ     201 11 282 745 683
      TXT LZO   742 12 671 065 769
      TXT ZIP   1320 2 597 397 680
      TXT     392 10 818 058 643

      Reads:

      Format Compression Type Time (sec)
      SEQ LZO BLOCK 150
      SEQ LZO RECORD 281
      SEQ ZIP BLOCK 155
      SEQ ZIP RECORD 548
      SEQ     209
      TXT LZO   620
      TXT ZIP   355
      TXT     284

      Of note:

      • Lzo compressed TextOutput is larger than the uncompressed output (HADOOP-2402); lzop cannot read it.
      • Zip compression is expensive. Short values are responsible for the unimpressive compression for record-compressed SequenceFiles.
      • TextInputFormat is slow (HADOOP-2285). TextOutputFormat also looks suspect.

      Attachments

        1. 2406-1.patch
          28 kB
          Christopher Douglas
        2. 2406-0.patch
          28 kB
          Christopher Douglas

        Activity

          People

            cdouglas Christopher Douglas
            cdouglas Christopher Douglas
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: