Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1465

archive partSize should be configurable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • harchive
    • None

    Description

      The archive part size is current set to 2GB. For archiving 10^5 small files, it took 52 minutes since there is only 1 mapper.

      -bash-3.1$ time $H archive ${Q} -archiveName ${DIR}.3.har -p ${PARENT} ${DIR} ${PARENT}
      10/02/06 01:55:14 INFO mapred.JobClient: Running job: job_201002042035_5737
      ...
      10/02/06 02:47:18 INFO mapred.JobClient:  map 100% reduce 100%
      10/02/06 02:47:19 INFO mapred.JobClient: Job complete: job_201002042035_5737
      ...
      10/02/06 02:47:19 INFO mapred.JobClient:     Reduce input records=100002
      
      real    52m27.188s
      user    0m29.314s
      sys     0m1.276s
      

      Attachments

        Issue Links

          Activity

            People

              mahadev Mahadev Konar
              szetszwo Tsz-wo Sze
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: