Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-210

want InputFormat for zip files

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      HDFS is inefficient with large numbers of small files. Thus one might pack many small files into large, compressed, archives. But, for efficient map-reduce operation, it is desireable to be able to split inputs into smaller chunks, with one or more small original file per split. The zip format, unlike tar, permits enumeration of files in the archive without scanning the entire archive. Thus a zip InputFormat could efficiently permit splitting large archives into splits that contain one or more archived files.

      Attachments

        1. ZipInputFormat_fixed.patch
          15 kB
          Ankur Bansal

        Activity

          People

            indrajeetapache indrajit
            cutting Doug Cutting
            Votes:
            11 Vote for this issue
            Watchers:
            30 Start watching this issue

            Dates

              Created:
              Updated: