[MAPREDUCE-210] want InputFormat for zip files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

HDFS is inefficient with large numbers of small files. Thus one might pack many small files into large, compressed, archives. But, for efficient map-reduce operation, it is desireable to be able to split inputs into smaller chunks, with one or more small original file per split. The zip format, unlike tar, permits enumeration of files in the archive without scanning the entire archive. Thus a zip InputFormat could efficiently permit splitting large archives into splits that contain one or more archived files.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ZipInputFormat_fixed.patch
23/Jan/08 10:49
15 kB
Ankur Bansal

Activity

People

Assignee:: indrajit

Reporter:: Doug Cutting

Votes:: 11 Vote for this issue

Watchers:: 30 Start watching this issue

Dates

Created:: 31/Aug/07 19:01

Updated:: 03/Jun/21 05:56