[FLUME-600] Have collector source create names that are both lexographically and chronologically ordered - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.9.3
Fix Version/s: 0.9.4
Component/s: Sinks+Sources
Labels:
None

Release Note:
This patch changes the default filename convention that the collector writes out. Output file names will now have the following format: <prefix>yyyyMMdd-HHmmssSSSSz.<12digitNanos>.<8charTid>

Description

We're transitioning to Hadoop. Until then, we're parsing the files that Flume drops on S3.

S3's API says that keys will be returned in order. It's easy to ask S3:

"Given I am on 2011-03-17/0400/flume-1.seq, give me one file."

Assuming the next lexicographically ordered file is 2011-03-17/0400/flume-2.seq, then you don't have to do any cumbersome faux-directory sweeping (since S3 doesn't know about directories per se). You can let Amazon do that work for you.

We don't have any requirements about sprintf-style formatting of the filename; just that they're written in order

Rob

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--0001-FLUME-600-Have-collector-create-names-that-are-both-.patch
09/May/11 17:56
6 kB
Jonathan Hsieh

Issue Links

is related to

FLUME-48 collectorSink writes files with .seq suffix even though text files are written out.

Closed

Activity

People

Assignee:: Jonathan Hsieh

Reporter:: Disabled imported user

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 12/Apr/11 03:53

Updated:: 06/Aug/11 00:14

Resolved:: 09/May/11 17:55