[NUTCH-1357] All gora mapreduce functionality should go through StorageUtils - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Auto Closed
Affects Version/s: nutchgora
Fix Version/s: 2.5
Component/s: None
Labels:
None

Description

I am trying to make the concept of crawlId work for ALL nutch jobs: it seems the biggest problem with it not working as expected is because of the various ways gora mapreduce is used in nutch.

Some jobs use StorageUtils, some use GoraMapper/GoraReduce, some even use directly GoraInputFormat/GoraOutputFormat. But the only place the translation is made from crawlId into a schema name is in StorageUtils! Currently I am converting all calls to Gora* mapreduce initializing code to StorageUtils calls.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ferdy

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 09/May/12 13:08

Updated:: 13/Oct/19 22:35

Resolved:: 13/Oct/19 22:35