Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Auto Closed
-
nutchgora
-
None
-
None
Description
I am trying to make the concept of crawlId work for ALL nutch jobs: it seems the biggest problem with it not working as expected is because of the various ways gora mapreduce is used in nutch.
Some jobs use StorageUtils, some use GoraMapper/GoraReduce, some even use directly GoraInputFormat/GoraOutputFormat. But the only place the translation is made from crawlId into a schema name is in StorageUtils! Currently I am converting all calls to Gora* mapreduce initializing code to StorageUtils calls.