Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-711

Indexer failing after upgrade to Hadoop 0.19.1

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.0
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      After upgrade to Hadoop 0.19.1 Reducer is initialized in a different order than before (see http://svn.apache.org/viewvc?view=rev&revision=736239). IndexingFilters populate current JobConf with field options that are required for IndexerOutputFormat to function properly. However, the filters are instantiated in Reducer.configure(), which is now called after the OutputFormat is initialized, and not before as previously.

      The workaround for now is to instantiate IndexinigFilters once again inside IndexerOutputFormat. This issue should be revisited before 1.1 in order to find a better solution.

      See this thread for more information: http://www.lucidimagination.com/search/document/7c62c625c7ea17fe/problem_with_crawling_using_the_latest_1_0_trunk

        Attachments

        1. patch.txt
          0.7 kB
          Andrzej Bialecki

          Activity

            People

            • Assignee:
              ab Andrzej Bialecki
              Reporter:
              ab Andrzej Bialecki
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: