Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1290

crawlId not supported by all Tools

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • nutchgora
    • nutchgora
    • indexer
    • None
    • Patch Available

    Description

      See also: https://issues.apache.org/jira/browse/NUTCH-907

      The StorageUtils class exposes a createDataStore method which uses the default schema for a persistent class specified in the Gora configuration.
      This method ignores Nutch' storage.schema property and the notion of a crawlId.

      Two tools use this method instead of the createWebStore method (which does support the storage.schema property and a crawlId):

      o.a.n.indexer.IndexerReducer (IndexerJob)
      o.a.n.util.domain.DomainStatistics

      I propose that these two start using the createWebStore method and that we make remove the createDataStore method from the StorageUtils.
      Also, these two tools should support the crawlId command line parameter.

      Attachments

        1. NUTCH-1290.patch
          5 kB
          Mathijs Homminga

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            mathijs.homminga Mathijs Homminga
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment