Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: indexer
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      This patch contains a new utility which allows to check the configuration of the indexing filters. The IndexingFiltersChecker reads and parses a URL and run the indexers on it. Displays the fields obtained and the first
      100 characters of their value.

      Can be used e.g. ./nutch org.apache.nutch.indexer.IndexingFiltersChecker http://www.lemonde.fr/

      1. NUTCH-783.patch
        4 kB
        Julien Nioche

        Issue Links

          Activity

          Hide
          Julien Nioche added a comment -

          Removed tag 1.1
          Will rename to IndexingPluginsChecker later

          Show
          Julien Nioche added a comment - Removed tag 1.1 Will rename to IndexingPluginsChecker later
          Hide
          Markus Jelsma added a comment -

          What's this? Shouldn't it be closed?

          Show
          Markus Jelsma added a comment - What's this? Shouldn't it be closed?
          Hide
          Julien Nioche added a comment -

          Why should it be closed? As said in the description and later comments it is used to test indexing plugins, so it is not directly bound to Lucene or SOLR

          Show
          Julien Nioche added a comment - Why should it be closed? As said in the description and later comments it is used to test indexing plugins, so it is not directly bound to Lucene or SOLR
          Hide
          Markus Jelsma added a comment -

          You're right. Shouldn't it be marked for a version then?

          Show
          Markus Jelsma added a comment - You're right. Shouldn't it be marked for a version then?
          Hide
          Julien Nioche added a comment -

          Marked for 2.0 - patch might need adapting

          Show
          Julien Nioche added a comment - Marked for 2.0 - patch might need adapting
          Hide
          Markus Jelsma added a comment - - edited

          Yes, this code is not compatible with Nutch API in 1.4

          +      List<String> values = doc.getFieldValues(fname);
          +      if (values != null) {
          +        for (String value : values){
          +          int minText = Math.min(100, value.length());
          +          System.out.println(fname + " :\t" + value.substring(0, minText));
          +        }
          +      }
          

          changed to

                List<Object> values = Arrays.asList(doc.getFieldValue(fname));
                if (values != null) {
                  for (Object value : values) {
                    String str = value.toString();
                    int minText = Math.min(100, str.length());
                    System.out.println(fname + " :\t" + str.substring(0, minText));
                  }
                }
          

          It works now. I think it's nice to have in 1.4 and 2.0.

          Show
          Markus Jelsma added a comment - - edited Yes, this code is not compatible with Nutch API in 1.4 + List< String > values = doc.getFieldValues(fname); + if (values != null ) { + for ( String value : values){ + int minText = Math .min(100, value.length()); + System .out.println(fname + " :\t" + value.substring(0, minText)); + } + } changed to List< Object > values = Arrays.asList(doc.getFieldValue(fname)); if (values != null ) { for ( Object value : values) { String str = value.toString(); int minText = Math .min(100, str.length()); System .out.println(fname + " :\t" + str.substring(0, minText)); } } It works now. I think it's nice to have in 1.4 and 2.0.
          Hide
          Julien Nioche added a comment -

          Why not. Let's rename it to IndexingFiltersChecker. Markus to you want to look after this one or shall I do it?

          Show
          Julien Nioche added a comment - Why not. Let's rename it to IndexingFiltersChecker. Markus to you want to look after this one or shall I do it?
          Hide
          Markus Jelsma added a comment -

          Alright! I can include the changes in a new patch and add it as a new command. Also agreed on renaming the guy.

          Show
          Markus Jelsma added a comment - Alright! I can include the changes in a new patch and add it as a new command. Also agreed on renaming the guy.
          Hide
          Markus Jelsma added a comment -

          Committed for 1.4 in rev 1145117. Cheers for Julien for this nice utility.

          Show
          Markus Jelsma added a comment - Committed for 1.4 in rev 1145117. Cheers for Julien for this nice utility.
          Hide
          Markus Jelsma added a comment -

          Changed description to reflect name change.

          Show
          Markus Jelsma added a comment - Changed description to reflect name change.
          Hide
          Julien Nioche added a comment -

          Thanks for committing it Markus

          Show
          Julien Nioche added a comment - Thanks for committing it Markus

            People

            • Assignee:
              Markus Jelsma
              Reporter:
              Julien Nioche
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development