Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: indexer
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      This patch contains a new utility which allows to check the configuration of the indexing filters. The IndexingFiltersChecker reads and parses a URL and run the indexers on it. Displays the fields obtained and the first
      100 characters of their value.

      Can be used e.g. ./nutch org.apache.nutch.indexer.IndexingFiltersChecker http://www.lemonde.fr/

      1. NUTCH-783.patch
        4 kB
        Julien Nioche

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          524d 22h 45m 1 Markus Jelsma 11/Jul/11 11:47
          Resolved Resolved Closed Closed
          8m 56s 1 Julien Nioche 11/Jul/11 11:56
          Julien Nioche made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Julien Nioche added a comment -

          Thanks for committing it Markus

          Show
          Julien Nioche added a comment - Thanks for committing it Markus
          Markus Jelsma made changes -
          Description This patch contains a new utility which allows to check the configuration of the indexing filters. The IndexerChecker reads and parses a URL and run the indexers on it. Displays the fields obtained and the first
           100 characters of their value.

          Can be used e.g. ./nutch org.apache.nutch.indexer.IndexingFiltersChecker http://www.lemonde.fr/

          This patch contains a new utility which allows to check the configuration of the indexing filters. The IndexingFiltersChecker reads and parses a URL and run the indexers on it. Displays the fields obtained and the first
           100 characters of their value.

          Can be used e.g. ./nutch org.apache.nutch.indexer.IndexingFiltersChecker http://www.lemonde.fr/

          Markus Jelsma made changes -
          Summary IndexingFiltersChecker Utilty IndexingFiltersChecker Utility
          Description This patch contains a new utility which allows to check the configuration of the indexing filters. The IndexerChecker reads and parses a URL and run the indexers on it. Displays the fields obtained and the first
           100 characters of their value.

          Can be used e.g. ./nutch org.apache.nutch.indexer.IndexerChecker http://www.lemonde.fr/

          This patch contains a new utility which allows to check the configuration of the indexing filters. The IndexerChecker reads and parses a URL and run the indexers on it. Displays the fields obtained and the first
           100 characters of their value.

          Can be used e.g. ./nutch org.apache.nutch.indexer.IndexingFiltersChecker http://www.lemonde.fr/

          Hide
          Markus Jelsma added a comment -

          Changed description to reflect name change.

          Show
          Markus Jelsma added a comment - Changed description to reflect name change.
          Markus Jelsma made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Markus Jelsma added a comment -

          Committed for 1.4 in rev 1145117. Cheers for Julien for this nice utility.

          Show
          Markus Jelsma added a comment - Committed for 1.4 in rev 1145117. Cheers for Julien for this nice utility.
          Markus Jelsma made changes -
          Link This issue relates to NUTCH-1038 [ NUTCH-1038 ]
          Markus Jelsma made changes -
          Summary IndexerChecker Utilty IndexingFiltersChecker Utilty
          Markus Jelsma made changes -
          Assignee Julien Nioche [ jnioche ] Markus Jelsma [ markus17 ]
          Fix Version/s 1.4 [ 12316519 ]
          Fix Version/s 2.0 [ 12314893 ]
          Affects Version/s 1.4 [ 12316519 ]
          Hide
          Markus Jelsma added a comment -

          Alright! I can include the changes in a new patch and add it as a new command. Also agreed on renaming the guy.

          Show
          Markus Jelsma added a comment - Alright! I can include the changes in a new patch and add it as a new command. Also agreed on renaming the guy.
          Hide
          Julien Nioche added a comment -

          Why not. Let's rename it to IndexingFiltersChecker. Markus to you want to look after this one or shall I do it?

          Show
          Julien Nioche added a comment - Why not. Let's rename it to IndexingFiltersChecker. Markus to you want to look after this one or shall I do it?
          Hide
          Markus Jelsma added a comment - - edited

          Yes, this code is not compatible with Nutch API in 1.4

          +      List<String> values = doc.getFieldValues(fname);
          +      if (values != null) {
          +        for (String value : values){
          +          int minText = Math.min(100, value.length());
          +          System.out.println(fname + " :\t" + value.substring(0, minText));
          +        }
          +      }
          

          changed to

                List<Object> values = Arrays.asList(doc.getFieldValue(fname));
                if (values != null) {
                  for (Object value : values) {
                    String str = value.toString();
                    int minText = Math.min(100, str.length());
                    System.out.println(fname + " :\t" + str.substring(0, minText));
                  }
                }
          

          It works now. I think it's nice to have in 1.4 and 2.0.

          Show
          Markus Jelsma added a comment - - edited Yes, this code is not compatible with Nutch API in 1.4 + List< String > values = doc.getFieldValues(fname); + if (values != null ) { + for ( String value : values){ + int minText = Math .min(100, value.length()); + System .out.println(fname + " :\t" + value.substring(0, minText)); + } + } changed to List< Object > values = Arrays.asList(doc.getFieldValue(fname)); if (values != null ) { for ( Object value : values) { String str = value.toString(); int minText = Math .min(100, str.length()); System .out.println(fname + " :\t" + str.substring(0, minText)); } } It works now. I think it's nice to have in 1.4 and 2.0.
          Julien Nioche made changes -
          Fix Version/s 2.0 [ 12314893 ]
          Hide
          Julien Nioche added a comment -

          Marked for 2.0 - patch might need adapting

          Show
          Julien Nioche added a comment - Marked for 2.0 - patch might need adapting
          Hide
          Markus Jelsma added a comment -

          You're right. Shouldn't it be marked for a version then?

          Show
          Markus Jelsma added a comment - You're right. Shouldn't it be marked for a version then?
          Hide
          Julien Nioche added a comment -

          Why should it be closed? As said in the description and later comments it is used to test indexing plugins, so it is not directly bound to Lucene or SOLR

          Show
          Julien Nioche added a comment - Why should it be closed? As said in the description and later comments it is used to test indexing plugins, so it is not directly bound to Lucene or SOLR
          Hide
          Markus Jelsma added a comment -

          What's this? Shouldn't it be closed?

          Show
          Markus Jelsma added a comment - What's this? Shouldn't it be closed?
          Julien Nioche made changes -
          Fix Version/s 1.1 [ 12313609 ]
          Hide
          Julien Nioche added a comment -

          Removed tag 1.1
          Will rename to IndexingPluginsChecker later

          Show
          Julien Nioche added a comment - Removed tag 1.1 Will rename to IndexingPluginsChecker later
          Julien Nioche made changes -
          Assignee Julien Nioche [ jnioche ]
          Julien Nioche made changes -
          Field Original Value New Value
          Attachment NUTCH-783.patch [ 12434377 ]
          Julien Nioche created issue -

            People

            • Assignee:
              Markus Jelsma
              Reporter:
              Julien Nioche
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development