Solr
  1. Solr
  2. SOLR-663

Allow multiple files for stopwords, protwords and synonyms

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Component/s: search
    • Labels:
      None

      Description

      Allow multiple files separated by comma (escaped by backslash) for StopFilterFactory, EnglishPorterFilterFactory, KeepWordFilterFactory and SynonymFilterFactory

      1. SOLR-663.patch
        10 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Shalin Shekhar Mangar added a comment -

          Added StrUtils#splitFileNames for the split logic. I chose to add another method because the escaping backslash had to be removed from the file names which the existing splitSmart methods didn't provide. Added test for this method in TestUtils.java

          Added logic to load multiple files in

          • EnglishPorterFilterFactory
          • SynonymFilterFactory
          • KeepWordFilterFactory
          • StopFilterFactory
          Show
          Shalin Shekhar Mangar added a comment - Added StrUtils#splitFileNames for the split logic. I chose to add another method because the escaping backslash had to be removed from the file names which the existing splitSmart methods didn't provide. Added test for this method in TestUtils.java Added logic to load multiple files in EnglishPorterFilterFactory SynonymFilterFactory KeepWordFilterFactory StopFilterFactory
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 680935.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 680935.
          Hide
          Grant Ingersoll added a comment -

          Shalin,

          I'm not sure I am following the logic in this patch, specifically around:

          
                  java.io.File keepWordsFile = new File(wordFiles);
                  if (keepWordsFile.exists()) {
                    List<String> wlist = loader.getLines(wordFiles);
                    words = StopFilter.makeStopSet(
                        (String[])wlist.toArray(new String[0]), ignoreCase);
                  } else  {
                    List<String> files = StrUtils.splitFileNames(wordFiles);
                    for (String file : files) {
                      List<String> wlist = loader.getLines(file.trim());
                      words.addAll(StopFilter.makeStopSet((String[])wlist.toArray(new String[0]), ignoreCase));
                    }
          

          When is the if clause above executed? Seems like the most likely case is that people are just using a file in solr/conf and it is relative, so the exists() method call will never be true, since the current working directory is likely to be two levels up?

          Also, why the need for separate cases anyway? Isn't the single file case just a degenerate case of the multiple files version?

          See also SOLR-1095, where I am working on some fixes to this, but I want to make sure I capture your understanding first.

          Show
          Grant Ingersoll added a comment - Shalin, I'm not sure I am following the logic in this patch, specifically around: java.io.File keepWordsFile = new File(wordFiles); if (keepWordsFile.exists()) { List< String > wlist = loader.getLines(wordFiles); words = StopFilter.makeStopSet( ( String [])wlist.toArray( new String [0]), ignoreCase); } else { List< String > files = StrUtils.splitFileNames(wordFiles); for ( String file : files) { List< String > wlist = loader.getLines(file.trim()); words.addAll(StopFilter.makeStopSet(( String [])wlist.toArray( new String [0]), ignoreCase)); } When is the if clause above executed? Seems like the most likely case is that people are just using a file in solr/conf and it is relative, so the exists() method call will never be true, since the current working directory is likely to be two levels up? Also, why the need for separate cases anyway? Isn't the single file case just a degenerate case of the multiple files version? See also SOLR-1095 , where I am working on some fixes to this, but I want to make sure I capture your understanding first.
          Hide
          Shalin Shekhar Mangar added a comment -

          Seems like the most likely case is that people are just using a file in solr/conf and it is relative, so the exists() method call will never be true, since the current working directory is likely to be two levels up?

          You are right Grant. It is a bug. It will never be executed unless an absolute path is specified. The reason it worked is because the splitSmart took care of the single file relative path case. Do you want me to re-open this issue?

          Show
          Shalin Shekhar Mangar added a comment - Seems like the most likely case is that people are just using a file in solr/conf and it is relative, so the exists() method call will never be true, since the current working directory is likely to be two levels up? You are right Grant. It is a bug. It will never be executed unless an absolute path is specified. The reason it worked is because the splitSmart took care of the single file relative path case. Do you want me to re-open this issue?
          Hide
          Grant Ingersoll added a comment -

          I have a fix for it along w/ the changes to SOLR-1095, so let's just note it here and on that issue.

          Show
          Grant Ingersoll added a comment - I have a fix for it along w/ the changes to SOLR-1095 , so let's just note it here and on that issue.

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Shalin Shekhar Mangar
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development