Solr
  1. Solr
  2. SOLR-1366

UnsupportedOperationException may be thrown when using custom IndexReader

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: replication (java), search
    • Labels:
      None

      Description

      If a custom IndexReaderFactory is specifiedd in solrconfig.xml, and IndexReader-s that it produces don't support IndexReader.directory() (such as is the case with ParallelReader or MultiReader) then an uncaught UnsupportedOperationException is thrown.

      This call is used only to retrieve the full path of the directory for informational purpose, so it shouldn't lead to a crash. Instead we could supply other available information about the reader (e.g. from its toString() method).

      1. SOLR-1366.patch
        1 kB
        Shalin Shekhar Mangar
      2. searcher.patch
        0.9 kB
        Andrzej Bialecki

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Bulk close Solr 1.4 issues

          Show
          Grant Ingersoll added a comment - Bulk close Solr 1.4 issues
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 817801.

          Thanks everybody!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 817801. Thanks everybody!
          Hide
          Andrzej Bialecki added a comment -

          Looks good to me, +1.

          Show
          Andrzej Bialecki added a comment - Looks good to me, +1.
          Hide
          Shalin Shekhar Mangar added a comment -

          How about this?

          ** Experimental Feature **
          Please note - Using a custom IndexReaderFactory may prevent certain other features
          from working. The API to IndexReaderFactory may change without warning or may even
          be removed from future releases if the problems cannot be resolved.
          
          ** Features that may not work with custom IndexReaderFactory **
          The ReplicationHandler assumes a disk-resident index. Using a custom
          IndexReader implementation may cause incompatibility between ReplicationHandler and
          may cause replication to not work correctly. See SOLR-1366 for details.
          
          Show
          Shalin Shekhar Mangar added a comment - How about this? ** Experimental Feature ** Please note - Using a custom IndexReaderFactory may prevent certain other features from working. The API to IndexReaderFactory may change without warning or may even be removed from future releases if the problems cannot be resolved. ** Features that may not work with custom IndexReaderFactory ** The ReplicationHandler assumes a disk-resident index. Using a custom IndexReader implementation may cause incompatibility between ReplicationHandler and may cause replication to not work correctly. See SOLR-1366 for details.
          Hide
          Andrzej Bialecki added a comment -

          +1 for adding a big red flag. My application depends on this functionality, and it's working well once I overrode a bunch of additional methods in IndexReader that deal with Directory, IndexCommit, index version, etc.

          (A few details on this, and why my solution is not applicable in general case: I'm using ParallelReader, and the other indexes that I add are throwaways, i.e. I recreate them on each index refresh from external shared resources. So I basically short-circuited those methods that deal with directory and commits so that they return information from the main index. This way the file-based replication works as before for the main index).

          Show
          Andrzej Bialecki added a comment - +1 for adding a big red flag. My application depends on this functionality, and it's working well once I overrode a bunch of additional methods in IndexReader that deal with Directory, IndexCommit, index version, etc. (A few details on this, and why my solution is not applicable in general case: I'm using ParallelReader, and the other indexes that I add are throwaways, i.e. I recreate them on each index refresh from external shared resources. So I basically short-circuited those methods that deal with directory and commits so that they return information from the main index. This way the file-based replication works as before for the main index).
          Hide
          Hoss Man added a comment -

          The core issue here seems to stem from incompatible assumptions between various pieces of code in Solr, and the new IndexReaderFactory functionality introduced in 1.4 by SOLR-243.

          If we don't have a silver bullet for solving all the discrepancies before 1.4, then the safest approach seems to be making sure the documentation (and example configs) for IndexReaderFactory have big flashing red "Experimental!" disclaimers on them warning that using an IndexReaderFactory prevents certain other features from working (list all the ones we know about) and that the API for IndexReaderFactory may change at anytime w/o warning or be removed from future release if the problems can't be resolved.

          I'm tempted to suggest we remove SOLR-243 altogether, but there are some use cases of it that can work well for people, so it would be a shame to gut it completely.

          Show
          Hoss Man added a comment - The core issue here seems to stem from incompatible assumptions between various pieces of code in Solr, and the new IndexReaderFactory functionality introduced in 1.4 by SOLR-243 . If we don't have a silver bullet for solving all the discrepancies before 1.4, then the safest approach seems to be making sure the documentation (and example configs) for IndexReaderFactory have big flashing red "Experimental!" disclaimers on them warning that using an IndexReaderFactory prevents certain other features from working (list all the ones we know about) and that the API for IndexReaderFactory may change at anytime w/o warning or be removed from future release if the problems can't be resolved. I'm tempted to suggest we remove SOLR-243 altogether, but there are some use cases of it that can work well for people, so it would be a shame to gut it completely.
          Hide
          Andrzej Bialecki added a comment -

          I didn't make myself clear .. I fixed this for my application, where I control the implementation of IndexReader, but I wouldn't recommend this fix for general use. So this was just FYI.

          Show
          Andrzej Bialecki added a comment - I didn't make myself clear .. I fixed this for my application, where I control the implementation of IndexReader, but I wouldn't recommend this fix for general use. So this was just FYI.
          Hide
          Yonik Seeley added a comment -

          So, it seems for 1.4 we could fix this problem

          What's the fix/workaround though? Solr currently assumes a disk resident index, and the patch here makes up a directory... that doesn't seem like a good idea. It also seems too late in the release cycle to try and abstract away from a disk index... potentially complex and invasive.

          Show
          Yonik Seeley added a comment - So, it seems for 1.4 we could fix this problem What's the fix/workaround though? Solr currently assumes a disk resident index, and the patch here makes up a directory... that doesn't seem like a good idea. It also seems too late in the release cycle to try and abstract away from a disk index... potentially complex and invasive.
          Hide
          Grant Ingersoll added a comment -

          So, it seems for 1.4 we could fix this problem, add the documentation, and then open a new issue to deal with the replication problem in 1.5?

          Show
          Grant Ingersoll added a comment - So, it seems for 1.4 we could fix this problem, add the documentation, and then open a new issue to deal with the replication problem in 1.5?
          Hide
          Andrzej Bialecki added a comment -

          FYI, for now I solved this by extending my IndexReader to support this call and return the original directory that lists all index files plus a few resources that I care about. However, this is just glossing over the deeper problem - replication handler shouldn't assume the directory is file-based.

          Show
          Andrzej Bialecki added a comment - FYI, for now I solved this by extending my IndexReader to support this call and return the original directory that lists all index files plus a few resources that I care about. However, this is just glossing over the deeper problem - replication handler shouldn't assume the directory is file-based.
          Hide
          Shalin Shekhar Mangar added a comment -

          Yeah, ReplicationHandler assumes a file based IndexReader. We can document this clearly.

          .. I haven't looked into it yet, but perhaps this could be solved by extending the replication handler to support multiple dirs, and for those IndexReader that don't support directory() try asking for getSubReaders() and use their directory() ...

          Yes but that becomes complicated very fast. You'd need to know the correct directory to which a newly downloaded file has to be written on the slave. You'd also need to re-open readers selectively and you'd need to change their directories (for cases where an existing file is to be deleted/replaced).

          We anyway need to take a look at replication again after 1.4 with a focus on Lucene's NRT capabilities.

          Show
          Shalin Shekhar Mangar added a comment - Yeah, ReplicationHandler assumes a file based IndexReader. We can document this clearly. .. I haven't looked into it yet, but perhaps this could be solved by extending the replication handler to support multiple dirs, and for those IndexReader that don't support directory() try asking for getSubReaders() and use their directory() ... Yes but that becomes complicated very fast. You'd need to know the correct directory to which a newly downloaded file has to be written on the slave. You'd also need to re-open readers selectively and you'd need to change their directories (for cases where an existing file is to be deleted/replaced). We anyway need to take a look at replication again after 1.4 with a focus on Lucene's NRT capabilities.
          Hide
          Andrzej Bialecki added a comment -

          .. I haven't looked into it yet, but perhaps this could be solved by extending the replication handler to support multiple dirs, and for those IndexReader that don't support directory() try asking for getSubReaders() and use their directory() ...

          Show
          Andrzej Bialecki added a comment - .. I haven't looked into it yet, but perhaps this could be solved by extending the replication handler to support multiple dirs, and for those IndexReader that don't support directory() try asking for getSubReaders() and use their directory() ...
          Hide
          Andrzej Bialecki added a comment -

          Indeed, that complicates the matter ... It looks like using a non-file based IndexReader breaks replication. This is not a regression from 1.3, but the functionality to specify custom IndexReaders will be available in 1.4, so it should be clearly stated in docs that it prevents replication from working properly, until we rectify this issue.

          Show
          Andrzej Bialecki added a comment - Indeed, that complicates the matter ... It looks like using a non-file based IndexReader breaks replication. This is not a regression from 1.3, but the functionality to specify custom IndexReaders will be available in 1.4, so it should be clearly stated in docs that it prevents replication from working properly, until we rectify this issue.
          Hide
          Mark Miller added a comment -

          This call is used only to retrieve the full path of the directory for informational purpose, so it shouldn't lead to a crash.

          The replication handler uses it a lot (assuming its a file) - I almost feel we should add some javadoc about how using non FSDirectory based IndexReaders is supported. I'm not even fully sure myself.

          Also, are we positive it behaves here correctly in SolrCore ?

                File indexDirFile = new File(getIndexDir()).getCanonicalFile();
                File newIndexDirFile = new File(newIndexDir).getCanonicalFile();
                
                if (newestSearcher != null && solrConfig.reopenReaders
                    && indexDirFile.equals(newIndexDirFile)) {
          
          Show
          Mark Miller added a comment - This call is used only to retrieve the full path of the directory for informational purpose, so it shouldn't lead to a crash. The replication handler uses it a lot (assuming its a file) - I almost feel we should add some javadoc about how using non FSDirectory based IndexReaders is supported. I'm not even fully sure myself. Also, are we positive it behaves here correctly in SolrCore ? File indexDirFile = new File(getIndexDir()).getCanonicalFile(); File newIndexDirFile = new File(newIndexDir).getCanonicalFile(); if (newestSearcher != null && solrConfig.reopenReaders && indexDirFile.equals(newIndexDirFile)) {
          Hide
          Andrzej Bialecki added a comment -

          Patch that catches the exception and supplies IndexReader.toString() instead.

          Show
          Andrzej Bialecki added a comment - Patch that catches the exception and supplies IndexReader.toString() instead.

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Andrzej Bialecki
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development