Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16010

langid should include all required Tika dependencies

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • contrib - LangId
    • None

    Description

      Currently, the langid module requires that extraction module to be loaded for langid to work. It isn't clear if what is included in the extraction module will even meet the langid needs (ie: tika-langdetect isn't included in extraction module)

      ➜  solr git:(SOLR-15989) find solr/packaging/build/solr-10.0.0-SNAPSHOT/ -name '*tika*.jar'
      solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/langid/lib/tika-core-1.27.jar
      solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-parsers-1.27.jar
      solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-java7-1.27.jar
      solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-xmp-1.27.jar
      solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/vorbis-java-tika-0.8.jar
      solr/packaging/build/solr-10.0.0-SNAPSHOT/modules/extraction/lib/tika-core-1.27.jar
      

      This came out of a discussion in SOLR-15989 - https://github.com/apache/solr/pull/621#discussion_r806083202

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              krisden Kevin Risden
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: