Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-477

AnalysisRequestHandler

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      Being able to programmatically access tokenization information can be quite useful not only in Solr, but in other NLP applications where token vectors are necessary.

      The patch to follow creates an AnalysisRequestHandler which processes a document through the analysis process and returns a response filled with tokens, their offsets, position inc., type and value.

      Patch also adds some character array processing to Xml and adds Token handling to XMLWriter.

      I only implemented Xml output, as I don't know JSON or the other types. If someone else is so motivated, they can add those.

      Attachments

        1. SOLR-477.patch
          21 kB
          Grant Ingersoll
        2. SOLR-477.patch
          22 kB
          Grant Ingersoll

        Activity

          People

            gsingers Grant Ingersoll
            gsingers Grant Ingersoll
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: