Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2720

A parser to output universal sentence encodings to text

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 2.0.0-BETA
    • tika-dl
    • None

    Description

      This parser encodes a text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks. The model is trained and optimized for greater-than-word length text, such as sentences, phrases or short paragraphs. It is trained on a variety of data sources and a variety of tasks with the aim of dynamically accommodating a wide variety of natural language understanding tasks. The input is variable length English text and the output is a 512 dimensional vector.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ThejanWijesinghe Thejan Wijesinghe
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: