[TIKA-2720] A parser to output universal sentence encodings to text - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: 2.0.0-BETA
Component/s: tika-dl
Labels:
None

Description

This parser encodes a text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks. The model is trained and optimized for greater-than-word length text, such as sentences, phrases or short paragraphs. It is trained on a variety of data sources and a variety of tasks with the aim of dynamically accommodating a wide variety of natural language understanding tasks. The input is variable length English text and the output is a 512 dimensional vector.

Attachments

Issue Links

links to

GitHub Pull Request #248

Activity

People

Assignee:: Unassigned

Reporter:: Thejan Wijesinghe

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Sep/18 20:51

Updated:: 21/Jul/21 22:13