Lucene - Core
  1. Lucene - Core
  2. LUCENE-1888

Provide Option to Store Payloads on the Term Vector

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Would be nice to have the option to access the payloads in a document-centric way by adding them to the Term Vectors. Naturally, this makes the Term Vectors bigger, but it may be just what one needs.

        Issue Links

          Activity

          Hide
          Peter Wilkins added a comment -

          As someone new to Lucene, with a specific problem to solve, it is difficult to identify the appropriate Lucene feature to use. Reading various online posts, I see I'm not alone. I have a use case that I think this JIRA issue addresses; perhaps it will help refine what the issue resolution would do.

          I'm indexing a lecture video transcript. I want to store the text of the transcript and timecodes of when each word occurs. I want to search the text of the transcript and return the timecode so I can display the lecture video from that spot.

          Show
          Peter Wilkins added a comment - As someone new to Lucene, with a specific problem to solve, it is difficult to identify the appropriate Lucene feature to use. Reading various online posts, I see I'm not alone. I have a use case that I think this JIRA issue addresses; perhaps it will help refine what the issue resolution would do. I'm indexing a lecture video transcript. I want to store the text of the transcript and timecodes of when each word occurs. I want to search the text of the transcript and return the timecode so I can display the lecture video from that spot.
          Hide
          Michal Fapso added a comment - - edited

          Hi Peter,

          I work on the same thing. You can get my code from here: http://speech.fit.vutbr.cz/en/software/speech-search (Lucene extension for bin sequences), there are also some testing data. Actually it indexes word confusion networks with scores of hypotheses, but of course it will work also for 1-best string transcripts.

          That code runs behind this website: http://www.superlectures.com/odyssey/

          It is few months old, so if you are interested, I can send you our current version.

          Best regards,
          Michal Fapso

          Show
          Michal Fapso added a comment - - edited Hi Peter, I work on the same thing. You can get my code from here: http://speech.fit.vutbr.cz/en/software/speech-search (Lucene extension for bin sequences), there are also some testing data. Actually it indexes word confusion networks with scores of hypotheses, but of course it will work also for 1-best string transcripts. That code runs behind this website: http://www.superlectures.com/odyssey/ It is few months old, so if you are interested, I can send you our current version. Best regards, Michal Fapso
          Hide
          Robert Muir added a comment -

          If we want to implement something like LUCENE-4272, we would need this option. I don't think it would be too bad especially
          now that term vectors use the same codec apis as the postings lists, e.g. hasPayload and getPayload is already there,
          its just that today it always returns false.

          Show
          Robert Muir added a comment - If we want to implement something like LUCENE-4272 , we would need this option. I don't think it would be too bad especially now that term vectors use the same codec apis as the postings lists, e.g. hasPayload and getPayload is already there, its just that today it always returns false.
          Show
          Robert Muir added a comment - See http://mail-archives.apache.org/mod_mbox/lucene-dev/201208.mbox/%3CCAOdYfZV_r6Ov8WFpAOyPid2r6MqR-_zx7kDmVvjucN1wcxD8Yw%40mail.gmail.com%3E
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Grant Ingersoll
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development