Lucene - Core
  1. Lucene - Core
  2. LUCENE-988

Benchmarker tasks for the TPB data collection

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Not a Problem
    • Affects Version/s: 2.3
    • Fix Version/s: None
    • Component/s: modules/benchmark
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Very simple DocMaker and QueryMaker for the TPB data collection (~150,000 content items, ~500,000 comments to the contents and ~3,700,000 user queries).

      URL to dataset:
      http://thepiratebay.org/tor/3783572/db_dump_and_query_log_from_piratebay.org__summer_of_2006

      1. LUCENE-988.txt
        5 kB
        Karl Wettin

        Activity

        Hide
        Shai Erera added a comment -

        Closing because I'm not sure what's the license level of "The Pirate Bay" DB and also not sure that we want to have such DB in Lucene. Benchmark's API allows for someone to write a ContentSource which reads whatever source he wants, and convert it to DocData that is later fed and index by DocMaker.

        Show
        Shai Erera added a comment - Closing because I'm not sure what's the license level of "The Pirate Bay" DB and also not sure that we want to have such DB in Lucene. Benchmark's API allows for someone to write a ContentSource which reads whatever source he wants, and convert it to DocData that is later fed and index by DocMaker.

          People

          • Assignee:
            Unassigned
            Reporter:
            Karl Wettin
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development