Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1614

Search in Hadoop

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 1.4
    • 3.2
    • search
    • None

    Description

      What's the use case? Sometimes queries are expensive (such as
      regex) or one has indexes located in HDFS, that then need to be
      searched on. By leveraging Hadoop, these non-time sensitive
      queries may be executed without dynamically deploying the
      indexes to new Solr servers.

      We'll download the index out of HDFS (assuming they're zipped),
      perform the queries in a batch on the index shard, then merge
      the results either using a Solr query results priority queue, or
      simply using Hadoop's built in merge sorting.

      The query file will be encoded in JSON format, (ID, query,
      numresults,fields). The shards file will simply contain newline
      delimited paths (HDFS or otherwise). The output can be a Solr
      encoded results file per query.

      I'm hoping to add an actual Hadoop unit test.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jasonrutherglen Jason Rutherglen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: