Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Inspired by Sahin Buyrukbilen's question here:

      http://www.lucidimagination.com/search/document/b68846e383824653/how_to_export_lucene_index_to_a_simple_text_file#b68846e383824653

      I made a simple read/write codec that stores all postings data into a
      single text file (_X.pst), looking like this:

      field contents
        term file
          doc 0
            pos 5
        term is
          doc 0
            pos 1
        term second
          doc 0
            pos 3
        term test
          doc 0
            pos 4
        term the
          doc 0
            pos 2
        term this
          doc 0
            pos 0
      END
      

      The codec is fully funtional – all Lucene & Solr tests pass with
      -Dtests.codec=SimpleText – but, its performance is obviously poor.

      However, it should be useful for debugging, transparency,
      understanding just what Lucene stores in its index, etc. And it's a
      quick way to gain some understanding on how a codec works...

      1. LUCENE-2664.patch
        33 kB
        Michael McCandless

        Activity

        Hide
        Yonik Seeley added a comment -

        heh - cool!

        Show
        Yonik Seeley added a comment - heh - cool!
        Hide
        Michael McCandless added a comment -

        Committed, but I had to leave SimpleText out of the nightly rotation... some tests run incredibly slowly, due to heavy reliance on the terms dict cache (which SimpleText doesn't have)... I'd like to separately fix that and then hopefully put SImpleText in for rotation, so I'll leave this issue open for that.

        Show
        Michael McCandless added a comment - Committed, but I had to leave SimpleText out of the nightly rotation... some tests run incredibly slowly, due to heavy reliance on the terms dict cache (which SimpleText doesn't have)... I'd like to separately fix that and then hopefully put SImpleText in for rotation, so I'll leave this issue open for that.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development