Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10035

Simple text codec add multi level skip list data

    XMLWordPrintableJSON

Details

    • New

    Description

      Simple text codec add skip list data( include impact) to help understand index format,For debugging, curiosity, transparency only!! When term's docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default value is 8), Simple text codec will write skip list, the .pst (simple text term dictionary file) file will looks like this

      field title
        term args
          doc 2
            freq 2
            pos 7
            pos 10
          ## we omit docs for better view ......
          doc 98
            freq 2
            pos 2
            pos 6
          skipList 
      ?
            level 1
              skipDoc 65
              skipDocFP 949
              impacts 
                impact 
                  freq 1
                  norm 2
                impact 
                  freq 2
                  norm 12
                impact 
                  freq 3
                  norm 13
              impacts_end 
      ?
            level 0
              skipDoc 17
              skipDocFP 284
              impacts 
                impact 
                  freq 1
                  norm 2
                impact 
                  freq 2
                  norm 12
              impacts_end         
              skipDoc 34
              skipDocFP 624
              impacts 
                impact 
                  freq 1
                  norm 2
                impact 
                  freq 2
                  norm 12
                impact 
                  freq 3
                  norm 14
              impacts_end         
              skipDoc 65
              skipDocFP 949
              impacts 
                impact 
                  freq 1
                  norm 2
                impact 
                  freq 2
                  norm 12
                impact 
                  freq 3
                  norm 13
              impacts_end         
              skipDoc 90
              skipDocFP 1311
              impacts 
                impact 
                  freq 1
                  norm 2
                impact 
                  freq 2
                  norm 10
                impact 
                  freq 3
                  norm 13
                impact 
                  freq 4
                  norm 14
              impacts_end 
      END
      checksum 00000000000829315543
      
      

      compare with previous,we add skipList,level, skipDoc, skipDocFP, impacts, impact, freq, norm nodes, at the same, simple text codec can support advanceShallow when search time.

       

      Why there has question mark symbol in the file ?

      Because the MultiLevelSkipListWriter will write "length" and "childPointer" with VLong

      This speed up search process ?

      No!!! It can be advanceShallow when search, but why not speed up yet? Because the skip list data after docs(see the file described before), it must iterate all docs before read skip list data, so it never speed up search time. it has no "skipOffset" to direct read skip list data, but as mentioned before, it is For debugging, curiosity, transparency only!! If this is a problem, may be the next time, i can add "skipOffset", so we can read skip list data directly.

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            wuda0112 wuda
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 6h 40m
                6h 40m