Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1980

Jexl expressions for CrawlDbReader

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.11
    • crawldb
    • None

    Description

      Jexl expression support for the CrawlDbReader. This allows you to read items from the database based on their metadata with flexilibity and boolean logic. Some examples

      * Get all english pages
      -expr "lang=en"
      
      * Get all english pages that have a low response time
      -expr "lang=en && _rs_ > 5000"
      

      Attachments

        1. NUTCH-1980.patch
          7 kB
          Markus Jelsma
        2. NUTCH-1980-1.9.patch
          10 kB
          Markus Jelsma
        3. NUTCH-1980-1.9.patch
          10 kB
          Markus Jelsma
        4. NUTCH-1980-1.9.patch
          9 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: