Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-3028

WARCExported to support filtering by JEXL

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.19
    • 1.21
    • None
    • None

    Description

      Filtering segment data to WARC is now possible using JEXL expressions. In the next example, all records with SOME_KEY=SOME_VALUE in their parseData metadata are exported to WARC.

      -expr 'parseData.getParseMeta().get("SOME_KEY").equals("SOME_VALUE")'

      or

      -expr 'content.getMetadata().get("SOME_KEY").equals("SOME_VALUE")'

      Attachments

        1. NUTCH-3028.patch
          5 kB
          Markus Jelsma
        2. NUTCH-3028-1.patch
          5 kB
          Markus Jelsma
        3. NUTCH-3028-2.patch
          6 kB
          Markus Jelsma

        Activity

          People

            markus17 Markus Jelsma
            markus17 Markus Jelsma
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: