Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1602

improve the readability of metadata in readdb dump normal

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.7
    • 1.8
    • crawldb
    • None

    Description

      the dumped metadata format is not readable.

      $bin/nutch readdb crawldb/ -dump dir
      http://www.baidu.com/	Version: 7
      Status: 3 (db_gone)
      Fetch time: Sat Aug 17 22:35:37 CST 2013
      Modified time: Thu Jan 01 08:00:00 CST 1970
      Retries since fetch: 0
      Retry interval: 3888000 seconds (45 days)
      Score: 1.0
      Signature: null
      Metadata: m1: v22m3: v3m2: v2m5: v5m4: m4_pst_: robots_denied(18), lastModified=0m6: v6
      

      so I improve the Metadata format to this

      Metadata: m1=v22;m3=v3;m2=v2;m5=v5;m4=m4;_pst_=robots_denied(18), lastModified=0;m6=v6;
      

      Attachments

        1. NUTCH-1602.patch
          0.5 kB
          lufeng
        2. NUTCH-1602-2.patch
          0.9 kB
          lufeng

        Activity

          People

            amuseme.lu lufeng
            amuseme.lu lufeng
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: