Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2694

HostDB to aggregate by long instead of integer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.15
    • 1.16
    • hostdb
    • None

    Description

      Last week we got Pinterest in our database, it has a neat set of sitemaps, and a lot of entries, over 2 billion. When first making HostDatum i foolishly used ints instead of longs, which shows in -1.9 billion records for Pinterest.

      I propose a simple move from int to long with an upgrade note mentioning the databases are not compatible and the suggestion to delete any existing HostDB. Agreed?

      Attachments

        1. NUTCH-2694-2.patch
          15 kB
          Sebastian Nagel
        2. NUTCH-2694.patch
          13 kB
          Markus Jelsma

        Activity

          People

            Unassigned Unassigned
            markus17 Markus Jelsma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: