Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2950

UpdateHostDb: performance improvements

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 1.18
    • 1.19
    • hostdb
    • None
    • Patch Available

    Description

      This issue addresses a couple of performance improvements when creating the HostDb:

      • avoid needless conversions between hostname and URL
      • improvements of HostDb serialization (write and read)
      • parametrize logging and log less on level INFO
      • do not create DNS resolver threads if DNS look-ups are not requested by command-line options

      A patch/PR is ready. Depending on the chosen command-line options, a 10-20% speed-up should be visible if DNS look-ups, normalization and filtering are off.

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: