Uploaded image for project: 'Droids'
  1. Droids
  2. DROIDS-109 Several defects in robots exclusion protocol (robots.txt) implementation
  3. DROIDS-110

droids-norobots shouldn't have dependency on protocol implementation; it should be abstract Rules Engine

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.2.0
    • None
    • core, norobots
    • None

    Description

      Naturally enforced by DROIDS-109 requirements.

      1. Move NoRobotsClient.java from droids-norobots into droids-core
      2. Move ContentLoader.java from droids-norobots into droids-core
      3. Refactor ContentLoader, ContentEntity, ManagedContentEntity, AdvancedManagedContentEntity
      Having InputStream instead of byte[] doesn't seem right; and we need proper metadata.

      Even for FileProtocol.FileContentEntity, why should we expect unlimited terabytes of data and use InputStream instead of bytearray and proper encoding in case of text? Most "robots" exist because of "search", and most simply limit data to 64kb - 128kb (although Amazon.com have 300kb raw web pages in average)

      Attachments

        Activity

          People

            Unassigned Unassigned
            funtick Fuad Efendi
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 672h
                672h
                Remaining:
                Remaining Estimate - 672h
                672h
                Logged:
                Time Spent - Not Specified
                Not Specified