Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-412

HTTPClient API does not allow parallelism

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3
    • 2.8
    • core
    • None

    Description

      Although our DefaultHTTPClient using a "PoolingHttpClientConnectionManager" we are unable to use parallelism to take advantage of this, because the getActualDocumentIRI(), getContentType(), and getContentLength() methods are defined on the actual http client itself, and not on a response object, and thus, by the time they are called, their values may have changed as a result of a different http client url request. Thus there is no way to execute calls in parallel using a single http client.

      Background: I ran into this problem while trying to parallelize the online microdata tests (cf. ANY23-67) for speed, using a single Any23 instance to extract from multiple pages simultaneously. Usually, the tests would pass, but sporadically, they would fail as a result of the document IRI not matching the page the triples were extracted from. I had to work around this by using a different Any23 instance (and thus a different http client) for every single request.

      Attachments

        Activity

          People

            Unassigned Unassigned
            hansbrende Hans Brende
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: