Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-336

Parsing json-ld content takes prohibitively long time

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.2
    • 2.3
    • core, extractors
    • None

    Description

      Using the page https://www.guthriegreen.com as a benchmark, a page fetch took about 100 ms, while simply parsing the json-ld content on that page took a staggering 27400 ms. For reference, I'm using Java 8, build 162, on a Macbook Pro (early 2015).

      The bad news is that this is not our fault.

      I've profiled this behavior down to the com.github.jsonldjava.utils.JsonUtils.fromURL(URL, CloseableHttpClient) function. 94% of the parsing time is spent there. This function is called when trying to load remote json-ld contexts. 

      In order to avoid loading remote contexts repeatedly, this function tries to cache them by using a CachingHttpClient from the httpclient-osgi library.

      Unfortunately, that strategy is not working, as I have recorded exactly zero cache hits, meaning that every retrieval is a cache miss and a remote context is re-fetched via http every single time it's accessed.

       

      Attachments

        1. Screen Shot 2018-03-27 at 2.52.15 PM.png
          257 kB
          Hans Brende
        2. Screen Shot 2018-03-27 at 2.54.43 PM.png
          231 kB
          Hans Brende

        Issue Links

          Activity

            People

              hansbrende Hans Brende
              hansbrende Hans Brende
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: