Uploaded image for project: 'PredictionIO (Retired)'
  1. PredictionIO (Retired)
  2. PIO-102

ESEngineInstances `getAll` results out of order (Elasticsearch 5.x)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.11.0-incubating
    • 0.12.0-incubating
    • Core
    • None

    Description

      Using the new Elasticsearch 5.x REST storage client as the meta storage source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in conf/pio-env.sh), I found that once an engine has been trained a certain number of times, that the most recent engine instance is no longer retrieved. So, I tracked down where those Elasticsearch queries originate.

      In the original Elasticsearch 1.x storage client, the "scroll" pagination responses are collected by appending them to one another.

      In the new Elasticsearch 5.x client, the "scroll" responses are collected by prepending them to one another.

      This out-of-order concatenation breaks ESEngineInstances `getLatestCompleted` by erroneously replacing the head of the results with an older engine instance, when there are enough engine instances to overflow a single page of Elasticsearch hits.

      I've observed this buggy behavior after ten trainings, when enough engine instances are stored to trigger Elasticsearch's scroll feature.

      Pull request: https://github.com/apache/incubator-predictionio/pull/406

      Attachments

        Issue Links

          Activity

            People

              marsikai Mars Hall
              marsikai Mars Hall
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: