Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.11.0-incubating
-
None
Description
Using the new Elasticsearch 5.x REST storage client as the meta storage source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in conf/pio-env.sh), I found that once an engine has been trained a certain number of times, that the most recent engine instance is no longer retrieved. So, I tracked down where those Elasticsearch queries originate.
In the original Elasticsearch 1.x storage client, the "scroll" pagination responses are collected by appending them to one another.
In the new Elasticsearch 5.x client, the "scroll" responses are collected by prepending them to one another.
This out-of-order concatenation breaks ESEngineInstances `getLatestCompleted` by erroneously replacing the head of the results with an older engine instance, when there are enough engine instances to overflow a single page of Elasticsearch hits.
I've observed this buggy behavior after ten trainings, when enough engine instances are stored to trigger Elasticsearch's scroll feature.
Pull request: https://github.com/apache/incubator-predictionio/pull/406
Attachments
Issue Links
- links to