[PIO-102] ESEngineInstances `getAll` results out of order (Elasticsearch 5.x) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.11.0-incubating
Fix Version/s: 0.12.0-incubating
Component/s: Core
Labels:
None

Target Version/s:

0.12.0-incubating
External issue URL:
https://github.com/apache/incubator-predictionio/pull/406

Description

Using the new Elasticsearch 5.x REST storage client as the meta storage source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in conf/pio-env.sh), I found that once an engine has been trained a certain number of times, that the most recent engine instance is no longer retrieved. So, I tracked down where those Elasticsearch queries originate.

In the original Elasticsearch 1.x storage client, the "scroll" pagination responses are collected by appending them to one another.

In the new Elasticsearch 5.x client, the "scroll" responses are collected by prepending them to one another.

This out-of-order concatenation breaks ESEngineInstances `getLatestCompleted` by erroneously replacing the head of the results with an older engine instance, when there are enough engine instances to overflow a single page of Elasticsearch hits.

I've observed this buggy behavior after ten trainings, when enough engine instances are stored to trigger Elasticsearch's scroll feature.

Pull request: https://github.com/apache/incubator-predictionio/pull/406

Attachments

Issue Links

links to

GitHub Pull Request #406

Activity

People

Assignee:: Mars Hall

Reporter:: Mars Hall

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Jul/17 21:00

Updated:: 05/Dec/17 22:44

Resolved:: 28/Jul/17 18:56