Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
- The `FsStateStore` creates and updates a `current.jst` to track the most recent version of the job state
- The problem is that for AWS users, the state-store typically has be put on S3
- The problem is that for overwriting data, S3 only provides eventual consistency
- This can cause problems as Gobblin jobs will see an old version of the state-store
A simple solution to this problem would be to:
- Remove the concept of `current.jst` and just let each state-store entry be of the form `job_id.jst`
- The gobblin just does a `ls` on the state-store directory, sorts the contents by file name and picks the most recent one
- File listing time + sorting the listing shouldn't take long, but just in case the state-store retention job should be run as part of the Gobblin core job - either in the `ApplicationLauncher` or the `JobLauncher`
Github Url : https://github.com/linkedin/gobblin/issues/882
Github Reporter : stakiar
Github Created At : 2016-03-25T03:54:26Z
Github Updated At : 2017-01-12T04:50:48Z
Comments
stakiar wrote on 2016-03-25T03:55:30Z : @zliu41 I believe we discussed this briefly while working on #741, any comments on the above approach?
Github Url : https://github.com/linkedin/gobblin/issues/882#issuecomment-201126060
zliu41 wrote on 2016-03-25T15:35:35Z : LGTM except that if a job has multiple datasets there will be multiple `current.jst`s so you'll need to find the most recent one for each dataset urn.
Github Url : https://github.com/linkedin/gobblin/issues/882#issuecomment-201334649
jbaranick wrote on 2016-04-12T02:58:38Z : I've started working on this.
Github Url : https://github.com/linkedin/gobblin/issues/882#issuecomment-208682515
lakshmanantokbox wrote on 2016-04-22T01:23:37Z : If the consistency is turned on in EMR,“consistent view” for EMRFS(https://blogs.aws.amazon.com/bigdata/post/Tx1WL4KR7SE37YY/Ensuring-Consistency-When-Using-Amazon-S3-and-Amazon-Elastic-MapReduce-for-ETL-W), this problem can be avoided
Github Url : https://github.com/linkedin/gobblin/issues/882#issuecomment-213198627
jbaranick wrote on 2016-04-22T01:48:32Z : Correct, but for those use Qubole, this is not the case.
> On Apr 21, 2016, at 6:23 PM, lakshmanantokbox notifications@github.com wrote:
>
> If the consistency is turned on in EMR,“consistent view” for EMRFS(https://blogs.aws.amazon.com/bigdata/post/Tx1WL4KR7SE37YY/Ensuring-Consistency-When-Using-Amazon-S3-and-Amazon-Elastic-MapReduce-for-ETL-W), this problem can be avoided
>
> —
> You are receiving this because you commented.
> Reply to this email directly or view it on GitHub
Github Url : https://github.com/linkedin/gobblin/issues/882#issuecomment-213207376