Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4829

Change default Kudu read behavior for "RYW"

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      For 2.9 we want to change the default Kudu read behavior mode, which was previously exposed via an Impala gflag.

      Currently the default read mode is set to "READ_LATEST", which essentially provides no guarantees on reading except that any read issued will read the latest value that the target replica happens to have. This is not necessarily a time after a previous write operation in the same session. By changing the read mode to the misleadingly named "READ_AT_SNAPSHOT", we can ensure that Kudu reads will all be at times at least or greater than the latest "observed" time (which Impala already sets on the client). Note that this does not mean all reads are performed at the same timestamp (i.e. a snapshot read) because that requires setting a snapshot timestamp, but doing this will require more work in the future in both Impala (IMPALA-4685) and Kudu (which needs to make some client changes and also fix how they GC historical values).

      This means that, after this change, values written within a session will always be visible to subsequent reads. Before this change, this was usually the case but not guaranteed. The Kudu team calls this "Read Your Writes".

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        commit 32ff959814646458a34278500bd01fc7741951ce
        Author: Matthew Jacobs <mj@cloudera.com>
        Date: Thu Jan 26 12:56:19 2017 -0800

        IMPALA-4829: Change default Kudu read behavior for "RYW"

        Currently the default Kudu read mode is set to "READ_LATEST",
        which essentially provides no guarantees on reading except
        that any read issued will read the latest value that the
        target replica happens to have. This is not necessarily a
        time after a previous write operation in the same session.
        By changing the read mode to the misleadingly named
        "READ_AT_SNAPSHOT", we can ensure that Kudu reads will all
        be at times at least or greater than the latest "observed"
        time (which Impala already sets on the client). Note that
        this does not mean all reads are performed at the same
        timestamp (i.e. a snapshot read) because that requires
        setting a snapshot timestamp, but doing this will require
        more work in the future in both Impala and (mostly) Kudu.
        The Kudu team calls this "Read Your Writes".

        This means that, after this change, values written within a
        session will always be visible to subsequent reads. Before
        this change, this was usually the case but not guaranteed.

        Testing: Private test run, running an exhaustive job now.
        This is otherwise difficult to validate in new tests. This
        has plenty of time to bake for 2.9 in case we discover
        performance or functional issues.

        Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e
        Reviewed-on: http://gerrit.cloudera.org:8080/5802
        Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
        Reviewed-by: Dan Hecht <dhecht@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        mjacobs Matthew Jacobs added a comment - commit 32ff959814646458a34278500bd01fc7741951ce Author: Matthew Jacobs <mj@cloudera.com> Date: Thu Jan 26 12:56:19 2017 -0800 IMPALA-4829 : Change default Kudu read behavior for "RYW" Currently the default Kudu read mode is set to "READ_LATEST", which essentially provides no guarantees on reading except that any read issued will read the latest value that the target replica happens to have. This is not necessarily a time after a previous write operation in the same session. By changing the read mode to the misleadingly named "READ_AT_SNAPSHOT", we can ensure that Kudu reads will all be at times at least or greater than the latest "observed" time (which Impala already sets on the client). Note that this does not mean all reads are performed at the same timestamp (i.e. a snapshot read) because that requires setting a snapshot timestamp, but doing this will require more work in the future in both Impala and (mostly) Kudu. The Kudu team calls this "Read Your Writes". This means that, after this change, values written within a session will always be visible to subsequent reads. Before this change, this was usually the case but not guaranteed. Testing: Private test run, running an exhaustive job now. This is otherwise difficult to validate in new tests. This has plenty of time to bake for 2.9 in case we discover performance or functional issues. Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e Reviewed-on: http://gerrit.cloudera.org:8080/5802 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            mjacobs Matthew Jacobs
            Reporter:
            mjacobs Matthew Jacobs
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development