Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4907

Unable to open scanner: Timed out errors when running COMPUTE STATS on Kudu-related tables

    Details

      Description

      I have noticed that when loading our test data onto a cdh5-trunk cluster, there are frequent errors when we run compute stats on Kudu-related tables, but these errors don't appear one earlier versions (e.g., with CDH 5.10). They also don't appear on mini-cluster tests.

      compute-table-stats.log from cdh5-trunk test run:

      Executing: compute stats functional_kudu.alltypestiny
        -> Error: ImpalaBeeswaxException:
       Query aborted:
      Unable to open scanner: Timed out: unable to retry before timeout: Remote error: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for ts: L: 465216 to be safe (mode: NON-LEADER). Current safe time: L: 390220 Physical time difference: None (Logical clock): Remote error: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for ts: L: 431554 to be safe (mode: NON-LEADER). Current safe time: L: 367307 Physical time difference: None (Logical clock)
      
      Executing: compute stats functional_kudu.jointbl
        -> Updated 1 partition(s) and 4 column(s).
      
      Executing: compute stats functional_kudu.emptytable
        -> Error: ImpalaBeeswaxException:
       Query aborted:
      Unable to open scanner: Timed out: Timed out waiting for ts: L: 501168 to be safe (mode: NON-LEADER). Current safe time: L: 449249 Physical time difference: None (Logical clock)
      
      Executing: compute stats functional_kudu.nulltable
        -> Error: ImpalaBeeswaxException:
       Query aborted:
      Unable to open scanner: Timed out: unable to retry before timeout: Remote error: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for ts: L: 535495 to be safe (mode: NON-LEADER). Current safe time: L: 438046 Physical time difference: None (Logical clock): Remote error: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for ts: L: 501998 to be safe (mode: NON-LEADER). Current safe time: L: 415262 Physical time difference: None (Logical clock)
      
      Executing: compute stats functional_kudu.dimtbl
        -> Error: ImpalaBeeswaxException:
       Query aborted:
      Unable to open scanner: Timed out: unable to retry before timeout: Remote error: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for ts: L: 570160 to be safe (mode: NON-LEADER). Current safe time: L: 461598 Physical time difference: None (Logical clock): Remote error: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for ts: L: 536646 to be safe (mode: NON-LEADER). Current safe time: L: 438814 Physical time difference: None (Logical clock)
      
      Executing: compute stats functional_kudu.alltypessmall
        -> Error: ImpalaBeeswaxException:
       Query aborted:
      Unable to open scanner: Timed out: unable to retry before timeout: Remote error: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for ts: L: 603727 to be safe (mode: NON-LEADER). Current safe time: L: 484424 Physical time difference: None (Logical clock): Remote error: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for ts: L: 570916 to be safe (mode: NON-LEADER). Current safe time: L: 462086 Physical time difference: None (Logical clock)
      
      Executing: compute stats functional_kudu.alltypesagg_idx
        -> Updated 1 partition(s) and 15 column(s).
      
      Executing: compute stats functional_kudu.alltypesaggnonulls
        -> Updated 1 partition(s) and 14 column(s).
      
      Executing: compute stats functional_kudu.tinytable
        -> Updated 1 partition(s) and 2 column(s).
      

      compute-table-stats.log from CDH 5.10 test run:

      Executing: compute stats functional_kudu.alltypestiny
        -> Updated 1 partition(s) and 13 column(s).
      
      Executing: compute stats functional_kudu.jointbl
        -> Updated 1 partition(s) and 4 column(s).
      
      Executing: compute stats functional_kudu.emptytable
        -> Updated 1 partition(s) and 2 column(s).
      
      Executing: compute stats functional_kudu.nulltable
        -> Updated 1 partition(s) and 7 column(s).
      
      Executing: compute stats functional_kudu.dimtbl
        -> Updated 1 partition(s) and 3 column(s).
      
      Executing: compute stats functional_kudu.alltypessmall
        -> Updated 1 partition(s) and 13 column(s).
      
      Executing: compute stats functional_kudu.alltypesagg_idx
        -> Updated 1 partition(s) and 15 column(s).
      
      Executing: compute stats functional_kudu.alltypesaggnonulls
        -> Updated 1 partition(s) and 14 column(s).
      
      Executing: compute stats functional_kudu.tinytable
        -> Updated 1 partition(s) and 2 column(s).
      

        Activity

        Hide
        dknupp David Knupp added a comment -

        Matthew Jacobs Assigned to you for the obvious reason. Please reassign as appropriate.

        Show
        dknupp David Knupp added a comment - Matthew Jacobs Assigned to you for the obvious reason. Please reassign as appropriate.
        Hide
        jbapple Jim Apple added a comment -

        Can you clarify which Apache Impala git hashes correspond to cdh5-trunk and CDH5.10

        Show
        jbapple Jim Apple added a comment - Can you clarify which Apache Impala git hashes correspond to cdh5-trunk and CDH5.10
        Show
        dknupp David Knupp added a comment - Good question. These are the two specific commits I looked at. CDH5.10 = https://github.com/apache/incubator-impala/commit/b3636c97d4b872e1640955974409c57459d655e0 CDH5-trunk = https://github.com/apache/incubator-impala/commit/933f2ce7fd17ebcee8c150f93ff4488f746232dd
        Hide
        jyu@cloudera.com Juan Yu added a comment -

        seems like due to the default value of kudu_read_mode changed from READ_LATEST to READ_AT_SNAPSHOT.

        Show
        jyu@cloudera.com Juan Yu added a comment - seems like due to the default value of kudu_read_mode changed from READ_LATEST to READ_AT_SNAPSHOT.
        Hide
        mmokhtar Mostafa Mokhtar added a comment -

        Bumping priority as this blocks basic functionality.

        Show
        mmokhtar Mostafa Mokhtar added a comment - Bumping priority as this blocks basic functionality.
        Hide
        dknupp David Knupp added a comment -

        Note that this issue is also being discussed in KUDU-1869.

        After talking to Matthew Jacobs, my understanding is that there is a fix on the Kudu side that is currently in review. In the meantime, there's some consideration to set SNAPSHOT_READ to disabled in the meantime, since is causing failures in Kudu-related tests.

        Matt, please correct me if I'm wrong.

        Show
        dknupp David Knupp added a comment - Note that this issue is also being discussed in KUDU-1869 . After talking to Matthew Jacobs , my understanding is that there is a fix on the Kudu side that is currently in review. In the meantime, there's some consideration to set SNAPSHOT_READ to disabled in the meantime, since is causing failures in Kudu-related tests. Matt, please correct me if I'm wrong.
        Hide
        mjacobs Matthew Jacobs added a comment -

        That is correct. I have a patch out to revert the SNAPSHOT_READ behavior, waiting for a +2: https://gerrit.cloudera.org/#/c/5970/

        Show
        mjacobs Matthew Jacobs added a comment - That is correct. I have a patch out to revert the SNAPSHOT_READ behavior, waiting for a +2: https://gerrit.cloudera.org/#/c/5970/
        Hide
        jbapple Jim Apple added a comment -

        https://gerrit.cloudera.org/#/c/5970/ is in. Should we Resolve this issue?

        Show
        jbapple Jim Apple added a comment - https://gerrit.cloudera.org/#/c/5970/ is in. Should we Resolve this issue?
        Hide
        mjacobs Matthew Jacobs added a comment -

        commit bd1d445b37f3cfc56ff868a678caf161b29a9d92 is for reverting the RYW functionality default in impala, but let's keep this open to track getting a better fix from Kudu. we'd like to re-revert this before 2.9 if possible.

        Show
        mjacobs Matthew Jacobs added a comment - commit bd1d445b37f3cfc56ff868a678caf161b29a9d92 is for reverting the RYW functionality default in impala, but let's keep this open to track getting a better fix from Kudu. we'd like to re-revert this before 2.9 if possible.
        Hide
        jbapple Jim Apple added a comment -

        SG. If this is an "if possible", do you think it still deserves the "blocker" priority?

        Show
        jbapple Jim Apple added a comment - SG. If this is an "if possible", do you think it still deserves the "blocker" priority?
        Hide
        mjacobs Matthew Jacobs added a comment -

        We have RYW disabled by default for now. We will turn it back on when we get a better solution for KUDU-1869.

        Show
        mjacobs Matthew Jacobs added a comment - We have RYW disabled by default for now. We will turn it back on when we get a better solution for KUDU-1869 .
        Hide
        mjacobs Matthew Jacobs added a comment -

        Filed https://issues.cloudera.org/browse/IMPALA-4972 to track re-enabling RYW

        Show
        mjacobs Matthew Jacobs added a comment - Filed https://issues.cloudera.org/browse/IMPALA-4972 to track re-enabling RYW

          People

          • Assignee:
            mjacobs Matthew Jacobs
            Reporter:
            dknupp David Knupp
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development