Uploaded image for project: 'Apache Trafodion (Retired)'
  1. Apache Trafodion (Retired)
  2. TRAFODION-2455

Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from estimator, fails with timeouts by doing select count (*)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1-incubating
    • 2.1-incubating
    • sql-cmp
    • None
    • A cluster large enough to host a 22 billion row table

    Description

      When loading a scale factor 73728 Order Entry database, if UPDATE STATISTICS is done soon after the load on one particular table (the largest table, having 22 billion rows), we get the following failure:

      SQLEXCEPTION on Statement, Error Code = -9200
      update statistics for table trafodion.javabench.oe_orderline_73728 on every column, (OL_W_ID, OL_I_ID), (OL_D_ID, OL_W_ID), (OL_D_ID, OL_I_ID) sample

          • ERROR[9200] UPDATE STATISTICS for table TRAFODION.JAVABENCH.OE_ORDERLINE_73728 encountered an error (8448) from statement getRow(). [2017-01-09 02:07:22]
          • ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::coProcAggr returned error HBASE_ACCESS_ERROR(-706). Cause: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=3, exceptions:
            Mon Jan 09 01:47:21 PST 2017, RpcRetryingCaller {globalStartTime=1483954641419, pause=100, retries=3}, java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=73, waitTime=600001, operationTimeout=600000 expired.
            Mon Jan 09 01:57:21 PST 2017, RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}

            , java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=185, waitTime=600001, operationTimeout=600000 expired.
            Mon Jan 09 02:07:22 PST 2017, RpcRetryingCaller

            {globalStartTime=1483954641419, pause=100, retries=3}

            , java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=310, waitTime=600001, operationTimeout=600000 expired.

      A subsequent update statistics command succeeds, but these failures take a half hour or more.

      Enabling logging for update stats shows that getrowcount returns 0, so update stats assumes the table is small enough to do a select count . The plan for this select count (perhaps suffering from the same issue that causes getrowcount to return a non-estimate) chooses the HBase aggregate coprocessor. The table in question has 22 billion rows, so the the coprocessor isn't a good choice, and the query times out. But the real issue is, why can't the table get a rowcount estimate.

      Rerunning UPDATE STATS on this table a few hours later succeeds.

      Attachments

        Issue Links

          Activity

            People

              dbirdsall Dave Birdsall
              dbirdsall Dave Birdsall
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: