Details

    Description

      Apache Ignite 3, rev. 8d6264af0edb9752e2239f9b686abe5580056862 (Nov 16 2023)

      Benchmark: https://github.com/cmu-db/benchbase/tree/main/src/main/java/com/oltpbenchmark/benchmarks/tpch 

      Setup

      • 1 Ignite 3 server node, raft.fsync = false
      • TPC-H with scale factor = 0.1

      Steps

      1. Start an Ignite 3 node
      2. Run benchbase with -s 1 --create=true --load=true --execute=false to preload data
      3. Observe via the benchbase log that the data was successfully loaded
      4. Run benchbase with -s 1 --create=false --load=false --execute=true to run the benchmark

      Expected result

      The benchmark finishes after warmup + duration time

      Actual result

      The benchmark hangs on Q21 query:

       

      SELECT
         cntrycode,
         COUNT(*) AS numcust,
         SUM(c_acctbal) AS totacctbal
      FROM
         (
            SELECT
               SUBSTRING(c_phone FROM 1 FOR 2) AS cntrycode,
               c_acctbal
            FROM
               customer
            WHERE
               SUBSTRING(c_phone FROM 1 FOR 2) IN (?, ?, ?, ?, ?, ?, ?)
               AND c_acctbal >
               (
                   SELECT
                      AVG(c_acctbal)
                   FROM
                      customer
                   WHERE
                      c_acctbal > 0.00
                      AND SUBSTRING(c_phone FROM 1 FOR 2) IN (?, ?, ?, ?, ?, ?, ?)
               )
               AND NOT EXISTS
               (
                   SELECT
                      *
                   FROM
                      orders
                   WHERE
                      o_custkey = c_custkey
               )
         )
         AS custsale
      GROUP BY
         cntrycode
      ORDER BY
         cntrycode
      

       

      Hung client thread:

       

      "TPCHWorker<000>" #61 prio=5 os_prio=0 cpu=13.83ms elapsed=3064.60s allocated=2668K defined_classes=46 tid=0x00007f9c4d2d73b0 nid=0x2703 waiting on condition  [0x00007f9b662fd000]
         java.lang.Thread.State: WAITING (parking)
      	at jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
      	- parking to wait for  <0x00000006347cb998> (a java.util.concurrent.CompletableFuture$Signaller)
      	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.7/LockSupport.java:211)
      	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.7/CompletableFuture.java:1864)
      	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.7/ForkJoinPool.java:3463)
      	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.7/ForkJoinPool.java:3434)
      	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.7/CompletableFuture.java:1898)
      	at java.util.concurrent.CompletableFuture.get(java.base@17.0.7/CompletableFuture.java:2072)
      	at org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:139)
      	at org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeWithArguments(JdbcPreparedStatement.java:765)
      	at org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:116)
      	at com.oltpbenchmark.benchmarks.tpch.procedures.GenericQuery.run(GenericQuery.java:39)
      	at com.oltpbenchmark.benchmarks.tpch.TPCHWorker.executeWork(TPCHWorker.java:44)
      	at com.oltpbenchmark.api.Worker.doWork(Worker.java:416)
      	at com.oltpbenchmark.api.Worker.run(Worker.java:282)
      	at java.lang.Thread.run(java.base@17.0.7/Thread.java:833)

      Logs, configs:

       

      logs.zip

      Attachments

        1. logs.zip
          741 kB
          Ivan Artiukhov

        Issue Links

          Activity

            jooger , mzhuravkov folks, do a review please.

            korlov Konstantin Orlov added a comment - jooger , mzhuravkov folks, do a review please.

            Merged to main.

            The current state is as follow:

            • The system is not ready yet to handle TPC-H on scalefactor=0.1, thus I analysed query performance on scalefactor=0.01 on 3 node lab, and on scalefactor=0.001 locally on laptop (results of local runs are attached to the PR's)
            • On the lab (scalefactor=0.01), the queries average time is less than 1 second with 2 exceptions: the average time of q21 exceeds 1 second (but the query itself doesn't hang now) and q2 hangs completely (I filed another ticket to deal with this)
            korlov Konstantin Orlov added a comment - Merged to main . The current state is as follow: The system is not ready yet to handle TPC-H on scalefactor=0.1, thus I analysed query performance on scalefactor=0.01 on 3 node lab, and on scalefactor=0.001 locally on laptop (results of local runs are attached to the PR's) On the lab (scalefactor=0.01), the queries average time is less than 1 second with 2 exceptions: the average time of q21 exceeds 1 second (but the query itself doesn't hang now) and q2 hangs completely (I filed another ticket to deal with this)

            People

              korlov Konstantin Orlov
              Artukhov Ivan Artiukhov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m