Cassandra
  1. Cassandra
  2. CASSANDRA-3758

parallel compaction hang (on large rows?)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Fix Version/s: None
    • Component/s: Core

      Description

      it is observed that:

      nodetool -h 127.0.0.1 -p 8080 compactionstats
      pending tasks: 1
      compaction type keyspace column family bytes compacted bytes total progress
      Compaction SyncCoreComputedContactNetworks 119739938 0 n/a

      and that is not moving (ie the bytes compacted never increase, the bytes total stay 0).

      this is probably going to be difficult to reproduce, as the problem is observed when compacting 15 large sstables (total ~300G).

      attaching the thread dumps (along with logs), when such happen.

        Activity

        Hide
        Sylvain Lebresne added a comment -

        Resolving as dupe of CASSANDRA-3711. Please reopen if you run into this with a version that have the CASSANDRA-3711 fix.

        Show
        Sylvain Lebresne added a comment - Resolving as dupe of CASSANDRA-3711 . Please reopen if you run into this with a version that have the CASSANDRA-3711 fix.
        Hide
        ruslan.usifov added a comment -

        We use 0.8.10.

        And

        nodetool -h localhost compactionstats
        pending tasks: 1

        Dissapear only after cassandra restart. This happens only ones, without no any warning or Exceptions in logs

        Show
        ruslan.usifov added a comment - We use 0.8.10. And nodetool -h localhost compactionstats pending tasks: 1 Dissapear only after cassandra restart. This happens only ones, without no any warning or Exceptions in logs
        Hide
        Jonathan Ellis added a comment -

        parallel compaction does not exist in 0.8, so you must be seeing something different. are you on the latest 0.8 release?

        Show
        Jonathan Ellis added a comment - parallel compaction does not exist in 0.8, so you must be seeing something different. are you on the latest 0.8 release?
        Hide
        ruslan.usifov added a comment - - edited

        Looks like on 0.8 we have the same problem. This happens when we run nodetool cleanup, after adding capacity to cluster

        Show
        ruslan.usifov added a comment - - edited Looks like on 0.8 we have the same problem. This happens when we run nodetool cleanup, after adding capacity to cluster
        Hide
        Sylvain Lebresne added a comment -

        I assume they are using a version from before CASSANDRA-3711.

        I remembered we had something like that but wasn't able to find the issue. You're probably right they ran into this, and maybe that happen to have something to do with the hanging too (even though the lack of any error is kind of weird).

        I've committed the total bytes count fix. I wonder if we should try to close this one on the idea that it would be a consequence of CASSANDRA-3711 and just let someone reopen if he can reproduce with 1.0.7+ ?

        Show
        Sylvain Lebresne added a comment - I assume they are using a version from before CASSANDRA-3711 . I remembered we had something like that but wasn't able to find the issue. You're probably right they ran into this, and maybe that happen to have something to do with the hanging too (even though the lack of any error is kind of weird). I've committed the total bytes count fix. I wonder if we should try to close this one on the idea that it would be a consequence of CASSANDRA-3711 and just let someone reopen if he can reproduce with 1.0.7+ ?
        Hide
        Jonathan Ellis added a comment -

        there seems to be tons of CompactionReducer threads, coming for lots of different ParallelCompactionIterable, which would suggest we don't shutdown the executor of CompactionReducer correctly. But I don't see why that would happen

        I assume they are using a version from before CASSANDRA-3711.

        attaching a patch to fix that problem

        +1

        Show
        Jonathan Ellis added a comment - there seems to be tons of CompactionReducer threads, coming for lots of different ParallelCompactionIterable, which would suggest we don't shutdown the executor of CompactionReducer correctly. But I don't see why that would happen I assume they are using a version from before CASSANDRA-3711 . attaching a patch to fix that problem +1
        Hide
        Sylvain Lebresne added a comment -

        I'm not sure what's going on here. I went back over the parallel compaction code and didn't saw any obvious problem. I'm not sure it'll be easy to fix without being able to repro.

        I'm also not completely sure what to make of the provided thread dump. Is that only one giant thread dump? If so, there seems to be tons of CompactionReducer threads, coming for lots of different ParallelCompactionIterable, which would suggest we don't shutdown the executor of CompactionReducer correctly. But I don't see why that would happen.

        At least, what is annoying is that the reporting of the total bytes to compact is buggy for parallel compactions (if it wasn't we could tell more precisely when during the compaction the hanging occured). So attaching a patch to fix that problem.

        Show
        Sylvain Lebresne added a comment - I'm not sure what's going on here. I went back over the parallel compaction code and didn't saw any obvious problem. I'm not sure it'll be easy to fix without being able to repro. I'm also not completely sure what to make of the provided thread dump. Is that only one giant thread dump? If so, there seems to be tons of CompactionReducer threads, coming for lots of different ParallelCompactionIterable, which would suggest we don't shutdown the executor of CompactionReducer correctly. But I don't see why that would happen. At least, what is annoying is that the reporting of the total bytes to compact is buggy for parallel compactions (if it wasn't we could tell more precisely when during the compaction the hanging occured). So attaching a patch to fix that problem.
        Hide
        Jackson Chung added a comment -

        confirmed that disable multithreaded_compaction allows compaction to finish

        Show
        Jackson Chung added a comment - confirmed that disable multithreaded_compaction allows compaction to finish
        Hide
        Jackson Chung added a comment -

        ps: i had a problem unzip it originally, but getaround by gunzip -c [file] > /path/to/file .. just in case

        Show
        Jackson Chung added a comment - ps: i had a problem unzip it originally, but getaround by gunzip -c [file] > /path/to/file .. just in case

          People

          • Assignee:
            Unassigned
            Reporter:
            Jackson Chung
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development