[CASSANDRA-12796] Heap exhaustion when rebuilding secondary index over a table with wide partitions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Urgent
Resolution: Fixed
Fix Version/s: 2.2.9, 3.0.11, 3.10
Component/s: Feature/2i Index, Legacy/Core
Labels:
None

Severity:
Critical
Since Version:

2.2.7

Description

We have a table with rather wide partition and a secondary index defined over it. As soon as we try to rebuild the index we observed exhaustion of Java heap and eventual OOM error. After a lengthy investigation we have managed to find a culprit which appears to be a wrong granule of barrier issuances in method org.apache.cassandra.db.Keyspace.indexRow:

        try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
        {
            Set<SecondaryIndex> indexes = cfs.indexManager.getIndexesByNames(idxNames);

            Iterator<ColumnFamily> pager = QueryPagers.pageRowLocally(cfs, key.getKey(), DEFAULT_PAGE_SIZE);
            while (pager.hasNext())
            {
                ColumnFamily cf = pager.next();
                ColumnFamily cf2 = cf.cloneMeShallow();
                for (Cell cell : cf)
                {
                    if (cfs.indexManager.indexes(cell.name(), indexes))
                        cf2.addColumn(cell);
                }
                cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
            }
        }

Please note the operation group granule is a partition of the source table which poses a problem for wide partition tables as flush runnable (org.apache.cassandra.db.ColumnFamilyStore.Flush.run()) won't proceed with flushing secondary index memtable before completing operations prior recent issue of the barrier. In our situation the flush runnable waits until whole wide partition gets indexed into the secondary index memtable before flushing it. This causes an exhaustion of the heap and eventual OOM error.

After we changed granule of barrier issue in method org.apache.cassandra.db.Keyspace.indexRow to query page as opposed to table partition secondary index (see https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified), rebuild started to work without heap exhaustion.

Attachments

Issue Links

links to

GitHub Pull Request #83

Activity

People

Assignee:: Sam Tunnicliffe

Reporter:: Milan Majercik

Authors:: Sam Tunnicliffe

Reviewers:: Jeremiah Jordan

Votes:: 2 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 17/Oct/16 09:10

Updated:: 16/Apr/19 09:30

Resolved:: 13/Dec/16 10:41