[CASSANDRA-9881] Rows with negative-sized keys can't be skipped by sstablescrub - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Low
Resolution: Unresolved
Fix Version/s: 2.1.x
Component/s: None
Labels:
None

Severity:
Low

Description

It is possible to have corruption in such a way that scrub (on or offline) can't skip the row, so you end up in a loop where this just keeps repeating:

WARNING: Row starting at position 2087453 is unreadable; skipping to next 
Reading row at 2087453 
row (unreadable key) is -1 bytes

The workaround is to just delete the problem sstable since you were going to have to repair anyway, but it would still be nice to salvage the rest of the sstable.

Attachments

Activity

Ascending order - Click to sort in descending order

Jonathan Ellis added a comment - 31/Aug/15 15:44

Is this still a problem post 8099? /cc slebresne

Jonathan Ellis added a comment - 31/Aug/15 15:44 Is this still a problem post 8099? /cc slebresne

Sylvain Lebresne added a comment - 01/Sep/15 09:56

Is this still a problem post 8099?

I lack information to be certain but I have no reason to think 8099 helps here in any way.

In fact, my best guess as to why this could happen is that the file is broken in a way that throws while reading the partition key but this without making any progress on the data file (the file pointer isn't moved), and there is no available index file (it is either not present at all, or we're reached the end of it). For instance, maybe the data file has actually been read entirely (and hence the index file is EOF) but there is something wrong with the file system at the end of the file. In any case, if that happens, we'll fail reading the key, write the "row (unreadable key) is -1 bytes" (the negative size is a red herring, it's -1 only because there is no index key to use for the calculation of the size), and then start again (as we don't have an index entry, we won't seek to the new row but rather just start reading from the same place).

So it sounds like a scrub bug really and we should give up if we can't read the partition and have no index entry to skip it (which, for some reason, we don't do). The open question is what we do in that case: currently, when scrub gives up, it abort any writing and so doesn't "salvage the rest of the sstable". I suppose the assumption was that we can't log which keys we skip so we prefer not skipping any, but probably we want to revisit that.

Sylvain Lebresne added a comment - 01/Sep/15 09:56 Is this still a problem post 8099? I lack information to be certain but I have no reason to think 8099 helps here in any way. In fact, my best guess as to why this could happen is that the file is broken in a way that throws while reading the partition key but this without making any progress on the data file (the file pointer isn't moved), and there is no available index file (it is either not present at all, or we're reached the end of it). For instance, maybe the data file has actually been read entirely (and hence the index file is EOF) but there is something wrong with the file system at the end of the file. In any case, if that happens, we'll fail reading the key, write the "row (unreadable key) is -1 bytes" (the negative size is a red herring, it's -1 only because there is no index key to use for the calculation of the size), and then start again (as we don't have an index entry, we won't seek to the new row but rather just start reading from the same place). So it sounds like a scrub bug really and we should give up if we can't read the partition and have no index entry to skip it (which, for some reason, we don't do). The open question is what we do in that case: currently, when scrub gives up, it abort any writing and so doesn't "salvage the rest of the sstable". I suppose the assumption was that we can't log which keys we skip so we prefer not skipping any, but probably we want to revisit that.

Mateusz Korniak added a comment - 14/Oct/15 09:51

I have similar issue with 2.0.17, scrub endlessly loops:

WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,214 OutputHandler.java (line 52) Row starting at position 17869559593 is unreadable; skipping to next
WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,218 OutputHandler.java (line 57) Error reading row (stacktrace follows):
java.io.IOError: java.io.IOException: Unable to read row key from data file
at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:204)
at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:447)
at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:436)
at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:63)
at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:240)
at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to read row key from data file
... 10 more
WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,218 OutputHandler.java (line 52) Row starting at position 17869559593 is unreadable; skipping to next

close to end of sstable:
Scrub Foo 17869559593 17875049522 bytes 99.97%

BTW, what it means: Resolution: Unresolved / Fix Version/s: 2.1.x ? Is it fixed in 2.1 ? Will upgrade help?

Mateusz Korniak added a comment - 14/Oct/15 09:51 I have similar issue with 2.0.17, scrub endlessly loops: WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,214 OutputHandler.java (line 52) Row starting at position 17869559593 is unreadable; skipping to next WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,218 OutputHandler.java (line 57) Error reading row (stacktrace follows): java.io.IOError: java.io.IOException: Unable to read row key from data file at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:204) at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:447) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:436) at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:63) at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:240) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:222) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Unable to read row key from data file ... 10 more WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,218 OutputHandler.java (line 52) Row starting at position 17869559593 is unreadable; skipping to next close to end of sstable: Scrub Foo 17869559593 17875049522 bytes 99.97% BTW, what it means: Resolution: Unresolved / Fix Version/s: 2.1.x ? Is it fixed in 2.1 ? Will upgrade help?

Brent Evans added a comment - 05/May/16 09:49

I'm also experiencing the same issue when performing a Scrub, which results in the nodetool scrub infinitely looping over the unreadable row.

Is there any plan to further investigate this, or is the only way to resolve this to find and delete the affected SSTable?

Cassandra version: 2.1.11

Logs:

WARN  [CompactionExecutor:3653] 2016-05-05 07:04:18,180 OutputHandler.java:52 - Row starting at position 809762816 is unreadable; skipping to next
WARN  [CompactionExecutor:3653] 2016-05-05 07:04:18,184 OutputHandler.java:57 - Error reading row (stacktrace follows):
java.io.IOError: java.io.IOException: Unable to read row key from data file
	at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:195) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:654) [apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:94) [apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:321) [apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:277) [apache-cassandra-2.1.11.jar:2.1.11]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_31]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_31]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31]
Caused by: java.io.IOException: Unable to read row key from data file
	... 9 common frames omitted

Thanks

Brent Evans added a comment - 05/May/16 09:49 I'm also experiencing the same issue when performing a Scrub, which results in the nodetool scrub infinitely looping over the unreadable row. Is there any plan to further investigate this, or is the only way to resolve this to find and delete the affected SSTable? Cassandra version: 2.1.11 Logs: WARN [CompactionExecutor:3653] 2016-05-05 07:04:18,180 OutputHandler.java:52 - Row starting at position 809762816 is unreadable; skipping to next WARN [CompactionExecutor:3653] 2016-05-05 07:04:18,184 OutputHandler.java:57 - Error reading row (stacktrace follows): java.io.IOError: java.io.IOException: Unable to read row key from data file at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:195) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:654) [apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:94) [apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:321) [apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:277) [apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_31] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_31] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31] at java.lang. Thread .run( Thread .java:745) [na:1.8.0_31] Caused by: java.io.IOException: Unable to read row key from data file ... 9 common frames omitted Thanks

Laxmikant Upadhyay added a comment - 22/Jan/19 11:07

I am also observing this issue on 3.11.2 . Sstables were corrupted as disk got full due to snapshots, and when i run nodetool scrub after deleting snapshots i get below WARN in infinite loop.

WARN  [CompactionExecutor:2677] 2019-01-22 10:08:16,907 OutputHandler.java:52 - Row starting at position 24772608 is unreadable; skipping to next
WARN  [CompactionExecutor:2677] 2019-01-22 10:08:16,908 OutputHandler.java:57 - Error reading row (stacktrace follows):
java.io.IOError: java.io.IOException: Unable to read row key from data file
        at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:216) ~

Laxmikant Upadhyay added a comment - 22/Jan/19 11:07 I am also observing this issue on 3.11.2 . Sstables were corrupted as disk got full due to snapshots, and when i run nodetool scrub after deleting snapshots i get below WARN in infinite loop. WARN [CompactionExecutor:2677] 2019-01-22 10:08:16,907 OutputHandler.java:52 - Row starting at position 24772608 is unreadable; skipping to next WARN [CompactionExecutor:2677] 2019-01-22 10:08:16,908 OutputHandler.java:57 - Error reading row (stacktrace follows): java.io.IOError: java.io.IOException: Unable to read row key from data file at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:216) ~

Thomas Steinmaurer added a comment - 14/Dec/22 18:59 - edited

Interesting, I'm also getting this with 3.11.14 while trying to scrub a single Cassandra table, where it seems that a single physical SSTable on disk is broken. Seems to be in an infinite loop with the same log line as shown above. No progress according to "nodetool compactionstats" etc ...

WARN  [CompactionExecutor:3252] 2022-12-14 19:29:32,206 UTC OutputHandler.java:57 - Error reading partition (unreadable key) (stacktrace follows):
java.io.IOError: java.io.IOException: Unable to read partition key from data file
        at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:222)
        at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:1052)
        at org.apache.cassandra.db.compaction.CompactionManager.access$200(CompactionManager.java:86)
        at org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:399)
        at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:319)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Unable to read partition key from data file

Thomas Steinmaurer added a comment - 14/Dec/22 18:59 - edited Interesting, I'm also getting this with 3.11.14 while trying to scrub a single Cassandra table, where it seems that a single physical SSTable on disk is broken. Seems to be in an infinite loop with the same log line as shown above. No progress according to "nodetool compactionstats" etc ... WARN [CompactionExecutor:3252] 2022-12-14 19:29:32,206 UTC OutputHandler.java:57 - Error reading partition (unreadable key) (stacktrace follows): java.io.IOError: java.io.IOException: Unable to read partition key from data file at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:222) at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:1052) at org.apache.cassandra.db.compaction.CompactionManager.access$200(CompactionManager.java:86) at org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:399) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:319) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Unable to read partition key from data file

People

Assignee:: Unassigned

Reporter:: Brandon Williams

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 23/Jul/15 18:39

Updated:: 15/Dec/22 06:12

Apache Cassandra