Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-9881

Rows with negative-sized keys can't be skipped by sstablescrub

Details

    • Bug
    • Status: Open
    • Low
    • Resolution: Unresolved
    • 2.1.x
    • None
    • None
    • Low

    Description

      It is possible to have corruption in such a way that scrub (on or offline) can't skip the row, so you end up in a loop where this just keeps repeating:

      WARNING: Row starting at position 2087453 is unreadable; skipping to next 
      Reading row at 2087453 
      row (unreadable key) is -1 bytes
      

      The workaround is to just delete the problem sstable since you were going to have to repair anyway, but it would still be nice to salvage the rest of the sstable.

      Attachments

        Activity

          jbellis Jonathan Ellis added a comment -

          Is this still a problem post 8099? /cc slebresne

          jbellis Jonathan Ellis added a comment - Is this still a problem post 8099? /cc slebresne

          Is this still a problem post 8099?

          I lack information to be certain but I have no reason to think 8099 helps here in any way.

          In fact, my best guess as to why this could happen is that the file is broken in a way that throws while reading the partition key but this without making any progress on the data file (the file pointer isn't moved), and there is no available index file (it is either not present at all, or we're reached the end of it). For instance, maybe the data file has actually been read entirely (and hence the index file is EOF) but there is something wrong with the file system at the end of the file. In any case, if that happens, we'll fail reading the key, write the "row (unreadable key) is -1 bytes" (the negative size is a red herring, it's -1 only because there is no index key to use for the calculation of the size), and then start again (as we don't have an index entry, we won't seek to the new row but rather just start reading from the same place).

          So it sounds like a scrub bug really and we should give up if we can't read the partition and have no index entry to skip it (which, for some reason, we don't do). The open question is what we do in that case: currently, when scrub gives up, it abort any writing and so doesn't "salvage the rest of the sstable". I suppose the assumption was that we can't log which keys we skip so we prefer not skipping any, but probably we want to revisit that.

          slebresne Sylvain Lebresne added a comment - Is this still a problem post 8099? I lack information to be certain but I have no reason to think 8099 helps here in any way. In fact, my best guess as to why this could happen is that the file is broken in a way that throws while reading the partition key but this without making any progress on the data file (the file pointer isn't moved), and there is no available index file (it is either not present at all, or we're reached the end of it). For instance, maybe the data file has actually been read entirely (and hence the index file is EOF) but there is something wrong with the file system at the end of the file. In any case, if that happens, we'll fail reading the key, write the "row (unreadable key) is -1 bytes" (the negative size is a red herring, it's -1 only because there is no index key to use for the calculation of the size), and then start again (as we don't have an index entry, we won't seek to the new row but rather just start reading from the same place). So it sounds like a scrub bug really and we should give up if we can't read the partition and have no index entry to skip it (which, for some reason, we don't do). The open question is what we do in that case: currently, when scrub gives up, it abort any writing and so doesn't "salvage the rest of the sstable". I suppose the assumption was that we can't log which keys we skip so we prefer not skipping any, but probably we want to revisit that.

          I have similar issue with 2.0.17, scrub endlessly loops:

          WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,214 OutputHandler.java (line 52) Row starting at position 17869559593 is unreadable; skipping to next
          WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,218 OutputHandler.java (line 57) Error reading row (stacktrace follows):
          java.io.IOError: java.io.IOException: Unable to read row key from data file
          at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:204)
          at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:447)
          at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:436)
          at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:63)
          at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:240)
          at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:222)
          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:745)
          Caused by: java.io.IOException: Unable to read row key from data file
          ... 10 more
          WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,218 OutputHandler.java (line 52) Row starting at position 17869559593 is unreadable; skipping to next

          close to end of sstable:
          Scrub Foo 17869559593 17875049522 bytes 99.97%

          BTW, what it means: Resolution: Unresolved / Fix Version/s: 2.1.x ? Is it fixed in 2.1 ? Will upgrade help?

          matkor Mateusz Korniak added a comment - I have similar issue with 2.0.17, scrub endlessly loops: WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,214 OutputHandler.java (line 52) Row starting at position 17869559593 is unreadable; skipping to next WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,218 OutputHandler.java (line 57) Error reading row (stacktrace follows): java.io.IOError: java.io.IOException: Unable to read row key from data file at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:204) at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:447) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:436) at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:63) at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:240) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:222) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Unable to read row key from data file ... 10 more WARN [CompactionExecutor:2980] 2015-10-14 10:40:18,218 OutputHandler.java (line 52) Row starting at position 17869559593 is unreadable; skipping to next close to end of sstable: Scrub Foo 17869559593 17875049522 bytes 99.97% BTW, what it means: Resolution: Unresolved / Fix Version/s: 2.1.x ? Is it fixed in 2.1 ? Will upgrade help?
          bevans88 Brent Evans added a comment -

          I'm also experiencing the same issue when performing a Scrub, which results in the nodetool scrub infinitely looping over the unreadable row.

          Is there any plan to further investigate this, or is the only way to resolve this to find and delete the affected SSTable?

          Cassandra version: 2.1.11

          Logs:

          WARN  [CompactionExecutor:3653] 2016-05-05 07:04:18,180 OutputHandler.java:52 - Row starting at position 809762816 is unreadable; skipping to next
          WARN  [CompactionExecutor:3653] 2016-05-05 07:04:18,184 OutputHandler.java:57 - Error reading row (stacktrace follows):
          java.io.IOError: java.io.IOException: Unable to read row key from data file
          	at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:195) ~[apache-cassandra-2.1.11.jar:2.1.11]
          	at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:654) [apache-cassandra-2.1.11.jar:2.1.11]
          	at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:94) [apache-cassandra-2.1.11.jar:2.1.11]
          	at org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:321) [apache-cassandra-2.1.11.jar:2.1.11]
          	at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:277) [apache-cassandra-2.1.11.jar:2.1.11]
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_31]
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_31]
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31]
          	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31]
          Caused by: java.io.IOException: Unable to read row key from data file
          	... 9 common frames omitted
          

          Thanks

          bevans88 Brent Evans added a comment - I'm also experiencing the same issue when performing a Scrub, which results in the nodetool scrub infinitely looping over the unreadable row. Is there any plan to further investigate this, or is the only way to resolve this to find and delete the affected SSTable? Cassandra version: 2.1.11 Logs: WARN [CompactionExecutor:3653] 2016-05-05 07:04:18,180 OutputHandler.java:52 - Row starting at position 809762816 is unreadable; skipping to next WARN [CompactionExecutor:3653] 2016-05-05 07:04:18,184 OutputHandler.java:57 - Error reading row (stacktrace follows): java.io.IOError: java.io.IOException: Unable to read row key from data file at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:195) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:654) [apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:94) [apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:321) [apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:277) [apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_31] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_31] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31] at java.lang. Thread .run( Thread .java:745) [na:1.8.0_31] Caused by: java.io.IOException: Unable to read row key from data file ... 9 common frames omitted Thanks

          I am also observing this issue on 3.11.2 . Sstables were corrupted as disk got full due to snapshots, and when i run nodetool scrub after deleting snapshots i get below WARN in infinite loop.

           

          WARN  [CompactionExecutor:2677] 2019-01-22 10:08:16,907 OutputHandler.java:52 - Row starting at position 24772608 is unreadable; skipping to next
          WARN  [CompactionExecutor:2677] 2019-01-22 10:08:16,908 OutputHandler.java:57 - Error reading row (stacktrace follows):
          java.io.IOError: java.io.IOException: Unable to read row key from data file
                  at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:216) ~
          

           

          laxmikant99 Laxmikant Upadhyay added a comment - I am also observing this issue on 3.11.2 . Sstables were corrupted as disk got full due to snapshots, and when i run nodetool scrub after deleting snapshots i get below WARN in infinite loop.   WARN  [CompactionExecutor:2677] 2019-01-22 10:08:16,907 OutputHandler.java:52 - Row starting at position 24772608 is unreadable; skipping to next WARN  [CompactionExecutor:2677] 2019-01-22 10:08:16,908 OutputHandler.java:57 - Error reading row (stacktrace follows): java.io.IOError: java.io.IOException: Unable to read row key from data file         at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:216) ~  
          tsteinmaurer Thomas Steinmaurer added a comment - - edited

          Interesting, I'm also getting this with 3.11.14 while trying to scrub a single Cassandra table, where it seems that a single physical SSTable on disk is broken. Seems to be in an infinite loop with the same log line as shown above. No progress according to "nodetool compactionstats" etc ...

          WARN  [CompactionExecutor:3252] 2022-12-14 19:29:32,206 UTC OutputHandler.java:57 - Error reading partition (unreadable key) (stacktrace follows):
          java.io.IOError: java.io.IOException: Unable to read partition key from data file
                  at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:222)
                  at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:1052)
                  at org.apache.cassandra.db.compaction.CompactionManager.access$200(CompactionManager.java:86)
                  at org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:399)
                  at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:319)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                  at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
                  at java.lang.Thread.run(Thread.java:750)
          Caused by: java.io.IOException: Unable to read partition key from data file
          
          tsteinmaurer Thomas Steinmaurer added a comment - - edited Interesting, I'm also getting this with 3.11.14 while trying to scrub a single Cassandra table, where it seems that a single physical SSTable on disk is broken. Seems to be in an infinite loop with the same log line as shown above. No progress according to "nodetool compactionstats" etc ... WARN [CompactionExecutor:3252] 2022-12-14 19:29:32,206 UTC OutputHandler.java:57 - Error reading partition (unreadable key) (stacktrace follows): java.io.IOError: java.io.IOException: Unable to read partition key from data file at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:222) at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:1052) at org.apache.cassandra.db.compaction.CompactionManager.access$200(CompactionManager.java:86) at org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:399) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:319) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Unable to read partition key from data file

          People

            Unassigned Unassigned
            brandon.williams Brandon Williams
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: