[DERBY-7034] Derby's sync() handling can lead to database corruption (at least on Linux) - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.14.2.0
Fix Version/s: None
Component/s: Store
Labels:
None

Urgency:
Normal

Description

I recently read about "fsyncgate 2018" that the Postgres team raised: https://wiki.postgresql.org/wiki/Fsync_Errors. https://lwn.net/Articles/752063/ has a good overview of the issue relating to fsync() behaviour on Linux. The short summary is on some versions of Linux if you retry fsync() after it failed, it will succeed and you will end up with corrupted data on disk.

At a quick glance at the Derby code, I have already seen two places where sync() is retried in a loop which is clearly dangerous. There could be other areas too.

In LogAccessFile:

    /**
     * Guarantee all writes up to the last call to flushLogAccessFile on disk.
     * <p>
     * A call for clients of LogAccessFile to insure that all data written
     * up to the last call to flushLogAccessFile() are written to disk.
     * This call will not return until those writes have hit disk.
     * <p>
     * Note that this routine may block waiting for I/O to complete so 
     * callers should limit the number of resource held locked while this
     * operation is called.  It is expected that the caller
     * Note that this routine only "writes" the data to the file, this does not
     * mean that the data has been synced to disk.  The only way to insure that
     * is to first call switchLogBuffer() and then follow by a call of sync().
     *
     **/
    public void syncLogAccessFile() 
        throws IOException, StandardException
    {
        for( int i=0; ; )
        {
            // 3311: JVM sync call sometimes fails under high load against NFS 
            // mounted disk.  We re-try to do this 20 times.
            try
            {
                synchronized( this)
                {
                    log.sync();
                }

                // the sync succeed, so return
                break;
            }
            catch( SyncFailedException sfe )
            {
                i++;
                try
                {
                    // wait for .2 of a second, hopefully I/O is done by now
                    // we wait a max of 4 seconds before we give up
                    Thread.sleep( 200 ); 
                }
                catch( InterruptedException ie )
                {
                    InterruptStatus.setInterrupted();
                }

                if( i > 20 )
                    throw StandardException.newException(
                        SQLState.LOG_FULL, sfe);
            }
        }
    }

And LogToFile has similar retry code.. but without handling for SyncFailedException:

    /**
     * Utility routine to call sync() on the input file descriptor.
     * <p> 
    */
    private void syncFile( StorageRandomAccessFile raf) 
        throws StandardException
    {
        for( int i=0; ; )
        {
            // 3311: JVM sync call sometimes fails under high load against NFS 
            // mounted disk.  We re-try to do this 20 times.
            try
            {
                raf.sync();

                // the sync succeed, so return
                break;
            }
            catch (IOException ioe)
            {
                i++;
                try
                {
                    // wait for .2 of a second, hopefully I/O is done by now
                    // we wait a max of 4 seconds before we give up
                    Thread.sleep(200);
                }
                catch( InterruptedException ie )
                {   
                    InterruptStatus.setInterrupted();
                }

                if( i > 20 )
                {
                    throw StandardException.newException(
                                SQLState.LOG_FULL, ioe);
                }
            }
        }
    }

It seems Postgres, MySQL and MongoDB have already changed their code to "panic" if an error comes from an fsync() call.

There is a lot more complexities with how fsync() reports errors (if at all). It is worth getting into it further as I am not familiar with Derby's internals and how affected it could be by this.

Interestingly people have indicated this issue is more likely to happen for network filesystems (since write failures are more common due to the network going down) and in the past it was easy just to say "NFS is broken".. but in actual fact the problem was in some cases with fsync() and how it was called in a loop.

I've been trying to find out if Windows has similar issues without much luck. But given the mysterious corruption issues I have seen on the past with Windows/CIFS.. I do wonder if this is related somehow.

Derby's sync() handling can lead to database corruption (at least on Linux)

Details

Description

Attachments

Activity

People

Dates