Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0, Trunk
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      A transaction log is needed for durability of updates, for a more performant realtime-get, and for replaying updates to recovering peers.

      1. SOLR-2700.patch
        3 kB
        Noble Paul
      2. SOLR-2700.patch
        60 kB
        Yonik Seeley
      3. SOLR-2700.patch
        51 kB
        Yonik Seeley
      4. SOLR-2700.patch
        48 kB
        Yonik Seeley
      5. SOLR-2700.patch
        49 kB
        Yonik Seeley
      6. SOLR-2700.patch
        48 kB
        Yonik Seeley
      7. SOLR-2700.patch
        38 kB
        Yonik Seeley
      8. SOLR-2700.patch
        32 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.
          Hide
          Otis Gospodnetic added a comment -

          Mark Miller & Yonik Seeley - I think this is closable?

          Show
          Otis Gospodnetic added a comment - Mark Miller & Yonik Seeley - I think this is closable?
          Hide
          Robert Muir added a comment -

          Unassigned issues -> 4.1

          Show
          Robert Muir added a comment - Unassigned issues -> 4.1
          Hide
          Mark Miller added a comment -

          I think this one just needs a changes entry to be resolved.

          Show
          Mark Miller added a comment - I think this one just needs a changes entry to be resolved.
          Hide
          Noble Paul added a comment -

          Serialize the strings to a meta file (.tlm = transaction log meta) before the add is complete

          Show
          Noble Paul added a comment - Serialize the strings to a meta file (.tlm = transaction log meta) before the add is complete
          Hide
          Mark Miller added a comment -

          Now it looks like realtime get returns docs even if they are not going to end up being indexed (rolled back) until next commit is issued.

          Yup - NRT does the same.

          Show
          Mark Miller added a comment - Now it looks like realtime get returns docs even if they are not going to end up being indexed (rolled back) until next commit is issued. Yup - NRT does the same.
          Hide
          Yonik Seeley added a comment -

          I meant this: http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22

          Right - that's what I mean too (since all that really does is call Lucene's IW.rollback).
          It's a low-level operation I'm not particularly fond of - to most clients, when a commit happened should be relatively arbitrary (esp when going into the world of NRT).

          Show
          Yonik Seeley added a comment - I meant this: http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 Right - that's what I mean too (since all that really does is call Lucene's IW.rollback). It's a low-level operation I'm not particularly fond of - to most clients, when a commit happened should be relatively arbitrary (esp when going into the world of NRT).
          Hide
          Sami Siren added a comment -

          I meant this: http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 I also thought things might start to get hairy if rollbacks are to be supported when multiple nodes are involved. Now it looks like realtime get returns docs even if they are not going to end up being indexed (rolled back) until next commit is issued.

          Show
          Sami Siren added a comment - I meant this: http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 I also thought things might start to get hairy if rollbacks are to be supported when multiple nodes are involved. Now it looks like realtime get returns docs even if they are not going to end up being indexed (rolled back) until next commit is issued.
          Hide
          Yonik Seeley added a comment -

          What about rollbacks? Are they going to be supported too?

          via Lucene's IW.rollback? I don't think so... Rolling back assumes one client has complete control over the index and the commits.

          Show
          Yonik Seeley added a comment - What about rollbacks? Are they going to be supported too? via Lucene's IW.rollback? I don't think so... Rolling back assumes one client has complete control over the index and the commits.
          Hide
          Sami Siren added a comment -

          What about rollbacks? Are they going to be supported too?

          Show
          Sami Siren added a comment - What about rollbacks? Are they going to be supported too?
          Hide
          Yonik Seeley added a comment -

          The transaction logging as it is done currently does not provide durability.

          Correct - that's up next (I actually have it in a local copy).

          I don't see the headers being written (globalStrings) during add().

          Thanks! That would make the read-side difficult

          Show
          Yonik Seeley added a comment - The transaction logging as it is done currently does not provide durability. Correct - that's up next (I actually have it in a local copy). I don't see the headers being written (globalStrings) during add(). Thanks! That would make the read-side difficult
          Hide
          Noble Paul added a comment -

          The transaction logging as it is done currently does not provide durability.
          if the server crashed in between , will it be able to recover from the transaction log ?
          I don't see the headers being written (globalStrings) during add().

          Show
          Noble Paul added a comment - The transaction logging as it is done currently does not provide durability. if the server crashed in between , will it be able to recover from the transaction log ? I don't see the headers being written (globalStrings) during add().
          Hide
          Mike Anderson added a comment -

          Will the transaction log be available via API? It would be very useful for application debugging if it were possible to query a record's transaction log and see a history of updates.

          Show
          Mike Anderson added a comment - Will the transaction log be available via API? It would be very useful for application debugging if it were possible to query a record's transaction log and see a history of updates.
          Hide
          Jason Rutherglen added a comment -

          This is going to best be amazing, I wonder if other projects have already implemented these features years ago?

          Show
          Jason Rutherglen added a comment - This is going to best be amazing, I wonder if other projects have already implemented these features years ago?
          Hide
          Yonik Seeley added a comment -

          I'm not sure how this feature makes any sense, the documents are already being serialized to disk, eg, to the docstore by StoredFieldsWriter.

          If the only use for a transaction log were realtime-get, I would agree. But we have many more uses planned, so this is just a stepping stone (and a nice little feature along the way) to where we are going.

          Show
          Yonik Seeley added a comment - I'm not sure how this feature makes any sense, the documents are already being serialized to disk, eg, to the docstore by StoredFieldsWriter. If the only use for a transaction log were realtime-get, I would agree. But we have many more uses planned, so this is just a stepping stone (and a nice little feature along the way) to where we are going.
          Hide
          Jason Rutherglen added a comment -

          I'm not sure how this feature makes any sense, the documents are already being serialized to disk, eg, to the docstore by StoredFieldsWriter. Now the system will be serializing the exact same documents twice, that is extremely redundant.

          Show
          Jason Rutherglen added a comment - I'm not sure how this feature makes any sense, the documents are already being serialized to disk, eg, to the docstore by StoredFieldsWriter. Now the system will be serializing the exact same documents twice, that is extremely redundant.
          Hide
          Yonik Seeley added a comment -

          OK, I think we're getting close to committing now.

          Urggg - scratch that. At some point in the past, some of the asserts were commented out to aid in debugging and I never re-enabled them. The realtime-get test now fails, so I need to dig into that again.

          Show
          Yonik Seeley added a comment - OK, I think we're getting close to committing now. Urggg - scratch that. At some point in the past, some of the asserts were commented out to aid in debugging and I never re-enabled them. The realtime-get test now fails, so I need to dig into that again.
          Hide
          Yonik Seeley added a comment -

          OK, I think we're getting close to committing now.
          Among other things, this latest version adds the abstract UpdateLog class with a NullUpdateLog and FSUpdateLog subclasses, adds an updateHandler/updateLog section solrconfig.xml, and allows one to specify the log directory.

          Currently the default is NullUpdateLog.

          Show
          Yonik Seeley added a comment - OK, I think we're getting close to committing now. Among other things, this latest version adds the abstract UpdateLog class with a NullUpdateLog and FSUpdateLog subclasses, adds an updateHandler/updateLog section solrconfig.xml, and allows one to specify the log directory. Currently the default is NullUpdateLog.
          Hide
          Yonik Seeley added a comment -

          Here's an update that uses a fixed set of external strings in the javabin codec to ret and avoid repeating all of the field names in the logs. This drops the indexing penalty to 28% slower in this specific test, and decreases the transaction log size to 974M.

          Show
          Yonik Seeley added a comment - Here's an update that uses a fixed set of external strings in the javabin codec to ret and avoid repeating all of the field names in the logs. This drops the indexing penalty to 28% slower in this specific test, and decreases the transaction log size to 974M.
          Hide
          Yonik Seeley added a comment -

          it seems you are doing blocking writes which might not be ideal here at all.

          Yeah, I had eventually planned on concurrent writes... we're already doing concurrent reads, which seemed more important.

          What we need to store in main memory is the offset and the length to do the realtime get here.

          Right, that's what we're currently storing (just the offset).

          If not I think we should use a faster hand written serialization instead of java serialization which is proven to be freaking slow.

          "javabin" is misleading - it does not use Java serialization (and is both much faster and more compact).

          Another totally different idea for the RT get is to spend more time on a RAM Reader that is capable of doing exactSeeks on the anyway used BytesRefHash.

          Yeah, but it seems like we'd still need the transaction log stuff anyway (for durability and to helping a peer recover.)

          Show
          Yonik Seeley added a comment - it seems you are doing blocking writes which might not be ideal here at all. Yeah, I had eventually planned on concurrent writes... we're already doing concurrent reads, which seemed more important. What we need to store in main memory is the offset and the length to do the realtime get here. Right, that's what we're currently storing (just the offset). If not I think we should use a faster hand written serialization instead of java serialization which is proven to be freaking slow. "javabin" is misleading - it does not use Java serialization (and is both much faster and more compact). Another totally different idea for the RT get is to spend more time on a RAM Reader that is capable of doing exactSeeks on the anyway used BytesRefHash. Yeah, but it seems like we'd still need the transaction log stuff anyway (for durability and to helping a peer recover.)
          Hide
          Simon Willnauer added a comment -

          Just to get a rough idea of performance, I uploaded one of my CSV test files (765MB, 100M docs, 7 small string fields per doc).
          Time to complete indexing was 42% longer, and the transaction log grew to 1.8GB. The lucene index was 1.2GB. The log was on the same device, so the main impact may have been disk IO.

          I think this is far from what we can really do here. I didn't look too close at the code yet but it seems you are doing blocking writes which might not be ideal here at all. I think what you can do here is to allocate the space you need per record and write concurrently on a Channel (see FileChannel#write(ByteBuffer src, long position)), the same is true for reads (FileChannel#read(ByteBuffer dst, long position)). What we need to store in main memory is the offset and the length to do the realtime get here.

          To take that one step further it might be good keep around the already serialized data if possible so if binary update is used can we piggyback the bytes in the SolrInputDocument somehow? If not I think we should use a faster hand written serialization instead of java serialization which is proven to be freaking slow.

          Another totally different idea for the RT get is to spend more time on a RAM Reader that is capable of doing exactSeeks on the anyway used BytesRefHash. I don't thinks this would be too far away since the biggest problem here is to provide an efficiently sorted dictionary. maybe this should be a long term goal for the RT Get feature.

          Since we are already doing Write Behind here we could also try to use some compression especially if the source data is large, not sure if that will pay off though since we are not keeping the logs around forever.

          Eventually I think this should be a feature that lives outside of solr since many Lucene applications could make use of it. ElasticSearch for instance uses pretty similar features which could be adopted to something like a DurableIndexWriter wrapper.

          Show
          Simon Willnauer added a comment - Just to get a rough idea of performance, I uploaded one of my CSV test files (765MB, 100M docs, 7 small string fields per doc). Time to complete indexing was 42% longer, and the transaction log grew to 1.8GB. The lucene index was 1.2GB. The log was on the same device, so the main impact may have been disk IO. I think this is far from what we can really do here. I didn't look too close at the code yet but it seems you are doing blocking writes which might not be ideal here at all. I think what you can do here is to allocate the space you need per record and write concurrently on a Channel (see FileChannel#write(ByteBuffer src, long position)), the same is true for reads (FileChannel#read(ByteBuffer dst, long position)). What we need to store in main memory is the offset and the length to do the realtime get here. To take that one step further it might be good keep around the already serialized data if possible so if binary update is used can we piggyback the bytes in the SolrInputDocument somehow? If not I think we should use a faster hand written serialization instead of java serialization which is proven to be freaking slow. Another totally different idea for the RT get is to spend more time on a RAM Reader that is capable of doing exactSeeks on the anyway used BytesRefHash. I don't thinks this would be too far away since the biggest problem here is to provide an efficiently sorted dictionary. maybe this should be a long term goal for the RT Get feature. Since we are already doing Write Behind here we could also try to use some compression especially if the source data is large, not sure if that will pay off though since we are not keeping the logs around forever. Eventually I think this should be a feature that lives outside of solr since many Lucene applications could make use of it. ElasticSearch for instance uses pretty similar features which could be adopted to something like a DurableIndexWriter wrapper.
          Hide
          Jason Rutherglen added a comment -

          Typically a transaction log configured to be written to a different hard drive than the indexes / database.

          Show
          Jason Rutherglen added a comment - Typically a transaction log configured to be written to a different hard drive than the indexes / database.
          Hide
          Yonik Seeley added a comment -

          Just to get a rough idea of performance, I uploaded one of my CSV test files (765MB, 100M docs, 7 small string fields per doc).
          Time to complete indexing was 42% longer, and the transaction log grew to 1.8GB. The lucene index was 1.2GB. The log was on the same device, so the main impact may have been disk IO.

          Show
          Yonik Seeley added a comment - Just to get a rough idea of performance, I uploaded one of my CSV test files (765MB, 100M docs, 7 small string fields per doc). Time to complete indexing was 42% longer, and the transaction log grew to 1.8GB. The lucene index was 1.2GB. The log was on the same device, so the main impact may have been disk IO.
          Hide
          Yonik Seeley added a comment -

          Patch that updates to trunk and comments out the prints (those were actually causing test failures for some reason...)

              [junit] Testsuite: org.apache.solr.update.Batch-With-Multiple-Tests
              [junit] Testcase: org.apache.solr.update.Batch-With-Multiple-Tests:testDistribSearch:       Caused an ERROR
              [junit] Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit.
              [junit] junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit.
              [junit]     at java.lang.Thread.run(Thread.java:680)
          
          Show
          Yonik Seeley added a comment - Patch that updates to trunk and comments out the prints (those were actually causing test failures for some reason...) [junit] Testsuite: org.apache.solr.update.Batch-With-Multiple-Tests [junit] Testcase: org.apache.solr.update.Batch-With-Multiple-Tests:testDistribSearch: Caused an ERROR [junit] Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. [junit] junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. [junit] at java.lang. Thread .run( Thread .java:680)
          Hide
          Yonik Seeley added a comment -

          Here's an update that among other things uses the "tlog" directory under the data directory.

          Show
          Yonik Seeley added a comment - Here's an update that among other things uses the "tlog" directory under the data directory.
          Hide
          Mark Miller added a comment -

          Realtime get can be a separate issue altogether.

          Realtime get is a separate JIRA issue - see SOLR-2656. However, it obviously goes very much with this issue if you do a little back reading - you don't want to have to reopen a reader each time for realtime get - you want to pull from the transaction log when you can instead.

          Show
          Mark Miller added a comment - Realtime get can be a separate issue altogether. Realtime get is a separate JIRA issue - see SOLR-2656 . However, it obviously goes very much with this issue if you do a little back reading - you don't want to have to reopen a reader each time for realtime get - you want to pull from the transaction log when you can instead.
          Hide
          Noble Paul added a comment -

          Realtime get can be a separate issue altogether. These are two distinct features

          Show
          Noble Paul added a comment - Realtime get can be a separate issue altogether. These are two distinct features
          Hide
          Andrzej Bialecki added a comment -

          +1 for abstract class. E.g. one interesting mechanism for tlog persistence and fanout to slaves could be Kafka (http://incubator.apache.org/projects/kafka.html).

          Show
          Andrzej Bialecki added a comment - +1 for abstract class. E.g. one interesting mechanism for tlog persistence and fanout to slaves could be Kafka ( http://incubator.apache.org/projects/kafka.html ).
          Hide
          Yonik Seeley added a comment -

          Should we not have a base class/interface for Transaction Log ? What if I wish to have an alternate implementation

          Yeah, eventually. I started that way, but it was premature. Before committing we should have a no-op implementation.

          What is the application of realtime get ?

          It's a step in the direction that allows an application to use Solr as a data store and not just an index.

          Show
          Yonik Seeley added a comment - Should we not have a base class/interface for Transaction Log ? What if I wish to have an alternate implementation Yeah, eventually. I started that way, but it was premature. Before committing we should have a no-op implementation. What is the application of realtime get ? It's a step in the direction that allows an application to use Solr as a data store and not just an index.
          Hide
          Noble Paul added a comment -

          A couple of comments

          • Should we not have a base class/interface for Transaction Log ? What if I wish to have an alternate implementation
          • What is the application of realtime get ? is it for the cloud?
          Show
          Noble Paul added a comment - A couple of comments Should we not have a base class/interface for Transaction Log ? What if I wish to have an alternate implementation What is the application of realtime get ? is it for the cloud?
          Hide
          Yonik Seeley added a comment -

          Ah, silly mistake. When I moved the commitLock.lock() further up in the file, I failed to remove the original lock() and thus locked it twice.

          Show
          Yonik Seeley added a comment - Ah, silly mistake. When I moved the commitLock.lock() further up in the file, I failed to remove the original lock() and thus locked it twice.
          Hide
          Yonik Seeley added a comment -

          Here's the latest prototype patch. I've hit a bit of an oddity with locking though that causes TestRealTimeGet to hang.

          I put a ReentrantLock around the commit in the update hander. The test hangs with one or more of the writer threads blocked on the .lock().

          • .unlock is called in a finally block - so it should always get called
          • I added a counter that is incremented after the lock and decremented after the unlock. it shows "0" in the debugger after the hang, meaning that we unlocked as many times as we locked.
          • the only place that touches that lock is DUH2.commit()
          • if I look into the Sync object inside the ReentrantLock, the state is 1 (meaning locked I think). The exclusiveOwnerThread is "main" for some reason.
          • I think what I am seeing is that unlock() seems to normally fail to take effect. The normal course is that cleanIndex() causes the main thread to do a deleteByQuery + commit, and even though the print says the lock was released, main retains it and no one else can ever acquire it.

          I can see the output via intellij, but not from the command line (since output seems to be buffered until the end of the test).

          Show
          Yonik Seeley added a comment - Here's the latest prototype patch. I've hit a bit of an oddity with locking though that causes TestRealTimeGet to hang. I put a ReentrantLock around the commit in the update hander. The test hangs with one or more of the writer threads blocked on the .lock(). .unlock is called in a finally block - so it should always get called I added a counter that is incremented after the lock and decremented after the unlock. it shows "0" in the debugger after the hang, meaning that we unlocked as many times as we locked. the only place that touches that lock is DUH2.commit() if I look into the Sync object inside the ReentrantLock, the state is 1 (meaning locked I think). The exclusiveOwnerThread is "main" for some reason. I think what I am seeing is that unlock() seems to normally fail to take effect. The normal course is that cleanIndex() causes the main thread to do a deleteByQuery + commit, and even though the print says the lock was released, main retains it and no one else can ever acquire it. I can see the output via intellij, but not from the command line (since output seems to be buffered until the end of the test).
          Hide
          Yonik Seeley added a comment -

          Here's an update that handles delete-by-id and also makes lookups concurrent (no synchronization on the file reads so multiple can proceed at once).

          Show
          Yonik Seeley added a comment - Here's an update that handles delete-by-id and also makes lookups concurrent (no synchronization on the file reads so multiple can proceed at once).
          Hide
          Yonik Seeley added a comment -

          Here's a draft patch.
          There is a tlog.<number> file created for each commit. The javabin format is used to serialize SolrInputDocuments.
          An in-memory map of pointers into the log is kept for documents not yet soft-committed, and the realtime-get component checks that first before using SolrCore.getNewestSearcher().

          Seems to work for getting documents not in the newest searcher so far.

          Tons of stuff left to do

          • the tlog files are currently in the CWD
          • need to handle deletes
          • need to handle flushes in a performant way
          • need to implement optional fsync for durability on power-failure
          • would be nice to make some of this multi-threaded for better performance
          • need to implement durability (apply updates from logs on startup)
          • need to implement some form of cleanup for transaction logs
          Show
          Yonik Seeley added a comment - Here's a draft patch. There is a tlog.<number> file created for each commit. The javabin format is used to serialize SolrInputDocuments. An in-memory map of pointers into the log is kept for documents not yet soft-committed, and the realtime-get component checks that first before using SolrCore.getNewestSearcher(). Seems to work for getting documents not in the newest searcher so far. Tons of stuff left to do the tlog files are currently in the CWD need to handle deletes need to handle flushes in a performant way need to implement optional fsync for durability on power-failure would be nice to make some of this multi-threaded for better performance need to implement durability (apply updates from logs on startup) need to implement some form of cleanup for transaction logs

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Yonik Seeley
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development