Solr
  1. Solr
  2. SOLR-6393

Improve transaction log replay speed on HDFS.

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Replay speed is pretty slow on HDFS because we currently reopen a reader between reading each update.

        Activity

        Hide
        Mark Miller added a comment -

        Patch that only reopens a reader when we move beyond the length of the file being read rather than on every next call.

        Show
        Mark Miller added a comment - Patch that only reopens a reader when we move beyond the length of the file being read rather than on every next call.
        Hide
        Mark Miller added a comment -

        The brings straight read speed from abysmal to something on par with the local filesystem implementation. That will be signifgant in the case the transaction log needs to be replayed on startup and updates are not coming in.

        read/write speed will still hit ugly reader reopens, but should also be much faster as updates will be read in larger batches than one in most cases.

        Show
        Mark Miller added a comment - The brings straight read speed from abysmal to something on par with the local filesystem implementation. That will be signifgant in the case the transaction log needs to be replayed on startup and updates are not coming in. read/write speed will still hit ugly reader reopens, but should also be much faster as updates will be read in larger batches than one in most cases.
        Hide
        Erick Erickson added a comment -

        Assuming that

        needs to be replayed on startup and updates are not coming in
        should read
        needs to be replayed on startup and updates are coming in

        Show
        Erick Erickson added a comment - Assuming that needs to be replayed on startup and updates are not coming in should read needs to be replayed on startup and updates are coming in
        Hide
        Mark Miller added a comment -

        No, it's written right. The second paragraph addresses when updates are also coming in.

        Show
        Mark Miller added a comment - No, it's written right. The second paragraph addresses when updates are also coming in.
        Hide
        Erick Erickson added a comment -

        Ah, missed that. Thanks!

        Show
        Erick Erickson added a comment - Ah, missed that. Thanks!
        Hide
        ASF subversion and git services added a comment -

        Commit 1619200 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1619200 ]

        SOLR-6393: TransactionLog replay performance on HDFS is very poor.

        Show
        ASF subversion and git services added a comment - Commit 1619200 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1619200 ] SOLR-6393 : TransactionLog replay performance on HDFS is very poor.
        Hide
        ASF subversion and git services added a comment -

        Commit 1619218 from Mark Miller in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1619218 ]

        SOLR-6393: TransactionLog replay performance on HDFS is very poor.

        Show
        ASF subversion and git services added a comment - Commit 1619218 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1619218 ] SOLR-6393 : TransactionLog replay performance on HDFS is very poor.
        Hide
        Mark Miller added a comment -

        Ah, missed that. Thanks!

        The straight read speed should be really fast now, but when we are updating while reading, when we get to the end of the file we are reading, we have to try and open it again to see if there is anything new. The local fs impl uses channels and doesn't have to do this to see the latest data from the writer. So even when updates are also coming in, this should be a huge improvement, because it was previously reopening needlessly on every update it read, but we do still have to take that hit of opening a new reader every time we read up to the end of the view of the last reader.

        Show
        Mark Miller added a comment - Ah, missed that. Thanks! The straight read speed should be really fast now, but when we are updating while reading, when we get to the end of the file we are reading, we have to try and open it again to see if there is anything new. The local fs impl uses channels and doesn't have to do this to see the latest data from the writer. So even when updates are also coming in, this should be a huge improvement, because it was previously reopening needlessly on every update it read, but we do still have to take that hit of opening a new reader every time we read up to the end of the view of the last reader.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Mark Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development