Accumulo
  1. Accumulo
  2. ACCUMULO-2613

Take advantage of HDFS caching to improve MTTR

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.7.0
    • Component/s: None
    • Labels:

      Description

      Hadoop 2.3.0 added HDFS caching.

      We should use this for small internal use tables (like !METADATA) and we should probably have a configurable option to use it for tables, with a stern warning that it should only be enabled on small tables that will be frequently used.

        Activity

        Sean Busbey created issue -
        Hide
        Sean Busbey added a comment -

        Actually, to really drive down recovery time would we need to also ask to pin WALs for the !METADATA table? I think we would.

        Show
        Sean Busbey added a comment - Actually, to really drive down recovery time would we need to also ask to pin WALs for the !METADATA table? I think we would.
        Hide
        Mike Drob added a comment -
        Show
        Mike Drob added a comment - Previous discussion here: http://markmail.org/message/egw2pp6du3xgdkqb
        Hide
        Eric Newton added a comment -

        To pin WALs to the metadata table, we would first have to separate the WAL for the metadata tablets (table-specific WALs).

        Show
        Eric Newton added a comment - To pin WALs to the metadata table, we would first have to separate the WAL for the metadata tablets (table-specific WALs).
        Hide
        Sean Busbey added a comment -

        That would be preferable, but couldn't a tablet server just recognize when it was adding a !METADATA related entry to the WAL and then request that it be pinned?

        Show
        Sean Busbey added a comment - That would be preferable, but couldn't a tablet server just recognize when it was adding a !METADATA related entry to the WAL and then request that it be pinned?
        Hide
        John Vines added a comment -

        How much gain would we get from caching metadata table files when we're already caching them in the tserver?

        Show
        John Vines added a comment - How much gain would we get from caching metadata table files when we're already caching them in the tserver?
        Hide
        Sean Busbey added a comment -

        caching them in the tserver doesn't help when the tserver goes down.

        Show
        Sean Busbey added a comment - caching them in the tserver doesn't help when the tserver goes down.
        Hide
        Sean Busbey added a comment -

        edited title to make my intentions clearer

        Show
        Sean Busbey added a comment - edited title to make my intentions clearer
        Sean Busbey made changes -
        Field Original Value New Value
        Summary Take advantage of HDFS caching Take advantage of HDFS caching to improve MTTR
        Hide
        Eric Newton added a comment -

        It is not unusual to have a metadata tablet on half the nodes of a cluster.

        Show
        Eric Newton added a comment - It is not unusual to have a metadata tablet on half the nodes of a cluster.
        Hide
        Sean Busbey added a comment -

        even then, say we pinned all the WALs. how much space is that likely to be? less than 10G per node? that's not too bad on a hardware cluster.

        Show
        Sean Busbey added a comment - even then, say we pinned all the WALs. how much space is that likely to be? less than 10G per node? that's not too bad on a hardware cluster.
        Sean Busbey made changes -
        Labels recovery
        Hide
        Eric Newton added a comment -

        I don't want to seem argumentative, because I really don't know if using this cache for the WAL is a good idea, or not. But I can think of some issues:

        • hopefully, in your clusters, recovery is an unusual operation
        • WAL has to write to disk to survive power loss, making it a bad candidate for RAM-only storage
        • Others have purposefully turned off caching of WAL data to make memory available for other things, since reading them at all is unusual

        We already know we can improve recovery time by reducing the largest WAL size, parallelizing read/sort, and computing a more optimal leaseRecovery timeout. I would strongly suggest a more in-depth look into recovery before even experimenting with HDFS caching.

        Show
        Eric Newton added a comment - I don't want to seem argumentative, because I really don't know if using this cache for the WAL is a good idea, or not. But I can think of some issues: hopefully, in your clusters, recovery is an unusual operation WAL has to write to disk to survive power loss, making it a bad candidate for RAM-only storage Others have purposefully turned off caching of WAL data to make memory available for other things, since reading them at all is unusual We already know we can improve recovery time by reducing the largest WAL size, parallelizing read/sort, and computing a more optimal leaseRecovery timeout. I would strongly suggest a more in-depth look into recovery before even experimenting with HDFS caching.

          People

          • Assignee:
            Unassigned
            Reporter:
            Sean Busbey
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development