Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18872

Backup scaling for multiple table and millions of row

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I did a simple experiment of loading ~200 million rows on a table 1 and nothing in a table 2. This test was done on a local cluster ~ approx 3-4 containers were running in parallel. The focus of the test was not on how much time backup takes but on time spent on the table were no data has been changed.

      Table without Data -->
      Elapsed: 44mins, 52sec
      Average Map Time 3sec
      Average Shuffle Time 2mins, 35sec
      Average Merge Time 0sec
      Average Reduce Time 0sec
      Map : 2052
      Reduce : 1

      Table with Data -->
      Elapsed: 1hrs, 44mins, 10sec
      Average Map Time 4sec
      Average Shuffle Time 37sec
      Average Merge Time 3sec
      Average Reduce Time 47sec
      Map : 2052
      Reduce : 64

      All above numbers are a single node cluster so not many mappers run in parallel. but let's extrapolate this to 20 node cluster, with ~100 tables and data size to be backed up various for approx 2000 Wals, let us say each 20 node can process 3 containers i.e 60 wals in parallel. assume 3 sec are spent in each WALs i.e. 6000\ 60 sec --> 100 per table --> 10000 sec for all tables.
      ~166 mins --> ~2.7 hrs only for filtering. This does not seem to be scale. (These are just rough numbers from a basic test). As all parsing is O (m (WALS) * n (Tables))

      Main intend of this test is to see even the backup of very less churning table might take good amount for just filtering the data. As number of table or data increases, this does not seem scalable

      Even i can see from our current cluster numbers easily close to 100 table, 200 millions rows, 200 -300 GB.

      I would suggest that we should have filtering to parse WALs once and to segregate in multiple WALs per table --> hFiles from per table wals. ( just a rough idea).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vrodionov Vladimir Rodionov
                Reporter:
                vishk Vishal Khandelwal
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: