Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-8980

Assistant Store ----------- An Index Store of HRegion

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • regionserver
    • None

    Description

      Background
      a.Generally, we would hope several organizations for the same data. e.g. Secondary Index sortes the data as the non-primary key.
      b.Now, when we scanning the data on HBase with condition, like ValueFilter, its efficiency seems low
      c.We could create an Assistant Store to store the data with another organization for the data of HRegion

      Assistant Store
      a.It's a store of HRegion, like HStore, could be created by user through adding ColumnFamliy

      b.Data in Assistant Store is the copy of data in HRegion, but using another organization ,The Exception is that its row could be not in the range of HRegion and its value is the same as the row of original KeyValue
      For example,
      The region(Range:'row001'~'row999') includes the following KVs in the Store cf:
      row001/cf:q1/val001
      row002/cf:q1/val002
      row003/cf:q1/val003
      we could create an Assistant Store(named as) for the region which includes the following KVs:
      val001/cf:q1/row001
      val002/cf:q1/row002
      val003/cf:q1/row003

      c.We could use local region transaction to ensure the Atomicity and Consistency

      e.Regionserver will put data into Assistant Store automatically, but user should read the data from Assistant Store himself

      Example of Using Assistant Store
      a.Supposing exist the empty table named t1 with the column family named c1, it has only one region (region's range is from EMPTY_START_ROW to EMPTY_END_ROW).

      b.Adding an Assistant Store for the table through adding a new column family named c2.

      c.User put following data to table:
      r1/c1:q1/v1
      r2/c1:q1/v2
      r3/c1:q1/v1
      r4/c1:q1/v2
      r5/c1:q1/v1
      r6/c1:q1/v2

      d.Then, the region will have the following data:
      r1/c1:q1/v1
      r2/c1:q1/v2
      r3/c1:q1/v1
      r4/c1:q1/v2
      r5/c1:q1/v1
      r6/c1:q1/v2

      v1/c2:q1/r1
      v1/c2:q1/r3
      v1/c2:q1/r5
      v2/c2:q1/r2 (Generated by Assistant, Stored in Assistant Store)
      v2/c2:q1/r4
      v2/c2:q1/r6

      e.Splitting the region into daughter_a and daughter_b with the split poit 'r4',

      then the daughter_a has the following data:
      r1/c1:q1/v1
      r2/c1:q1/v2
      r3/c1:q1/v1

      v1/c2:q1/r1
      v1/c2:q1/r3 (Data in Assistant Store)
      v2/c2:q1/r2

      the daughter_b has the following data:

      r4/c1:q1/v2
      r5/c1:q1/v1
      r6/c1:q1/v2

      v1/c2:q1/r5
      v2/c2:q1/r4(Data in Assistant Store)
      v2/c2:q1/r6

      f.From the above, we could see that the data in Assistant Store is always corresponding to the original data in Region, its data is maintained by regionserver.

      g.How to use the data in Assistant Store?
      Suppose we want to do a scan from 'r1' to 'r7' with the ValueFilter value = 'v2',
      We must scan the whole table without Assistant Store.
      But now we could use Assistant Store to speed up scanning:
      Take a scan on Assistant Store from 'v2' to 'v2+', and get the following result:
      v2/c2:q1/r2
      v2/c2:q1/r4
      v2/c2:q1/r6

      Unfortunately, the scan result may not be ordered by row nor value, but be able to make it ordered by value.

      From the code view, I design the scan on Assistant Store as following:

      //Limit the scan range from the row
      Scan scan = new Scan();
      scan.setStartRow('r1');
      scan.setStopRow('r7');
      
      //Do the scan on Assistant Store
      Scan assistantScan = new Scan().setStartRow('v2').setStopRow('v2'+'(byte)0x00');
      scan.setAssistantScan(assistantScan);//After setting this, region will run the scan with the assistant Scan
      
      scanner = htable.getScanner(scan);
      
      for(Result result:scanner){
      //out put
      v2/c2:q1/r2
      v2/c2:q1/r4
      v2/c2:q1/r6
      }
      

      Implementation Dependency
      a.Split the StoreFile as value.(Now,we just split the file as row)
      b.Support multi-row transaction in region (Alreadt implemented)

      Providing an initial patch on 0.94 version.
      What do you think about such a Store.

      Attachments

        1. 8980-94.patch
          81 kB
          Chunhui Shen

        Activity

          People

            Unassigned Unassigned
            zjushch Chunhui Shen
            Votes:
            1 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: