Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Background
a.Generally, we would hope several organizations for the same data. e.g. Secondary Index sortes the data as the non-primary key.
b.Now, when we scanning the data on HBase with condition, like ValueFilter, its efficiency seems low
c.We could create an Assistant Store to store the data with another organization for the data of HRegion
Assistant Store
a.It's a store of HRegion, like HStore, could be created by user through adding ColumnFamliy
b.Data in Assistant Store is the copy of data in HRegion, but using another organization ,The Exception is that its row could be not in the range of HRegion and its value is the same as the row of original KeyValue
For example,
The region(Range:'row001'~'row999') includes the following KVs in the Store cf:
row001/cf:q1/val001
row002/cf:q1/val002
row003/cf:q1/val003
we could create an Assistant Store(named as) for the region which includes the following KVs:
val001/cf:q1/row001
val002/cf:q1/row002
val003/cf:q1/row003
c.We could use local region transaction to ensure the Atomicity and Consistency
e.Regionserver will put data into Assistant Store automatically, but user should read the data from Assistant Store himself
Example of Using Assistant Store
a.Supposing exist the empty table named t1 with the column family named c1, it has only one region (region's range is from EMPTY_START_ROW to EMPTY_END_ROW).
b.Adding an Assistant Store for the table through adding a new column family named c2.
c.User put following data to table:
r1/c1:q1/v1
r2/c1:q1/v2
r3/c1:q1/v1
r4/c1:q1/v2
r5/c1:q1/v1
r6/c1:q1/v2
d.Then, the region will have the following data:
r1/c1:q1/v1
r2/c1:q1/v2
r3/c1:q1/v1
r4/c1:q1/v2
r5/c1:q1/v1
r6/c1:q1/v2
v1/c2:q1/r1
v1/c2:q1/r3
v1/c2:q1/r5
v2/c2:q1/r2 (Generated by Assistant, Stored in Assistant Store)
v2/c2:q1/r4
v2/c2:q1/r6
e.Splitting the region into daughter_a and daughter_b with the split poit 'r4',
then the daughter_a has the following data:
r1/c1:q1/v1
r2/c1:q1/v2
r3/c1:q1/v1
v1/c2:q1/r1
v1/c2:q1/r3 (Data in Assistant Store)
v2/c2:q1/r2
the daughter_b has the following data:
r4/c1:q1/v2
r5/c1:q1/v1
r6/c1:q1/v2
v1/c2:q1/r5
v2/c2:q1/r4(Data in Assistant Store)
v2/c2:q1/r6
f.From the above, we could see that the data in Assistant Store is always corresponding to the original data in Region, its data is maintained by regionserver.
g.How to use the data in Assistant Store?
Suppose we want to do a scan from 'r1' to 'r7' with the ValueFilter value = 'v2',
We must scan the whole table without Assistant Store.
But now we could use Assistant Store to speed up scanning:
Take a scan on Assistant Store from 'v2' to 'v2+', and get the following result:
v2/c2:q1/r2
v2/c2:q1/r4
v2/c2:q1/r6
Unfortunately, the scan result may not be ordered by row nor value, but be able to make it ordered by value.
From the code view, I design the scan on Assistant Store as following:
//Limit the scan range from the row Scan scan = new Scan(); scan.setStartRow('r1'); scan.setStopRow('r7'); //Do the scan on Assistant Store Scan assistantScan = new Scan().setStartRow('v2').setStopRow('v2'+'(byte)0x00'); scan.setAssistantScan(assistantScan);//After setting this, region will run the scan with the assistant Scan scanner = htable.getScanner(scan); for(Result result:scanner){ //out put v2/c2:q1/r2 v2/c2:q1/r4 v2/c2:q1/r6 }
Implementation Dependency
a.Split the StoreFile as value.(Now,we just split the file as row)
b.Support multi-row transaction in region (Alreadt implemented)
Providing an initial patch on 0.94 version.
What do you think about such a Store.