HBase
  1. HBase
  2. HBASE-5229

Provide basic building blocks for "multi-row" local transactions.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: Client, regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In the final iteration, this issue provides a generalized, public mutateRowsWithLocks method on HRegion, that can be used by coprocessors to implement atomic operations efficiently.
      Coprocessors are already region aware, which makes this is a good pairing of APIs. This feature is by design not available to the client via the HTable API.

      It took a long time to arrive at this and I apologize for the public exposure of my (erratic in retrospect) thought processes.

      Was:
      HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here.

      After a bit of discussion two solutions have emerged:
      1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows.
      2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix.

      #1 is true to the current storage paradigm of HBase.
      #2 is true to the current client side API.

      I will explore these two with sample patches here.

      --------------------
      Was:
      As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple.
      I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere.

      Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete.

      1. 5229.txt
        27 kB
        Lars Hofhansl
      2. 5229-seekto.txt
        7 kB
        Lars Hofhansl
      3. 5229-seekto-v2.txt
        8 kB
        Lars Hofhansl
      4. 5229-multiRow.txt
        29 kB
        Lars Hofhansl
      5. 5229-multiRow-v2.txt
        25 kB
        Lars Hofhansl
      6. 5229-endpoint.txt
        20 kB
        Lars Hofhansl
      7. 5229-final.txt
        21 kB
        Lars Hofhansl

        Issue Links

          Activity

          Hide
          Lars Hofhansl added a comment -

          Initial patch. It works as advertised. I couldn't think of good names for the classes (RegionMutation is not a good name).

          Show
          Lars Hofhansl added a comment - Initial patch. It works as advertised. I couldn't think of good names for the classes (RegionMutation is not a good name).
          Hide
          Lars Hofhansl added a comment -

          Todd Lipcon made a good point: Currently regions are an implementation detail. With this patch they would practically become part of the API.

          Show
          Lars Hofhansl added a comment - Todd Lipcon made a good point: Currently regions are an implementation detail. With this patch they would practically become part of the API.
          Hide
          Lars Hofhansl added a comment -

          That argument pretty much sinks this approach for me.

          At the same time we should not put a full transactional API into HBase, but rather provide enough building blocks so that an outside client could implement transactions. I do not see how we can do that without exposing knowledge about some internals such as regions.

          Another approach is to give more control over which set of rows can participate in a transaction.
          Right now that is all KVs with the same row-key (internally we achieve by collocating all those KVs, but that is an implementation detail).
          What if we allow a prefix of the row key instead? We can even formalize that, and give the row key some internal (optional) structure, which allows the application to specific transaction groups.

          Show
          Lars Hofhansl added a comment - That argument pretty much sinks this approach for me. At the same time we should not put a full transactional API into HBase, but rather provide enough building blocks so that an outside client could implement transactions. I do not see how we can do that without exposing knowledge about some internals such as regions. Another approach is to give more control over which set of rows can participate in a transaction. Right now that is all KVs with the same row-key (internally we achieve by collocating all those KVs, but that is an implementation detail). What if we allow a prefix of the row key instead? We can even formalize that, and give the row key some internal (optional) structure, which allows the application to specific transaction groups.
          Hide
          stack added a comment -

          Another approach is to give more control over which set of rows can participate in a transaction.

          If we did row prefix instead, it'd have to be an input to the table splitting function so we didn't split in the middle of a transactions row set. Doesn't sound hard. Would be part of table schema.

          Show
          stack added a comment - Another approach is to give more control over which set of rows can participate in a transaction. If we did row prefix instead, it'd have to be an input to the table splitting function so we didn't split in the middle of a transactions row set. Doesn't sound hard. Would be part of table schema.
          Hide
          Todd Lipcon added a comment -

          What if we allow a prefix of the row key instead?

          That's essentially the approach that both DynamoDB and Oracle's NoSQL db take. But, the way I see it, we already have that – the row key is the "prefix of the row key", and the column key is the "rest of the row key". What would an extra element of key structure give us?

          Show
          Todd Lipcon added a comment - What if we allow a prefix of the row key instead? That's essentially the approach that both DynamoDB and Oracle's NoSQL db take. But, the way I see it, we already have that – the row key is the "prefix of the row key", and the column key is the "rest of the row key". What would an extra element of key structure give us?
          Hide
          Lars Hofhansl added a comment -

          That is true when it comes to storage.
          However our (current) API is mostly row based. There is no way to start or stop a stop a scan at a column, there are many assumptions about rows baked into the scanner, etc. I don't think ColumnRangeFilter would be good enough here.

          Declaring a prefix and honoring it during splitting seems simpler and more in line with our current API and (probably?) what a user would expect.

          It is another avenue, though. For example we can add Scan.set

          {Start|Stop}

          Key (where we can present prefixes of the full key, rather than just the row key), and handle it accordingly at the server. Would also need a nextKeyValue (or nextColumn or something) method on ResultScanner along with the server code that does this efficiently.

          Show
          Lars Hofhansl added a comment - That is true when it comes to storage. However our (current) API is mostly row based. There is no way to start or stop a stop a scan at a column, there are many assumptions about rows baked into the scanner, etc. I don't think ColumnRangeFilter would be good enough here. Declaring a prefix and honoring it during splitting seems simpler and more in line with our current API and (probably?) what a user would expect. It is another avenue, though. For example we can add Scan.set {Start|Stop} Key (where we can present prefixes of the full key, rather than just the row key), and handle it accordingly at the server. Would also need a nextKeyValue (or nextColumn or something) method on ResultScanner along with the server code that does this efficiently.
          Hide
          Philip Zeyliger added a comment -

          Just to kibitz a little bit:

          The approach you're proposing will only ever let you do "local" transactions: transactions clearly related to a prefix-friendly set of rows. For example, if, say, inserting a tweet is transactional, but there's a global hash tag index, you might want to make deleting a tweet transactional in the sense that it'll clean up the index entry for you. You're never going to get the index to be on the same region. It's possible to implement multi-row transactions on top of single row transactions (by putting a write a head log there, see Megastore at http://research.google.com/pubs/pub36971.html).

          On the other hand, if you're cool with that, and even just "local" transactions could be very useful to many users and is probably quite efficient. As a user, I'd much prefer dealing with single rows, or row prefixes, than hoping that my rows are on the same region server.

          Show
          Philip Zeyliger added a comment - Just to kibitz a little bit: The approach you're proposing will only ever let you do "local" transactions: transactions clearly related to a prefix-friendly set of rows. For example, if, say, inserting a tweet is transactional, but there's a global hash tag index, you might want to make deleting a tweet transactional in the sense that it'll clean up the index entry for you. You're never going to get the index to be on the same region. It's possible to implement multi-row transactions on top of single row transactions (by putting a write a head log there, see Megastore at http://research.google.com/pubs/pub36971.html ). On the other hand, if you're cool with that, and even just "local" transactions could be very useful to many users and is probably quite efficient. As a user, I'd much prefer dealing with single rows, or row prefixes, than hoping that my rows are on the same region server.
          Hide
          Lars Hofhansl added a comment -

          Thanks Philip. Yep, that's the idea with this. Global transactions would be handled differently and be far more heavyweight (and must be used judiciously or your load will not scale).

          The transactions I have in mind with this would be just as fast as a current Puts/Deletes (with the drawback that related data would be collocated).

          Show
          Lars Hofhansl added a comment - Thanks Philip. Yep, that's the idea with this. Global transactions would be handled differently and be far more heavyweight (and must be used judiciously or your load will not scale). The transactions I have in mind with this would be just as fast as a current Puts/Deletes (with the drawback that related data would be collocated).
          Hide
          Matt Corgan added a comment -

          I think the approach of expanding and improving support for gigantic rows would be cleaner than adding another consistency guarantee to hbase's feature list. Combining very wide rows with many column families, each with separate settings and compactions provides a good framework for all sorts of different data models on top. You could let a row become so big that it's the only row on a machine if you want.

          From what I can tell from papers and presentations, BigTable actually supports many gigabyte EntityGroups. They mention it at 6:18 in this video: http://www.youtube.com/watch?v=xO015C3R6dw . I'm not sure how "Entity Groups can span machines and still enforce transactionality". I had thought that an entity group was confined to a single BigTable row, maybe that means they do span BigTable regions. Anyone know how that works?

          Show
          Matt Corgan added a comment - I think the approach of expanding and improving support for gigantic rows would be cleaner than adding another consistency guarantee to hbase's feature list. Combining very wide rows with many column families, each with separate settings and compactions provides a good framework for all sorts of different data models on top. You could let a row become so big that it's the only row on a machine if you want. From what I can tell from papers and presentations, BigTable actually supports many gigabyte EntityGroups. They mention it at 6:18 in this video: http://www.youtube.com/watch?v=xO015C3R6dw . I'm not sure how "Entity Groups can span machines and still enforce transactionality". I had thought that an entity group was confined to a single BigTable row, maybe that means they do span BigTable regions. Anyone know how that works?
          Hide
          Daniel Gómez Ferro added a comment -

          Currently regions are an implementation detail. With this patch they would practically become part of the API.

          Didn't regions already become part of the API with Coprocessors?

          Show
          Daniel Gómez Ferro added a comment - Currently regions are an implementation detail. With this patch they would practically become part of the API. Didn't regions already become part of the API with Coprocessors?
          Hide
          Todd Lipcon added a comment -

          Sort of - but coprocessors are really an ultra-advanced API. I see them more like kernel modules in Linux - we don't purport to keep them 100% compatible between versions, and you're likely to crash your database if you mess up. Here, though, we were talking about a publicly accessible transactionality feature which users are going to depend on, and which breaks the abstractions everywhere else.

          Show
          Todd Lipcon added a comment - Sort of - but coprocessors are really an ultra-advanced API. I see them more like kernel modules in Linux - we don't purport to keep them 100% compatible between versions, and you're likely to crash your database if you mess up. Here, though, we were talking about a publicly accessible transactionality feature which users are going to depend on, and which breaks the abstractions everywhere else.
          Hide
          Lars Hofhansl added a comment -

          @Daniel: Except for RegionCoprocessorEnvironment there are not too many points there directly expose regions. So I think Todd's point still holds

          J-D brought to my attention that via scanner batching we can already control how many columns a scanner.next() call returns. So what I am exploring now is better control over where to start a scan (allow a column prefix to specified along with the startRow - the only sticky point is that family delete marker would not be honored in that case, as we'd want to seek directly to the column and not seek to the family delete marker first as that would defeat the purpose completely).

          Show
          Lars Hofhansl added a comment - @Daniel: Except for RegionCoprocessorEnvironment there are not too many points there directly expose regions. So I think Todd's point still holds J-D brought to my attention that via scanner batching we can already control how many columns a scanner.next() call returns. So what I am exploring now is better control over where to start a scan (allow a column prefix to specified along with the startRow - the only sticky point is that family delete marker would not be honored in that case, as we'd want to seek directly to the column and not seek to the family delete marker first as that would defeat the purpose completely).
          Hide
          Lars Hofhansl added a comment -

          Here's a patch demonstrating that approach.
          A scanner can be passed a start KeyValue. It will seek to that KeyValue.
          The client is responsible to create a suitable KeyValue for seeking (see the included test).
          Together with scanner batching this allows for pretty flexible intra row scanning.

          For simplicity the stopRow is also enforced allowing only intra-row scanning (that is just so that the change it ClientScanner is small - otherwise it'd need to remember the seekTo for retry but still allow to scan to the next region by setting the startRow).

          If folks like this approach, I'll make a nicer patch and change the title of this jira.

          Show
          Lars Hofhansl added a comment - Here's a patch demonstrating that approach. A scanner can be passed a start KeyValue. It will seek to that KeyValue. The client is responsible to create a suitable KeyValue for seeking (see the included test). Together with scanner batching this allows for pretty flexible intra row scanning. For simplicity the stopRow is also enforced allowing only intra-row scanning (that is just so that the change it ClientScanner is small - otherwise it'd need to remember the seekTo for retry but still allow to scan to the next region by setting the startRow). If folks like this approach, I'll make a nicer patch and change the title of this jira.
          Hide
          Lars Hofhansl added a comment -

          Oops... Attached the wrong file. This is the one.

          Show
          Lars Hofhansl added a comment - Oops... Attached the wrong file. This is the one.
          Hide
          Lars Hofhansl added a comment -

          Arghhh... This one... Really.

          Show
          Lars Hofhansl added a comment - Arghhh... This one... Really.
          Hide
          Ted Yu added a comment -

          Interesting.

          +      int length = in.readInt();
          

          Would be nice if we can utilize vint.

          I suggest changing the title. We're pretty far from the original plan.

          Show
          Ted Yu added a comment - Interesting. + int length = in.readInt(); Would be nice if we can utilize vint. I suggest changing the title. We're pretty far from the original plan.
          Hide
          Lars Hofhansl added a comment -

          Was going by the code in Put.java. Agreed, vint is better here, especially because seekTo will typically only have the key portion (i.e. be small in size). Maybe we should go through Put/Get/Delete/etc and also use vints there.

          I'll do some performance tests with a 1m columns or so. Also have to wrap my head around the implications for bloom filters.

          Show
          Lars Hofhansl added a comment - Was going by the code in Put.java. Agreed, vint is better here, especially because seekTo will typically only have the key portion (i.e. be small in size). Maybe we should go through Put/Get/Delete/etc and also use vints there. I'll do some performance tests with a 1m columns or so. Also have to wrap my head around the implications for bloom filters.
          Hide
          Lars Hofhansl added a comment -

          A real patch is a bit more complicated, as there are multiple regions and stores. The ClientScanner still needs to address the correct region, and all unneeded stores need to be skipped.

          Show
          Lars Hofhansl added a comment - A real patch is a bit more complicated, as there are multiple regions and stores. The ClientScanner still needs to address the correct region, and all unneeded stores need to be skipped.
          Hide
          Lars Hofhansl added a comment - - edited

          Here's a patch that works. Regions are correctly addressed using the row key. Intra-row scans are limited to a single row.
          The region scanner will correctly skip all stores for families before the seekTo KV.
          The first matching store will use the seekTo KV, the others will seek to the beginning of the row (since their family sorts after the seekTo key).

          The model possible here would be that a "transactional" table ends up being a row in HBase (i.e. the row-key is the tablename), and a prefix of the columns defines "rows" inside that table, the rest of the columns defines the actual "columns" of that row.

          With this patch that is possible (set seekTo in Scan.java to seek inside a row, and enable batching).

          This patch provides no way to set a stop point inside a row. The client would need to set batching to reasonable amount (to avoid too many roundtrips and at the same not to return too many unnecessary KVs).
          Also with a stop point, we could prune all stores whose family is past the stop point (just like this patch prunes all stores with families before the stop point).
          Because RegionScannerImpl and StoreScaner are inherently the row based the refactoring would be non-trivial and risky.

          I will now explore what it would take to define a grouping prefix in HTableDefinition.

          Show
          Lars Hofhansl added a comment - - edited Here's a patch that works. Regions are correctly addressed using the row key. Intra-row scans are limited to a single row. The region scanner will correctly skip all stores for families before the seekTo KV. The first matching store will use the seekTo KV, the others will seek to the beginning of the row (since their family sorts after the seekTo key). The model possible here would be that a "transactional" table ends up being a row in HBase (i.e. the row-key is the tablename), and a prefix of the columns defines "rows" inside that table, the rest of the columns defines the actual "columns" of that row. With this patch that is possible (set seekTo in Scan.java to seek inside a row, and enable batching). This patch provides no way to set a stop point inside a row. The client would need to set batching to reasonable amount (to avoid too many roundtrips and at the same not to return too many unnecessary KVs). Also with a stop point, we could prune all stores whose family is past the stop point (just like this patch prunes all stores with families before the stop point). Because RegionScannerImpl and StoreScaner are inherently the row based the refactoring would be non-trivial and risky. I will now explore what it would take to define a grouping prefix in HTableDefinition.
          Hide
          Lars Hofhansl added a comment -

          Yet another option is to a have ColumnFamilyRangeFilter (similar to ColumnRangeFilter, but also taking column families into account). Such a filter could be optimized to do just one seek in the beginning to seek to the appropriate KeyValue (using SEEK_NEXT_USING_HINT, and hinting the right KV), and it could easily seek to the next row at the end. The app would then create a scan for a single row and pass a ColumnFamilyRangeFilter.

          Anyway, maybe I should stop my public stream of consciousness here.

          Show
          Lars Hofhansl added a comment - Yet another option is to a have ColumnFamilyRangeFilter (similar to ColumnRangeFilter, but also taking column families into account). Such a filter could be optimized to do just one seek in the beginning to seek to the appropriate KeyValue (using SEEK_NEXT_USING_HINT, and hinting the right KV), and it could easily seek to the next row at the end. The app would then create a scan for a single row and pass a ColumnFamilyRangeFilter. Anyway, maybe I should stop my public stream of consciousness here.
          Hide
          Lars Hofhansl added a comment -

          It turns out that there is no reliable ordering between KeyValues in different families (for the same row key).
          Neither should there be, really, since stores should (in theory - not done, yet) be able to retrieve KVs in parallel. So starting at (row1, familyX, columnA) and ending at (row1, familyY, columnB) has no meaning.
          Maybe that should have been obvious... See comment in KeyValue.KeyComparator(...):

          // TODO the family and qualifier should be compared separately
          compare = Bytes.compareTo(left, lcolumnoffset, lcolumnlength, right,
             rcolumnoffset, rcolumnlength);
          ...
          

          Not sure if that is separate bug. Indeed a ("row1", "family", "12") is equal to ("row1", "family1", "2") by this comparator.

          After all this, it looks like the best bet is to use the existing ColumnRangeFilter for intra-row scanning. If only one (or a few) families are used, that will be pretty efficient too, as ColumnRangeFilter seeks to the appropriate KV.

          I'll close this issue.

          Show
          Lars Hofhansl added a comment - It turns out that there is no reliable ordering between KeyValues in different families (for the same row key). Neither should there be, really, since stores should (in theory - not done, yet) be able to retrieve KVs in parallel. So starting at (row1, familyX, columnA) and ending at (row1, familyY, columnB) has no meaning. Maybe that should have been obvious... See comment in KeyValue.KeyComparator(...): // TODO the family and qualifier should be compared separately compare = Bytes.compareTo(left, lcolumnoffset, lcolumnlength, right, rcolumnoffset, rcolumnlength); ... Not sure if that is separate bug. Indeed a ("row1", "family", "12") is equal to ("row1", "family1", "2") by this comparator. After all this, it looks like the best bet is to use the existing ColumnRangeFilter for intra-row scanning. If only one (or a few) families are used, that will be pretty efficient too, as ColumnRangeFilter seeks to the appropriate KV. I'll close this issue.
          Hide
          Lars Hofhansl added a comment -

          That is to say HBASE-3584 and HBASE-5203 along with ColumnRangeFilter is all this is needed.

          Show
          Lars Hofhansl added a comment - That is to say HBASE-3584 and HBASE-5203 along with ColumnRangeFilter is all this is needed.
          Hide
          stack added a comment -

          Is this the second exercise where we arrive at the same conclusion, that ColumnRangeFilter is what we should use doing intra-row scan? If so, should we do something to make ColumnRangeFilter use more palatable to use (add methods to Scan API that use the ColumnRangeFilter underneath).

          What about the comparator, we should fix that?

          Is it a problem, that there is no ordering between CFs Lars?

          Anything else we need to fix after your explorations Lars?

          Should we write up something in the book or elsewhere on 'transactions' in a single row is the way to go (as per Matt's supposition that making the single row case work properly should be good for a variety of modelings on hbase)?

          Good stuff Lars.

          Show
          stack added a comment - Is this the second exercise where we arrive at the same conclusion, that ColumnRangeFilter is what we should use doing intra-row scan? If so, should we do something to make ColumnRangeFilter use more palatable to use (add methods to Scan API that use the ColumnRangeFilter underneath). What about the comparator, we should fix that? Is it a problem, that there is no ordering between CFs Lars? Anything else we need to fix after your explorations Lars? Should we write up something in the book or elsewhere on 'transactions' in a single row is the way to go (as per Matt's supposition that making the single row case work properly should be good for a variety of modelings on hbase)? Good stuff Lars.
          Hide
          Lars Hofhansl added a comment -

          Yes, it is the 2nd time

          This is the way to make large Rows (ala Megastore) possible as Matt suggests, because a scan can be efficiently done inside a row (well actually a column family).

          Re: KV ordering. Looking around at the code... It is causing no problems, because we never have to compare KVs between families. It makes sense, RegionScannerImpl just deals with rows, and StoreScanners just deals with StoreFileScaners, which are all of he same family... Just something I somehow had not groked before today (I had assumed there is a consistent global ordering between all KeyValues).

          Instead of adding cruft to the Scan API that does not provide any new features, I think that ColumnRangeFilter rather should just be documented more prominently. I'll add something to the book (it's only tersely mentioned). And I'll definitely blog about it.

          Might still be worth exploring some outside grouping of rows (like the prefix I mention above), because that would be more in line with the API that a client expects. Implementing that would actually be simple: We'd just calculate the midKey as we do now, and then we take a prefix of the midKey to do the actual split (if the table declares - say - a four byte prefix, then we always split on the first four bytes of the midKey instead of the full row-key).

          Using wide rows with ColumnRangeFilter forces the application to reinvent many concepts (like selected a prefix of the column to declare the "inner grouping" and then use the remaining suffix to identify the "inner columns".)
          Additionally there might be work to have HBase perform with very wide rows (although ColumnRangeFilter can efficiently retrieve a subset of columns).

          It looks like ColumnRangeFilter might be ineffective if there many versions of the cells, as version elimination during scanning happens after Filters are evaluated (see ScanQueryMatcher).

          Maybe that's one thing to change: Allow a filter to declare whether it is evaluated pre or post version elimination... There're usecases for both. Kannan faces a similar problem in HBASE-5104, which I think could be solved if filters could be optionally evaluated after the version handling. If that's something we want to do, I'll create a jira for that and work out a patch.

          Show
          Lars Hofhansl added a comment - Yes, it is the 2nd time This is the way to make large Rows (ala Megastore) possible as Matt suggests, because a scan can be efficiently done inside a row (well actually a column family). Re: KV ordering. Looking around at the code... It is causing no problems, because we never have to compare KVs between families. It makes sense, RegionScannerImpl just deals with rows, and StoreScanners just deals with StoreFileScaners, which are all of he same family... Just something I somehow had not groked before today (I had assumed there is a consistent global ordering between all KeyValues). Instead of adding cruft to the Scan API that does not provide any new features, I think that ColumnRangeFilter rather should just be documented more prominently. I'll add something to the book (it's only tersely mentioned). And I'll definitely blog about it. Might still be worth exploring some outside grouping of rows (like the prefix I mention above), because that would be more in line with the API that a client expects. Implementing that would actually be simple: We'd just calculate the midKey as we do now, and then we take a prefix of the midKey to do the actual split (if the table declares - say - a four byte prefix, then we always split on the first four bytes of the midKey instead of the full row-key). Using wide rows with ColumnRangeFilter forces the application to reinvent many concepts (like selected a prefix of the column to declare the "inner grouping" and then use the remaining suffix to identify the "inner columns".) Additionally there might be work to have HBase perform with very wide rows (although ColumnRangeFilter can efficiently retrieve a subset of columns). It looks like ColumnRangeFilter might be ineffective if there many versions of the cells, as version elimination during scanning happens after Filters are evaluated (see ScanQueryMatcher). Maybe that's one thing to change: Allow a filter to declare whether it is evaluated pre or post version elimination... There're usecases for both. Kannan faces a similar problem in HBASE-5104 , which I think could be solved if filters could be optionally evaluated after the version handling. If that's something we want to do, I'll create a jira for that and work out a patch.
          Hide
          stack added a comment -

          In the above when you talk of column prefix, couldn't that be column family?

          For sure we need to talk up ColumnRangeFilter and exercise it more (e.g. your observation on filters and versions needs looking into...)

          Good stuff.

          Show
          stack added a comment - In the above when you talk of column prefix, couldn't that be column family? For sure we need to talk up ColumnRangeFilter and exercise it more (e.g. your observation on filters and versions needs looking into...) Good stuff.
          Hide
          stack added a comment -

          In the above when you talk of column prefix, couldn't that be column family?

          For sure we need to talk up ColumnRangeFilter and exercise it more (e.g. your observation on filters and versions needs looking into...)

          Good stuff.

          Show
          stack added a comment - In the above when you talk of column prefix, couldn't that be column family? For sure we need to talk up ColumnRangeFilter and exercise it more (e.g. your observation on filters and versions needs looking into...) Good stuff.
          Hide
          Lars Hofhansl added a comment -

          In the above when you talk of column prefix, couldn't that be column family?

          Depends on how wants to slice it. One idea is to map a transaction grouping of rows into a single HBase row. So you need a way to identify the inner rows. If you do that by column family you'd have a CF for each row. In that case it'd be better to take a prefix of the HBase columns to identify the (inner) row and the rest to be column.

          There are probably many other way to slice this.

          I'll file a jira for the Filter versioning interaction.

          Show
          Lars Hofhansl added a comment - In the above when you talk of column prefix, couldn't that be column family? Depends on how wants to slice it. One idea is to map a transaction grouping of rows into a single HBase row. So you need a way to identify the inner rows. If you do that by column family you'd have a CF for each row. In that case it'd be better to take a prefix of the HBase columns to identify the (inner) row and the rest to be column. There are probably many other way to slice this. I'll file a jira for the Filter versioning interaction.
          Hide
          Lars Hofhansl added a comment -

          The other thing that comes to mind is deletion.
          We would need a delete marker that can flag a set of column by prefix. It would be almost identical to the columns delete marker but affect all column with a particular prefix.

          Show
          Lars Hofhansl added a comment - The other thing that comes to mind is deletion. We would need a delete marker that can flag a set of column by prefix. It would be almost identical to the columns delete marker but affect all column with a particular prefix.
          Hide
          stack added a comment -

          I see now what the other delete issue is about.

          Regards CF as prefix, yeah, its a prob. now where the logical cf and physical cf are same thing. If we had locality groups, as per BT paper, we could have lots of cfs.

          Show
          stack added a comment - I see now what the other delete issue is about. Regards CF as prefix, yeah, its a prob. now where the logical cf and physical cf are same thing. If we had locality groups, as per BT paper, we could have lots of cfs.
          Hide
          Lars Hofhansl added a comment -

          Is anybody interested in me exploring the split prefix idea described above?

          Basically a table would declare a prefix of N bytes, and during splitting we make sure don't split values with the same prefix (which essentially just means that we calculate the midKey as we do now, and just take the first N bytes to perform the actual split, hence actual split point would always be aligned with the prefixes).
          That way we have defined a grouping of rows that could participate in local transactions.

          Show
          Lars Hofhansl added a comment - Is anybody interested in me exploring the split prefix idea described above? Basically a table would declare a prefix of N bytes, and during splitting we make sure don't split values with the same prefix (which essentially just means that we calculate the midKey as we do now, and just take the first N bytes to perform the actual split, hence actual split point would always be aligned with the prefixes). That way we have defined a grouping of rows that could participate in local transactions.
          Hide
          Todd Lipcon added a comment -

          I'd rather see some ability to make the split row determination a pluggable interface, so this could be implemented without changing core. Do you think that sounds feasible?

          Pie-in-the-sky wise, it would be great if HBase had a data model that was just a hierarchy with a configurable number of tables. Right now we have (row, col, ts) keys, where row is both the unit of splittability and the unit of atomicity. It would be cool if instead you could have a table with (foo, bar, baz, ts) where "foo" was the unit of split and "baz" was the unit of atomicity – or even do away with timestamps on a row. But that would be a rather giant change.

          Show
          Todd Lipcon added a comment - I'd rather see some ability to make the split row determination a pluggable interface, so this could be implemented without changing core. Do you think that sounds feasible? Pie-in-the-sky wise, it would be great if HBase had a data model that was just a hierarchy with a configurable number of tables. Right now we have (row, col, ts) keys, where row is both the unit of splittability and the unit of atomicity. It would be cool if instead you could have a table with (foo, bar, baz, ts) where "foo" was the unit of split and "baz" was the unit of atomicity – or even do away with timestamps on a row. But that would be a rather giant change.
          Hide
          stack added a comment -

          @Lars I'm 'intellectually' interested. I have no practical need. I'm more interested in our being able to support large rows (intra-row scanning, etc.); i.e. one row in a region and that region is huge and it works.

          Show
          stack added a comment - @Lars I'm 'intellectually' interested. I have no practical need. I'm more interested in our being able to support large rows (intra-row scanning, etc.); i.e. one row in a region and that region is huge and it works.
          Hide
          Lars Hofhansl added a comment -

          @Todd: Making split row determination pluggable is a good idea. Not sure how this would interact with the possibility of choosing explicit split points through the HBase shell, would need to disable that (unless the split policy supports it).

          Would also need to work on a per table basis.

          The pie-in-the-sky sounds cool and also like a rewrite of HBase. (You mean "bar" being the unit of atomicity - not "baz", right?

          @Stack: I hear you. Means that the client and data modeling would have to change completely if it needs transactions.

          Show
          Lars Hofhansl added a comment - @Todd: Making split row determination pluggable is a good idea. Not sure how this would interact with the possibility of choosing explicit split points through the HBase shell, would need to disable that (unless the split policy supports it). Would also need to work on a per table basis. The pie-in-the-sky sounds cool and also like a rewrite of HBase. (You mean "bar" being the unit of atomicity - not "baz", right? @Stack: I hear you. Means that the client and data modeling would have to change completely if it needs transactions.
          Hide
          stack added a comment -

          I hear you. Means that the client and data modeling would have to change completely if it needs transactions.

          Could this be done as a facade atop our current api?

          Show
          stack added a comment - I hear you. Means that the client and data modeling would have to change completely if it needs transactions. Could this be done as a facade atop our current api?
          Hide
          Lars Hofhansl added a comment -

          Potentially. I also filed HBASE-5304. Turns out that it is not that hard to make the split key policy pluggable per table.
          With that one can define a policy to keep ranges of rows together (regions still cannot overlap or have holes of course). After that we can extend HBASE-5203 to allow multiple rows (and fail cold at the region server if the rows are not colocated).

          Show
          Lars Hofhansl added a comment - Potentially. I also filed HBASE-5304 . Turns out that it is not that hard to make the split key policy pluggable per table. With that one can define a policy to keep ranges of rows together (regions still cannot overlap or have holes of course). After that we can extend HBASE-5203 to allow multiple rows (and fail cold at the region server if the rows are not colocated).
          Hide
          Lars Hofhansl added a comment -

          Now that HBASE-5304 is committed, I would to rekindle the discussion here.

          We have need to be able to co-locate some data for our tenant (for example all data of a specific user within a specific tenant space), in order to provide fast atomic operations.

          With HBASE-5304 it is now possible to provide a split policy that (within the HBase constraints, of course) allows to guide the split process to co-locate parts of the data. What is missing is a small change on top of HBASE-3584 to allow atomic cross row operations.
          See the first patch I have attached here - 5229.txt.
          I would change the RegionMutation to MultiRowMutation, but in principle it would be the same. A very small, safe, incremental change.

          Show
          Lars Hofhansl added a comment - Now that HBASE-5304 is committed, I would to rekindle the discussion here. We have need to be able to co-locate some data for our tenant (for example all data of a specific user within a specific tenant space), in order to provide fast atomic operations. With HBASE-5304 it is now possible to provide a split policy that (within the HBase constraints, of course) allows to guide the split process to co-locate parts of the data. What is missing is a small change on top of HBASE-3584 to allow atomic cross row operations. See the first patch I have attached here - 5229.txt. I would change the RegionMutation to MultiRowMutation, but in principle it would be the same. A very small, safe, incremental change.
          Hide
          Lars Hofhansl added a comment -

          Patch that works for me. Looks bigger than it is. Just refactors some code to make it usable for MultiRow ops.

          "Advanced use only" is prominently mentioned in the Javadoc, and it also advises to either disable splitting or use a custom RegionSplitPolicy.

          Show
          Lars Hofhansl added a comment - Patch that works for me. Looks bigger than it is. Just refactors some code to make it usable for MultiRow ops. "Advanced use only" is prominently mentioned in the Javadoc, and it also advises to either disable splitting or use a custom RegionSplitPolicy.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/
          -----------------------------------------------------------

          Review request for hbase.

          Summary
          -------

          This builds on HBASE-3584, HBASE-5203, and HBASE-5304.

          Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy).
          At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets.

          Obviously this is an advanced features and this prominently called out in the Javadoc.

          This addresses bug HBASE-5229.
          https://issues.apache.org/jira/browse/HBASE-5229

          Diffs


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1239953

          Diff: https://reviews.apache.org/r/3748/diff

          Testing
          -------

          Tests added to TestFromClientSide and TestAtomicOperation

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/ ----------------------------------------------------------- Review request for hbase. Summary ------- This builds on HBASE-3584 , HBASE-5203 , and HBASE-5304 . Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. Obviously this is an advanced features and this prominently called out in the Javadoc. This addresses bug HBASE-5229 . https://issues.apache.org/jira/browse/HBASE-5229 Diffs http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1239953 Diff: https://reviews.apache.org/r/3748/diff Testing ------- Tests added to TestFromClientSide and TestAtomicOperation Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4788
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
          <https://reviews.apache.org/r/3748/#comment10525>

          Should be ' to mutateRows'

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
          <https://reviews.apache.org/r/3748/#comment10526>

          Should read ' the mutations'

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
          <https://reviews.apache.org/r/3748/#comment10529>

          Can the two add() methods be combined into one which accepts Mutation ?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
          <https://reviews.apache.org/r/3748/#comment10527>

          Is this method thread-safe ?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
          <https://reviews.apache.org/r/3748/#comment10528>

          This comment can be removed, right ?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java
          <https://reviews.apache.org/r/3748/#comment10531>

          From its name, RowMutation seems to refer to single row. It is a little confusing RowMutation extends MultiRowMutation.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java
          <https://reviews.apache.org/r/3748/#comment10530>

          version would be read / written twice, right ?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <https://reviews.apache.org/r/3748/#comment10532>

          Should be 'within the region', right ?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <https://reviews.apache.org/r/3748/#comment10533>

          rowsToLock.size() could be smaller than mutations.size(), right ?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <https://reviews.apache.org/r/3748/#comment10535>

          Can we refer regionName from rm (the MultiRowMutation) ?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <https://reviews.apache.org/r/3748/#comment10534>

          Should be mutateRows

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
          <https://reviews.apache.org/r/3748/#comment10536>

          Should read "atomic mutation"

          • Ted
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4788 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java < https://reviews.apache.org/r/3748/#comment10525 > Should be ' to mutateRows' http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java < https://reviews.apache.org/r/3748/#comment10526 > Should read ' the mutations' http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java < https://reviews.apache.org/r/3748/#comment10529 > Can the two add() methods be combined into one which accepts Mutation ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java < https://reviews.apache.org/r/3748/#comment10527 > Is this method thread-safe ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java < https://reviews.apache.org/r/3748/#comment10528 > This comment can be removed, right ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java < https://reviews.apache.org/r/3748/#comment10531 > From its name, RowMutation seems to refer to single row. It is a little confusing RowMutation extends MultiRowMutation. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java < https://reviews.apache.org/r/3748/#comment10530 > version would be read / written twice, right ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < https://reviews.apache.org/r/3748/#comment10532 > Should be 'within the region', right ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < https://reviews.apache.org/r/3748/#comment10533 > rowsToLock.size() could be smaller than mutations.size(), right ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < https://reviews.apache.org/r/3748/#comment10535 > Can we refer regionName from rm (the MultiRowMutation) ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < https://reviews.apache.org/r/3748/#comment10534 > Should be mutateRows http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java < https://reviews.apache.org/r/3748/#comment10536 > Should read "atomic mutation" Ted
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-03 04:45:54, Ted Yu wrote:

          >

          The for the thourough review as usual.
          I'll have a new patch tomorrow.

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 60

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line60>

          >

          > Can the two add() methods be combined into one which accepts Mutation ?

          That is because there are other mutations that I do not support (Increment/Append).

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 65

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line65>

          >

          > This comment can be removed, right ?

          Yes

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java, line 39

          > <https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line39>

          >

          > From its name, RowMutation seems to refer to single row. It is a little confusing RowMutation extends MultiRowMutation.

          Yeah. Maybe I'll have a common super class instead.

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java, line 96

          > <https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line96>

          >

          > version would be read / written twice, right ?

          Yes. Need to fix that.

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4211

          > <https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4211>

          >

          > rowsToLock.size() could be smaller than mutations.size(), right ?

          Oh yes... Good point.

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java, line 3160

          > <https://reviews.apache.org/r/3748/diff/1/?file=72045#file72045line3160>

          >

          > Can we refer regionName from rm (the MultiRowMutation) ?

          Yes, because all rows on the MultiRowMutation need to be in the same region. HTable just uses the first Mutation to find the region to the used.

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 64

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line64>

          >

          > Is this method thread-safe ?

          Should be. Only uses local state or protected data structures (like put, get, etc)

          • Lars

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4788
          -----------------------------------------------------------

          On 2012-02-03 01:29:58, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3748/

          -----------------------------------------------------------

          (Updated 2012-02-03 01:29:58)

          Review request for hbase.

          Summary

          -------

          This builds on HBASE-3584, HBASE-5203, and HBASE-5304.

          Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy).

          At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets.

          Obviously this is an advanced features and this prominently called out in the Javadoc.

          This addresses bug HBASE-5229.

          https://issues.apache.org/jira/browse/HBASE-5229

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1239953

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1239953

          Diff: https://reviews.apache.org/r/3748/diff

          Testing

          -------

          Tests added to TestFromClientSide and TestAtomicOperation

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-03 04:45:54, Ted Yu wrote: > The for the thourough review as usual. I'll have a new patch tomorrow. On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 60 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line60 > > > Can the two add() methods be combined into one which accepts Mutation ? That is because there are other mutations that I do not support (Increment/Append). On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 65 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line65 > > > This comment can be removed, right ? Yes On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java , line 39 > < https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line39 > > > From its name, RowMutation seems to refer to single row. It is a little confusing RowMutation extends MultiRowMutation. Yeah. Maybe I'll have a common super class instead. On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java , line 96 > < https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line96 > > > version would be read / written twice, right ? Yes. Need to fix that. On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java , line 4211 > < https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4211 > > > rowsToLock.size() could be smaller than mutations.size(), right ? Oh yes... Good point. On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java , line 3160 > < https://reviews.apache.org/r/3748/diff/1/?file=72045#file72045line3160 > > > Can we refer regionName from rm (the MultiRowMutation) ? Yes, because all rows on the MultiRowMutation need to be in the same region. HTable just uses the first Mutation to find the region to the used. On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 64 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line64 > > > Is this method thread-safe ? Should be. Only uses local state or protected data structures (like put, get, etc) Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4788 ----------------------------------------------------------- On 2012-02-03 01:29:58, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/ ----------------------------------------------------------- (Updated 2012-02-03 01:29:58) Review request for hbase. Summary ------- This builds on HBASE-3584 , HBASE-5203 , and HBASE-5304 . Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. Obviously this is an advanced features and this prominently called out in the Javadoc. This addresses bug HBASE-5229 . https://issues.apache.org/jira/browse/HBASE-5229 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1239953 Diff: https://reviews.apache.org/r/3748/diff Testing ------- Tests added to TestFromClientSide and TestAtomicOperation Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4789
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
          <https://reviews.apache.org/r/3748/#comment10537>

          Should we check they are for different rows here?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java
          <https://reviews.apache.org/r/3748/#comment10543>

          This is called by default. No need to call it explicitly.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java
          <https://reviews.apache.org/r/3748/#comment10545>

          This is called by default. No need to call it explicitly.

          • Jimmy
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4789 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java < https://reviews.apache.org/r/3748/#comment10537 > Should we check they are for different rows here? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java < https://reviews.apache.org/r/3748/#comment10543 > This is called by default. No need to call it explicitly. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java < https://reviews.apache.org/r/3748/#comment10545 > This is called by default. No need to call it explicitly. Jimmy
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4791
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <https://reviews.apache.org/r/3748/#comment10547>

          My point was we shouldn't throw exception in this case.
          MultiRowMutation should be delivered to the correct region.

          I think we agree on the above.

          • Ted
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4791 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < https://reviews.apache.org/r/3748/#comment10547 > My point was we shouldn't throw exception in this case. MultiRowMutation should be delivered to the correct region. I think we agree on the above. Ted
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-03 05:01:34, Jimmy Xiang wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 67

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line67>

          >

          > Should we check they are for different rows here?

          It is still valid to pass the same rows here.

          On 2012-02-03 05:01:34, Jimmy Xiang wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java, line 46

          > <https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line46>

          >

          > This is called by default. No need to call it explicitly.

          rights... will fix.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-03 05:01:34, Jimmy Xiang wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 67 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line67 > > > Should we check they are for different rows here? It is still valid to pass the same rows here. On 2012-02-03 05:01:34, Jimmy Xiang wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java , line 46 > < https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line46 > > > This is called by default. No need to call it explicitly. rights... will fix. Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-03 05:07:24, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java, line 3160

          > <https://reviews.apache.org/r/3748/diff/1/?file=72045#file72045line3160>

          >

          > My point was we shouldn't throw exception in this case.

          > MultiRowMutation should be delivered to the correct region.

          >

          > I think we agree on the above.

          Oh... MultiRowMutation does not know anything about regions. So we have the use the one passed.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-03 05:07:24, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java , line 3160 > < https://reviews.apache.org/r/3748/diff/1/?file=72045#file72045line3160 > > > My point was we shouldn't throw exception in this case. > MultiRowMutation should be delivered to the correct region. > > I think we agree on the above. Oh... MultiRowMutation does not know anything about regions. So we have the use the one passed. Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java, line 39

          > <https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line39>

          >

          > From its name, RowMutation seems to refer to single row. It is a little confusing RowMutation extends MultiRowMutation.

          Lars Hofhansl wrote:

          Yeah. Maybe I'll have a common super class instead.

          On 2nd thought. RowMutation is specialisation of MultiRowMutation, one that only allows a single row.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java , line 39 > < https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line39 > > > From its name, RowMutation seems to refer to single row. It is a little confusing RowMutation extends MultiRowMutation. Lars Hofhansl wrote: Yeah. Maybe I'll have a common super class instead. On 2nd thought. RowMutation is specialisation of MultiRowMutation, one that only allows a single row. Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java, line 96

          > <https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line96>

          >

          > version would be read / written twice, right ?

          Lars Hofhansl wrote:

          Yes. Need to fix that.

          And actually this is what I had in mind too. Both MultiRowMutation and RowMution write their version.
          MultiRowMutation knowns to serialize mutations, that process needs to be versioned.
          RowMutation adds some fields on top of that, those also need to be versioned.

          Maybe the two should be entirely independent (at the expense of a few duplicated lines of code) so that they can evolve independently.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java , line 96 > < https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line96 > > > version would be read / written twice, right ? Lars Hofhansl wrote: Yes. Need to fix that. And actually this is what I had in mind too. Both MultiRowMutation and RowMution write their version. MultiRowMutation knowns to serialize mutations, that process needs to be versioned. RowMutation adds some fields on top of that, those also need to be versioned. Maybe the two should be entirely independent (at the expense of a few duplicated lines of code) so that they can evolve independently. Lars
          Hide
          Ted Yu added a comment -

          Maybe the two should be entirely independent (at the expense of a few duplicated lines of code) so that they can evolve independently.

          That is better.

          Show
          Ted Yu added a comment - Maybe the two should be entirely independent (at the expense of a few duplicated lines of code) so that they can evolve independently. That is better.
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-03 05:01:34, Jimmy Xiang wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 67

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line67>

          >

          > Should we check they are for different rows here?

          Lars Hofhansl wrote:

          It is still valid to pass the same rows here.

          That's right. Should the server side share the same function to do MultiRowMutation and RowMutation? Looks the server side doesn't have much difference for mutateRow() and mutateRows()

          • Jimmy
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-03 05:01:34, Jimmy Xiang wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 67 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line67 > > > Should we check they are for different rows here? Lars Hofhansl wrote: It is still valid to pass the same rows here. That's right. Should the server side share the same function to do MultiRowMutation and RowMutation? Looks the server side doesn't have much difference for mutateRow() and mutateRows() Jimmy
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/
          -----------------------------------------------------------

          (Updated 2012-02-03 19:59:55.515243)

          Review request for hbase.

          Changes
          -------

          Addressed Ted's and Jimmy's comments.
          Also:

          • RowMutation and MultiRowMutation are evolved independently (at the expense of some replicated code)
          • MultiRowMutation cannot participate itself in a Multi operation (because it does not implement Row). Could be made working, but would need some refactoring in a different jira.

          Summary
          -------

          This builds on HBASE-3584, HBASE-5203, and HBASE-5304.

          Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy).
          At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets.

          Obviously this is an advanced features and this prominently called out in the Javadoc.

          This addresses bug HBASE-5229.
          https://issues.apache.org/jira/browse/HBASE-5229

          Diffs (updated)


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1239953
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1239953

          Diff: https://reviews.apache.org/r/3748/diff

          Testing
          -------

          Tests added to TestFromClientSide and TestAtomicOperation

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/ ----------------------------------------------------------- (Updated 2012-02-03 19:59:55.515243) Review request for hbase. Changes ------- Addressed Ted's and Jimmy's comments. Also: RowMutation and MultiRowMutation are evolved independently (at the expense of some replicated code) MultiRowMutation cannot participate itself in a Multi operation (because it does not implement Row). Could be made working, but would need some refactoring in a different jira. Summary ------- This builds on HBASE-3584 , HBASE-5203 , and HBASE-5304 . Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. Obviously this is an advanced features and this prominently called out in the Javadoc. This addresses bug HBASE-5229 . https://issues.apache.org/jira/browse/HBASE-5229 Diffs (updated) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1239953 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1239953 Diff: https://reviews.apache.org/r/3748/diff Testing ------- Tests added to TestFromClientSide and TestAtomicOperation Thanks, Lars
          Hide
          Ted Yu added a comment -

          Latest patch looks much cleaner.
          Shall we let Hadoop QA run over the patch ?

          Show
          Ted Yu added a comment - Latest patch looks much cleaner. Shall we let Hadoop QA run over the patch ?
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-03 05:01:34, Jimmy Xiang wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 67

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line67>

          >

          > Should we check they are for different rows here?

          Lars Hofhansl wrote:

          It is still valid to pass the same rows here.

          Jimmy Xiang wrote:

          That's right. Should the server side share the same function to do MultiRowMutation and RowMutation? Looks the server side doesn't have much difference for mutateRow() and mutateRows()

          Oh... I misunderstood what you were saying.
          On the server it is mostly the same code. On the client I would like to keep them separate features as RowMutation (hopefully) is a pretty standard feature, whereas MultiRowMutation is pretty advanced (need to manage your Splits, etc) and really treads new HBase ground.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-03 05:01:34, Jimmy Xiang wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 67 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line67 > > > Should we check they are for different rows here? Lars Hofhansl wrote: It is still valid to pass the same rows here. Jimmy Xiang wrote: That's right. Should the server side share the same function to do MultiRowMutation and RowMutation? Looks the server side doesn't have much difference for mutateRow() and mutateRows() Oh... I misunderstood what you were saying. On the server it is mostly the same code. On the client I would like to keep them separate features as RowMutation (hopefully) is a pretty standard feature, whereas MultiRowMutation is pretty advanced (need to manage your Splits, etc) and really treads new HBase ground. Lars
          Hide
          Lars Hofhansl added a comment -

          @Ted... Getting test run in.

          Show
          Lars Hofhansl added a comment - @Ted... Getting test run in.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12513190/5229-multiRow-v2.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.io.hfile.TestHFileBlock
          org.apache.hadoop.hbase.mapreduce.TestImportTsv

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/902//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/902//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/902//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513190/5229-multiRow-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.io.hfile.TestHFileBlock org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/902//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/902//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/902//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          Test run looks good.

          Show
          Lars Hofhansl added a comment - Test run looks good.
          Hide
          Lars Hofhansl added a comment -

          Is there still general trepidation around this?
          The patch is very lightweight and does not change any existing functionality. With HBASE-5304 it can be quite helpful to provide local transaction to help in multi-tenant settings.

          We can possibly call out the "advancedness" of this feature, maybe by having an "advanced htable" (or something like this). Not sure that is worth splitting the API, though.

          Show
          Lars Hofhansl added a comment - Is there still general trepidation around this? The patch is very lightweight and does not change any existing functionality. With HBASE-5304 it can be quite helpful to provide local transaction to help in multi-tenant settings. We can possibly call out the "advancedness" of this feature, maybe by having an "advanced htable" (or something like this). Not sure that is worth splitting the API, though.
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4822
          -----------------------------------------------------------

          A couple of nits and small implementation details, but overall looks pretty good.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
          <https://reviews.apache.org/r/3748/#comment10588>

          I think is this unnecessary, javadoc should handle inheriting the docs.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
          <https://reviews.apache.org/r/3748/#comment10587>

          or presplitting as is described in other documenttation.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
          <https://reviews.apache.org/r/3748/#comment10586>

          Probably want to wrap NOTE in <b> tags to call it out.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
          <https://reviews.apache.org/r/3748/#comment10590>

          A javadoc here might be nice to indicate that the nullary constructor is actually completely ok to use (as opposed to the more common state of being reserved for readFields).

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
          <https://reviews.apache.org/r/3748/#comment10585>

          Even though it uses protected structures doesn't mean that its necessarily thread safe. In fact, because it is using the standard ArrayList, there is no guarantee of safety. Either the class should be marked as not thread safe OR the mutations should be wrapped as a concurrent list.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java
          <https://reviews.apache.org/r/3748/#comment10591>

          You really don't need to keep around the row anymore either because you can get that from the mutations as you already do mutateRows with MultiRowMutation. Its nice to store it, but is only going to be checked infrequently and saves you a little bit over the wire (which could add up, depending on row size).

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <https://reviews.apache.org/r/3748/#comment10592>

          Suprised this isn't a utility method in HRegion - it seems really useful. Maybe worth pulling out for general use.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <https://reviews.apache.org/r/3748/#comment10593>

          This isn't actually true, right? With multirow, you are actually going to lock more than one row (and the lockId null seems kind of a hack around that as it is always null, so far).

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <https://reviews.apache.org/r/3748/#comment10594>

          nit: lockID rather than just lid would be slightly descriptive.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <https://reviews.apache.org/r/3748/#comment10595>

          But in the comments on the MultiRowMutation you push that checking off onto the RS, so no checking really happens then (except, I guess when you try to mutate rows on the region and it fails b/c those rows aren't there, but that seems kinda late for the check).

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <https://reviews.apache.org/r/3748/#comment10596>

          Wow, this is ugly. Maybe we should consider some refactoring of this later?

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          <https://reviews.apache.org/r/3748/#comment10597>

          This class can get easily bloated as we add more types. Might be worth considering refactoring this out into its own test.

          • Jesse
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4822 ----------------------------------------------------------- A couple of nits and small implementation details, but overall looks pretty good. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java < https://reviews.apache.org/r/3748/#comment10588 > I think is this unnecessary, javadoc should handle inheriting the docs. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java < https://reviews.apache.org/r/3748/#comment10587 > or presplitting as is described in other documenttation. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java < https://reviews.apache.org/r/3748/#comment10586 > Probably want to wrap NOTE in <b> tags to call it out. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java < https://reviews.apache.org/r/3748/#comment10590 > A javadoc here might be nice to indicate that the nullary constructor is actually completely ok to use (as opposed to the more common state of being reserved for readFields). http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java < https://reviews.apache.org/r/3748/#comment10585 > Even though it uses protected structures doesn't mean that its necessarily thread safe. In fact, because it is using the standard ArrayList, there is no guarantee of safety. Either the class should be marked as not thread safe OR the mutations should be wrapped as a concurrent list. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java < https://reviews.apache.org/r/3748/#comment10591 > You really don't need to keep around the row anymore either because you can get that from the mutations as you already do mutateRows with MultiRowMutation. Its nice to store it, but is only going to be checked infrequently and saves you a little bit over the wire (which could add up, depending on row size). http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < https://reviews.apache.org/r/3748/#comment10592 > Suprised this isn't a utility method in HRegion - it seems really useful. Maybe worth pulling out for general use. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < https://reviews.apache.org/r/3748/#comment10593 > This isn't actually true, right? With multirow, you are actually going to lock more than one row (and the lockId null seems kind of a hack around that as it is always null, so far). http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < https://reviews.apache.org/r/3748/#comment10594 > nit: lockID rather than just lid would be slightly descriptive. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < https://reviews.apache.org/r/3748/#comment10595 > But in the comments on the MultiRowMutation you push that checking off onto the RS, so no checking really happens then (except, I guess when you try to mutate rows on the region and it fails b/c those rows aren't there, but that seems kinda late for the check). http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < https://reviews.apache.org/r/3748/#comment10596 > Wow, this is ugly. Maybe we should consider some refactoring of this later? http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java < https://reviews.apache.org/r/3748/#comment10597 > This class can get easily bloated as we add more types. Might be worth considering refactoring this out into its own test. Jesse
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > A couple of nits and small implementation details, but overall looks pretty good.

          You're looking at an old version of the patch.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java, line 3160

          > <https://reviews.apache.org/r/3748/diff/1/?file=72045#file72045line3160>

          >

          > But in the comments on the MultiRowMutation you push that checking off onto the RS, so no checking really happens then (except, I guess when you try to mutate rows on the region and it fails b/c those rows aren't there, but that seems kinda late for the check).

          Checking is happening the region.internalMutate.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 786

          > <https://reviews.apache.org/r/3748/diff/1/?file=72037#file72037line786>

          >

          > I think is this unnecessary, javadoc should handle inheriting the docs.

          It's done elsewhere, it is good to call out that no doc was added here, because the interface has the doc.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java, line 284

          > <https://reviews.apache.org/r/3748/diff/1/?file=72038#file72038line284>

          >

          > or presplitting as is described in other documenttation.

          Yes, should add this.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 35

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line35>

          >

          > Probably want to wrap NOTE in <b> tags to call it out.

          Sure.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 45

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line45>

          >

          > A javadoc here might be nice to indicate that the nullary constructor is actually completely ok to use (as opposed to the more common state of being reserved for readFields).

          Good point. Although unless it is called out that you cannot use a constructor there should be no reason whyt you couldn't.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 64

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line64>

          >

          > Even though it uses protected structures doesn't mean that its necessarily thread safe. In fact, because it is using the standard ArrayList, there is no guarantee of safety. Either the class should be marked as not thread safe OR the mutations should be wrapped as a concurrent list.

          I disagree.
          This is a client side object and none of the client side objects are threadsafe nor should they be (see Put.java/Delete.java/Increment.java/Append.java/etc), that's the task of client application.

          I misread Ted's comment before, of course this method is not threadsafe.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java, line 95

          > <https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line95>

          >

          > You really don't need to keep around the row anymore either because you can get that from the mutations as you already do mutateRows with MultiRowMutation. Its nice to store it, but is only going to be checked infrequently and saves you a little bit over the wire (which could add up, depending on row size).

          Same as Put and Delete (where every KV already has the row).
          There is room optimization, but this is not the jira to do that.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4161

          > <https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4161>

          >

          > Suprised this isn't a utility method in HRegion - it seems really useful. Maybe worth pulling out for general use.

          internalMutate?

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4181

          > <https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4181>

          >

          > This isn't actually true, right? With multirow, you are actually going to lock more than one row (and the lockId null seems kind of a hack around that as it is always null, so far).

          lockId could be passed to use one lock to lock all rows. Not used, yet, but still useful.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4214

          > <https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4214>

          >

          > nit: lockID rather than just lid would be slightly descriptive.

          getLock is pretty clear, so is rowsToLock.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java, line 3346

          > <https://reviews.apache.org/r/3748/diff/1/?file=72045#file72045line3346>

          >

          > Wow, this is ugly. Maybe we should consider some refactoring of this later?

          Not only ugly, but also wrong (see 2nd version of the patch). MultiRowMutation does not implement Row so it cannot be part of a Multi action.

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java, line 272

          > <https://reviews.apache.org/r/3748/diff/1/?file=72048#file72048line272>

          >

          > This class can get easily bloated as we add more types. Might be worth considering refactoring this out into its own test.

          See 2nd version of patch.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-05 07:26:08, Jesse Yates wrote: > A couple of nits and small implementation details, but overall looks pretty good. You're looking at an old version of the patch. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java , line 3160 > < https://reviews.apache.org/r/3748/diff/1/?file=72045#file72045line3160 > > > But in the comments on the MultiRowMutation you push that checking off onto the RS, so no checking really happens then (except, I guess when you try to mutate rows on the region and it fails b/c those rows aren't there, but that seems kinda late for the check). Checking is happening the region.internalMutate. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java , line 786 > < https://reviews.apache.org/r/3748/diff/1/?file=72037#file72037line786 > > > I think is this unnecessary, javadoc should handle inheriting the docs. It's done elsewhere, it is good to call out that no doc was added here, because the interface has the doc. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java , line 284 > < https://reviews.apache.org/r/3748/diff/1/?file=72038#file72038line284 > > > or presplitting as is described in other documenttation. Yes, should add this. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 35 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line35 > > > Probably want to wrap NOTE in <b> tags to call it out. Sure. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 45 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line45 > > > A javadoc here might be nice to indicate that the nullary constructor is actually completely ok to use (as opposed to the more common state of being reserved for readFields). Good point. Although unless it is called out that you cannot use a constructor there should be no reason whyt you couldn't. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 64 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line64 > > > Even though it uses protected structures doesn't mean that its necessarily thread safe. In fact, because it is using the standard ArrayList, there is no guarantee of safety. Either the class should be marked as not thread safe OR the mutations should be wrapped as a concurrent list. I disagree. This is a client side object and none of the client side objects are threadsafe nor should they be (see Put.java/Delete.java/Increment.java/Append.java/etc), that's the task of client application. I misread Ted's comment before, of course this method is not threadsafe. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java , line 95 > < https://reviews.apache.org/r/3748/diff/1/?file=72040#file72040line95 > > > You really don't need to keep around the row anymore either because you can get that from the mutations as you already do mutateRows with MultiRowMutation. Its nice to store it, but is only going to be checked infrequently and saves you a little bit over the wire (which could add up, depending on row size). Same as Put and Delete (where every KV already has the row). There is room optimization, but this is not the jira to do that. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java , line 4161 > < https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4161 > > > Suprised this isn't a utility method in HRegion - it seems really useful. Maybe worth pulling out for general use. internalMutate? On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java , line 4181 > < https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4181 > > > This isn't actually true, right? With multirow, you are actually going to lock more than one row (and the lockId null seems kind of a hack around that as it is always null, so far). lockId could be passed to use one lock to lock all rows. Not used, yet, but still useful. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java , line 4214 > < https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4214 > > > nit: lockID rather than just lid would be slightly descriptive. getLock is pretty clear, so is rowsToLock. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java , line 3346 > < https://reviews.apache.org/r/3748/diff/1/?file=72045#file72045line3346 > > > Wow, this is ugly. Maybe we should consider some refactoring of this later? Not only ugly, but also wrong (see 2nd version of the patch). MultiRowMutation does not implement Row so it cannot be part of a Multi action. On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java , line 272 > < https://reviews.apache.org/r/3748/diff/1/?file=72048#file72048line272 > > > This class can get easily bloated as we add more types. Might be worth considering refactoring this out into its own test. See 2nd version of patch. Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-03 04:45:54, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java, line 64

          > <https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line64>

          >

          > Is this method thread-safe ?

          Lars Hofhansl wrote:

          Should be. Only uses local state or protected data structures (like put, get, etc)

          Misread your comment Ted. This method is indeed not thread-safe nor should it be. This is a client side object and it up to the application to provide thread safety (which is identical to Put/Delete/Append/Increment, and indeed HTable).

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-03 04:45:54, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java , line 64 > < https://reviews.apache.org/r/3748/diff/1/?file=72039#file72039line64 > > > Is this method thread-safe ? Lars Hofhansl wrote: Should be. Only uses local state or protected data structures (like put, get, etc) Misread your comment Ted. This method is indeed not thread-safe nor should it be. This is a client side object and it up to the application to provide thread safety (which is identical to Put/Delete/Append/Increment, and indeed HTable). Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4161

          > <https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4161>

          >

          > Suprised this isn't a utility method in HRegion - it seems really useful. Maybe worth pulling out for general use.

          Lars Hofhansl wrote:

          internalMutate?

          Ah misread again. There is a utility method: checkRow, which I should indeed use. The first incarnation of this did something different, now checkRow should be used.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java , line 4161 > < https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4161 > > > Suprised this isn't a utility method in HRegion - it seems really useful. Maybe worth pulling out for general use. Lars Hofhansl wrote: internalMutate? Ah misread again. There is a utility method: checkRow, which I should indeed use. The first incarnation of this did something different, now checkRow should be used. Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-05 07:26:08, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4161

          > <https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4161>

          >

          > Suprised this isn't a utility method in HRegion - it seems really useful. Maybe worth pulling out for general use.

          Lars Hofhansl wrote:

          internalMutate?

          Lars Hofhansl wrote:

          Ah misread again. There is a utility method: checkRow, which I should indeed use. The first incarnation of this did something different, now checkRow should be used.

          Remember now. checkRow throws WrongRegionException, which means the client will retry. Here I throws DoNotRetryException, because retrying will be wrong.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-05 07:26:08, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java , line 4161 > < https://reviews.apache.org/r/3748/diff/1/?file=72044#file72044line4161 > > > Suprised this isn't a utility method in HRegion - it seems really useful. Maybe worth pulling out for general use. Lars Hofhansl wrote: internalMutate? Lars Hofhansl wrote: Ah misread again. There is a utility method: checkRow, which I should indeed use. The first incarnation of this did something different, now checkRow should be used. Remember now. checkRow throws WrongRegionException, which means the client will retry. Here I throws DoNotRetryException, because retrying will be wrong. Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4833
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <https://reviews.apache.org/r/3748/#comment10616>

          Do we need to be sorting rowsToLock?

          I'm thinking of multiple concurrent mutateRows operation, trying to lock the same set of rows.

          Perhaps, throwing IOException is going to prevent us from a situation where we end up with a deadlock. But, we still might want to sort it to ensure (better) progress (no livelock).

          • Amitanand
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4833 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < https://reviews.apache.org/r/3748/#comment10616 > Do we need to be sorting rowsToLock? I'm thinking of multiple concurrent mutateRows operation, trying to lock the same set of rows. Perhaps, throwing IOException is going to prevent us from a situation where we end up with a deadlock. But, we still might want to sort it to ensure (better) progress (no livelock). Amitanand
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-06 15:52:43, Amitanand Aiyer wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4212

          > <https://reviews.apache.org/r/3748/diff/2/?file=72266#file72266line4212>

          >

          > Do we need to be sorting rowsToLock?

          >

          > I'm thinking of multiple concurrent mutateRows operation, trying to lock the same set of rows.

          >

          > Perhaps, throwing IOException is going to prevent us from a situation where we end up with a deadlock. But, we still might want to sort it to ensure (better) progress (no livelock).

          MutateRows sorts them (by using a TreeSet with Bytes.BYTES_COMPARATOR, for exactly this reason.
          Maybe this should be called out here, by making the argument a SortedSet.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-06 15:52:43, Amitanand Aiyer wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java , line 4212 > < https://reviews.apache.org/r/3748/diff/2/?file=72266#file72266line4212 > > > Do we need to be sorting rowsToLock? > > I'm thinking of multiple concurrent mutateRows operation, trying to lock the same set of rows. > > Perhaps, throwing IOException is going to prevent us from a situation where we end up with a deadlock. But, we still might want to sort it to ensure (better) progress (no livelock). MutateRows sorts them (by using a TreeSet with Bytes.BYTES_COMPARATOR, for exactly this reason. Maybe this should be called out here, by making the argument a SortedSet. Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/
          -----------------------------------------------------------

          (Updated 2012-02-06 19:51:58.341235)

          Review request for hbase.

          Changes
          -------

          Addressed a few comments.
          In addition the client can retry a MultiRowMutation if the first row is not available in the region (as that might indicate that the region moved).

          Summary
          -------

          This builds on HBASE-3584, HBASE-5203, and HBASE-5304.

          Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy).
          At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets.

          Obviously this is an advanced features and this prominently called out in the Javadoc.

          This addresses bug HBASE-5229.
          https://issues.apache.org/jira/browse/HBASE-5229

          Diffs (updated)


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241120
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1241120
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241120
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241120
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1241120
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1241120
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241120
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1241120
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1241120
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1241120

          Diff: https://reviews.apache.org/r/3748/diff

          Testing
          -------

          Tests added to TestFromClientSide and TestAtomicOperation

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/ ----------------------------------------------------------- (Updated 2012-02-06 19:51:58.341235) Review request for hbase. Changes ------- Addressed a few comments. In addition the client can retry a MultiRowMutation if the first row is not available in the region (as that might indicate that the region moved). Summary ------- This builds on HBASE-3584 , HBASE-5203 , and HBASE-5304 . Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. Obviously this is an advanced features and this prominently called out in the Javadoc. This addresses bug HBASE-5229 . https://issues.apache.org/jira/browse/HBASE-5229 Diffs (updated) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1241120 Diff: https://reviews.apache.org/r/3748/diff Testing ------- Tests added to TestFromClientSide and TestAtomicOperation Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4844
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <https://reviews.apache.org/r/3748/#comment10643>

          What if rm contains more than one Mutation ?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <https://reviews.apache.org/r/3748/#comment10642>

          else is not needed considering exception is thrown on line 4170.

          • Ted
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4844 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < https://reviews.apache.org/r/3748/#comment10643 > What if rm contains more than one Mutation ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < https://reviews.apache.org/r/3748/#comment10642 > else is not needed considering exception is thrown on line 4170. Ted
          Hide
          Ted Yu added a comment -

          For my first comment, RowMutation maintains single row in internalAdd().
          So it should be fine passing the row directly to internalMutate().

          Show
          Ted Yu added a comment - For my first comment, RowMutation maintains single row in internalAdd(). So it should be fine passing the row directly to internalMutate().
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-06 20:24:17, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4152

          > <https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4152>

          >

          > What if rm contains more than one Mutation ?

          Hopefully rm does contain more than one Mutation, otherwise using this API is pointless.
          It is guranteed, though, that all Mutations are for this single row.

          Do you see a concern?

          On 2012-02-06 20:24:17, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4171

          > <https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4171>

          >

          > else is not needed considering exception is thrown on line 4170.

          Right. But this makes the flow clear. Personally I am not a big fan of having to look through code and having to piece together the control flow by tracking exceptions and return statements.
          I don't mind changing it, though.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-06 20:24:17, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java , line 4152 > < https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4152 > > > What if rm contains more than one Mutation ? Hopefully rm does contain more than one Mutation, otherwise using this API is pointless. It is guranteed, though, that all Mutations are for this single row. Do you see a concern? On 2012-02-06 20:24:17, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java , line 4171 > < https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4171 > > > else is not needed considering exception is thrown on line 4170. Right. But this makes the flow clear. Personally I am not a big fan of having to look through code and having to piece together the control flow by tracking exceptions and return statements. I don't mind changing it, though. Lars
          Hide
          Lars Hofhansl added a comment -

          Another option I thought of to make this interface less public, is to not add mutateRows to HRegionInterface and HTableInterface/HTable, but rather to RegionServerServices and hence allow access from Coprocessor endpoints, only. HRegionServer would still have a public mutateRows method (or maybe a more generalized version of it) but it will be only accessible to coprocessors.

          If that sounds like a good idea, I'll update the patch to do that.

          Show
          Lars Hofhansl added a comment - Another option I thought of to make this interface less public, is to not add mutateRows to HRegionInterface and HTableInterface/HTable, but rather to RegionServerServices and hence allow access from Coprocessor endpoints, only. HRegionServer would still have a public mutateRows method (or maybe a more generalized version of it) but it will be only accessible to coprocessors. If that sounds like a good idea, I'll update the patch to do that.
          Hide
          Lars Hofhansl added a comment -

          It's even simpler. A coprocessor endpoint has access to its region. If I rename internalMutate from my patch to mutateRowsWithLocks(List<Mutations> mutations, Set<String> rowsToLock) and make it public in HRegion, it can be called from a coprocessor endpoint.
          It would not exposed in HRegionInterface, HRegionServer, RegionServerServices, or the HTableInterfaces.

          Show
          Lars Hofhansl added a comment - It's even simpler. A coprocessor endpoint has access to its region. If I rename internalMutate from my patch to mutateRowsWithLocks(List<Mutations> mutations, Set<String> rowsToLock) and make it public in HRegion, it can be called from a coprocessor endpoint. It would not exposed in HRegionInterface, HRegionServer, RegionServerServices, or the HTableInterfaces.
          Hide
          Lars Hofhansl added a comment -

          Here's what I had in mind.
          The patch is essentially the same (a slight refactoring of the mutateRow code), but without changes to HTable[Interface], HRegionServer, HRegionInterface, etc.

          mutateRowsWithLocks is now public on HRegion so that coprocessor endpoints have access to it.

          The test classes MultiRowMutation

          {Protocol|Endpoint}

          illustrate how one could write a coprocessor using this feature.
          (I could move these two to HBase proper into org.apache.hadoop.hbase.coprocessor).

          TestFromClientSide.testMultiRowMutation illustrates how this would look from a client.

          Since coprocessor already are region aware, this is a nice match of APIs.

          I think this addresses all objections I have noted so far.
          Please have a look and let me know what you think. I would like to commit this version (or one very similar to it).

          Show
          Lars Hofhansl added a comment - Here's what I had in mind. The patch is essentially the same (a slight refactoring of the mutateRow code), but without changes to HTable [Interface] , HRegionServer, HRegionInterface, etc. mutateRowsWithLocks is now public on HRegion so that coprocessor endpoints have access to it. The test classes MultiRowMutation {Protocol|Endpoint} illustrate how one could write a coprocessor using this feature. (I could move these two to HBase proper into org.apache.hadoop.hbase.coprocessor). TestFromClientSide.testMultiRowMutation illustrates how this would look from a client. Since coprocessor already are region aware, this is a nice match of APIs. I think this addresses all objections I have noted so far. Please have a look and let me know what you think. I would like to commit this version (or one very similar to it).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12513647/5229-endpoint.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/912//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/912//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/912//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513647/5229-endpoint.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/912//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/912//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/912//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Latest patch is of decent size.
          Can you update the patch on review board ?

          Thanks

          Show
          Ted Yu added a comment - Latest patch is of decent size. Can you update the patch on review board ? Thanks
          Hide
          Todd Lipcon added a comment -

          Also, since this extends the semantics exposed by HBase we should probably point it out widely on the dev list to make sure everyone agrees with the direction.

          Show
          Todd Lipcon added a comment - Also, since this extends the semantics exposed by HBase we should probably point it out widely on the dev list to make sure everyone agrees with the direction.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/
          -----------------------------------------------------------

          (Updated 2012-02-07 19:41:16.600336)

          Review request for hbase.

          Changes
          -------

          This is essentially the same patch with the following differences:

          • all changes to HTable, HTableInterface, HRegionServer, HRegionInterface removed
          • internalMutate made public and renamed to mutateRowsWithLocks

          The idea is that mutateRowsWithLocks is only available to coprocessors, and not to the client HTableInterface API.
          Coprocessors already need to be region aware, so this is a nice match of APIs. The test classes MulitRowMutationProtocol and MulitRowMutationEndPoint serve as an example of how to do that.

          TestFromClientSide.testMultiRowMutation illustrates how the client API would like (once a coprocessor is in place)

          Summary
          -------

          This builds on HBASE-3584, HBASE-5203, and HBASE-5304.

          Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy).
          At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets.

          Obviously this is an advanced features and this prominently called out in the Javadoc.

          This addresses bug HBASE-5229.
          https://issues.apache.org/jira/browse/HBASE-5229

          Diffs (updated)


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241350
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241350
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationEndpoint.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationProtocol.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241350
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241350

          Diff: https://reviews.apache.org/r/3748/diff

          Testing
          -------

          Tests added to TestFromClientSide and TestAtomicOperation

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/ ----------------------------------------------------------- (Updated 2012-02-07 19:41:16.600336) Review request for hbase. Changes ------- This is essentially the same patch with the following differences: all changes to HTable, HTableInterface, HRegionServer, HRegionInterface removed internalMutate made public and renamed to mutateRowsWithLocks The idea is that mutateRowsWithLocks is only available to coprocessors, and not to the client HTableInterface API. Coprocessors already need to be region aware, so this is a nice match of APIs. The test classes MulitRowMutationProtocol and MulitRowMutationEndPoint serve as an example of how to do that. TestFromClientSide.testMultiRowMutation illustrates how the client API would like (once a coprocessor is in place) Summary ------- This builds on HBASE-3584 , HBASE-5203 , and HBASE-5304 . Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. Obviously this is an advanced features and this prominently called out in the Javadoc. This addresses bug HBASE-5229 . https://issues.apache.org/jira/browse/HBASE-5229 Diffs (updated) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241350 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241350 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationEndpoint.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationProtocol.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241350 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241350 Diff: https://reviews.apache.org/r/3748/diff Testing ------- Tests added to TestFromClientSide and TestAtomicOperation Thanks, Lars
          Hide
          Lars Hofhansl added a comment -

          @Todd: The last patch only provides a hook for a coprocessor endpoint to call (and the hook itself is just a more flexible way to call an existing method).

          So in order to use this, a user would need to write and install a coprocessor endpoint.

          Coprocessors can already do whatever they want, and that includes modifying multiple rows (but this feature cannot be done efficiently or correctly, hence the small core change).

          Only HRegion is changed (20 lines without comments), the rest is test code.

          Show
          Lars Hofhansl added a comment - @Todd: The last patch only provides a hook for a coprocessor endpoint to call (and the hook itself is just a more flexible way to call an existing method). So in order to use this, a user would need to write and install a coprocessor endpoint. Coprocessors can already do whatever they want, and that includes modifying multiple rows (but this feature cannot be done efficiently or correctly, hence the small core change). Only HRegion is changed (20 lines without comments), the rest is test code.
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4876
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationProtocol.java
          <https://reviews.apache.org/r/3748/#comment10746>

          Please add javadoc.

          • Ted
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4876 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationProtocol.java < https://reviews.apache.org/r/3748/#comment10746 > Please add javadoc. Ted
          Hide
          Lars Hofhansl added a comment -

          @Ted: Will do. Any other comments/converns?

          Show
          Lars Hofhansl added a comment - @Ted: Will do. Any other comments/converns?
          Hide
          Ted Yu added a comment -

          The patch looks clean.

          Other people's comments are important, too.

          Show
          Ted Yu added a comment - The patch looks clean. Other people's comments are important, too.
          Hide
          Todd Lipcon added a comment -

          Gotcha - I hadn't looked at the most recent patch. Seems reasonable enough.

          Show
          Todd Lipcon added a comment - Gotcha - I hadn't looked at the most recent patch. Seems reasonable enough.
          Hide
          Lars Hofhansl added a comment -

          Maybe I should file a new jira for the latest change and let this one rest... It seems like it went through too many iterations already.

          Show
          Lars Hofhansl added a comment - Maybe I should file a new jira for the latest change and let this one rest... It seems like it went through too many iterations already.
          Hide
          Lars Hofhansl added a comment -

          Oh and also, I could move the test coprocessor into org.apache.hadoop.hbase.coprocessor, so the endpoint for multirow mutations would be there as part of the HBase install, and a user would just need to load it (similar to what we do with the Aggregation Coprocessor).
          Would that make it too official? (I'm fine leaving it just in test, too)

          Show
          Lars Hofhansl added a comment - Oh and also, I could move the test coprocessor into org.apache.hadoop.hbase.coprocessor, so the endpoint for multirow mutations would be there as part of the HBase install, and a user would just need to load it (similar to what we do with the Aggregation Coprocessor). Would that make it too official? (I'm fine leaving it just in test, too)
          Hide
          Ted Yu added a comment -

          I am fine with moving MultiRowMutationEndpoint into org.apache.hadoop.hbase.coprocessor

          Show
          Ted Yu added a comment - I am fine with moving MultiRowMutationEndpoint into org.apache.hadoop.hbase.coprocessor
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4885
          -----------------------------------------------------------

          Ship it!

          +1

          • Michael
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4885 ----------------------------------------------------------- Ship it! +1 Michael
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/#review4886
          -----------------------------------------------------------

          Ship it!

          Small nits depending on what you want to do here. Looks good though.

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationEndpoint.java
          <https://reviews.apache.org/r/3748/#comment10769>

          This could be good to put into client.coprocessor, rather than client (when you make the move over).

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          <https://reviews.apache.org/r/3748/#comment10775>

          It would be nice if this stuff was in its own test, IF you are going to pull the endpoint out to src/main. It will help make the functionality a bit more obvious.

          • Jesse
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4886 ----------------------------------------------------------- Ship it! Small nits depending on what you want to do here. Looks good though. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationEndpoint.java < https://reviews.apache.org/r/3748/#comment10769 > This could be good to put into client.coprocessor, rather than client (when you make the move over). http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java < https://reviews.apache.org/r/3748/#comment10775 > It would be nice if this stuff was in its own test, IF you are going to pull the endpoint out to src/main. It will help make the functionality a bit more obvious. Jesse
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-07 23:49:19, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationEndpoint.java, line 39

          > <https://reviews.apache.org/r/3748/diff/4/?file=72911#file72911line39>

          >

          > This could be good to put into client.coprocessor, rather than client (when you make the move over).

          This is a good point.

          • Michael
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-07 23:49:19, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationEndpoint.java , line 39 > < https://reviews.apache.org/r/3748/diff/4/?file=72911#file72911line39 > > > This could be good to put into client.coprocessor, rather than client (when you make the move over). This is a good point. Michael
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-07 23:49:19, Jesse Yates wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationEndpoint.java, line 39

          > <https://reviews.apache.org/r/3748/diff/4/?file=72911#file72911line39>

          >

          > This could be good to put into client.coprocessor, rather than client (when you make the move over).

          Michael Stack wrote:

          This is a good point.

          Heh, I am way ahead of you guys
          Suggested already on the jira.
          Will have a new patch soon, hopefully the last revision.
          I would like to keep the test in TestFromClientSide, though.

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-07 23:49:19, Jesse Yates wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/MultiRowMutationEndpoint.java , line 39 > < https://reviews.apache.org/r/3748/diff/4/?file=72911#file72911line39 > > > This could be good to put into client.coprocessor, rather than client (when you make the move over). Michael Stack wrote: This is a good point. Heh, I am way ahead of you guys Suggested already on the jira. Will have a new patch soon, hopefully the last revision. I would like to keep the test in TestFromClientSide, though. Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3748/
          -----------------------------------------------------------

          (Updated 2012-02-08 00:15:28.554140)

          Review request for hbase.

          Changes
          -------

          Hopefully the last iteration.

          • Added Javadoc
          • moved coprocessor endpoint to org.apache.hadoop.hbase.coprocessor (not ...hbase.client.coprocessor)

          Summary
          -------

          This builds on HBASE-3584, HBASE-5203, and HBASE-5304.

          Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy).
          At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets.

          Obviously this is an advanced features and this prominently called out in the Javadoc.

          This addresses bug HBASE-5229.
          https://issues.apache.org/jira/browse/HBASE-5229

          Diffs (updated)


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationProtocol.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241536
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241536
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241536
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241536

          Diff: https://reviews.apache.org/r/3748/diff

          Testing
          -------

          Tests added to TestFromClientSide and TestAtomicOperation

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/ ----------------------------------------------------------- (Updated 2012-02-08 00:15:28.554140) Review request for hbase. Changes ------- Hopefully the last iteration. Added Javadoc moved coprocessor endpoint to org.apache.hadoop.hbase.coprocessor (not ...hbase.client.coprocessor) Summary ------- This builds on HBASE-3584 , HBASE-5203 , and HBASE-5304 . Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. Obviously this is an advanced features and this prominently called out in the Javadoc. This addresses bug HBASE-5229 . https://issues.apache.org/jira/browse/HBASE-5229 Diffs (updated) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationProtocol.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241536 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241536 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241536 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241536 Diff: https://reviews.apache.org/r/3748/diff Testing ------- Tests added to TestFromClientSide and TestAtomicOperation Thanks, Lars
          Hide
          Lars Hofhansl added a comment -

          If there are no further concerns, I will commit this change tomorrow.

          Show
          Lars Hofhansl added a comment - If there are no further concerns, I will commit this change tomorrow.
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4889 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationProtocol.java < https://reviews.apache.org/r/3748/#comment10786 > Replace operations with transactions. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java < https://reviews.apache.org/r/3748/#comment10787 > Please plug in testMultiRowMutation http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java < https://reviews.apache.org/r/3748/#comment10788 > testMultiRowMutation here as well. Ted
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          On 2012-02-08 01:06:21, Ted Yu wrote:

          >

          Ah yes, good catch as usual. If there're no further comments, I'll do that on commit.
          Thanks Ted.

          On 2012-02-08 01:06:21, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationProtocol.java, line 28

          > <https://reviews.apache.org/r/3748/diff/5/?file=72962#file72962line28>

          >

          > Replace operations with transactions.

          And presumably remove "atomic" as it is implied by transaction, right?

          • Lars
          Show
          jiraposter@reviews.apache.org added a comment - - edited On 2012-02-08 01:06:21, Ted Yu wrote: > Ah yes, good catch as usual. If there're no further comments, I'll do that on commit. Thanks Ted. On 2012-02-08 01:06:21, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationProtocol.java , line 28 > < https://reviews.apache.org/r/3748/diff/5/?file=72962#file72962line28 > > > Replace operations with transactions. And presumably remove "atomic" as it is implied by transaction, right? Lars
          Hide
          Lars Hofhansl added a comment -

          Committed to trunnk. Thanks everyone for bearing with me.

          Show
          Lars Hofhansl added a comment - Committed to trunnk. Thanks everyone for bearing with me.
          Hide
          Lars Hofhansl added a comment -

          Final patch for reference

          Show
          Lars Hofhansl added a comment - Final patch for reference
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #102 (See https://builds.apache.org/job/HBase-TRUNK-security/102/)
          HBASE-5229 Provide basic building blocks for 'multi-row' local transactions.

          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationProtocol.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #102 (See https://builds.apache.org/job/HBase-TRUNK-security/102/ ) HBASE-5229 Provide basic building blocks for 'multi-row' local transactions. larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationProtocol.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2656 (See https://builds.apache.org/job/HBase-TRUNK/2656/)
          HBASE-5229 Provide basic building blocks for 'multi-row' local transactions.

          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationProtocol.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2656 (See https://builds.apache.org/job/HBase-TRUNK/2656/ ) HBASE-5229 Provide basic building blocks for 'multi-row' local transactions. larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationProtocol.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
          Hide
          Lars Hofhansl added a comment -

          Note that Cassandra adds something similar in 1.1: http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 (check towards the end of that blog post).

          Show
          Lars Hofhansl added a comment - Note that Cassandra adds something similar in 1.1: http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 (check towards the end of that blog post).

            People

            • Assignee:
              Lars Hofhansl
              Reporter:
              Lars Hofhansl
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development