HBase
  1. HBase
  2. HBASE-10854

[VisibilityController] Apply MAX_VERSIONS from schema or request when scanning

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.98.0
    • Fix Version/s: 0.99.0, 0.98.2
    • Component/s: security
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If we update the row multiple times with different visibility labels
      we are able to get the "old version" of the row until is flushed

      $ sudo -u hbase hbase shell
      hbase> add_labels 'A'
      hbase> add_labels 'B'
      hbase> create 'tb', 'f1'
      hbase> put 'tb', 'row', 'f1:q', 'v1', {VISIBILITY=>'A'}
      hbase> put 'tb', 'row', 'f1:q', 'v1all'
      hbase> put 'tb', 'row', 'f1:q', 'v1aOrB', {VISIBILITY=>'A|B'}
      hbase> put 'tb', 'row', 'f1:q', 'v1aAndB', {VISIBILITY=>'A&B'}
      hbase> scan 'tb'
      row column=f1:q, timestamp=1395948168154, value=v1aAndB
      1 row
      
      $ sudo -u testuser hbase shell
      hbase> scan 'tb'
      row column=f1:q, timestamp=1395948168102, value=v1all
      1 row
      

      When we flush the memstore we get a single row (the last one inserted)
      so the testuser get 0 rows now.

      $ sudo -u hbase hbase shell
      hbase> flush 'tb'
      hbase> scan 'tb'
      row column=f1:q, timestamp=1395948168154, value=v1aAndB
      1 row
      
      $ sudo -u testuser hbase shell
      hbase> scan 'tb'
      0 row
      
      1. HBASE-10854.patch
        12 kB
        Anoop Sam John

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        6d 9h 36m 1 Anoop Sam John 03/Apr/14 07:15
        Patch Available Patch Available Resolved Resolved
        3h 16m 1 Anoop Sam John 03/Apr/14 10:32
        Resolved Resolved Closed Closed
        324d 13h 55m 1 Enis Soztutar 21/Feb/15 23:28
        Enis Soztutar made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Enis Soztutar added a comment -

        Closing this issue after 0.99.0 release.

        Show
        Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
        Anoop Sam John made changes -
        Fix Version/s 0.98.2 [ 12326505 ]
        Fix Version/s 0.98.1 [ 12325664 ]
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #246 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/246/)
        HBASE-10854 [VisibilityController] Apply MAX_VERSIONS from schema or request when scanning. (Anoop) (anoopsamjohn: rev 1584328)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java
        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityLabelFilter.java
        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #246 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/246/ ) HBASE-10854 [VisibilityController] Apply MAX_VERSIONS from schema or request when scanning. (Anoop) (anoopsamjohn: rev 1584328) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityLabelFilter.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.98 #262 (See https://builds.apache.org/job/HBase-0.98/262/)
        HBASE-10854 [VisibilityController] Apply MAX_VERSIONS from schema or request when scanning. (Anoop) (anoopsamjohn: rev 1584328)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java
        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityLabelFilter.java
        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #262 (See https://builds.apache.org/job/HBase-0.98/262/ ) HBASE-10854 [VisibilityController] Apply MAX_VERSIONS from schema or request when scanning. (Anoop) (anoopsamjohn: rev 1584328) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityLabelFilter.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK #5060 (See https://builds.apache.org/job/HBase-TRUNK/5060/)
        HBASE-10854 [VisibilityController] Apply MAX_VERSIONS from schema or request when scanning. (Anoop) (anoopsamjohn: rev 1584327)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityLabelFilter.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #5060 (See https://builds.apache.org/job/HBase-TRUNK/5060/ ) HBASE-10854 [VisibilityController] Apply MAX_VERSIONS from schema or request when scanning. (Anoop) (anoopsamjohn: rev 1584327) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityLabelFilter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java
        Anoop Sam John made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.98.1 [ 12325664 ]
        Fix Version/s 0.98.2 [ 12326505 ]
        Resolution Fixed [ 1 ]
        Hide
        Anoop Sam John added a comment - - edited

        Committed to trunk and 98. Thanks for the reviews Ram and Andy.
        Thanks for reporting the issue Matteo.

        Show
        Anoop Sam John added a comment - - edited Committed to trunk and 98. Thanks for the reviews Ram and Andy. Thanks for reporting the issue Matteo.
        Hide
        Andrew Purtell added a comment -

        I updated the jira title.

        Show
        Andrew Purtell added a comment - I updated the jira title.
        Andrew Purtell made changes -
        Summary Multiple Row/VisibilityLabels visible while in the memstore [VisibilityController] Apply MAX_VERSIONS from schema or request when scanning
        Hide
        ramkrishna.s.vasudevan added a comment -

        +1

        Show
        ramkrishna.s.vasudevan added a comment - +1
        Hide
        ramkrishna.s.vasudevan added a comment -

        Ignore my previous comment. we do this for maxversions ==1 but memstore has more versions for a kv.

        Show
        ramkrishna.s.vasudevan added a comment - Ignore my previous comment. we do this for maxversions ==1 but memstore has more versions for a kv.
        Hide
        Andrew Purtell added a comment -

        Tested all the above cases and the results are as expected.

        Great, +1 for commit

        Show
        Andrew Purtell added a comment - Tested all the above cases and the results are as expected. Great, +1 for commit
        Hide
        ramkrishna.s.vasudevan added a comment -

        Just a small nit, may be we can make the new code to work only if maxVersions >1. If it is 1, then we need not do those comparisons etc.

        Show
        ramkrishna.s.vasudevan added a comment - Just a small nit, may be we can make the new code to work only if maxVersions >1. If it is 1, then we need not do those comparisons etc.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12638429/HBASE-10854.patch
        against trunk revision .
        ATTACHMENT ID: 12638429

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot

        -1 core zombie tests. There are 1 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestImportExport.testImport94Table(TestImportExport.java:230)

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638429/HBASE-10854.patch against trunk revision . ATTACHMENT ID: 12638429 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot -1 core zombie tests . There are 1 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestImportExport.testImport94Table(TestImportExport.java:230) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9185//console This message is automatically generated.
        Hide
        Anoop Sam John added a comment -

        Andrew Purtell
        Tested all the above cases and the results are as expected.

        Show
        Anoop Sam John added a comment - Andrew Purtell Tested all the above cases and the results are as expected.
        Hide
        Andrew Purtell added a comment -

        Patch seems reasonable to me. Please try out the below scenarios with the patch applied. The shell commands aren't real, just illustrations.

        Table 't1' has MAX_VERSIONS=10

        hbase> put 't1', 'row', 'f1:q', 'v1all'
        hbase> put 't1', 'row', 'f1:q', 'v1aOrB', {VISIBILITY=>'A|B'}
        hbase> put 't1', 'row', 'f1:q', 'v1aAndB', {VISIBILITY=>'A&B'}
        
        scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 1, AUTHORIZATIONS => 'A' }
        -> [ 'v1aOrB' ]
        
        scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 10, AUTHORIZATIONS => 'A' }
        -> [ 'v1aOrB', 'v1all' ]
        
        scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 1, AUTHORIZATIONS => 'B' }
        -> [ 'v1aOrB' ]
        
        scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 10, AUTHORIZATIONS => 'B' }
        -> [ 'v1aOrB', 'v1all' ]
        
        scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 1, AUTHORIZATIONS => ['A', 'B'] }
        ->  [ 'v1aAndB' ]
        
        scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 10, AUTHORIZATIONS => ['A', 'B'] }
        -> [ 'v1aAndB', 'v1aOrB', 'v1all' ]
        

        Table 't2' has MAX_VERSIONS=1

        hbase> put 't2', 'row', 'f1:q', 'v1all'
        hbase> put 't2', 'row', 'f1:q', 'v1aOrB', {VISIBILITY=>'A|B'}
        hbase> put 't2', 'row', 'f1:q', 'v1aAndB', {VISIBILITY=>'A&B'}
        
        scan 't2', 'row', 'f1:q', { AUTHORIZATIONS => 'A' }
        -> []
        
        scan 't2', 'row', 'f1:q', { AUTHORIZATIONS => 'B' }
        ->  []
        
        scan 't2', 'row', 'f1:q', { AUTHORIZATIONS => ['A', 'B'] }
        -> [ 'v1aAndB' ]
        

        Yes? Then I am +1

        Everyone else find the behavior illustrated by the above shell commands reasonable?

        Show
        Andrew Purtell added a comment - Patch seems reasonable to me. Please try out the below scenarios with the patch applied. The shell commands aren't real, just illustrations. Table 't1' has MAX_VERSIONS=10 hbase> put 't1', 'row', 'f1:q', 'v1all' hbase> put 't1', 'row', 'f1:q', 'v1aOrB', {VISIBILITY=>'A|B'} hbase> put 't1', 'row', 'f1:q', 'v1aAndB', {VISIBILITY=>'A&B'} scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 1, AUTHORIZATIONS => 'A' } -> [ 'v1aOrB' ] scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 10, AUTHORIZATIONS => 'A' } -> [ 'v1aOrB', 'v1all' ] scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 1, AUTHORIZATIONS => 'B' } -> [ 'v1aOrB' ] scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 10, AUTHORIZATIONS => 'B' } -> [ 'v1aOrB', 'v1all' ] scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 1, AUTHORIZATIONS => ['A', 'B'] } -> [ 'v1aAndB' ] scan 't1', 'row', 'f1:q', { MAX_VERSIONS => 10, AUTHORIZATIONS => ['A', 'B'] } -> [ 'v1aAndB', 'v1aOrB', 'v1all' ] Table 't2' has MAX_VERSIONS=1 hbase> put 't2', 'row', 'f1:q', 'v1all' hbase> put 't2', 'row', 'f1:q', 'v1aOrB', {VISIBILITY=>'A|B'} hbase> put 't2', 'row', 'f1:q', 'v1aAndB', {VISIBILITY=>'A&B'} scan 't2', 'row', 'f1:q', { AUTHORIZATIONS => 'A' } -> [] scan 't2', 'row', 'f1:q', { AUTHORIZATIONS => 'B' } -> [] scan 't2', 'row', 'f1:q', { AUTHORIZATIONS => ['A', 'B'] } -> [ 'v1aAndB' ] Yes? Then I am +1 Everyone else find the behavior illustrated by the above shell commands reasonable?
        Hide
        ramkrishna.s.vasudevan added a comment -

        I already saw the patch internally. this is what we had in mind. +1. Will update my patch for HBASE-10899 based on this.
        NEXT_COL is not called here because we don't know how many versions may be there. ( and that may inturn lead to reseek()).

        Show
        ramkrishna.s.vasudevan added a comment - I already saw the patch internally. this is what we had in mind. +1. Will update my patch for HBASE-10899 based on this. NEXT_COL is not called here because we don't know how many versions may be there. ( and that may inturn lead to reseek()).
        Anoop Sam John made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Fix Version/s 0.99.0 [ 12325675 ]
        Fix Version/s 0.98.2 [ 12326505 ]
        Anoop Sam John made changes -
        Attachment HBASE-10854.patch [ 12638429 ]
        Hide
        Anoop Sam John added a comment -

        Considering the max versions for every CF. The change make sure that we are not evaluating the visibility label on the old gone versions.

        Show
        Anoop Sam John added a comment - Considering the max versions for every CF. The change make sure that we are not evaluating the visibility label on the old gone versions.
        Anoop Sam John made changes -
        Assignee Anoop Sam John [ anoop.hbase ]
        Anoop Sam John made changes -
        Affects Version/s 0.98.0 [ 12323143 ]
        Affects Version/s 0.98.1 [ 12325664 ]
        Hide
        Anoop Sam John added a comment -

        What I am trying to do is take into account the MAX_VERSIONS specified for a cf in the table descriptor. When this is 1 (default is 1 only) we will consider the latest version version cell's label in the VisibilityLabelFilter.

        Show
        Anoop Sam John added a comment - What I am trying to do is take into account the MAX_VERSIONS specified for a cf in the table descriptor. When this is 1 (default is 1 only) we will consider the latest version version cell's label in the VisibilityLabelFilter.
        Hide
        Anoop Sam John added a comment -

        Let me try out some thing Ram. Will try to come up with a patch

        Show
        Anoop Sam John added a comment - Let me try out some thing Ram. Will try to come up with a patch
        Hide
        ramkrishna.s.vasudevan added a comment -
          @Test
          public void testMultipleVersions() throws Exception {
            TableName tableName = TableName.valueOf(TEST_NAME.getMethodName());
            HColumnDescriptor col = new HColumnDescriptor(fam);
            col.setMaxVersions(1);
            HTableDescriptor desc = new HTableDescriptor(tableName);
            desc.addFamily(col);
            TEST_UTIL.getHBaseAdmin().createTable(desc);
            List<Put> puts = new ArrayList<Put>();
            Put put = new Put(Bytes.toBytes("row1"));
            put.add(fam, qual, 3l, Bytes.toBytes("100"));
            put.setCellVisibility(new CellVisibility(SECRET));
            puts.add(put);
            put = new Put(Bytes.toBytes("row1"));
            put.add(fam, qual, 4l, Bytes.toBytes("101"));
            put.setCellVisibility(new CellVisibility(PRIVATE));
            puts.add(put);
            HTable table = new HTable(TEST_UTIL.getConfiguration(), tableName);
            table.put(puts);
            //TEST_UTIL.getHBaseAdmin().flush(tableName.getNameAsString());
            Scan s = new Scan();
            s.setMaxVersions(1);
            s.setAuthorizations(new Authorizations(SECRET));
            ResultScanner scanner = table.getScanner(s);
            Result[] next = scanner.next(4);
            assertEquals(1, next.length);
          }
        

        Now uncommenting the flush line will give you no result. If for a CF max version is 1 can we have a behaviour in Visibility Filter like SCVF to always return only latest version unless and other wise specified otherwise by the user and carry that parameter to the Visibility filter?
        That would make things atleast inline with the SCVF behaviour?
        And agree with Andy that we could document this behaviour as generally filters with versions is always a debatable point. But considering Visibility is related to security this understanding makes it even more important.

        Show
        ramkrishna.s.vasudevan added a comment - @Test public void testMultipleVersions() throws Exception { TableName tableName = TableName.valueOf(TEST_NAME.getMethodName()); HColumnDescriptor col = new HColumnDescriptor(fam); col.setMaxVersions(1); HTableDescriptor desc = new HTableDescriptor(tableName); desc.addFamily(col); TEST_UTIL.getHBaseAdmin().createTable(desc); List<Put> puts = new ArrayList<Put>(); Put put = new Put(Bytes.toBytes( "row1" )); put.add(fam, qual, 3l, Bytes.toBytes( "100" )); put.setCellVisibility( new CellVisibility(SECRET)); puts.add(put); put = new Put(Bytes.toBytes( "row1" )); put.add(fam, qual, 4l, Bytes.toBytes( "101" )); put.setCellVisibility( new CellVisibility(PRIVATE)); puts.add(put); HTable table = new HTable(TEST_UTIL.getConfiguration(), tableName); table.put(puts); //TEST_UTIL.getHBaseAdmin().flush(tableName.getNameAsString()); Scan s = new Scan(); s.setMaxVersions(1); s.setAuthorizations( new Authorizations(SECRET)); ResultScanner scanner = table.getScanner(s); Result[] next = scanner.next(4); assertEquals(1, next.length); } Now uncommenting the flush line will give you no result. If for a CF max version is 1 can we have a behaviour in Visibility Filter like SCVF to always return only latest version unless and other wise specified otherwise by the user and carry that parameter to the Visibility filter? That would make things atleast inline with the SCVF behaviour? And agree with Andy that we could document this behaviour as generally filters with versions is always a debatable point. But considering Visibility is related to security this understanding makes it even more important.
        Hide
        Anoop Sam John added a comment -

        I can see in SCVF there is latestVersionOnly which defaults to true.
        Ideally for a table CF with MAX_VERSIONS as 1, any Filter acting on the Cells in that CF must act on latest version only. Pls correct me if my point is wrong. Once a major compaction is happened and no data in Memstore there will be only one version and then doing a scan will act on that single version cell. So we have to make sure same way is happening for other cases also.

        Show
        Anoop Sam John added a comment - I can see in SCVF there is latestVersionOnly which defaults to true. Ideally for a table CF with MAX_VERSIONS as 1, any Filter acting on the Cells in that CF must act on latest version only. Pls correct me if my point is wrong. Once a major compaction is happened and no data in Memstore there will be only one version and then doing a scan will act on that single version cell. So we have to make sure same way is happening for other cases also.
        Hide
        Andrew Purtell added a comment -

        For the common use case where multiple data sets with different visiblity labels are combined into a single large table, the user will use a schema with MAX_VERSIONS > 1. Then users with differing authorizations will read from the table and for some the latest version(s) will be what we are supposed to return, and for others "older" version(s) are what we are supposed to return. "Inconsistent" views over the multiple versions, depending on user authorizations, is the desired behavior.

        As Anoop says:

        The visibility based evaluation and cell filtering will happen in Filter level while on a top layer (after this filtering) the filtering based on the number of max versions will happen. (In SQM)

        HBase internal handling of multiple cell versions can produce surprising behavior when using visibility labels. The code is functioning correctly, as long as multiple versions with different labels are accessible to the scanner, then the scanner will filter out what is not visible and return what is.

        If I can suggest a way to proceed, it would be:
        1. Try out the visibility labels feature.
        2. Where you find the behavior surprising to you in some way, describe your observations on this issue or others
        3. In some cases we can document the behavior you are observing in the online manual as expected, with some advice
        4. In other cases, we can look at changing how HBase internally handles multiple versions of cells to avoid surprising behavior we think by consensus should be considered incorrect or ugly

        Show
        Andrew Purtell added a comment - For the common use case where multiple data sets with different visiblity labels are combined into a single large table, the user will use a schema with MAX_VERSIONS > 1. Then users with differing authorizations will read from the table and for some the latest version(s) will be what we are supposed to return, and for others "older" version(s) are what we are supposed to return. "Inconsistent" views over the multiple versions, depending on user authorizations, is the desired behavior. As Anoop says: The visibility based evaluation and cell filtering will happen in Filter level while on a top layer (after this filtering) the filtering based on the number of max versions will happen. (In SQM) HBase internal handling of multiple cell versions can produce surprising behavior when using visibility labels. The code is functioning correctly, as long as multiple versions with different labels are accessible to the scanner, then the scanner will filter out what is not visible and return what is. If I can suggest a way to proceed, it would be: 1. Try out the visibility labels feature. 2. Where you find the behavior surprising to you in some way, describe your observations on this issue or others 3. In some cases we can document the behavior you are observing in the online manual as expected, with some advice 4. In other cases, we can look at changing how HBase internally handles multiple versions of cells to avoid surprising behavior we think by consensus should be considered incorrect or ugly
        Hide
        Anoop Sam John added a comment -

        How we go with this issue? Ping Matteo Bertozzi

        Show
        Anoop Sam John added a comment - How we go with this issue? Ping Matteo Bertozzi
        Hide
        ramkrishna.s.vasudevan added a comment -

        Filtering happens and later versions are checked by SQM using the column trackers. Similar issues were raised some time back. Don't remember that one.
        I missed this issue during my day time here.

        Show
        ramkrishna.s.vasudevan added a comment - Filtering happens and later versions are checked by SQM using the column trackers. Similar issues were raised some time back. Don't remember that one. I missed this issue during my day time here.
        Hide
        Anoop Sam John added a comment -

        Similar case of inconsistency can come with a SCVF. Set the latestVersionOnly as false. Based on the value of the older versions we might get back a row in cases when the older versions are in memstore. Or in diff HFiles (No compaction)

        Ping Lars Hofhansl

        Show
        Anoop Sam John added a comment - Similar case of inconsistency can come with a SCVF. Set the latestVersionOnly as false. Based on the value of the older versions we might get back a row in cases when the older versions are in memstore. Or in diff HFiles (No compaction) Ping Lars Hofhansl
        Hide
        Anoop Sam John added a comment -

        This is not the case with MemStore items alone. Consider the case of having a cell (with label) being written. After this a flush is happened. So one cell in that HFile. A diff version of the same cell is being written again (diff label) and this is being flushed. Now there are 2 cells in 2 HFiles and make sure no compaction is happening. Similar scenario described here can happen now. After a compaction the behaviour will change.
        By default the max version for a CF is 1. And so flushes and compactions will make sure to write only 1 cell version in these cases.
        During scan, even if we specify some maxversion count in scan what we take is the min of both these versions number and which will come as 1 here.
        The visibility based evaluation and cell filtering will happen in Filter level while on a top layer (after this filtering) the filtering based on the number of max versions will happen. (In SQM)
        So to fix this problem, we have to consider the min version number used in SQM at lower layers also.. (Readers)

        Second, we should agree on what is the correct behavior for schemas supporting multiple versions, with multiple cell versions with differing visibility expressions among the versions

        IMO in this case we have to consider all cells and which version's visibility support viewing by the user, we have to return.

        Show
        Anoop Sam John added a comment - This is not the case with MemStore items alone. Consider the case of having a cell (with label) being written. After this a flush is happened. So one cell in that HFile. A diff version of the same cell is being written again (diff label) and this is being flushed. Now there are 2 cells in 2 HFiles and make sure no compaction is happening. Similar scenario described here can happen now. After a compaction the behaviour will change. By default the max version for a CF is 1. And so flushes and compactions will make sure to write only 1 cell version in these cases. During scan, even if we specify some maxversion count in scan what we take is the min of both these versions number and which will come as 1 here. The visibility based evaluation and cell filtering will happen in Filter level while on a top layer (after this filtering) the filtering based on the number of max versions will happen. (In SQM) So to fix this problem, we have to consider the min version number used in SQM at lower layers also.. (Readers) Second, we should agree on what is the correct behavior for schemas supporting multiple versions, with multiple cell versions with differing visibility expressions among the versions IMO in this case we have to consider all cells and which version's visibility support viewing by the user, we have to return.
        Hide
        Andrew Purtell added a comment -

        I think there are two issues here. First, the behavior for multiple versions with multiple visibility expressions should be consistent between memstore and store scanning. Second, we should agree on what is the correct behavior for schemas supporting multiple versions, with multiple cell versions with differing visibility expressions among the versions.

        Show
        Andrew Purtell added a comment - I think there are two issues here. First, the behavior for multiple versions with multiple visibility expressions should be consistent between memstore and store scanning. Second, we should agree on what is the correct behavior for schemas supporting multiple versions, with multiple cell versions with differing visibility expressions among the versions.
        Matteo Bertozzi made changes -
        Field Original Value New Value
        Description If we update the row multiple times with different visibility labels
        we are able to get the "old version" of the row until is flushed
        {code}
        $ sudo -u hbase hbase shell
        hbase> add_labels 'A'
        hbase> add_labels 'B'
        hbase> create 'tb', 'f1'
        hbase> put 'tb', 'row', 'f1:q', 'v1', {VISIBILITY=>'A'}
        hbase> put 'tb', 'row', 'f1:q', 'v1all'
        hbase> put 'tb', 'row', 'f1:q', 'v1aOrB', {VISIBILITY=>'A|B'}
        hbase> put 'tb', 'row', 'f1:q', 'v1aAndB', {VISIBILITY=>'A&B'}
        hbase> scan 'tb'
        row column=f1:q, timestamp=1395948168154, value=v1aAndB
        1 row

        $ sudo -u hbase hbase shell
        hbase> scan 'tb'
        row column=f1:q, timestamp=1395948168102, value=v1all
        1 row
        {code}

        When we flush the memstore we get a single row (the last one inserted)
        so the testuser get 0 rows now.
        {code}
        $ sudo -u hbase hbase shell
        hbase> flush 'tb'
        hbase> scan 'tb'
        row column=f1:q, timestamp=1395948168154, value=v1aAndB
        1 row

        $ sudo -u hbase hbase shell
        hbase> scan 'tb'
        0 row
        {code}
        If we update the row multiple times with different visibility labels
        we are able to get the "old version" of the row until is flushed
        {code}
        $ sudo -u hbase hbase shell
        hbase> add_labels 'A'
        hbase> add_labels 'B'
        hbase> create 'tb', 'f1'
        hbase> put 'tb', 'row', 'f1:q', 'v1', {VISIBILITY=>'A'}
        hbase> put 'tb', 'row', 'f1:q', 'v1all'
        hbase> put 'tb', 'row', 'f1:q', 'v1aOrB', {VISIBILITY=>'A|B'}
        hbase> put 'tb', 'row', 'f1:q', 'v1aAndB', {VISIBILITY=>'A&B'}
        hbase> scan 'tb'
        row column=f1:q, timestamp=1395948168154, value=v1aAndB
        1 row

        $ sudo -u testuser hbase shell
        hbase> scan 'tb'
        row column=f1:q, timestamp=1395948168102, value=v1all
        1 row
        {code}

        When we flush the memstore we get a single row (the last one inserted)
        so the testuser get 0 rows now.
        {code}
        $ sudo -u hbase hbase shell
        hbase> flush 'tb'
        hbase> scan 'tb'
        row column=f1:q, timestamp=1395948168154, value=v1aAndB
        1 row

        $ sudo -u testuser hbase shell
        hbase> scan 'tb'
        0 row
        {code}
        Matteo Bertozzi created issue -

          People

          • Assignee:
            Anoop Sam John
            Reporter:
            Matteo Bertozzi
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development