Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0
    • Component/s: Client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      There's sometimes a need to add custom attribute to Scan object so that it can be accessed on server side.
      Example of the case where it is needed discussed here: http://search-hadoop.com/m/v3Jtb2GkiO. There might be other cases where it is useful, which are mostly about logging/gathering stats on server side.

      Alternative to allowing adding any custom attributes to scan could be adding some fixed field, like "type" to the class.

      1. HBASE-3811.patch
        8 kB
        Alex Baranau
      2. HBASE-3811.patch
        9 kB
        Alex Baranau
      3. HBASE-3811.patch
        8 kB
        Alex Baranau
      4. HBASE-3811.patch
        8 kB
        Alex Baranau

        Issue Links

          Activity

          Hide
          stack added a comment -

          You could change Scan so it passes an optional Map. Add an attributes method to Scan. If attributes > 0, instantiate a Map and then serialize it out. On other end, have check for null or non-null Map when deserializing (We don't want to carry the Map for every Scan I'd say – just carry it when attributes present... but perhaps I'm doing premature optimization here).

          This could become more important now we have CPs.

          Show
          stack added a comment - You could change Scan so it passes an optional Map. Add an attributes method to Scan. If attributes > 0, instantiate a Map and then serialize it out. On other end, have check for null or non-null Map when deserializing (We don't want to carry the Map for every Scan I'd say – just carry it when attributes present... but perhaps I'm doing premature optimization here). This could become more important now we have CPs.
          Hide
          Ted Yu added a comment -

          We can follow the example of HTableDescriptor.write():

              out.writeInt(families.size());
              for(Iterator<HColumnDescriptor> it = families.values().iterator();
          
          Show
          Ted Yu added a comment - We can follow the example of HTableDescriptor.write(): out.writeInt(families.size()); for (Iterator<HColumnDescriptor> it = families.values().iterator();
          Hide
          Alex Baranau added a comment -

          Attached patch. There's small Q about it (you can see it in comments): since we changing the data format of serialized scan, should we change scan implementation version and/or should we support previous format (this way we will support "older" clients)?

          Show
          Alex Baranau added a comment - Attached patch. There's small Q about it (you can see it in comments): since we changing the data format of serialized scan, should we change scan implementation version and/or should we support previous format (this way we will support "older" clients)?
          Hide
          Alex Baranau added a comment -

          I was going to submit patch for review on review board, but had problems accessing it at https://reviews.apache.org/account/login/ (looks like my existing account doesn't work, nor password recovery/register functionality)

          Show
          Alex Baranau added a comment - I was going to submit patch for review on review board, but had problems accessing it at https://reviews.apache.org/account/login/ (looks like my existing account doesn't work, nor password recovery/register functionality)
          Hide
          Ted Yu added a comment -

          Should we generalize the type of value in attributes map to be of byte [] ?

          Show
          Ted Yu added a comment - Should we generalize the type of value in attributes map to be of byte [] ?
          Hide
          Alex Baranau added a comment -

          Hm. I thought about scan attributes as usual attributes/settings map, which is comparable to e.g. org.apache.hadoop.conf.Configuration or similar.

          Does anyone see we are going to use byte[] values? Are we going to put in values e.g. serialized objects, or smth related?

          Show
          Alex Baranau added a comment - Hm. I thought about scan attributes as usual attributes/settings map, which is comparable to e.g. org.apache.hadoop.conf.Configuration or similar. Does anyone see we are going to use byte[] values? Are we going to put in values e.g. serialized objects, or smth related?
          Hide
          Ted Yu added a comment -

          If we use "sourceScan" attribute to store the unique ID for distributed scan, a natural choice would be byte array representing the start/end row(s) of the original Scan.

          Show
          Ted Yu added a comment - If we use "sourceScan" attribute to store the unique ID for distributed scan, a natural choice would be byte array representing the start/end row(s) of the original Scan.
          Hide
          Alex Baranau added a comment -

          Attached new patch. Major change:

          • attributes map is changed from Map<String, String> to Map<String, byte[]>.
          Show
          Alex Baranau added a comment - Attached new patch. Major change: attributes map is changed from Map<String, String> to Map<String, byte[]>.
          Hide
          Ted Yu added a comment -

          I ran TestScan, TestScannerTimeout and TestFromClientSide based on today's patch and they passed.
          I then commented out the persistence of attributes map in write(final DataOutput out), TestScannerTimeout passed too.

          Show
          Ted Yu added a comment - I ran TestScan, TestScannerTimeout and TestFromClientSide based on today's patch and they passed. I then commented out the persistence of attributes map in write(final DataOutput out), TestScannerTimeout passed too.
          Hide
          Alex Baranau added a comment -

          > I then commented out the persistence of attributes map in
          > write(final DataOutput out), TestScannerTimeout passed too.

          Any specific reason for that? Just wondering...

          Show
          Alex Baranau added a comment - > I then commented out the persistence of attributes map in > write(final DataOutput out), TestScannerTimeout passed too. Any specific reason for that? Just wondering...
          Hide
          Ted Yu added a comment -

          I wanted to test backward compatibility of the read() method.
          The result means we don't have to increase the version number of Scan.

          Show
          Ted Yu added a comment - I wanted to test backward compatibility of the read() method. The result means we don't have to increase the version number of Scan.
          Hide
          Alex Baranau added a comment -

          That was expected by me since I added the code below to handle this case:

          int numAttributes = 0;
          try

          { numAttributes = in.readInt(); }

          catch (EOFException e)

          { // indicates that serialized scan doesn't support attributes // DO NOTHING // TODO: should we really care about it? or should we increase scan version number and handle/not handle // attributes depending on version? }

          Still one may think that using Scan version is cleaner variant. Not sure what are the plans for Scan version and when it is ok to change it by initial design.

          Show
          Alex Baranau added a comment - That was expected by me since I added the code below to handle this case: int numAttributes = 0; try { numAttributes = in.readInt(); } catch (EOFException e) { // indicates that serialized scan doesn't support attributes // DO NOTHING // TODO: should we really care about it? or should we increase scan version number and handle/not handle // attributes depending on version? } Still one may think that using Scan version is cleaner variant. Not sure what are the plans for Scan version and when it is ok to change it by initial design .
          Hide
          stack added a comment -

          Why do you extend TestCase Alex? Its a new test. Junit4 doesn't use TestCase? Or does it?

          Do isEmpty rather than '+ if (attributes.size() == 0) {'

          It seems strange we would catch an EOFE on deserialization when we have the version available to us. Can we use the SCAN_VERSION? Up it to 2 and have version 2 able to deserialize both versions?

          Otherwise patch looks great Alex.

          Show
          stack added a comment - Why do you extend TestCase Alex? Its a new test. Junit4 doesn't use TestCase? Or does it? Do isEmpty rather than '+ if (attributes.size() == 0) {' It seems strange we would catch an EOFE on deserialization when we have the version available to us. Can we use the SCAN_VERSION? Up it to 2 and have version 2 able to deserialize both versions? Otherwise patch looks great Alex.
          Hide
          Alex Baranau added a comment -

          qt. Why do you extend TestCase Alex? Its a new test. Junit4 doesn't use TestCase? Or does it?

          Just tried to follow the "coding style" here, as I saw other test cases use this design. Cleaning it.

          qt. It seems strange we would catch an EOFE on deserialization when we have the version available to us. Can we use the SCAN_VERSION? Up it to 2 and have version 2 able to deserialize both versions?

          Exactly. That is what I meant by saying/asking "Still one may think that using Scan version is cleaner variant. Not sure what are the plans for Scan version and when it is ok to change it by initial design."

          Show
          Alex Baranau added a comment - qt. Why do you extend TestCase Alex? Its a new test. Junit4 doesn't use TestCase? Or does it? Just tried to follow the "coding style" here, as I saw other test cases use this design. Cleaning it. qt. It seems strange we would catch an EOFE on deserialization when we have the version available to us. Can we use the SCAN_VERSION? Up it to 2 and have version 2 able to deserialize both versions? Exactly. That is what I meant by saying/asking "Still one may think that using Scan version is cleaner variant. Not sure what are the plans for Scan version and when it is ok to change it by initial design."
          Hide
          Alex Baranau added a comment -

          Submitted updated patch (fixed the nits mentioned by Stack).

          Show
          Alex Baranau added a comment - Submitted updated patch (fixed the nits mentioned by Stack).
          Hide
          Alex Baranau added a comment -

          Sorry, attached new one: my IDE shuffled the imports in the previous one

          Show
          Alex Baranau added a comment - Sorry, attached new one: my IDE shuffled the imports in the previous one
          Hide
          stack added a comment -

          Committed to TRUNK. Thanks for the patch Alex (I made you a contributor and so in future you should be able to assign yourself issues).

          Show
          stack added a comment - Committed to TRUNK. Thanks for the patch Alex (I made you a contributor and so in future you should be able to assign yourself issues).
          Hide
          Alex Baranau added a comment -

          Thank you!

          Show
          Alex Baranau added a comment - Thank you!
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #1939 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1939/)
          HBASE-3811 Allow adding attributes to Scan
          HBASE-3811 Allow adding attributes to Scan

          stack :
          Files :

          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestScan.java

          stack :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #1939 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1939/ ) HBASE-3811 Allow adding attributes to Scan HBASE-3811 Allow adding attributes to Scan stack : Files : /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestScan.java stack : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java

            People

            • Assignee:
              Alex Baranau
              Reporter:
              Alex Baranau
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development