Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0
    • Component/s: mapreduce, snapshots
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Added TableSnapshotInputFormat and TableSnapshotScanner for performing scans over hbase table snapshots from the client side, bypassing the hbase servers. The former configures a mapreduce job, while the latter does single client side scan over snapshot files. Can also be used with offline HBase with in-place or exported snapshot files.

      WARNING: This feature bypasses HBase-level security completely since the files are read from the hdfs directly. The user who is running the scan / job has to have read permissions to the data files and snapshot files.

      Show
      Added TableSnapshotInputFormat and TableSnapshotScanner for performing scans over hbase table snapshots from the client side, bypassing the hbase servers. The former configures a mapreduce job, while the latter does single client side scan over snapshot files. Can also be used with offline HBase with in-place or exported snapshot files. WARNING: This feature bypasses HBase-level security completely since the files are read from the hdfs directly. The user who is running the scan / job has to have read permissions to the data files and snapshot files.

      Description

      The idea is to add an InputFormat, which can run the mapreduce job over snapshot files directly bypassing hbase server layer. The IF is similar in usage to TableInputFormat, taking a Scan object from the user, but instead of running from an online table, it runs from a table snapshot. We do one split per region in the snapshot, and open an HRegion inside the RecordReader. A RegionScanner is used internally for doing the scan without any HRegionServer bits.

      Users have been asking and searching for ways to run MR jobs by reading directly from hfiles, so this allows new use cases if reading from stale data is ok:

      • Take snapshots periodically, and run MR jobs only on snapshots.
      • Export snapshots to remote hdfs cluster, run the MR jobs at that cluster without HBase cluster.
      • (Future use case) Combine snapshot data with online hbase data: Scan from yesterday's snapshot, but read today's data from online hbase cluster.
      1. HBASE-8369-trunk_v3.patch
        24 kB
        Bryan Keller
      2. HBASE-8369-trunk_v2.patch
        24 kB
        Bryan Keller
      3. HBASE-8369-trunk_v1.patch
        24 kB
        Bryan Keller
      4. HBASE-8369-0.94.patch
        23 kB
        Bryan Keller
      5. HBASE-8369-0.94_v5.patch
        24 kB
        Bryan Keller
      6. HBASE-8369-0.94_v4.patch
        24 kB
        Bryan Keller
      7. HBASE-8369-0.94_v3.patch
        24 kB
        Bryan Keller
      8. HBASE-8369-0.94_v2.patch
        24 kB
        Bryan Keller
      9. hbase-8369_v9.patch
        150 kB
        Enis Soztutar
      10. hbase-8369_v8.patch
        151 kB
        Enis Soztutar
      11. hbase-8369_v7.patch
        151 kB
        Enis Soztutar
      12. hbase-8369_v6.patch
        148 kB
        Enis Soztutar
      13. hbase-8369_v5.patch
        160 kB
        Enis Soztutar
      14. hbase-8369_v11.patch
        152 kB
        Enis Soztutar
      15. hbase-8369_v0.patch
        73 kB
        Enis Soztutar

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          74d 22h 56m 1 Bryan Keller 01/Jul/13 22:24
          Patch Available Patch Available Resolved Resolved
          140d 55m 1 Enis Soztutar 18/Nov/13 22:20
          Hide
          deepankar added a comment -

          created the jira HBASE-11335 and attached the patches

          Show
          deepankar added a comment - created the jira HBASE-11335 and attached the patches
          Hide
          deepankar added a comment -

          sorry did not create yet, but I am working on the patch,
          will create the jira with the patches uploaded by end of the day

          Show
          deepankar added a comment - sorry did not create yet, but I am working on the patch, will create the jira with the patches uploaded by end of the day
          Hide
          Enis Soztutar added a comment -

          deepankar did you end up creating an issue? I might have missed it.

          Show
          Enis Soztutar added a comment - deepankar did you end up creating an issue? I might have missed it.
          Hide
          deepankar added a comment -

          will create a jira and upload a patch

          Show
          deepankar added a comment - will create a jira and upload a patch
          Hide
          Enis Soztutar added a comment -

          It seems you are right. I think we changed the purpose of TABLE_DIR_KEY some time ago in one of the patch iterations. I think we should still pass the root dir for restore, but send the actual table dir to computeHDFSBlocksDistribution. Can you please open a new issue. You can also attach a patch if you don't mind working on it. Thanks.

          Show
          Enis Soztutar added a comment - It seems you are right. I think we changed the purpose of TABLE_DIR_KEY some time ago in one of the patch iterations. I think we should still pass the root dir for restore, but send the actual table dir to computeHDFSBlocksDistribution. Can you please open a new issue. You can also attach a patch if you don't mind working on it. Thanks.
          Hide
          deepankar added a comment -

          Sorry for the confusion, in my comment I meant that

          conf.set(TABLE_DIR_KEY, FSUtils.getTableDir(restoreDir, tableDesc.getTableName()));
          

          This similar statement in the 0.94 is

          Path tableDir = new Path(restoreDir, htd.getNameAsString());
           conf.set(TABLE_DIR_KEY, tableDir.toString());
          

          My concern is that instead of setting the TABLE_DIR_KEY to the tableDir under restoreDir, you are directly setting it to restoreDir
          I mean the tableDir will contain the tablesRegions and simlinks for the HFiles, but where as the restoreDir will contain /data/<namespace>/tableName/.

          As far as my understanding, setting this parameter wrong may not cause any real problems but the weight calculation of HDFS blocks
          will go wrong and thus leading to non local tasks,

          The reason for above is because of the following lines in the record reader

                Path tmpRootDir = new Path(conf.get(TABLE_DIR_KEY)); // This is the user specified root
                // directory where snapshot was restored
          

          so while creating record reader we are using this in the right way i.e as root directory

          But while using for the calculation of weights in the getInputSplits

              Path tableDir = new Path(conf.get(TABLE_DIR_KEY));
          
              List<InputSplit> splits = new ArrayList<InputSplit>();
              for (SnapshotRegionManifest regionManifest : regionManifests) {
                // load region descriptor
                HRegionInfo hri = HRegionInfo.convert(regionManifest.getRegionInfo());
          
                if (CellUtil.overlappingKeys(scan.getStartRow(), scan.getStopRow(),
                  hri.getStartKey(), hri.getEndKey())) {
                  // compute HDFS locations from snapshot files (which will get the locations for
                  // referred hfiles)
                  List<String> hosts = getBestLocations(conf,
                    HRegion.computeHDFSBlocksDistribution(conf, htd, hri, tableDir));
          
          

          we are sending the rootDir from the conf.get(TABLE_DIR_KEY) as the tableDir in the HRegion.computeHDFSBlocksDistribution(conf, htd, hri, tableDir) This will lead to returning a empty *HDFSBlocksDistribution*
          class which will lead to scheduling of non local tasks.

          Again there might be some mistake in my understanding, if so please correct me

          Show
          deepankar added a comment - Sorry for the confusion, in my comment I meant that conf.set(TABLE_DIR_KEY, FSUtils.getTableDir(restoreDir, tableDesc.getTableName())); This similar statement in the 0.94 is Path tableDir = new Path(restoreDir, htd.getNameAsString()); conf.set(TABLE_DIR_KEY, tableDir.toString()); My concern is that instead of setting the TABLE_DIR_KEY to the tableDir under restoreDir, you are directly setting it to restoreDir I mean the tableDir will contain the tablesRegions and simlinks for the HFiles, but where as the restoreDir will contain /data/<namespace>/tableName/ . As far as my understanding, setting this parameter wrong may not cause any real problems but the weight calculation of HDFS blocks will go wrong and thus leading to non local tasks, The reason for above is because of the following lines in the record reader Path tmpRootDir = new Path(conf.get(TABLE_DIR_KEY)); // This is the user specified root // directory where snapshot was restored so while creating record reader we are using this in the right way i.e as root directory But while using for the calculation of weights in the getInputSplits Path tableDir = new Path(conf.get(TABLE_DIR_KEY)); List<InputSplit> splits = new ArrayList<InputSplit>(); for (SnapshotRegionManifest regionManifest : regionManifests) { // load region descriptor HRegionInfo hri = HRegionInfo.convert(regionManifest.getRegionInfo()); if (CellUtil.overlappingKeys(scan.getStartRow(), scan.getStopRow(), hri.getStartKey(), hri.getEndKey())) { // compute HDFS locations from snapshot files (which will get the locations for // referred hfiles) List< String > hosts = getBestLocations(conf, HRegion.computeHDFSBlocksDistribution(conf, htd, hri, tableDir)); we are sending the rootDir from the conf.get(TABLE_DIR_KEY) as the tableDir in the HRegion.computeHDFSBlocksDistribution(conf, htd, hri, tableDir) This will lead to returning a empty * HDFSBlocksDistribution * class which will lead to scheduling of non local tasks. Again there might be some mistake in my understanding, if so please correct me
          Hide
          Enis Soztutar added a comment -

          I think the restoreDir here will be the root directory of the restored snapshot and hence the conf.set should not set restoreDir as the tableDir, rather it should be doing the following

          We are restoring the snapshot to the restoreDir, which would create the sym-link files for the table. We are setting the table dir to the restore directory, so that we can open the table's regions from the restore directory. We cannot use the root directory since that would mean we are reading from the actual table's regions rather than snapshot's regions.
          Did you run into a problem with this patch? Why do you think this is a problem?

          Show
          Enis Soztutar added a comment - I think the restoreDir here will be the root directory of the restored snapshot and hence the conf.set should not set restoreDir as the tableDir, rather it should be doing the following We are restoring the snapshot to the restoreDir, which would create the sym-link files for the table. We are setting the table dir to the restore directory, so that we can open the table's regions from the restore directory. We cannot use the root directory since that would mean we are reading from the actual table's regions rather than snapshot's regions. Did you run into a problem with this patch? Why do you think this is a problem?
          Hide
          deepankar added a comment -

          Sorry forgot to mention the class in the above comment it is TableSnapshotInputFormat

          Show
          deepankar added a comment - Sorry forgot to mention the class in the above comment it is TableSnapshotInputFormat
          Hide
          deepankar added a comment -

          In the patch that is merged in the trunk (hbase-8369_v9.patch and also on the head in github), I think there is a very minor bug (or a mistake in my understanding). In the below function

          public static void setInput(Job job, String snapshotName, Path restoreDir) throws IOException {
          

          to restore the snapshot we do this

          restoreDir = new Path(restoreDir, UUID.randomUUID().toString());
          
          // TODO: restore from record readers to parallelize.
          RestoreSnapshotHelper.restoreSnapshotForScanner(conf, fs, rootDir, restoreDir, snapshotName);
          conf.set(TABLE_DIR_KEY, restoreDir.toString());
          

          I think the restoreDir here will be the root directory of the restored snapshot and hence the conf.set should not set restoreDir as the
          tableDir, rather it should be doing the following

          conf.set(TABLE_DIR_KEY, FSUtils.getTableDir(rootDir, tableDesc.getTableName()));
          

          Can somebody correct me if I am wrong ?

          Show
          deepankar added a comment - In the patch that is merged in the trunk (hbase-8369_v9.patch and also on the head in github), I think there is a very minor bug (or a mistake in my understanding). In the below function public static void setInput(Job job, String snapshotName, Path restoreDir) throws IOException { to restore the snapshot we do this restoreDir = new Path(restoreDir, UUID.randomUUID().toString()); // TODO: restore from record readers to parallelize. RestoreSnapshotHelper.restoreSnapshotForScanner(conf, fs, rootDir, restoreDir, snapshotName); conf.set(TABLE_DIR_KEY, restoreDir.toString()); I think the restoreDir here will be the root directory of the restored snapshot and hence the conf.set should not set restoreDir as the tableDir, rather it should be doing the following conf.set(TABLE_DIR_KEY, FSUtils.getTableDir(rootDir, tableDesc.getTableName())); Can somebody correct me if I am wrong ?
          Hide
          Lars Hofhansl added a comment -

          These seems inmportant:

          TableMapreduceUtil changes (other than the new method): needed in case security is enabled. We should not talk with the HBase cluster at all.

          HDFSBlocksDistribution: in v3, we are providing 3 servers with highest locality to the input split. In v11, we are using all the servers with 80% of the locality for the top locality server. This ensures better locality.

          This seems nice to have:

          ClientScanner / AbstractClientScanner / TableRecordReaderImpl changes: the ClientSideRegion scanner keeps track of ScanMetrics, and exports those via MR job counters or Scan.

          To track what/how the mappers are doing.
          The rest we can do without in 0.94. Let's also move the discussion to HBASE-10076.

          Show
          Lars Hofhansl added a comment - These seems inmportant: TableMapreduceUtil changes (other than the new method): needed in case security is enabled. We should not talk with the HBase cluster at all. HDFSBlocksDistribution: in v3, we are providing 3 servers with highest locality to the input split. In v11, we are using all the servers with 80% of the locality for the top locality server. This ensures better locality. This seems nice to have: ClientScanner / AbstractClientScanner / TableRecordReaderImpl changes: the ClientSideRegion scanner keeps track of ScanMetrics, and exports those via MR job counters or Scan. To track what/how the mappers are doing. The rest we can do without in 0.94. Let's also move the discussion to HBASE-10076 .
          Hide
          Enis Soztutar added a comment -

          Maybe Enis Soztutar can mention the logic on why for some of these kinds of things?

          These are the list of high level things in the final version(v11) of the patch, which are different from Bryan's version (trunk-v3)

          • ClientScanner / AbstractClientScanner / TableRecordReaderImpl changes: the ClientSideRegion scanner keeps track of ScanMetrics, and exports those via MR job counters or Scan.
          • CellUtil changes : these are at a different place in Bryan's patch.
          • PB of MR data
          • HDFSBlocksDistribution: in v3, we are providing 3 servers with highest locality to the input split. In v11, we are using all the servers with 80% of the locality for the top locality server. This ensures better locality.
          • ClientSideRegionScanner / TableSnapshotScanner: not present in v3. ClientSideRegionScanner is an internal class to do the scanning. Both TableSnapshotScanner and TableSnapshotInputFormat uses it. TableSnapshotScanner is a client API, to scan snapshots without MR.
          • TableMapreduceUtil changes (other than the new method): needed in case security is enabled. We should not talk with the HBase cluster at all.
          • HRegion changes: v3 patch does send the parent dir for the region snapshot by assuming that table dir is the parent dir of the region dir. We do not want to make that assumption in trunk.
          • RestoreSnapshotHelper / ModifyRegionUtils : code organization
          • Other than these, general test, integration test, or performance evaluation tools.

          For 0.94, we can do a less intrusive patch which combines some of the changes above (like RestoreSnapshotHelper changes going into the new classes), and get rid of some of the changes like HRegion changes.

          Show
          Enis Soztutar added a comment - Maybe Enis Soztutar can mention the logic on why for some of these kinds of things? These are the list of high level things in the final version(v11) of the patch, which are different from Bryan's version (trunk-v3) ClientScanner / AbstractClientScanner / TableRecordReaderImpl changes: the ClientSideRegion scanner keeps track of ScanMetrics, and exports those via MR job counters or Scan. CellUtil changes : these are at a different place in Bryan's patch. PB of MR data HDFSBlocksDistribution: in v3, we are providing 3 servers with highest locality to the input split. In v11, we are using all the servers with 80% of the locality for the top locality server. This ensures better locality. ClientSideRegionScanner / TableSnapshotScanner: not present in v3. ClientSideRegionScanner is an internal class to do the scanning. Both TableSnapshotScanner and TableSnapshotInputFormat uses it. TableSnapshotScanner is a client API, to scan snapshots without MR. TableMapreduceUtil changes (other than the new method): needed in case security is enabled. We should not talk with the HBase cluster at all. HRegion changes: v3 patch does send the parent dir for the region snapshot by assuming that table dir is the parent dir of the region dir. We do not want to make that assumption in trunk. RestoreSnapshotHelper / ModifyRegionUtils : code organization Other than these, general test, integration test, or performance evaluation tools. For 0.94, we can do a less intrusive patch which combines some of the changes above (like RestoreSnapshotHelper changes going into the new classes), and get rid of some of the changes like HRegion changes.
          Hide
          stack added a comment -

          Thinking about this more, it is a good exercise to keep the M/R code out of HBase anyway...

          That'd be coolio. Elliott Clark looked at doing the first step, making a new mr module but apparently a web of circular depdencies (client and server)

          Show
          stack added a comment - Thinking about this more, it is a good exercise to keep the M/R code out of HBase anyway... That'd be coolio. Elliott Clark looked at doing the first step, making a new mr module but apparently a web of circular depdencies (client and server)
          Hide
          Bryan Keller added a comment -

          Btw my patch is based off of Enis's original patch. The functionality should be the same AFAIK, i.e. a new table directory is assembled from the snapshot.

          Show
          Bryan Keller added a comment - Btw my patch is based off of Enis's original patch. The functionality should be the same AFAIK, i.e. a new table directory is assembled from the snapshot.
          Hide
          Bryan Keller added a comment -

          One reason it is much smaller is I'm not using any of the Protobuf code, which is pretty sizable.

          Show
          Bryan Keller added a comment - One reason it is much smaller is I'm not using any of the Protobuf code, which is pretty sizable.
          Hide
          Lars Hofhansl added a comment -

          Thanks Bryan Keller. I too am interested why your patch is smaller. You say you run it in production, which is great. I'll try the patch and see if it works without any additional hooks - that would be awesome.

          Show
          Lars Hofhansl added a comment - Thanks Bryan Keller . I too am interested why your patch is smaller. You say you run it in production, which is great. I'll try the patch and see if it works without any additional hooks - that would be awesome.
          Hide
          Jesse Yates added a comment -

          Bryan Keller wouldn't be against a no-hook version going into 0.94, but its significantly smaller than what Enis originally implemented. Maybe you can shed some light on why you didn't need to do things like have a custom ClientScanner or copy over the snapshot files into a new directory(1)

          Maybe Enis Soztutar can mention the logic on why for some of these kinds of things?

          (1) now that I think about it, maybe trying avoid the snapshot being deleted out from under you. I'll let the authors comment more though

          Show
          Jesse Yates added a comment - Bryan Keller wouldn't be against a no-hook version going into 0.94, but its significantly smaller than what Enis originally implemented. Maybe you can shed some light on why you didn't need to do things like have a custom ClientScanner or copy over the snapshot files into a new directory(1) Maybe Enis Soztutar can mention the logic on why for some of these kinds of things? (1) now that I think about it, maybe trying avoid the snapshot being deleted out from under you. I'll let the authors comment more though
          Hide
          Sergey Shelukhin added a comment -

          If there are no hooks, would it also work in 96 by adding jars? Perhaps we should put it into release notes.
          Although if it's so non-intrusive I'd be +1 on putting into both 96 and 94

          Show
          Sergey Shelukhin added a comment - If there are no hooks, would it also work in 96 by adding jars? Perhaps we should put it into release notes. Although if it's so non-intrusive I'd be +1 on putting into both 96 and 94
          Hide
          Bryan Keller added a comment -

          The hooks are not strictly required (e.g. the patch I submitted uses the stock 0.94 distribution without hooks). In case the hooks are deemed "risky", it should be possible to modify the current patch not to use the hooks.

          Show
          Bryan Keller added a comment - The hooks are not strictly required (e.g. the patch I submitted uses the stock 0.94 distribution without hooks). In case the hooks are deemed "risky", it should be possible to modify the current patch not to use the hooks.
          Hide
          Lars Hofhansl added a comment -

          I'll look forward to the flowers

          Thinking about this more, it is a good exercise to keep the M/R code out of HBase anyway to then see what we need to add to core HBase in order to allow for that. Hopefully that would enable other M/R usecases without any core patches needed.

          Note, though, that the only risky bits (if any) of this are exactly these hooks. Everything else is just additional classes.

          Show
          Lars Hofhansl added a comment - I'll look forward to the flowers Thinking about this more, it is a good exercise to keep the M/R code out of HBase anyway to then see what we need to add to core HBase in order to allow for that. Hopefully that would enable other M/R usecases without any core patches needed. Note, though, that the only risky bits (if any) of this are exactly these hooks. Everything else is just additional classes.
          Hide
          stack added a comment -

          Something like that could go into 0.96 surely. stack ?

          Could do that (looking at the patch it doesn't look needed in 0.96).

          No objection from me adding in hooks into 0.94 as long as... you know what I'm going to say.

          I'll bring flowers next time we meet Lars Hofhansl

          Show
          stack added a comment - Something like that could go into 0.96 surely. stack ? Could do that (looking at the patch it doesn't look needed in 0.96). No objection from me adding in hooks into 0.94 as long as... you know what I'm going to say. I'll bring flowers next time we meet Lars Hofhansl
          Hide
          Lars Hofhansl added a comment - - edited

          The only changes to existing HBase classes are exactly these hooks, though. Without them it cannot be done with outside code. When those are in place anyway, might as well add some new classes for M/R stuff; but it's fine to keep these outside, they just become part of the M/R job then.

          To explain my comment above:
          Adding a few classes is not a fork of course, but it starts a slippery slope. Once you started it's easy to pile on top of that. And there are some HBase changes needed, so it is an actual patch we need to maintain.
          We have so far completely avoided that (except for some hopefully temporary security related changes to HDFS), and I have been a strong advocate for that in our organization. We have also always forward ported any changes we made to 0.96+. So it is frustrating having to start this even (or especially) for such a small change.

          So please pardon my frustration.
          I do not understand the reluctance with this, as it is almost no risk and some folks will be using 0.94 for a while.
          Whether it's a new "feature" or not is not relevant (IMHO). HBase's slow M/R performance could be considered a bug too, and then this would be bug fix.

          We're not breaking up over this

          So it seems a good compromise would be to get the required hooks into HBase...?
          Jesse Yates, FYI.

          Show
          Lars Hofhansl added a comment - - edited The only changes to existing HBase classes are exactly these hooks, though. Without them it cannot be done with outside code. When those are in place anyway, might as well add some new classes for M/R stuff; but it's fine to keep these outside, they just become part of the M/R job then. To explain my comment above: Adding a few classes is not a fork of course, but it starts a slippery slope. Once you started it's easy to pile on top of that. And there are some HBase changes needed, so it is an actual patch we need to maintain. We have so far completely avoided that (except for some hopefully temporary security related changes to HDFS), and I have been a strong advocate for that in our organization. We have also always forward ported any changes we made to 0.96+. So it is frustrating having to start this even (or especially) for such a small change. So please pardon my frustration. I do not understand the reluctance with this, as it is almost no risk and some folks will be using 0.94 for a while. Whether it's a new "feature" or not is not relevant (IMHO). HBase's slow M/R performance could be considered a bug too, and then this would be bug fix. We're not breaking up over this So it seems a good compromise would be to get the required hooks into HBase...? Jesse Yates , FYI.
          Hide
          Andrew Purtell added a comment -

          Like an appropriate HRegion constructor, etc.

          Something like that could go into 0.96 surely. stack ?

          We are going to have to start to go down the Facebook path of forking HBase for things like this then and our contribution will become less useful over time. So be it.

          To put this politely (I have strong opinions) the FB fork was a matter of a tight internal deployment schedule as opposed to any unwillingness of the community to work with their contributions.

          The addition of a couple of extra classes to a private build does not make a fork, just like the enhancements to reduce byte copies for "smart clients" that Jesse was working that went mainly into Phoenix didn't produce a fork. If and when the time comes that truly an incompatible change must be introduced that constitutes a real break, we should definitely look hard at that.

          Show
          Andrew Purtell added a comment - Like an appropriate HRegion constructor, etc. Something like that could go into 0.96 surely. stack ? We are going to have to start to go down the Facebook path of forking HBase for things like this then and our contribution will become less useful over time. So be it. To put this politely (I have strong opinions) the FB fork was a matter of a tight internal deployment schedule as opposed to any unwillingness of the community to work with their contributions. The addition of a couple of extra classes to a private build does not make a fork, just like the enhancements to reduce byte copies for "smart clients" that Jesse was working that went mainly into Phoenix didn't produce a fork. If and when the time comes that truly an incompatible change must be introduced that constitutes a real break, we should definitely look hard at that.
          Hide
          stack added a comment -

          We are going to have to start to go down the Facebook path of forking HBase for things like this then and our contribution will become less useful over time.

          Why not add in the changes to core you need to support this feature into 0.94?

          Lets not break up over whether a couple of mapreduce classes are in core or not.

          Show
          stack added a comment - We are going to have to start to go down the Facebook path of forking HBase for things like this then and our contribution will become less useful over time. Why not add in the changes to core you need to support this feature into 0.94? Lets not break up over whether a couple of mapreduce classes are in core or not.
          Hide
          Lars Hofhansl added a comment -

          anyone who wants to do this in 0.96 can just take the two mapreduce classes and include them in their mapreduce job

          That could work as long as we put the necessary hooks into HBase itself - the small changes to the existing classes. Like an appropriate HRegion constructor, etc.

          Show
          Lars Hofhansl added a comment - anyone who wants to do this in 0.96 can just take the two mapreduce classes and include them in their mapreduce job That could work as long as we put the necessary hooks into HBase itself - the small changes to the existing classes. Like an appropriate HRegion constructor, etc.
          Hide
          Lars Hofhansl added a comment -

          We will use this in our setup, which is based on 0.94. Like us some folks are stuck at a certain release of HBase, which does not mean they are not interested in new features.

          We are going to have to start to go down the Facebook path of forking HBase for things like this then and our contribution will become less useful over time. So be it.

          Show
          Lars Hofhansl added a comment - We will use this in our setup, which is based on 0.94. Like us some folks are stuck at a certain release of HBase, which does not mean they are not interested in new features. We are going to have to start to go down the Facebook path of forking HBase for things like this then and our contribution will become less useful over time. So be it.
          Hide
          Elliott Clark added a comment -

          I'm -0.5 on getting this into older releases. To me it seems like the community drew a line in the sand on what features made it into 0.96. We held that line for some things (security, etc). So it seems weird that we would pull this into older releases without extraordinary circumstances, and we were strict other places.

          We've started faster release trains and 0.98 is on the way. I would vote for holding a very strict no new features into old releases now that we have addressed concerns over release timings.

          Show
          Elliott Clark added a comment - I'm -0.5 on getting this into older releases. To me it seems like the community drew a line in the sand on what features made it into 0.96. We held that line for some things (security, etc). So it seems weird that we would pull this into older releases without extraordinary circumstances, and we were strict other places. We've started faster release trains and 0.98 is on the way. I would vote for holding a very strict no new features into old releases now that we have addressed concerns over release timings.
          Hide
          stack added a comment -

          I want to hold to no new features post release of a branch unless extraordinary circumstances. This one is very nice but doesn't seem to qualify as 'extraordinary' given anyone who wants to do this in 0.96 can just take the two mapreduce classes and include them in their mapreduce job – or go get the 0.98 jar (won't that work?).

          Show
          stack added a comment - I want to hold to no new features post release of a branch unless extraordinary circumstances. This one is very nice but doesn't seem to qualify as 'extraordinary' given anyone who wants to do this in 0.96 can just take the two mapreduce classes and include them in their mapreduce job – or go get the 0.98 jar (won't that work?).
          Hide
          Andrew Purtell added a comment -

          No, I think the incongruity of having something in 0.94, missing in 0.96, and again in 0.98 is bad practice. If we are blocked getting it in to 0.96, then it has to start > 0.96.

          Show
          Andrew Purtell added a comment - No, I think the incongruity of having something in 0.94, missing in 0.96, and again in 0.98 is bad practice. If we are blocked getting it in to 0.96, then it has to start > 0.96.
          Hide
          Jesse Yates added a comment -

          It would be also be odd to have it deprecated in 0.94, gone in 0.96 and then back again in 0.98 - makes users wonder about what happened in the middle there (since they probably won't read this jira to get context).

          Guess if we are explicit in the javadocs then it would be alright, just funky

          Show
          Jesse Yates added a comment - It would be also be odd to have it deprecated in 0.94, gone in 0.96 and then back again in 0.98 - makes users wonder about what happened in the middle there (since they probably won't read this jira to get context). Guess if we are explicit in the javadocs then it would be alright, just funky
          Hide
          Enis Soztutar added a comment -

          We can always just maintain it locally in our own repo.

          BTW, I am not against to commit this to 0.94, although 0.96 may or may not have it. Just that we should be very explicit about it (with javadoc, or maybe @deprecated notification?)

          Show
          Enis Soztutar added a comment - We can always just maintain it locally in our own repo. BTW, I am not against to commit this to 0.94, although 0.96 may or may not have it. Just that we should be very explicit about it (with javadoc, or maybe @deprecated notification?)
          Hide
          Lars Hofhansl added a comment -

          We have a backport patch already in HBASE-10076. We can always just maintain it locally in our own repo.

          This is very low risk change, though... Almost no existing classes changed. And would be a good story for HBase's bad scan performance (even with M/R). What do you say stack?

          Show
          Lars Hofhansl added a comment - We have a backport patch already in HBASE-10076 . We can always just maintain it locally in our own repo. This is very low risk change, though... Almost no existing classes changed. And would be a good story for HBase's bad scan performance (even with M/R). What do you say stack ?
          Hide
          Bryan Keller added a comment -

          FWIW, you can use the 0.94 patch I submitted without modifying the 0.94 release if a couple of minor changes are implemented (mostly naming). I have been using it in production for a while now with 0.94. Perhaps it could be tweaked and offered separately in a "contrib" directory or something in 0.94, along with the caveat about file permissions.

          Show
          Bryan Keller added a comment - FWIW, you can use the 0.94 patch I submitted without modifying the 0.94 release if a couple of minor changes are implemented (mostly naming). I have been using it in production for a while now with 0.94. Perhaps it could be tweaked and offered separately in a "contrib" directory or something in 0.94, along with the caveat about file permissions.
          Hide
          Enis Soztutar added a comment -

          It will be super confusing for users if it comes in 0.94, but not in 0.96. Either 0.94 version should come with a very visible warning that this feature won't be available in 0.96, or it should not come at all.

          Show
          Enis Soztutar added a comment - It will be super confusing for users if it comes in 0.94, but not in 0.96. Either 0.94 version should come with a very visible warning that this feature won't be available in 0.96, or it should not come at all.
          Hide
          Lars Hofhansl added a comment -

          You OK with this in 0.94, but not in 0.96?
          For Salesforce this would be very useful in 0.94, since a lot of our backup logic is implemented as M/R.

          Show
          Lars Hofhansl added a comment - You OK with this in 0.94, but not in 0.96? For Salesforce this would be very useful in 0.94, since a lot of our backup logic is implemented as M/R.
          Hide
          stack added a comment -

          You know my story on 'features' in 0.96 boss.

          Show
          stack added a comment - You know my story on 'features' in 0.96 boss.
          Hide
          Lars Hofhansl added a comment -

          I filed HBASE-10076 for the 0.94 backport. stack, how about 0.96? (mostly new code and incredibly helpful in some cases)

          Show
          Lars Hofhansl added a comment - I filed HBASE-10076 for the 0.94 backport. stack , how about 0.96? (mostly new code and incredibly helpful in some cases)
          Lars Hofhansl made changes -
          Link This issue relates to HBASE-10076 [ HBASE-10076 ]
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4687 (See https://builds.apache.org/job/HBase-TRUNK/4687/)
          HBASE-8369 MapReduce over snapshot files (enis: rev 1543195)

          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractClientScanner.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
          • /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java
          • /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestTableSnapshotInputFormat.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MapReduceProtos.java
          • /hbase/trunk/hbase-protocol/src/main/protobuf/MapReduce.proto
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HDFSBlocksDistribution.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/TableSnapshotScanner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ModifyRegionUtils.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ScanPerformanceEvaluation.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestTableSnapshotScanner.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSnapshotInputFormat.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4687 (See https://builds.apache.org/job/HBase-TRUNK/4687/ ) HBASE-8369 MapReduce over snapshot files (enis: rev 1543195) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractClientScanner.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestTableSnapshotInputFormat.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MapReduceProtos.java /hbase/trunk/hbase-protocol/src/main/protobuf/MapReduce.proto /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HDFSBlocksDistribution.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/TableSnapshotScanner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ModifyRegionUtils.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ScanPerformanceEvaluation.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestTableSnapshotScanner.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSnapshotInputFormat.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java
          Enis Soztutar made changes -
          Attachment hbase-8369_v11.patch [ 12614550 ]
          Hide
          Enis Soztutar added a comment -

          Forgot to attach the latest committed version from RB. The only difference is fixed javadoc between v10 from RB and v11.

          Show
          Enis Soztutar added a comment - Forgot to attach the latest committed version from RB. The only difference is fixed javadoc between v10 from RB and v11.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #843 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/843/)
          HBASE-8369 MapReduce over snapshot files (enis: rev 1543195)

          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractClientScanner.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
          • /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java
          • /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestTableSnapshotInputFormat.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MapReduceProtos.java
          • /hbase/trunk/hbase-protocol/src/main/protobuf/MapReduce.proto
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HDFSBlocksDistribution.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/TableSnapshotScanner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ModifyRegionUtils.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ScanPerformanceEvaluation.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestTableSnapshotScanner.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSnapshotInputFormat.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #843 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/843/ ) HBASE-8369 MapReduce over snapshot files (enis: rev 1543195) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractClientScanner.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestTableSnapshotInputFormat.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MapReduceProtos.java /hbase/trunk/hbase-protocol/src/main/protobuf/MapReduce.proto /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HDFSBlocksDistribution.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/TableSnapshotScanner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ModifyRegionUtils.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ScanPerformanceEvaluation.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestTableSnapshotScanner.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSnapshotInputFormat.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java
          Enis Soztutar made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Hide
          Enis Soztutar added a comment -

          I've finally committed this to trunk. Thanks everyone for reviews and discussions.

          Unfortunately, I did not have any time for 0.94 backport. Not sure about 0.96 branch as well. If we are doing 0.94 backport, it seems that we should have it in 0.96 as well.

          Show
          Enis Soztutar added a comment - I've finally committed this to trunk. Thanks everyone for reviews and discussions. Unfortunately, I did not have any time for 0.94 backport. Not sure about 0.96 branch as well. If we are doing 0.94 backport, it seems that we should have it in 0.96 as well.
          Hide
          Devaraj Das added a comment -

          LGTM. The javadoc warnings from hadoopqa are valid though. They should be fixed.

          Show
          Devaraj Das added a comment - LGTM. The javadoc warnings from hadoopqa are valid though. They should be fixed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12613728/hbase-8369_v9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 16 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 3 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613728/hbase-8369_v9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 16 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 3 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7851//console This message is automatically generated.
          Enis Soztutar made changes -
          Attachment hbase-8369_v9.patch [ 12613728 ]
          Hide
          Enis Soztutar added a comment -

          Update patch from RB.

          Show
          Enis Soztutar added a comment - Update patch from RB.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12613507/hbase-8369_v8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 16 new or modified tests.

          -1 hadoop1.0. The patch failed to compile against the hadoop 1.0 profile.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7830//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613507/hbase-8369_v8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 16 new or modified tests. -1 hadoop1.0 . The patch failed to compile against the hadoop 1.0 profile. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7830//console This message is automatically generated.
          Enis Soztutar made changes -
          Attachment hbase-8369_v8.patch [ 12613507 ]
          Hide
          Enis Soztutar added a comment -

          Patch from RB.

          Show
          Enis Soztutar added a comment - Patch from RB.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12612097/hbase-8369_v7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 16 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:

          -1 core zombie tests. There are 1 zombie test(s): at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612097/hbase-8369_v7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 16 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: -1 core zombie tests . There are 1 zombie test(s): at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//console This message is automatically generated.
          Enis Soztutar made changes -
          Attachment hbase-8369_v7.patch [ 12612097 ]
          Hide
          Enis Soztutar added a comment -

          Patch from RB.

          Show
          Enis Soztutar added a comment - Patch from RB.
          Enis Soztutar made changes -
          Release Note Added TableSnapshotInputFormat and TableSnapshotScanner for performing scans over hbase table snapshots from the client side, bypassing the hbase servers. The former configures a mapreduce job, while the latter does single client side scan over snapshot files. Can also be used with offline HBase with in-place or exported snapshot files.

          WARNING: This feature bypasses HBase-level security completely since the files are read from the hdfs directly. The user who is running the scan / job has to have read permissions to the data files and snapshot files.

          Hide
          Enis Soztutar added a comment -

          If you make a 0.94 patch I'll offer to try on a real cluster

          Will do a backport soon. Thanks for the offer.

          Andrew Purtell Let me write up something. Please feel free to add / edit.

          Show
          Enis Soztutar added a comment - If you make a 0.94 patch I'll offer to try on a real cluster Will do a backport soon. Thanks for the offer. Andrew Purtell Let me write up something. Please feel free to add / edit.
          Hide
          Andrew Purtell added a comment -

          This is a fine feature for performance but needs a fat release note indicating it is incompatible with security features we have now and planned.

          ACLs: Since the AccessController is bypassed, only file level permissions can provide protection. Perhaps someday if HDFS has file ACLs we can push CF ACLs down, that would be a useful improvement and maybe the best we can do here.

          Cell ACLs and visibility labels: The code that does access or visibility checking lives in the regionserver and cannot be trusted to filter cell if running in a mapreduce task with untrustable user code.

          Encryption: Unless the reader has access to the cluster master key the files won't be readable. That's the desired outcome. Shipping the master key to a MR job where untrustable user code can steal it would not be advisable.

          Show
          Andrew Purtell added a comment - This is a fine feature for performance but needs a fat release note indicating it is incompatible with security features we have now and planned. ACLs: Since the AccessController is bypassed, only file level permissions can provide protection. Perhaps someday if HDFS has file ACLs we can push CF ACLs down, that would be a useful improvement and maybe the best we can do here. Cell ACLs and visibility labels: The code that does access or visibility checking lives in the regionserver and cannot be trusted to filter cell if running in a mapreduce task with untrustable user code. Encryption: Unless the reader has access to the cluster master key the files won't be readable. That's the desired outcome. Shipping the master key to a MR job where untrustable user code can steal it would not be advisable.
          Hide
          Lars Hofhansl added a comment -

          If you make a 0.94 patch I'll offer to try on a real cluster

          Show
          Lars Hofhansl added a comment - If you make a 0.94 patch I'll offer to try on a real cluster
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12611171/hbase-8369_v6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 16 new or modified tests.

          -1 hadoop1.0. The patch failed to compile against the hadoop 1.0 profile.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7680//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611171/hbase-8369_v6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 16 new or modified tests. -1 hadoop1.0 . The patch failed to compile against the hadoop 1.0 profile. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7680//console This message is automatically generated.
          Enis Soztutar made changes -
          Attachment hbase-8369_v6.patch [ 12611171 ]
          Hide
          Enis Soztutar added a comment -

          Rebased patch

          Show
          Enis Soztutar added a comment - Rebased patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12610963/hbase-8369_v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 16 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7673//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610963/hbase-8369_v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 16 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7673//console This message is automatically generated.
          Enis Soztutar made changes -
          Attachment hbase-8369_v5.patch [ 12610963 ]
          Hide
          Enis Soztutar added a comment -

          latest patch from RB.

          Show
          Enis Soztutar added a comment - latest patch from RB.
          Hide
          Lars Hofhansl added a comment -

          I want that in 0.94.

          Show
          Lars Hofhansl added a comment - I want that in 0.94.
          Hide
          Enis Soztutar added a comment - - edited

          Uploaded the slides about this issue here: http://www.slideshare.net/enissoz/mapreduce-over-snapshots. It also contains some numbers for performaance comparison.
          Here are the raw numbers (in MB/s)

          Data size 6.6 G 13.2G 19.8 G 26.4 G
          StoreFileCount per region 3 6 9 12
          Scan(MB/s) 8.2 7.6 11.2 7.2
          SnapshotScan(MB/s) 60.8 59.5 55.3 46.7
          ScanMR(MB/s) 75.9 80.4 82.3 140.7
          SnapshotScanMR(MB/s) 198.6 275.6 311.6 329.4

          Main takeaway, seems to be the single scanner speeds improve 5-6x, from 11MB/s to 55MB/s. That is also half of raw disk speed (for a single disk). Do not read much into MR test speed improvements when store file increases. That is due to job launch costs taking relatively smaller percentage when data sizes increase.

          Show
          Enis Soztutar added a comment - - edited Uploaded the slides about this issue here: http://www.slideshare.net/enissoz/mapreduce-over-snapshots . It also contains some numbers for performaance comparison. Here are the raw numbers (in MB/s) Data size 6.6 G 13.2G 19.8 G 26.4 G StoreFileCount per region 3 6 9 12 Scan(MB/s) 8.2 7.6 11.2 7.2 SnapshotScan(MB/s) 60.8 59.5 55.3 46.7 ScanMR(MB/s) 75.9 80.4 82.3 140.7 SnapshotScanMR(MB/s) 198.6 275.6 311.6 329.4 Main takeaway, seems to be the single scanner speeds improve 5-6x, from 11MB/s to 55MB/s. That is also half of raw disk speed (for a single disk). Do not read much into MR test speed improvements when store file increases. That is due to job launch costs taking relatively smaller percentage when data sizes increase.
          stack made changes -
          Fix Version/s 0.96.0 [ 12324822 ]
          Hide
          Ted Yu added a comment -

          Can you attach the patch for QA run ?

          Thanks

          Show
          Ted Yu added a comment - Can you attach the patch for QA run ? Thanks
          Hide
          Enis Soztutar added a comment -

          Ok, finally, I have uploaded the first candidate patch for this issue. I have tested this with a real cluster of 10 nodes, and it seems ok. Have not done any performance testing yet.

          Let's get this to trunk first, then we can think about 0.96.1 and 0.94.x.

          Reviews welcome: https://reviews.apache.org/r/13958/

          Show
          Enis Soztutar added a comment - Ok, finally, I have uploaded the first candidate patch for this issue. I have tested this with a real cluster of 10 nodes, and it seems ok. Have not done any performance testing yet. Let's get this to trunk first, then we can think about 0.96.1 and 0.94.x. Reviews welcome: https://reviews.apache.org/r/13958/
          stack made changes -
          Fix Version/s 0.96.0 [ 12324822 ]
          Fix Version/s 0.95.2 [ 12320040 ]
          Hide
          Bryan Keller added a comment -

          FWIW, I am using this (slightly tweaked) in our production system now, and it has been a huge improvement to scan performance, up to 2.5x. The bottleneck is now in HRegion rather than region server. Setting up permissions in HDFS to allow these scans is the tricky part.

          Show
          Bryan Keller added a comment - FWIW, I am using this (slightly tweaked) in our production system now, and it has been a huge improvement to scan performance, up to 2.5x. The bottleneck is now in HRegion rather than region server. Setting up permissions in HDFS to allow these scans is the tricky part.
          Hide
          Lars Hofhansl added a comment -

          We're very interested in this (in the medium term), I'll see if we have cycles to work on this.

          Show
          Lars Hofhansl added a comment - We're very interested in this (in the medium term), I'll see if we have cycles to work on this.
          Hide
          Enis Soztutar added a comment -

          Sorry, no update just yet. My focus was on elsewhere. I have to merge the two patches, since Bryan's patch does not include all the parts from my patch. I also have to add some more tests to this.

          I don't think this is a requirement for 0.96.0 though. We can get this in later releases of 0.96.x.

          Show
          Enis Soztutar added a comment - Sorry, no update just yet. My focus was on elsewhere. I have to merge the two patches, since Bryan's patch does not include all the parts from my patch. I also have to add some more tests to this. I don't think this is a requirement for 0.96.0 though. We can get this in later releases of 0.96.x.
          Hide
          Matteo Bertozzi added a comment -

          Enis Soztutar any update on this? or the latest patch is still trunk-v3, 94-v5

          Show
          Matteo Bertozzi added a comment - Enis Soztutar any update on this? or the latest patch is still trunk-v3, 94-v5
          Hide
          Enis Soztutar added a comment -

          The tests are new and based on the TableInputFormat tests, which seemed a logical set of tests to mimic

          Sounds good.

          Most TODOs should be addressed except for how to deal with file permissions.

          Ok, let me work on that this week.

          The PB stuff would be better left to you I imagine, as are any additional tests you feel are required

          Sounds good.

          Show
          Enis Soztutar added a comment - The tests are new and based on the TableInputFormat tests, which seemed a logical set of tests to mimic Sounds good. Most TODOs should be addressed except for how to deal with file permissions. Ok, let me work on that this week. The PB stuff would be better left to you I imagine, as are any additional tests you feel are required Sounds good.
          Hide
          Bryan Keller added a comment -

          My main goal was to back port it to 0.94, but then I migrated that 0.94 patch to trunk as requested, which is how I ended up with my trunk patch. Most TODOs should be addressed except for how to deal with file permissions. I also added data locality to split assignment. The tests are new and based on the TableInputFormat tests, which seemed a logical set of tests to mimic. The PB stuff would be better left to you I imagine, as are any additional tests you feel are required. I prefer the name SnapshotInputFormat, for brevity and to differentiate from TableInputFormat, but that's a personal preference.

          Show
          Bryan Keller added a comment - My main goal was to back port it to 0.94, but then I migrated that 0.94 patch to trunk as requested, which is how I ended up with my trunk patch. Most TODOs should be addressed except for how to deal with file permissions. I also added data locality to split assignment. The tests are new and based on the TableInputFormat tests, which seemed a logical set of tests to mimic. The PB stuff would be better left to you I imagine, as are any additional tests you feel are required. I prefer the name SnapshotInputFormat, for brevity and to differentiate from TableInputFormat, but that's a personal preference.
          Hide
          Enis Soztutar added a comment -

          Bryan thanks for working on the patch. A couple of comments:

          • I see you renamed TableSnapshotIF to SnapshotIF. I would prefer TableSnapshot, since it is explicit that this is a snapshot of a table (although we only have table snapshots currently)
          • What happened to the tests in TestTableSnapshotInputFormat? I see you have some unit tests, but I failed to see any verification there.
          • In 0.94 writables are fine, but for trunk we would need to use PB for encoding the data for the wire. My first patch contains PB classes for SnapshotRegionSplits.
          • There were still TODO's to be addressed, especially on the testing side.
            Do you want to continue working on this? I can help with the testing and PB stuff next week.
          Show
          Enis Soztutar added a comment - Bryan thanks for working on the patch. A couple of comments: I see you renamed TableSnapshotIF to SnapshotIF. I would prefer TableSnapshot, since it is explicit that this is a snapshot of a table (although we only have table snapshots currently) What happened to the tests in TestTableSnapshotInputFormat? I see you have some unit tests, but I failed to see any verification there. In 0.94 writables are fine, but for trunk we would need to use PB for encoding the data for the wire. My first patch contains PB classes for SnapshotRegionSplits. There were still TODO's to be addressed, especially on the testing side. Do you want to continue working on this? I can help with the testing and PB stuff next week.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590700/HBASE-8369-trunk_v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590700/HBASE-8369-trunk_v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6209//console This message is automatically generated.
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94_v5.patch [ 12590699 ]
          Attachment HBASE-8369-trunk_v3.patch [ 12590700 ]
          Hide
          Bryan Keller added a comment -

          New patch only uses top node when assigning splits for better locality

          Show
          Bryan Keller added a comment - New patch only uses top node when assigning splits for better locality
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590450/HBASE-8369-trunk_v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590450/HBASE-8369-trunk_v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6189//console This message is automatically generated.
          Bryan Keller made changes -
          Attachment HBASE-8369-trunk_v2.patch [ 12590450 ]
          Hide
          Bryan Keller added a comment -

          Applied feedback

          Show
          Bryan Keller added a comment - Applied feedback
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94_v4.patch [ 12590449 ]
          Hide
          Bryan Keller added a comment -

          Applied feedback

          Show
          Bryan Keller added a comment - Applied feedback
          Hide
          Bryan Keller added a comment -

          The test is copied from TestTableInputFormatScanBase. I will upload a new patch tomorrow that fixes some of the issues mentioned.

          Show
          Bryan Keller added a comment - The test is copied from TestTableInputFormatScanBase. I will upload a new patch tomorrow that fixes some of the issues mentioned.
          Hide
          stack added a comment -

          Nice patch Bryan.

          Yeah, license missing. The test is copied from elsewhere? It works. Its good.

          Camelcase is a little unorthodox for hadoop configs. "hbase.SnapshotInputFormat.snapshot.name" We can change on commit.

          This is probably not needed "...implements Writable"... (just say no to Writables) Maybe Enis can fix that on next version.

          Yeah, would be grand to do away w/ Writables.

          This patch is great. Nice 'trick'.

          Show
          stack added a comment - Nice patch Bryan. Yeah, license missing. The test is copied from elsewhere? It works. Its good. Camelcase is a little unorthodox for hadoop configs. "hbase.SnapshotInputFormat.snapshot.name" We can change on commit. This is probably not needed "...implements Writable"... (just say no to Writables) Maybe Enis can fix that on next version. Yeah, would be grand to do away w/ Writables. This patch is great. Nice 'trick'.
          Hide
          Ted Yu added a comment -

          TestSnapshotInputFormatScan.java needs license header.

          Show
          Ted Yu added a comment - TestSnapshotInputFormatScan.java needs license header.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590328/HBASE-8369-trunk_v1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings).

          -1 lineLengths. The patch introduces lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590328/HBASE-8369-trunk_v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). -1 lineLengths . The patch introduces lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6183//console This message is automatically generated.
          Hide
          Bryan Keller added a comment -

          There is no straightforward solution regarding the file permissions TODO, as far as I know. The file link creation seems to increment a reference counter on the source file, thus you need write permission on the hbase data files. I tried scanning the snapshot without creating a temp table directory but was unsuccessful. If that worked, no file links would need to be created and in theory no write access would be required.

          Show
          Bryan Keller added a comment - There is no straightforward solution regarding the file permissions TODO, as far as I know. The file link creation seems to increment a reference counter on the source file, thus you need write permission on the hbase data files. I tried scanning the snapshot without creating a temp table directory but was unsuccessful. If that worked, no file links would need to be created and in theory no write access would be required.
          Hide
          Ted Yu added a comment -

          w.r.t. latest trunk patch:

          There are 3 TODO's.
          Indentation is off: two spaces for tab.
          The first item in comment below wasn't addressed:
          https://issues.apache.org/jira/browse/HBASE-8369?focusedCommentId=13636931&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13636931

          Please put the next patch on review board.

          Thanks Bryan.

          Show
          Ted Yu added a comment - w.r.t. latest trunk patch: There are 3 TODO's. Indentation is off: two spaces for tab. The first item in comment below wasn't addressed: https://issues.apache.org/jira/browse/HBASE-8369?focusedCommentId=13636931&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13636931 Please put the next patch on review board. Thanks Bryan.
          Bryan Keller made changes -
          Attachment HBASE-8369-trunk_v1.patch [ 12590328 ]
          Hide
          Bryan Keller added a comment -

          Patch for trunk

          Show
          Bryan Keller added a comment - Patch for trunk
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590324/HBASE-8369-0.94_v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6181//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590324/HBASE-8369-0.94_v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6181//console This message is automatically generated.
          Hide
          Bryan Keller added a comment -

          Sure, should only take a minute.

          Show
          Bryan Keller added a comment - Sure, should only take a minute.
          Hide
          stack added a comment -

          Bryan Keller Any chance of your making this work on trunk? Thanks sir.

          Show
          stack added a comment - Bryan Keller Any chance of your making this work on trunk? Thanks sir.
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94_v3.patch [ 12590324 ]
          Hide
          Bryan Keller added a comment -

          Test improvement

          Show
          Bryan Keller added a comment - Test improvement
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590321/HBASE-8369-0.94_v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6180//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590321/HBASE-8369-0.94_v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6180//console This message is automatically generated.
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94_v2.patch [ 12590321 ]
          Hide
          Bryan Keller added a comment -

          The unit tests were broken in my first patch, this should fix it.

          Show
          Bryan Keller added a comment - The unit tests were broken in my first patch, this should fix it.
          Hide
          Enis Soztutar added a comment -

          No worries. The pre-commit builds only work with trunk patches right now. We did not set it up for older branches.

          Show
          Enis Soztutar added a comment - No worries. The pre-commit builds only work with trunk patches right now. We did not set it up for older branches.
          Bryan Keller made changes -
          Affects Version/s 0.94.8 [ 12324145 ]
          Bryan Keller made changes -
          Release Note Snapshot map-reduce scan functionality for 0.94. This is new functionality only and should not affect other
          Bryan Keller made changes -
          Labels newbie
          Hide
          Bryan Keller added a comment -

          Sorry about that, my patch only works with 0.94, not trunk. I'm not sure how that works. Maybe I need to create a different issue?

          Show
          Bryan Keller added a comment - Sorry about that, my patch only works with 0.94, not trunk. I'm not sure how that works. Maybe I need to create a different issue?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590305/HBASE-8369-0.94.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6177//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590305/HBASE-8369-0.94.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6177//console This message is automatically generated.
          Bryan Keller made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Snapshot map-reduce scan functionality for 0.94. This is new functionality only and should not affect other
          Affects Version/s 0.94.8 [ 12324145 ]
          Labels newbie
          Hide
          Bryan Keller added a comment -

          This patch is based on Enis's patch and backported to 0.94. I added a couple of things, like factoring in data locality when assigning splits, and making the temp directory a parameter. The big downside so far to these patches is that the map-reduce job needs to be run by a user that has write access to files in the HBase data directory.

          Show
          Bryan Keller added a comment - This patch is based on Enis's patch and backported to 0.94. I added a couple of things, like factoring in data locality when assigning splits, and making the temp directory a parameter. The big downside so far to these patches is that the map-reduce job needs to be run by a user that has write access to files in the HBase data directory.
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94.patch [ 12590305 ]
          Hide
          Bryan Keller added a comment -

          Snapshot MR scan functionality for 0.94

          Show
          Bryan Keller added a comment - Snapshot MR scan functionality for 0.94
          Hide
          Enis Soztutar added a comment -

          Thanks for looking. This will correspond to the skeleton of the final patch. I want to add more testing and address the TODO's in the patch. But reviews are always welcome in any phase.

          Show
          Enis Soztutar added a comment - Thanks for looking. This will correspond to the skeleton of the final patch. I want to add more testing and address the TODO's in the patch. But reviews are always welcome in any phase.
          Hide
          Sergey Shelukhin added a comment -

          Skimmed the patch... should it be reviewed now or is just an example?

          Show
          Sergey Shelukhin added a comment - Skimmed the patch... should it be reviewed now or is just an example?
          Hide
          Nick Dimiduk added a comment -

          I like this looks of this patch, Enis Soztutar. I noticed this:

          +    Path tableDir = new Path(new Path("/users/enis"), UUID.randomUUID().toString()); //TODO change this, delete afterwards
          

          In mind of establishing the working directory, MR jobs working with HFiles should have something similar. Ie, the partitions file used by HFileOutputFormat really shouldn't assume tmp (see this comment), and I'll need a working directory to implement HBASE-8073. It would be nice to consolidate this concept into a single user-configured place. Separate JIRA?

          Show
          Nick Dimiduk added a comment - I like this looks of this patch, Enis Soztutar . I noticed this: + Path tableDir = new Path(new Path("/users/enis"), UUID.randomUUID().toString()); //TODO change this, delete afterwards In mind of establishing the working directory, MR jobs working with HFiles should have something similar. Ie, the partitions file used by HFileOutputFormat really shouldn't assume tmp (see this comment ), and I'll need a working directory to implement HBASE-8073 . It would be nice to consolidate this concept into a single user-configured place. Separate JIRA?
          Hide
          Ted Yu added a comment -

          Quickly went through the code.
          Will revisit when TODO's are addressed in next patch.

          +    public void close() throws IOException {
          +      if (this.scanner != null) {
          +        this.scanner.close();
          +      }
          +      if (region != null) {
          +        region.closeRegionOperation();
          +        region.close(true);
          

          I think region.closeRegionOperation() should be placed inside finally block because this.scanner.close() may throw IOException.

          +   * @param rootDir Root directory for HBase instance
          +   * @param conf
          ...
          +  public static HRegion createHRegion(final HRegionInfo info, final Path rootDir, final Path tableDir,
          +                                      final Configuration conf,
          

          Param tableDir missing in javadoc.

          -  private void restoreWALs() throws IOException {
          +  public void restoreWALs() throws IOException {
          

          I don't see the above method referenced in the patch. Why making it public ?

          Show
          Ted Yu added a comment - Quickly went through the code. Will revisit when TODO's are addressed in next patch. + public void close() throws IOException { + if ( this .scanner != null ) { + this .scanner.close(); + } + if (region != null ) { + region.closeRegionOperation(); + region.close( true ); I think region.closeRegionOperation() should be placed inside finally block because this.scanner.close() may throw IOException. + * @param rootDir Root directory for HBase instance + * @param conf ... + public static HRegion createHRegion( final HRegionInfo info, final Path rootDir, final Path tableDir, + final Configuration conf, Param tableDir missing in javadoc. - private void restoreWALs() throws IOException { + public void restoreWALs() throws IOException { I don't see the above method referenced in the patch. Why making it public ?
          Hide
          Nick Dimiduk added a comment -

          (1) seems like the best approach to me as well. We already ask the user to struggle with permissions issues between creation of and loading of HFiles for bulk loading. This is simplest to implement and consistent with existing tools.

          Show
          Nick Dimiduk added a comment - (1) seems like the best approach to me as well. We already ask the user to struggle with permissions issues between creation of and loading of HFiles for bulk loading. This is simplest to implement and consistent with existing tools.
          Hide
          Gary Helmling added a comment -

          Yes, agree (1) is the right approach for now. If new capabilities in HDFS show up, then we can reassess.

          Show
          Gary Helmling added a comment - Yes, agree (1) is the right approach for now. If new capabilities in HDFS show up, then we can reassess.
          Hide
          Enis Soztutar added a comment -

          Agreed that we need proper ACL's from HDFS to correctly manage this, but I am not holding my breath on it. Let's go with (1) for now to see the actual usage of this feature. We can improve later.

          Show
          Enis Soztutar added a comment - Agreed that we need proper ACL's from HDFS to correctly manage this, but I am not holding my breath on it. Let's go with (1) for now to see the actual usage of this feature. We can improve later.
          Hide
          Gary Helmling added a comment -

          For the short term, #1 (reading user has to be in the same group), seems fine as a restriction. In fact, given that the whole point is to read directly from HDFS, I don't see any other straightforward solution than relying on what controls HDFS gives us. In that vein, I'm really not in favor of #2. I just wanted to be clear on the restrictions.

          I don't see #3 as really adding anything that couldn't be coordinated through group perms and membership anyway.

          If HDFS ever implemented full ACLs for files, then I guess we could maintain separate ACLs for snapshot files, allowing read access by designated users.

          Show
          Gary Helmling added a comment - For the short term, #1 (reading user has to be in the same group), seems fine as a restriction. In fact, given that the whole point is to read directly from HDFS, I don't see any other straightforward solution than relying on what controls HDFS gives us. In that vein, I'm really not in favor of #2. I just wanted to be clear on the restrictions. I don't see #3 as really adding anything that couldn't be coordinated through group perms and membership anyway. If HDFS ever implemented full ACLs for files, then I guess we could maintain separate ACLs for snapshot files, allowing read access by designated users.
          Hide
          Enis Soztutar added a comment -

          So would this completely bypass security?

          Underlying hFiles are owned by the hbase user. For reading the files from MR files, a couple of options comes to my mind:
          (1) open the files directly from hdfs, in which case, the user has to be in the same group and have group permissions to read the files, or the user has to be the hbase user. Similar to current SSR.
          (2) have HBase servers open the file, and pass the file handlers to the MR job, similar to the approach in HDFS-347. This is obviously more involved and require a live HBase cluster.
          (3) Copy snapshot files as different user. This will only be applicable to exported snapshots. Copying data for in-place snapshots would be costly.
          any other ideas?

          Show
          Enis Soztutar added a comment - So would this completely bypass security? Underlying hFiles are owned by the hbase user. For reading the files from MR files, a couple of options comes to my mind: (1) open the files directly from hdfs, in which case, the user has to be in the same group and have group permissions to read the files, or the user has to be the hbase user. Similar to current SSR. (2) have HBase servers open the file, and pass the file handlers to the MR job, similar to the approach in HDFS-347 . This is obviously more involved and require a live HBase cluster. (3) Copy snapshot files as different user. This will only be applicable to exported snapshots. Copying data for in-place snapshots would be costly. any other ideas?
          Hide
          Gary Helmling added a comment -

          Yes, we do not need to initCredentials, since we are not talking to any hbase server.

          So would this completely bypass security? I also want this functionality for certain use cases, we should just be clear on this caveat.

          Show
          Gary Helmling added a comment - Yes, we do not need to initCredentials, since we are not talking to any hbase server. So would this completely bypass security? I also want this functionality for certain use cases, we should just be clear on this caveat.
          Hide
          Enis Soztutar added a comment -

          in general I'm against having another way to direct access the data, since it means that you're giving up on optimizing the main one.

          Conceptually, this is similar to the short circuit reads for HDFS. I agree that we should not need these kinds of optimizations, since in the long term, it will be impossible to implement QoS for IO if you give direct access to local files (for SSR) / hdfs files (for snapshot).

          if the final implementation will be like this one using the HRegion object, I'll be +1.

          Yes, that is the plan.

          Are the initCredentials modifications in TableMapReduceUtil required for the scope of this patch?

          Yes, we do not need to initCredentials, since we are not talking to any hbase server.

          Show
          Enis Soztutar added a comment - in general I'm against having another way to direct access the data, since it means that you're giving up on optimizing the main one. Conceptually, this is similar to the short circuit reads for HDFS. I agree that we should not need these kinds of optimizations, since in the long term, it will be impossible to implement QoS for IO if you give direct access to local files (for SSR) / hdfs files (for snapshot). if the final implementation will be like this one using the HRegion object, I'll be +1. Yes, that is the plan. Are the initCredentials modifications in TableMapReduceUtil required for the scope of this patch? Yes, we do not need to initCredentials, since we are not talking to any hbase server.
          Hide
          Jean-Marc Spaggiari added a comment -

          I very like the idea.

          Are the initCredentials modifications in TableMapReduceUtil required for the scope of this patch? Or they are coming from another scope?

          Show
          Jean-Marc Spaggiari added a comment - I very like the idea. Are the initCredentials modifications in TableMapReduceUtil required for the scope of this patch? Or they are coming from another scope?
          Hide
          Matteo Bertozzi added a comment -

          in general I'm against having another way to direct access the data, since it means that you're giving up on optimizing the main one.

          but if the final implementation will be like this one using the HRegion object, I'll be +1.

          Show
          Matteo Bertozzi added a comment - in general I'm against having another way to direct access the data, since it means that you're giving up on optimizing the main one. but if the final implementation will be like this one using the HRegion object, I'll be +1.
          Hide
          Enis Soztutar added a comment -

          Yes, this can go into 0.94 once finished. No fundamental changes to the core path.

          Show
          Enis Soztutar added a comment - Yes, this can go into 0.94 once finished. No fundamental changes to the core path.
          Enis Soztutar made changes -
          Field Original Value New Value
          Attachment hbase-8369_v0.patch [ 12579216 ]
          Hide
          Enis Soztutar added a comment -

          Attaching a prototype patch which seems to work for the test case. There are a lot of TODO's in the patch, and needs some polishing. Also no test done on actual clusters. But it should be enough to give an idea to the approach.

          Show
          Enis Soztutar added a comment - Attaching a prototype patch which seems to work for the test case. There are a lot of TODO's in the patch, and needs some polishing. Also no test done on actual clusters. But it should be enough to give an idea to the approach.
          Hide
          Lars Hofhansl added a comment -

          That sounds awesome. All new code, I assume, so I'd be OK with this in 0.94 as well (but I don't have a strong preference).

          Show
          Lars Hofhansl added a comment - That sounds awesome. All new code, I assume, so I'd be OK with this in 0.94 as well (but I don't have a strong preference).
          Enis Soztutar created issue -

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Enis Soztutar
            • Votes:
              2 Vote for this issue
              Watchers:
              37 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development