Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3672

Expose disk-location information for blocks to enable better scheduling

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, HDFS exposes on which datanodes a block resides, which allows clients to make scheduling decisions for locality and load balancing. Extending this to also expose on which disk on a datanode a block resides would enable even better scheduling, on a per-disk rather than coarse per-datanode basis.

      This API would likely look similar to Filesystem#getFileBlockLocations, but also involve a series of RPCs to the responsible datanodes to determine disk ids.

      1. design-doc-v1.pdf
        73 kB
        Andrew Wang
      2. design-doc-v2.pdf
        73 kB
        Andrew Wang
      3. hdfs-3672-1.patch
        37 kB
        Andrew Wang
      4. hdfs-3672-10.patch
        68 kB
        Andrew Wang
      5. hdfs-3672-11.patch
        68 kB
        Andrew Wang
      6. hdfs-3672-12.patch
        69 kB
        Andrew Wang
      7. hdfs-3672-2.patch
        48 kB
        Andrew Wang
      8. hdfs-3672-3.patch
        49 kB
        Andrew Wang
      9. hdfs-3672-4.patch
        52 kB
        Andrew Wang
      10. hdfs-3672-5.patch
        52 kB
        Andrew Wang
      11. hdfs-3672-6.patch
        60 kB
        Andrew Wang
      12. hdfs-3672-7.patch
        61 kB
        Andrew Wang
      13. hdfs-3672-8.patch
        61 kB
        Andrew Wang
      14. hdfs-3672-9.patch
        68 kB
        Andrew Wang

        Issue Links

          Activity

          Hide
          Andy Isaacson added a comment -

          also involve a series of RPCs to the responsible datanodes to determine disk ids.

          Keep in mind that this should be one RPC per DN rather than one RPC per block. If you have a dozen blocks on a single DN, it's a big and easy performance win to pass a vector of blocks and return a vector of locations, rather than asking for each block location in series.

          Show
          Andy Isaacson added a comment - also involve a series of RPCs to the responsible datanodes to determine disk ids. Keep in mind that this should be one RPC per DN rather than one RPC per block. If you have a dozen blocks on a single DN, it's a big and easy performance win to pass a vector of blocks and return a vector of locations, rather than asking for each block location in series.
          Hide
          Andrew Wang added a comment -

          First hack at this. I still want to add some more tests, but I think the design is about right.

          This essentially provides the same API as DFS#getFileBlockLocations, except it returns a subclass of BlockLocation, HdfsBlockLocation, which has an additional array of {{byte}}s which is an opaque identifier that specifies on which disk on a datanode the block resides.

          Currently, this ID is mapped to the index of the HDFS data directory containing the block file (e.g. /data/1, /data/2). This can thus change across reboots/config changes, and clients need to be prepared to requery anyway since blocks do move around as part of normal operation.

          I'd like to perhaps split the new DFS#getFileHdfsBlockLocations function into a call to DFS#getFileBlockLocations to do the NN query to get block locations, and then pass these to some other call (DFS#getDiskIds?), since this would let you do multiple calls to DFS#getFileBlockLocations and then do one series of RPCs to the datanodes. But, I need to figure out how to change the BlockLocation[] back into a LocatedBlock[].

          It might also be nice to do the DN RPCs in parallel, since right now it's serial setup, query, teardown for each DN.

          Show
          Andrew Wang added a comment - First hack at this. I still want to add some more tests, but I think the design is about right. This essentially provides the same API as DFS#getFileBlockLocations , except it returns a subclass of BlockLocation , HdfsBlockLocation , which has an additional array of {{byte}}s which is an opaque identifier that specifies on which disk on a datanode the block resides. Currently, this ID is mapped to the index of the HDFS data directory containing the block file (e.g. /data/1, /data/2). This can thus change across reboots/config changes, and clients need to be prepared to requery anyway since blocks do move around as part of normal operation. I'd like to perhaps split the new DFS#getFileHdfsBlockLocations function into a call to DFS#getFileBlockLocations to do the NN query to get block locations, and then pass these to some other call ( DFS#getDiskIds ?), since this would let you do multiple calls to DFS#getFileBlockLocations and then do one series of RPCs to the datanodes. But, I need to figure out how to change the BlockLocation[] back into a LocatedBlock[] . It might also be nice to do the DN RPCs in parallel, since right now it's serial setup, query, teardown for each DN.
          Hide
          Suresh Srinivas added a comment -

          Currently, HDFS exposes on which datanodes a block resides, which allows clients to make scheduling decisions for locality and load balancing. Extending this to also expose on which disk on a datanode a block resides would enable even better scheduling, on a per-disk rather than coarse per-datanode basis.

          I am not sure I understand your motivation. I can see Namenode understanding disk/storages in Datanode for improving the scheduling. I am not sure why clients should be exposed to this information. Can you describe use cases more clearly. Also please attach a short writeup/design that summarizes the motivation that captures these discussion as design.

          As regards NN knowing about this information, that is one of the motivations of HDFS-2832. If each storage volume that corresponds to a disk on Datanode has a separate storage ID, NN gets block reports and other stats per disk.

          Show
          Suresh Srinivas added a comment - Currently, HDFS exposes on which datanodes a block resides, which allows clients to make scheduling decisions for locality and load balancing. Extending this to also expose on which disk on a datanode a block resides would enable even better scheduling, on a per-disk rather than coarse per-datanode basis. I am not sure I understand your motivation. I can see Namenode understanding disk/storages in Datanode for improving the scheduling. I am not sure why clients should be exposed to this information. Can you describe use cases more clearly. Also please attach a short writeup/design that summarizes the motivation that captures these discussion as design. As regards NN knowing about this information, that is one of the motivations of HDFS-2832 . If each storage volume that corresponds to a disk on Datanode has a separate storage ID, NN gets block reports and other stats per disk.
          Hide
          Todd Lipcon added a comment -

          Hey Suresh. I'll try to answer a few of your questions above from the perspective of HBase and MR.

          The information is useful for clients when they have several tasks to complete which involve reading blocks on a given DataNode, but the order of the tasks doesn't matter. One example is in HBase: we currently have several compaction threads running inside the region server, and those compaction threads do a lot of IO. HBase could do a better job of scheduling the compactions if it knew which blocks were actually on the same underlying disk. If two blocks are on separate disks, you can get 2x the throughput by reading them at the same time, whereas if they're on the same disk, it would be better to schedule them one after the other.

          You can imagine this feature also being used at some point by MapReduce. Consider a map-only job which reads hundreds of blocks located on the same DN. When the associated NodeManager asks for a task to run, the application master can look at the already-running tasks on that node, understand which disks are currently not being read, and schedule a task which accesses an idle disk. Another MR use case is to keep track of which local disks the various tasks are reading from, and de-prioritize those disks when choosing which local disk on which to spill map output to avoid read-write contention.

          The other motivation is to eventually correlate these disk IDs with statistics/metrics within advanced clients. In HBase, for example, we currently always read from the local replica if it is available. If, however, one of the local disks is going bad, this can really impact latency, and we'd rather read a remote replica instead - the network latency is much less than the cost of accessing failing media. But we need to be able to look at a block and know which disk it's on in order to track these statistics.

          The overall guiding motivation is that we looked at heavily loaded clusters with 12 disks and found that we were suffering from pretty significant "hotspotting" of disk access. During any given second, about two thirds of the disks tend to be at 100% utilization while the others are basically idle. Using lsof to look at the number of open blocks on each data volume showed the same hotspotting: some disks had multiple tasks reading data whereas others had none. With a bit more client visibility into block<->disk correspondence, we can try to improve this.

          As regards NN knowing about this information, that is one of the motivations of HDFS-2832. If each storage volume that corresponds to a disk on Datanode has a separate storage ID, NN gets block reports and other stats per disk.

          I agree HDFS-2832 will really be useful for this. But it's a larger restructuring with much bigger implications. This JIRA is just about adding a new API which exposes some information that's already available. We explicitly chose to make the "disk ID" opaque in the proposed API – that way when HDFS-2832 arrives, we can really easily switch over the implementation to be based on the storage IDs without breaking users of the API.

          Show
          Todd Lipcon added a comment - Hey Suresh. I'll try to answer a few of your questions above from the perspective of HBase and MR. The information is useful for clients when they have several tasks to complete which involve reading blocks on a given DataNode, but the order of the tasks doesn't matter. One example is in HBase: we currently have several compaction threads running inside the region server, and those compaction threads do a lot of IO. HBase could do a better job of scheduling the compactions if it knew which blocks were actually on the same underlying disk. If two blocks are on separate disks, you can get 2x the throughput by reading them at the same time, whereas if they're on the same disk, it would be better to schedule them one after the other. You can imagine this feature also being used at some point by MapReduce. Consider a map-only job which reads hundreds of blocks located on the same DN. When the associated NodeManager asks for a task to run, the application master can look at the already-running tasks on that node, understand which disks are currently not being read, and schedule a task which accesses an idle disk. Another MR use case is to keep track of which local disks the various tasks are reading from, and de-prioritize those disks when choosing which local disk on which to spill map output to avoid read-write contention. The other motivation is to eventually correlate these disk IDs with statistics/metrics within advanced clients. In HBase, for example, we currently always read from the local replica if it is available. If, however, one of the local disks is going bad, this can really impact latency, and we'd rather read a remote replica instead - the network latency is much less than the cost of accessing failing media. But we need to be able to look at a block and know which disk it's on in order to track these statistics. The overall guiding motivation is that we looked at heavily loaded clusters with 12 disks and found that we were suffering from pretty significant "hotspotting" of disk access. During any given second, about two thirds of the disks tend to be at 100% utilization while the others are basically idle. Using lsof to look at the number of open blocks on each data volume showed the same hotspotting: some disks had multiple tasks reading data whereas others had none. With a bit more client visibility into block<->disk correspondence, we can try to improve this. As regards NN knowing about this information, that is one of the motivations of HDFS-2832 . If each storage volume that corresponds to a disk on Datanode has a separate storage ID, NN gets block reports and other stats per disk. I agree HDFS-2832 will really be useful for this. But it's a larger restructuring with much bigger implications. This JIRA is just about adding a new API which exposes some information that's already available. We explicitly chose to make the "disk ID" opaque in the proposed API – that way when HDFS-2832 arrives, we can really easily switch over the implementation to be based on the storage IDs without breaking users of the API.
          Hide
          Todd Lipcon added a comment -

          A few comments on the initial patch:

          • I definitely think we need to separate the API for getting disk locations so that you can pass a list of LocatedBlocks. For some of the above-mentioned use cases (eg MR scheduler), you need to get the locations for many files, and you don't want to have to do a fan-out round for each of the files separately.
          • Per above, I agree that we should make the disk IDs opaque. But a single byte seems short-sighted. Let's expose them as an interface "DiskId" which can be entirely devoid of getters for now – its only contract would be that it properly implements comparison, equals, and hashcode, so users can use them to aggregate stats by disk, etc. Internally we can implement it with a wrapper around a byte[].
          • In the protobuf response, given the above, I think we should do something like:
            message Response {
              repeated bytes diskIds;
              repeated uint32 diskIndexes; // for each block, pointers into above diskId array, or MAX_INT to indicate blocks not found
            }
            
          • Per above, need to figure out what you're doing for blocks that aren't found on a given DN. We also need to specify in the JavaDoc what happens in the response for DNs which don't respond. I think it's OK that the result would have some "unknown" - it's likely if any of the DNs are down.
          • Doing the fan-out RPC does seem important. Unfortunately it might be tricky, so I agree we should do it in a separate follow-up optimization.
          Show
          Todd Lipcon added a comment - A few comments on the initial patch: I definitely think we need to separate the API for getting disk locations so that you can pass a list of LocatedBlocks. For some of the above-mentioned use cases (eg MR scheduler), you need to get the locations for many files, and you don't want to have to do a fan-out round for each of the files separately. Per above, I agree that we should make the disk IDs opaque. But a single byte seems short-sighted. Let's expose them as an interface "DiskId" which can be entirely devoid of getters for now – its only contract would be that it properly implements comparison, equals, and hashcode, so users can use them to aggregate stats by disk, etc. Internally we can implement it with a wrapper around a byte[]. In the protobuf response, given the above, I think we should do something like: message Response { repeated bytes diskIds; repeated uint32 diskIndexes; // for each block, pointers into above diskId array, or MAX_INT to indicate blocks not found } Per above, need to figure out what you're doing for blocks that aren't found on a given DN. We also need to specify in the JavaDoc what happens in the response for DNs which don't respond. I think it's OK that the result would have some "unknown" - it's likely if any of the DNs are down. Doing the fan-out RPC does seem important. Unfortunately it might be tricky, so I agree we should do it in a separate follow-up optimization.
          Hide
          Arun C Murthy added a comment -

          Todd - the possibilities are intriguing. However, it still seems we have a lot of work ahead of us to realize the potential.

          So, before we jump in and add new public apis, can you/Andrew provide a study or better, an prototype, which proves that this api would be actually useful in some way downstream?

          Maybe we can look at some performance numbers if you have them? Or, maybe we can work on a branch where we have a clearly defined goal (say: improve HBase or improve MR scheduling) and we achieve that before we commit to this?

          Show
          Arun C Murthy added a comment - Todd - the possibilities are intriguing. However, it still seems we have a lot of work ahead of us to realize the potential. So, before we jump in and add new public apis, can you/Andrew provide a study or better, an prototype, which proves that this api would be actually useful in some way downstream? Maybe we can look at some performance numbers if you have them? Or, maybe we can work on a branch where we have a clearly defined goal (say: improve HBase or improve MR scheduling) and we achieve that before we commit to this?
          Hide
          Todd Lipcon added a comment -

          I understand the reticence to add new APIs without "proof" that they're useful. But it's a bit of a chicken-egg situation here. It's difficult for downstream projects to build against a branch or an uncommitted patch.

          One experiment I ran that I can report on is as follows (you may remember this from the HDFS Performance talk I gave prior to Hadoop Summit):

          • Test setup: 12x2T disks on a pseudo-distributed HDFS. Write 24 files, each ~10GB to the the local HDFS cluster.
          • Read throughput test (no scheduling): Start a "hadoop fs -cat /fileN > /dev/null" for all 24 files. Got ~700M/sec
          • Read throughput test (simulated "scheduling"): Run 12 threads, one per data directory: find /data/N -name blk* -exec cat {} \;. Got ~900M/sec (30% improvement)

          In each case, I ran "iostat -dxm 1" to collect disk stats on a 1-second interval. In the "unscheduled" test, each sample showed about 8 disks at 100% utilization and 4 disks at 0% utilization. In the "scheduled" test, all disks remain at 100% utilization.

          While the above experiment is obviously more tightly controlled than a real workload, it does show that you need to have scheduling to use all of the disks to their full potential.

          Would a fair compromise be to mark the new API as @InterfaceAudience.Unstable so that people understand it's experimental and may change or disappear in future releases? Given that the use cases for it are performance enhancement only, it seems like people could simply wrap in a try/catch so that, if the API ends up throwing an UnsupportedOperationException in a future version, it would just fall back to the slower un-scheduled path.

          Show
          Todd Lipcon added a comment - I understand the reticence to add new APIs without "proof" that they're useful. But it's a bit of a chicken-egg situation here. It's difficult for downstream projects to build against a branch or an uncommitted patch. One experiment I ran that I can report on is as follows (you may remember this from the HDFS Performance talk I gave prior to Hadoop Summit): Test setup: 12x2T disks on a pseudo-distributed HDFS. Write 24 files, each ~10GB to the the local HDFS cluster. Read throughput test (no scheduling): Start a "hadoop fs -cat /fileN > /dev/null" for all 24 files. Got ~700M/sec Read throughput test (simulated "scheduling"): Run 12 threads, one per data directory: find /data/N -name blk* -exec cat {} \;. Got ~900M/sec (30% improvement) In each case, I ran "iostat -dxm 1" to collect disk stats on a 1-second interval. In the "unscheduled" test, each sample showed about 8 disks at 100% utilization and 4 disks at 0% utilization. In the "scheduled" test, all disks remain at 100% utilization. While the above experiment is obviously more tightly controlled than a real workload, it does show that you need to have scheduling to use all of the disks to their full potential. Would a fair compromise be to mark the new API as @InterfaceAudience.Unstable so that people understand it's experimental and may change or disappear in future releases? Given that the use cases for it are performance enhancement only, it seems like people could simply wrap in a try/catch so that, if the API ends up throwing an UnsupportedOperationException in a future version, it would just fall back to the slower un-scheduled path.
          Hide
          Suresh Srinivas added a comment -

          Todd, thanks for describing the intent and use cases in detail. These APIs for experimentation sort of makes sense.

          However, I want to highlight the following:
          There are multiple daemons reading from/writing to disk in Hadoop. Datanodes, MapReduce shuffle and possibly HBase short circuit reads. Given this, a view given from Datanode alone would not reflect the complete reality. Also given there are many applications on HDFS that are reading/writing to disks as well, the view of a single application (in this case HBase or MapReduce) is also incomplete. While an application can make locally optimized scheduling decisions, it still may not result in better scheduling. The improvements one sees is going to best-effort and would be unpredictable.

          Show
          Suresh Srinivas added a comment - Todd, thanks for describing the intent and use cases in detail. These APIs for experimentation sort of makes sense. However, I want to highlight the following: There are multiple daemons reading from/writing to disk in Hadoop. Datanodes, MapReduce shuffle and possibly HBase short circuit reads. Given this, a view given from Datanode alone would not reflect the complete reality. Also given there are many applications on HDFS that are reading/writing to disks as well, the view of a single application (in this case HBase or MapReduce) is also incomplete. While an application can make locally optimized scheduling decisions, it still may not result in better scheduling. The improvements one sees is going to best-effort and would be unpredictable.
          Hide
          Todd Lipcon added a comment -

          Hey Suresh. I agree with all your points above.

          One thing that's been talked about in the past is to consider using a local-only block pool for MR temp storage. That would at least get one of the other major disk users going through the same code paths.

          The other idea we're thinking about is to expose disk statistics such as current queue length and utilization for each local disk, up via the OS. We're still running some experiments locally, but our assumption is that, within short time-scales (~0.5 seconds), the lagging 0.5 second usage is a reasonably good predictor of the next 0.5 seconds, given most Hadoop-style access is of 100MB+ chunks of data.

          So, are you OK with introducing these as Unstable-annotated APIs, perhaps with an extra JavaDoc warning that they are explicitly experimental and may cease to exist in the future?

          Show
          Todd Lipcon added a comment - Hey Suresh. I agree with all your points above. One thing that's been talked about in the past is to consider using a local-only block pool for MR temp storage. That would at least get one of the other major disk users going through the same code paths. The other idea we're thinking about is to expose disk statistics such as current queue length and utilization for each local disk, up via the OS. We're still running some experiments locally, but our assumption is that, within short time-scales (~0.5 seconds), the lagging 0.5 second usage is a reasonably good predictor of the next 0.5 seconds, given most Hadoop-style access is of 100MB+ chunks of data. So, are you OK with introducing these as Unstable-annotated APIs, perhaps with an extra JavaDoc warning that they are explicitly experimental and may cease to exist in the future?
          Hide
          Arun C Murthy added a comment -

          It's difficult for downstream projects to build against a branch or an uncommitted patch.

          Umm? We could just write up the patches and test them either ad-hoc or via a dev-branch and then commit?

          Show
          Arun C Murthy added a comment - It's difficult for downstream projects to build against a branch or an uncommitted patch. Umm? We could just write up the patches and test them either ad-hoc or via a dev-branch and then commit?
          Hide
          Suresh Srinivas added a comment -

          are you OK with introducing these as Unstable-annotated APIs

          My concern is, if this is used in MapReduce it might be okay. But once it starts getting used in other downstream projects removing this would be a challenge.

          Show
          Suresh Srinivas added a comment - are you OK with introducing these as Unstable-annotated APIs My concern is, if this is used in MapReduce it might be okay. But once it starts getting used in other downstream projects removing this would be a challenge.
          Hide
          Todd Lipcon added a comment -

          My concern is, if this is used in MapReduce it might be okay. But once it starts getting used in other downstream projects removing this would be a challenge

          That's the whole point of the Unstable API annotation, isn't it? We can change the API and downstream projects should accept that.

          What if we explicitly also mark it as throws UnsupportedOperationException. So users of the API would be encouraged to catch this exception.

          Since it's a performance API, it's always going to be used in an "advisory" role anyway – any use of it could safely fall back to the non-optimized code path.

          I'd be OK compromising and calling it LimitedPrivate(MapReduce), but I know that at least one of our customers is interested in using an API like this as well. Unfortunately I can't give too many details on their use case due to NDA (lame, I know), but I just wanted to provide a data point that there is demand for this "in the wild".

          We're still running some experiments locally, but our assumption is that, within short time-scales (~0.5 seconds), the lagging 0.5 second usage is a reasonably good predictor of the next 0.5 seconds, given most Hadoop-style access is of 100MB+ chunks of data

          I ran a simple experiment yesterday on one of our test clusters. The cluster is doing a mix of workloads - I think at the time it was running a Hive benchmark suite on ~100 nodes. So, it was under load, but not 100% utilization.

          On all of the nodes, I collected /proc/diskstats once a second for an hour. I then removed all disk samples where there was 0 load on the disk, since that was just periods of inactivity between test runs. Then, I took the disk utilization at each sample, and appended it as a column to the data from the previous second. I loaded the data into R and constructed a few simple models for each second's disk utilization on a given disk based on the previous second's disk statistics.

          Linear model using only the current utilization to predict the next second's utilization:

          > m.linear.on.only.util <- lm(next_sec_util ~ ms_doing_io, data=d)
          

          (this would correspond to a trivial model like "assume that if a disk is busy now, it will still be busy in the next second")

          Linear model using all of the current statistics (queue length, read/write mix, etc) to predict next second's util:

          > m.linear <- lm(next_sec_util ~ ., data=d)
          

          Quadratic model using all of the current statistics, and their interaction terms, to predict next second's util:

          > d.sample.200k <- d[sample(nrow(d), size=200000),]
          > m.quadratic <- lm(next_sec_util ~ .:., data=d.sample.200k)
          

          Random forest (a decision-tree based model, trained using only 1% of the data, since it's slow):

          > d.sample.10k <- d[sample(nrow(d), size=10000),]
          > m.rf <- randomForest(next_sec_util~., data=d.sample.10k)
          

          The models fared as follows:

          Model Percent variance explained
          Linear on only utilization 58.4%
          Linear 70.6%
          Quadratic 73.9%
          Random forest 76.9%

          Certainly the above analysis is just one workload, and one in which the disks are not being particularly slammed. But, it does show that looking at a disk's current status is a reasonable predictor of status over the next second on a typical MR cluster.

          Show
          Todd Lipcon added a comment - My concern is, if this is used in MapReduce it might be okay. But once it starts getting used in other downstream projects removing this would be a challenge That's the whole point of the Unstable API annotation, isn't it? We can change the API and downstream projects should accept that. What if we explicitly also mark it as throws UnsupportedOperationException . So users of the API would be encouraged to catch this exception. Since it's a performance API, it's always going to be used in an "advisory" role anyway – any use of it could safely fall back to the non-optimized code path. I'd be OK compromising and calling it LimitedPrivate(MapReduce), but I know that at least one of our customers is interested in using an API like this as well. Unfortunately I can't give too many details on their use case due to NDA (lame, I know), but I just wanted to provide a data point that there is demand for this "in the wild". We're still running some experiments locally, but our assumption is that, within short time-scales (~0.5 seconds), the lagging 0.5 second usage is a reasonably good predictor of the next 0.5 seconds, given most Hadoop-style access is of 100MB+ chunks of data I ran a simple experiment yesterday on one of our test clusters. The cluster is doing a mix of workloads - I think at the time it was running a Hive benchmark suite on ~100 nodes. So, it was under load, but not 100% utilization. On all of the nodes, I collected /proc/diskstats once a second for an hour. I then removed all disk samples where there was 0 load on the disk, since that was just periods of inactivity between test runs. Then, I took the disk utilization at each sample, and appended it as a column to the data from the previous second. I loaded the data into R and constructed a few simple models for each second's disk utilization on a given disk based on the previous second's disk statistics. Linear model using only the current utilization to predict the next second's utilization: > m.linear.on.only.util <- lm(next_sec_util ~ ms_doing_io, data=d) (this would correspond to a trivial model like "assume that if a disk is busy now, it will still be busy in the next second") Linear model using all of the current statistics (queue length, read/write mix, etc) to predict next second's util: > m.linear <- lm(next_sec_util ~ ., data=d) Quadratic model using all of the current statistics, and their interaction terms, to predict next second's util: > d.sample.200k <- d[sample(nrow(d), size=200000),] > m.quadratic <- lm(next_sec_util ~ .:., data=d.sample.200k) Random forest (a decision-tree based model, trained using only 1% of the data, since it's slow): > d.sample.10k <- d[sample(nrow(d), size=10000),] > m.rf <- randomForest(next_sec_util~., data=d.sample.10k) The models fared as follows: Model Percent variance explained Linear on only utilization 58.4% Linear 70.6% Quadratic 73.9% Random forest 76.9% Certainly the above analysis is just one workload, and one in which the disks are not being particularly slammed. But, it does show that looking at a disk's current status is a reasonable predictor of status over the next second on a typical MR cluster.
          Hide
          Tom White added a comment -

          A small comment on the patch - how about "DiskBlockLocation" instead of "HdfsBlockLocation" (and similarly for the method name) since other FileSystem implementations could implement this too.

          Show
          Tom White added a comment - A small comment on the patch - how about "DiskBlockLocation" instead of "HdfsBlockLocation" (and similarly for the method name) since other FileSystem implementations could implement this too.
          Hide
          Todd Lipcon added a comment -

          A small comment on the patch - how about "DiskBlockLocation" instead of "HdfsBlockLocation" (and similarly for the method name) since other FileSystem implementations could implement this too.

          Given it's an experimental API which we might want to change a few times until we get it right, I'd rather keep the scope to DistributedFileSystem at this point, rather than trying to add a generic API. If people come up with a way to generalize it to other file systems, let's add that as a future enhancement. Does that sound reasonable?

          Show
          Todd Lipcon added a comment - A small comment on the patch - how about "DiskBlockLocation" instead of "HdfsBlockLocation" (and similarly for the method name) since other FileSystem implementations could implement this too. Given it's an experimental API which we might want to change a few times until we get it right, I'd rather keep the scope to DistributedFileSystem at this point, rather than trying to add a generic API. If people come up with a way to generalize it to other file systems, let's add that as a future enhancement. Does that sound reasonable?
          Hide
          Andrew Wang added a comment -

          Newer version of the patch, addressing Todd and Tom's comments.

          One unfortunate bit is that to split the NN and DN RPCs, I needed to add a subclass of BlockLocation that hides a corresponding LocatedBlock. An array of these HdfsBlockLocation is now returned by DFS#getFileBlockLocations, and downcasted in DFSClient#getDiskBlockLocations to retrieve the LocatedBlock.

          I took Todd's advice about Integer.MAX_VALUE for denoting invalid blocks, but turn it a boolean accessible via DiskId#isValid before it's shown to consumers of FileSystem.

          Finally, I already renamed things to DiskBlockLocation based on Tom's comment, and reused the name HdfsBlockLocation for my LocatedBlock wrapper class. I can re-rename both of these if we don't like them.

          Show
          Andrew Wang added a comment - Newer version of the patch, addressing Todd and Tom's comments. One unfortunate bit is that to split the NN and DN RPCs, I needed to add a subclass of BlockLocation that hides a corresponding LocatedBlock . An array of these HdfsBlockLocation is now returned by DFS#getFileBlockLocations , and downcasted in DFSClient#getDiskBlockLocations to retrieve the LocatedBlock . I took Todd's advice about Integer.MAX_VALUE for denoting invalid blocks, but turn it a boolean accessible via DiskId#isValid before it's shown to consumers of FileSystem . Finally, I already renamed things to DiskBlockLocation based on Tom's comment, and reused the name HdfsBlockLocation for my LocatedBlock wrapper class. I can re-rename both of these if we don't like them.
          Hide
          Tom White added a comment -

          Todd, if the feature is scoped to DistributedFileSystem then an alternative would be not to expose it in FileSystem at all at this point, so users have to cast to DistributedFileSystem. That or rename it as I originally suggested. The point is to avoid introducing HDFS-isms in the FileSystem interface.

          Show
          Tom White added a comment - Todd, if the feature is scoped to DistributedFileSystem then an alternative would be not to expose it in FileSystem at all at this point, so users have to cast to DistributedFileSystem. That or rename it as I originally suggested. The point is to avoid introducing HDFS-isms in the FileSystem interface.
          Hide
          Todd Lipcon added a comment -

          Yep, I agree – the user should have to downcast to DistributedFileSystem to get at this API. I didn't notice that the proposed patch changed FileSystem itself.

          Show
          Todd Lipcon added a comment - Yep, I agree – the user should have to downcast to DistributedFileSystem to get at this API. I didn't notice that the proposed patch changed FileSystem itself.
          Hide
          Andrew Purtell added a comment -

          This can also be viewed as the query side of an API for device aware block placement. Consider renaming things like DiskBlockLocation to BlockDeviceLocation for such consideration. One attribute of BlockDeviceLocation should be enum Type. Then follow up work could be adding an API for specifying block placement according to BlockDeviceLocation.Type.

          This would enable use cases like creating certain files on BlockDeviceLocation.Type.FLASH where they might be frequently accessed, especially for random reads; and others (and by default) on BlockDeviceLocation.Type.DISK for selecting spinning media.

          Please pardon the interruption.

          Show
          Andrew Purtell added a comment - This can also be viewed as the query side of an API for device aware block placement. Consider renaming things like DiskBlockLocation to BlockDeviceLocation for such consideration. One attribute of BlockDeviceLocation should be enum Type . Then follow up work could be adding an API for specifying block placement according to BlockDeviceLocation.Type . This would enable use cases like creating certain files on BlockDeviceLocation.Type.FLASH where they might be frequently accessed, especially for random reads; and others (and by default) on BlockDeviceLocation.Type.DISK for selecting spinning media. Please pardon the interruption.
          Hide
          Andrew Wang added a comment -

          Nuked the method from FileSystem, and did some small cleanups. I also added InterfaceStability.Unstable annotations on the new classes in hadoop-common.

          Still open to renaming suggestions, if desired. I'd like to keep them named .*BlockLocation for consistency, because they are subclasses of BlockLocation.

          Perhaps HdfsBlockLocation for the DistributedFileSystem API, and LocatedBlockLocation for the internal LocatedBlock wrapper?

          Show
          Andrew Wang added a comment - Nuked the method from FileSystem , and did some small cleanups. I also added InterfaceStability.Unstable annotations on the new classes in hadoop-common. Still open to renaming suggestions, if desired. I'd like to keep them named .*BlockLocation for consistency, because they are subclasses of BlockLocation . Perhaps HdfsBlockLocation for the DistributedFileSystem API, and LocatedBlockLocation for the internal LocatedBlock wrapper?
          Hide
          Andrew Wang added a comment -

          I think at this point, the patch addresses everyone's comments thus far. Does someone mind marking it Patch Available so Jenkins runs?

          @Suresh @Arun: I'd be happy to put up a short design doc if you'd like. I think Todd's above posts make a pretty compelling case for potential I/O improvements, and the implementation of the new DistributedFileSystem api would definitely get refactored when HDFS-2832 is finished.

          Show
          Andrew Wang added a comment - I think at this point, the patch addresses everyone's comments thus far. Does someone mind marking it Patch Available so Jenkins runs? @Suresh @Arun: I'd be happy to put up a short design doc if you'd like. I think Todd's above posts make a pretty compelling case for potential I/O improvements, and the implementation of the new DistributedFileSystem api would definitely get refactored when HDFS-2832 is finished.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12538086/hdfs-3672-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.http.TestHttpServer

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2921//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2921//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2921//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538086/hdfs-3672-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.http.TestHttpServer +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2921//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2921//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2921//console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          I'll ask again since I didn't get a response - wouldn't it make sense to commit this patch to a dev-branch. Use that to prototype changes to either MapReduce or HBase and then merge it in?

          Show
          Arun C Murthy added a comment - I'll ask again since I didn't get a response - wouldn't it make sense to commit this patch to a dev-branch. Use that to prototype changes to either MapReduce or HBase and then merge it in?
          Hide
          Todd Lipcon added a comment -

          I'll ask again since I didn't get a response - wouldn't it make sense to commit this patch to a dev-branch. Use that to prototype changes to either MapReduce or HBase and then merge it in?

          There are projects outside of just HBase and MapReduce that would like to run against this, some of which are not Apache projects. As I mentioned above, we have at least one customer who would like to use this feature in their code to get better disk efficiency. They need to run against an actual release, not a dev branch build. This is the primary use case we're targeting right now. I want to be perfectly honest: the HBase/MR examples I gave above are not on our immediate roadmap; they just serve as proof that this isn't a one-off/niche improvement.

          The other downside with a dev branch is that it's difficult for downstream OSS projects to integrate against something that's not in a release. HBase already has to build against several different Maven profiles to support 1.0, 0.23, and 2.0. Adding another profile against a dev branch not available in maven is not feasible.

          This isn't the first time an API has been added to the trunk code before downstream users exist. For example, FileContext was in Hadoop for somewhere around a year before MR2 started to migrate to it. The "New MR API" is still barely used based on my discussions with users. If there is sufficient motivation (plus customer demand) for an API, and the API is explicitly marked Unstable, what's the problem with including it? It's entirely new code and has no risk of destabilizing the existing feature set.

          I fear that blocking APIs like this from Apache will only serve to fracture the Hadoop user base, pushing us back towards the 0.20-era nightmare of distinct distros with distinct non-overlapping capabilities.

          Do you have a technical objection to the new code: for example, a reason why it will destabilize the existing feature set?

          Show
          Todd Lipcon added a comment - I'll ask again since I didn't get a response - wouldn't it make sense to commit this patch to a dev-branch. Use that to prototype changes to either MapReduce or HBase and then merge it in? There are projects outside of just HBase and MapReduce that would like to run against this, some of which are not Apache projects. As I mentioned above, we have at least one customer who would like to use this feature in their code to get better disk efficiency. They need to run against an actual release, not a dev branch build. This is the primary use case we're targeting right now. I want to be perfectly honest: the HBase/MR examples I gave above are not on our immediate roadmap; they just serve as proof that this isn't a one-off/niche improvement. The other downside with a dev branch is that it's difficult for downstream OSS projects to integrate against something that's not in a release. HBase already has to build against several different Maven profiles to support 1.0, 0.23, and 2.0. Adding another profile against a dev branch not available in maven is not feasible. This isn't the first time an API has been added to the trunk code before downstream users exist. For example, FileContext was in Hadoop for somewhere around a year before MR2 started to migrate to it. The "New MR API" is still barely used based on my discussions with users. If there is sufficient motivation (plus customer demand) for an API, and the API is explicitly marked Unstable, what's the problem with including it? It's entirely new code and has no risk of destabilizing the existing feature set. I fear that blocking APIs like this from Apache will only serve to fracture the Hadoop user base, pushing us back towards the 0.20-era nightmare of distinct distros with distinct non-overlapping capabilities. Do you have a technical objection to the new code: for example, a reason why it will destabilize the existing feature set?
          Hide
          Andrew Wang added a comment -

          Fix findbugs. I also parallelized the DN RPCs with Callables and a threadpool.

          Show
          Andrew Wang added a comment - Fix findbugs. I also parallelized the DN RPCs with Callables and a threadpool.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12538617/hdfs-3672-4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2931//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2931//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538617/hdfs-3672-4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2931//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2931//console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          Todd - first of all, no one is blocking anything.

          Hey Suresh. I'll try to answer a few of your questions above from the perspective of HBase and MR.

          This jira was started with the premise that this new feature was useful to MapReduce and HBase (http://s.apache.org/NJY). So, I assumed there would be some work in that direction.

          If that was the case I don't see how doing the suggestion to do the work in a dev-branch before merging to mainline is blocking anything? It is something we have done many times over for YARN, HDFS HA etc. etc.

          Personally, if anyone was doing this work on MR, I'd be very interested in collaborating, heck - learning.

          However, given my experience on MR, I'd classify it as a high-risk, but very, very interesting research since on a mid-sized clusters (few hundred nodes) and beyond the scheduling overhead might more than negate the I/O gains. Hence, again, doing that in a dev-branch is absolutely the right thing to do from a project and risk management perspective.

          This isn't the first time an API has been added to the trunk code before downstream users exist.

          Yes, this wouldn't be the first time we made that mistake.

          Clearly, we are dealing with the consequences of our previous mistakes for a while now. Arguing that is a good reason to do the same, again, is not cogent.

          As I mentioned above, we have at least one customer who would like to use this feature in their code to get better disk efficiency. They need to run against an actual release, not a dev branch build. This is the primary use case we're targeting right now. I want to be perfectly honest: the HBase/MR examples I gave above are not on our immediate roadmap; they just serve as proof that this isn't a one-off/niche improvement.

          Now, clearly, you don't plan to do any work on either HBase or MR anytime soon and you have a different roadmap for a client.

          If you had made that clear sooner, the conversation would be different.

          Essentially, for the foreseeable future this will be dead code which is not going to be beneficial to anyone in the community... yet, the burden of maintenance etc. will remain.

          No, that is not a big deal since this particular change has a fairly small cross-section - it might be harder to make the argument for a future, more extensive change of this kind. Clearly, if it's a plugin etc., its easier to digest.

          IAC, I don't wish to debate this further.


          Importantly, we should switch this feature off by default so that people who use this understand that this isn't necessarily supported - at least until we have a real, use-case for this in the community.

          Show
          Arun C Murthy added a comment - Todd - first of all, no one is blocking anything. Hey Suresh. I'll try to answer a few of your questions above from the perspective of HBase and MR. This jira was started with the premise that this new feature was useful to MapReduce and HBase ( http://s.apache.org/NJY ). So, I assumed there would be some work in that direction. If that was the case I don't see how doing the suggestion to do the work in a dev-branch before merging to mainline is blocking anything? It is something we have done many times over for YARN, HDFS HA etc. etc. Personally, if anyone was doing this work on MR, I'd be very interested in collaborating, heck - learning . However, given my experience on MR, I'd classify it as a high-risk, but very, very interesting research since on a mid-sized clusters (few hundred nodes) and beyond the scheduling overhead might more than negate the I/O gains. Hence, again, doing that in a dev-branch is absolutely the right thing to do from a project and risk management perspective. This isn't the first time an API has been added to the trunk code before downstream users exist. Yes, this wouldn't be the first time we made that mistake. Clearly, we are dealing with the consequences of our previous mistakes for a while now. Arguing that is a good reason to do the same, again, is not cogent. As I mentioned above, we have at least one customer who would like to use this feature in their code to get better disk efficiency. They need to run against an actual release, not a dev branch build. This is the primary use case we're targeting right now. I want to be perfectly honest: the HBase/MR examples I gave above are not on our immediate roadmap; they just serve as proof that this isn't a one-off/niche improvement. Now, clearly, you don't plan to do any work on either HBase or MR anytime soon and you have a different roadmap for a client. If you had made that clear sooner, the conversation would be different. Essentially, for the foreseeable future this will be dead code which is not going to be beneficial to anyone in the community... yet, the burden of maintenance etc. will remain. No, that is not a big deal since this particular change has a fairly small cross-section - it might be harder to make the argument for a future, more extensive change of this kind . Clearly, if it's a plugin etc., its easier to digest. IAC, I don't wish to debate this further. Importantly, we should switch this feature off by default so that people who use this understand that this isn't necessarily supported - at least until we have a real, use-case for this in the community.
          Hide
          Andrew Wang added a comment -

          Attaching design doc detailing the usecases, and trying to plot out the future direction. Happy to expand on anything unclear.

          Overall, I feel like there's strong interest in the API from multiple parties (the unnamed Cloudera customer, HBase, MR), and fairly clear potential performance improvements. I'd appreciate any advice on making it crystal clear to downstream users that this is an unstable API. We've already got the appropriate annotations, and I could also make it require a config option before doing anything useful (which I think satisfies "default off"). Any other suggestions?

          Show
          Andrew Wang added a comment - Attaching design doc detailing the usecases, and trying to plot out the future direction. Happy to expand on anything unclear. Overall, I feel like there's strong interest in the API from multiple parties (the unnamed Cloudera customer, HBase, MR), and fairly clear potential performance improvements. I'd appreciate any advice on making it crystal clear to downstream users that this is an unstable API. We've already got the appropriate annotations, and I could also make it require a config option before doing anything useful (which I think satisfies "default off"). Any other suggestions?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12538854/design-doc-v1.pdf
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2937//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538854/design-doc-v1.pdf against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2937//console This message is automatically generated.
          Hide
          Andrew Wang added a comment -

          Another patch rev, basically just doing stylistic cleanups. I haven't heard any code-related feedback in a while, so I haven't changed any classnames or added any conf options.

          I've tried to satisfy all the comments thus far, and I would really like to get this in soon if possible. Happy to listen to any further feedback about what I can do to make this happen.

          Show
          Andrew Wang added a comment - Another patch rev, basically just doing stylistic cleanups. I haven't heard any code-related feedback in a while, so I haven't changed any classnames or added any conf options. I've tried to satisfy all the comments thus far, and I would really like to get this in soon if possible. Happy to listen to any further feedback about what I can do to make this happen.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12539102/hdfs-3672-5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
          org.apache.hadoop.hdfs.TestPersistBlocks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2948//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2948//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539102/hdfs-3672-5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2948//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2948//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Patch looks pretty good to me. A few comments:

          1. In DFSClient#getDiskBlockLocations, I recommend you add an instanceof check before the BlockLocation downcast to HdfsBlockLocation. Much better to throw a helpful RTE than some opaque ClassCastException.
          2. The DFSClient#getDiskBlockLocations method is huge, and has a few very distinct phases. I recommend you break this up into a few separate helper methods, e.g. one or two to initialize the data structures, one or two to perform the RPCs, one to re-associate the DN results with the correct block, etc.
          3. Unless I'm missing something, seems like you could easily make DiskBlockLocationCallable a static inner class.
          4. The javadoc parameter comment "@param blocks a List<LocatedBlock>" is not very helpful, since when the javadocs are generated the type of the parameter will automatically be included.
          5. The javadoc for DFSClient#getDiskBlockLocations should be a proper javadoc, i.e. with @param and @returns tags. I also recommend having this javadoc reference DistributedFileSystem#getFileDiskBlockLocations.
          6. In the new javadoc in DistributedFileSystem, you incorrectly say that this interface exists in the FileSystem class as well, and say "this is more helpful with DFS", which is the only implementation.
          7. I think you should change the LimitedPrivate InterfaceAudience annotations to Public, but keep the Unstable InterfaceStability annotations.
          8. Put a single space around your operators, e.g. "for (int i=0; i<blocks.size(); i++)"
          9. Unless I'm missing something, I don't think I see the ability to disable this feature, let alone it being off by default, as Arun requested.
          Show
          Aaron T. Myers added a comment - Patch looks pretty good to me. A few comments: In DFSClient#getDiskBlockLocations, I recommend you add an instanceof check before the BlockLocation downcast to HdfsBlockLocation. Much better to throw a helpful RTE than some opaque ClassCastException. The DFSClient#getDiskBlockLocations method is huge, and has a few very distinct phases. I recommend you break this up into a few separate helper methods, e.g. one or two to initialize the data structures, one or two to perform the RPCs, one to re-associate the DN results with the correct block, etc. Unless I'm missing something, seems like you could easily make DiskBlockLocationCallable a static inner class. The javadoc parameter comment "@param blocks a List<LocatedBlock>" is not very helpful, since when the javadocs are generated the type of the parameter will automatically be included. The javadoc for DFSClient#getDiskBlockLocations should be a proper javadoc, i.e. with @param and @returns tags. I also recommend having this javadoc reference DistributedFileSystem#getFileDiskBlockLocations. In the new javadoc in DistributedFileSystem, you incorrectly say that this interface exists in the FileSystem class as well, and say "this is more helpful with DFS", which is the only implementation. I think you should change the LimitedPrivate InterfaceAudience annotations to Public, but keep the Unstable InterfaceStability annotations. Put a single space around your operators, e.g. "for (int i=0; i<blocks.size(); i++)" Unless I'm missing something, I don't think I see the ability to disable this feature, let alone it being off by default, as Arun requested.
          Hide
          Andrew Wang added a comment -

          Thanks for the detailed review ATM, I tried to address all your comments.

          I broke out the huge DFSClient method into a few smaller ones, which are still a bit large but logically sound. I can try to go further with this, but it'll mean passing more stuff in parameters.

          The config option I added ("dfs.client.file-block-locations.enabled") is default off, and checked client-side only. I could add this to the DN side too if we want to be really sure.

          Show
          Andrew Wang added a comment - Thanks for the detailed review ATM, I tried to address all your comments. I broke out the huge DFSClient method into a few smaller ones, which are still a bit large but logically sound. I can try to go further with this, but it'll mean passing more stuff in parameters. The config option I added ("dfs.client.file-block-locations.enabled") is default off, and checked client-side only. I could add this to the DN side too if we want to be really sure.
          Hide
          Aaron T. Myers added a comment -

          Breaking up DFSClient#getDiskBlockLocations makes the code a lot more readable IMO. Thanks for doing that.

          A few more comments:

          1. This exception message shouldn't include "getDiskBlockLocations". I recommend you just say "DFSClient#getDiskBlockLocations expected to be given instances of HdfsBlockLocation"
          2. In the "re-group the locatedblocks to be grouped by datanodes..." loop, it seems like instead of the if (...) check, you could just put the initialization of the LocatedBlock list inside the outer loop, before the inner loop.
          3. Rather than using a hard-coded 10 threads for the ThreadPoolExecutor, please make this configurable. I think it's reasonable to not document it in a *-default.xml file, since most users will never want to change this value, but if someone does find the need to do it it'd be nice to not have to recompile.
          4. Rather than reusing the socket read timeout as the timeout for the RPCs to the DNs, I think this should be separately configurable. That conf value is used as the timeout for reading block data from a DN, and defaults to 60s. I think it's entirely reasonable that callers of this API will want a much lower timeout. For that matter, you might consider calling the version of ScheduledThreadPoolExecutor#invokeAll that takes a timeout as a parameter.
          5. You should add a comment explaining the reasoning for having this loop. (I see why it is, but it's not obvious, so should be explained.)
            +    for (int i = 0; i < futures.size(); i++) {
            +      metadatas.add(null);
            +    }
            
          6. In the final loop in DFSClient#queryDatanodesForHdfsBlocksMetadata, I recommend you move the fetching of the callable and the datanode objects to the catch clause, since that's the only place those variables are used.
          7. In the same catch clause mentioned above, I recommend you log the full exception stack trace if LOG.isDebugEnabled().
          8. "did not" should be two words:
            +            LOG.debug("Datanode responded with a block disk id we did" +
            +                "not request, omitting.");
            
          9. I think we should make it clear in the HdfsDiskId javadoc that it only uniquely identifies a data directory on a DN when paired with that DN. i.e. it is not the case that DiskId is unique between DNs.
          10. You shouldn't be using protobuf ByteString outside of the protobuf translator code - just use a byte[]. For that matter, it's only necessary that the final result to clients of the API be an opaque identifier. In the DN-side implementation of the RPC, and even the DFSClient code, you could reasonably use a meaningful value that's not opaque.
          11. How could this possibly happen?
            +        // Oddly, we got a blockpath that didn't match any dataDir.
            +        if (diskIndex == dataDirs.size()) {
            +          LOG.warn("Could not determine the data dir of block " 
            +              + block.toString() + " with path " + blockPath);
            +        }
            
          Show
          Aaron T. Myers added a comment - Breaking up DFSClient#getDiskBlockLocations makes the code a lot more readable IMO. Thanks for doing that. A few more comments: This exception message shouldn't include "getDiskBlockLocations". I recommend you just say "DFSClient#getDiskBlockLocations expected to be given instances of HdfsBlockLocation" In the "re-group the locatedblocks to be grouped by datanodes..." loop, it seems like instead of the if (...) check, you could just put the initialization of the LocatedBlock list inside the outer loop, before the inner loop. Rather than using a hard-coded 10 threads for the ThreadPoolExecutor, please make this configurable. I think it's reasonable to not document it in a *-default.xml file, since most users will never want to change this value, but if someone does find the need to do it it'd be nice to not have to recompile. Rather than reusing the socket read timeout as the timeout for the RPCs to the DNs, I think this should be separately configurable. That conf value is used as the timeout for reading block data from a DN, and defaults to 60s. I think it's entirely reasonable that callers of this API will want a much lower timeout. For that matter, you might consider calling the version of ScheduledThreadPoolExecutor#invokeAll that takes a timeout as a parameter. You should add a comment explaining the reasoning for having this loop. (I see why it is, but it's not obvious, so should be explained.) + for ( int i = 0; i < futures.size(); i++) { + metadatas.add( null ); + } In the final loop in DFSClient#queryDatanodesForHdfsBlocksMetadata, I recommend you move the fetching of the callable and the datanode objects to the catch clause, since that's the only place those variables are used. In the same catch clause mentioned above, I recommend you log the full exception stack trace if LOG.isDebugEnabled(). "did not" should be two words: + LOG.debug( "Datanode responded with a block disk id we did" + + "not request, omitting." ); I think we should make it clear in the HdfsDiskId javadoc that it only uniquely identifies a data directory on a DN when paired with that DN. i.e. it is not the case that DiskId is unique between DNs. You shouldn't be using protobuf ByteString outside of the protobuf translator code - just use a byte[]. For that matter, it's only necessary that the final result to clients of the API be an opaque identifier. In the DN-side implementation of the RPC, and even the DFSClient code, you could reasonably use a meaningful value that's not opaque. How could this possibly happen? + // Oddly, we got a blockpath that didn't match any dataDir. + if (diskIndex == dataDirs.size()) { + LOG.warn( "Could not determine the data dir of block " + + block.toString() + " with path " + blockPath); + }
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12539391/hdfs-3672-6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestFileConcurrentReader

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2961//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2961//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539391/hdfs-3672-6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2961//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2961//console This message is automatically generated.
          Hide
          Andrew Wang added a comment -

          Thanks for the (very thorough) reviews. Addressed as recommended, except as follows:

          In the "re-group the locatedblocks to be grouped by datanodes..." loop, it seems like instead of the if (...) check, you could just put the initialization of the LocatedBlock list inside the outer loop, before the inner loop.

          I think it's right as is. Potentially, you need to add a new list for every datanode replica of every LocatedBlock, thus doing it inside the double nested loop.

          Rather than using a hard-coded 10 threads for the ThreadPoolExecutor, please make this configurable. I think it's reasonable to not document it in a *-default.xml file, since most users will never want to change this value, but if someone does find the need to do it it'd be nice to not have to recompile.

          Since I already had hdfs-default.xml open to add the timeout config option, I documented this one too.

          Show
          Andrew Wang added a comment - Thanks for the (very thorough) reviews. Addressed as recommended, except as follows: In the "re-group the locatedblocks to be grouped by datanodes..." loop, it seems like instead of the if (...) check, you could just put the initialization of the LocatedBlock list inside the outer loop, before the inner loop. I think it's right as is. Potentially, you need to add a new list for every datanode replica of every LocatedBlock, thus doing it inside the double nested loop. Rather than using a hard-coded 10 threads for the ThreadPoolExecutor, please make this configurable. I think it's reasonable to not document it in a *-default.xml file, since most users will never want to change this value, but if someone does find the need to do it it'd be nice to not have to recompile. Since I already had hdfs-default.xml open to add the timeout config option, I documented this one too.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12539673/hdfs-3672-7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2964//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2964//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539673/hdfs-3672-7.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2964//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2964//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Thanks for addressing all my feedback, Andrew. The latest patch looks good to me, modulo one nit: looks like there's an unintended import of com.google.protobuf.ByteString in DFSClient.

          +1 once this is addressed.

          Show
          Aaron T. Myers added a comment - Thanks for addressing all my feedback, Andrew. The latest patch looks good to me, modulo one nit: looks like there's an unintended import of com.google.protobuf.ByteString in DFSClient. +1 once this is addressed.
          Hide
          Andrew Wang added a comment -

          Nit addressed.

          Show
          Andrew Wang added a comment - Nit addressed.
          Hide
          Suresh Srinivas added a comment -

          I need a day or two to review the doc and patch?

          Show
          Suresh Srinivas added a comment - I need a day or two to review the doc and patch?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12539712/hdfs-3672-8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2966//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2966//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539712/hdfs-3672-8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2966//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2966//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Hi Suresh, sounds good, and thanks for volunteering to help with a review. If you don't find time for a review in the next day or two I'd like to go ahead and commit this patch as-is. If that happens, we could of course always address any feedback you have in follow-up JIRAs.

          Show
          Aaron T. Myers added a comment - Hi Suresh, sounds good, and thanks for volunteering to help with a review. If you don't find time for a review in the next day or two I'd like to go ahead and commit this patch as-is. If that happens, we could of course always address any feedback you have in follow-up JIRAs.
          Hide
          Suresh Srinivas added a comment -

          The more I think about this, I feel less convinced about this API. Hotspots are to be avoided by distributing the data or may be even increasing replica count (variant of balancer) instead of building complicated scheduling logic on the applications. I have already commented on how limited the application view is. So really for mapreduce I am not conviced this how one should go about solving the hotspot issues. I am not sure if HBase cases would use this as well (is there any jira at least in HBase on this?).

          That leaves me to conclude the only real use case is that of "unknown customer". For that we are adding this non-trivial code in the core parts of HDFS! I am not convinced. However I will not block it and encourage you to think about doing it in a less hacky way.

          Is there a timeline where someone will work on HBase or MapReduce enhancements to use this capability?

          Document comments:

          1. "achieve disk-locality", do you mean "achieve datanode-locality"?
          2. Introduction: Not sure why the document uses "disk topology". It is just disk location right?
          3. Can we remove mentioning "Cloudera's unknown customer", it serves no purpose.

          Code review comments:

          1. DiskBlockLocation#diskIds, HdfsBlockLocation#block, HdfsDiskId#id and HdfsDiskId#isValid - make them final. This is true for member of HdfsBlocksMetadata.
          2. Why is this API marked @InterfaceAudience.Public. I think we should remove it and just leave InterfaceStability.Unstable
          3. minor "that can identifies"
          4. I think DiskBlockLocation, DiskId are not generic names. What if the underlying is not local disks at all, but mounted directory from another storage?
          5. Configuration to turn off this functionlity should be on the server side also. Otherwise a client can just enable this functionlality without the admin having control over it.
          6. Is this functionality expected to be used for a single file or for blocks belonging to many files? Because the current way of diskID as index in a datastructure that could change (with disk failures etc.) may not provide stable diskIds.
          7. "Get a HdfsBlocksMetadata" - make HdfsBlockLocation a link
          8. getHdfsBlocksMetadata may have a bug. Lets say block path is /foo/b/dir1 and there is another valid storage direcoty /foo/b/dir, is the correct index returned? Instead of getBLockFile, you may be better of using getReplicaInfo and use the volume from it to get the directory.
          9. I am surprised the API that returns list of diskIds and diskIds set for each block. What is the use of first diskId list.
          10. dfs.datanode.handler.count is 3. This could push the handler count limit.
          11. I like DiskId was opque. It would be good not to expose any member variables such as getId and keep it opque even in HdfsDiskId.
          12. Please leave DFSClient#getBlockLocation() unchanged.
          13. I prefer moving the experimental code out of DFSClient to a separate helper class. Please consider breaking some of the long methods into smaller methods.
          Show
          Suresh Srinivas added a comment - The more I think about this, I feel less convinced about this API. Hotspots are to be avoided by distributing the data or may be even increasing replica count (variant of balancer) instead of building complicated scheduling logic on the applications. I have already commented on how limited the application view is. So really for mapreduce I am not conviced this how one should go about solving the hotspot issues. I am not sure if HBase cases would use this as well (is there any jira at least in HBase on this?). That leaves me to conclude the only real use case is that of "unknown customer". For that we are adding this non-trivial code in the core parts of HDFS! I am not convinced. However I will not block it and encourage you to think about doing it in a less hacky way. Is there a timeline where someone will work on HBase or MapReduce enhancements to use this capability? Document comments: "achieve disk-locality", do you mean "achieve datanode-locality"? Introduction: Not sure why the document uses "disk topology". It is just disk location right? Can we remove mentioning "Cloudera's unknown customer", it serves no purpose. Code review comments: DiskBlockLocation#diskIds, HdfsBlockLocation#block, HdfsDiskId#id and HdfsDiskId#isValid - make them final. This is true for member of HdfsBlocksMetadata. Why is this API marked @InterfaceAudience.Public. I think we should remove it and just leave InterfaceStability.Unstable minor "that can identifies" I think DiskBlockLocation, DiskId are not generic names. What if the underlying is not local disks at all, but mounted directory from another storage? Configuration to turn off this functionlity should be on the server side also. Otherwise a client can just enable this functionlality without the admin having control over it. Is this functionality expected to be used for a single file or for blocks belonging to many files? Because the current way of diskID as index in a datastructure that could change (with disk failures etc.) may not provide stable diskIds. "Get a HdfsBlocksMetadata" - make HdfsBlockLocation a link getHdfsBlocksMetadata may have a bug. Lets say block path is /foo/b/dir1 and there is another valid storage direcoty /foo/b/dir, is the correct index returned? Instead of getBLockFile, you may be better of using getReplicaInfo and use the volume from it to get the directory. I am surprised the API that returns list of diskIds and diskIds set for each block. What is the use of first diskId list. dfs.datanode.handler.count is 3. This could push the handler count limit. I like DiskId was opque. It would be good not to expose any member variables such as getId and keep it opque even in HdfsDiskId. Please leave DFSClient#getBlockLocation() unchanged. I prefer moving the experimental code out of DFSClient to a separate helper class. Please consider breaking some of the long methods into smaller methods.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12539964/design-doc-v2.pdf
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2977//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539964/design-doc-v2.pdf against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2977//console This message is automatically generated.
          Hide
          Andrew Wang added a comment -

          Thanks for the design doc + code review Suresh. Doc comments I fixed. Code comments, I'd like to check a few things with you before cutting another patch. Everything else I'll fix as recommended.

          I think DiskBlockLocation, DiskId are not generic names. What if the underlying is not local disks at all, but mounted directory from another storage?

          Perhaps Storage(BlockLocation|Id)? Volume(BlockLocation|Id)? I'm not entirely sure of the end-user terminology here.

          Instead of getBLockFile, you may be better of using getReplicaInfo and use the volume from it to get the directory.

          This was a really good point, it's bugged. I changed it to doing comparisons on volumes, which should work as expected (no path comparisons).

          Is this functionality expected to be used for a single file or for blocks belonging to many files? Because the current way of diskID as index in a datastructure that could change (with disk failures etc.) may not provide stable diskIds.

          I am surprised the API that returns list of diskIds and diskIds set for each block. What is the use of first diskId list.

          This could have been more clearly documented in the code, I'll beef it up. I did this based on Todd's earlier comment; basically we pass a list of the DiskIds on the datanode (one per volume), and then a list of indexes into this list (one per block). Since the DiskId of a volume should be the same for the life of a datanode, I think this is fairly stable.

          Clients are going to have to deal with staleness in this disk location info, since as you noted, a block's volume can change based on configuration, failures, and normal HDFS operation. Clients that peek inside HDFS for BlockLocations via #getBlockLocations() need to be fairly sophisticated anyway, since they already deal with this kind of thing at the DN level.

          dfs.datanode.handler.count is 3. This could push the handler count limit.

          Should I just bump the default (say, to 10)? I haven't done any performance testing, so I don't know if it's a problem.

          Please leave DFSClient#getBlockLocation() unchanged.

          I'd love to do this, but there's currently a hacky thing going on here. The new #getDiskBlockLocations call needs ExtendedBlock}}s to query the datanodes. The current {{#getBlockLocations returns BlockLocation}}s, which don't have this information. That's why I changed it to instead return {{HdfsBlockLocation instead, which also include the required ExtendedBlock}}s along with the {{BlockLocation data.

          Show
          Andrew Wang added a comment - Thanks for the design doc + code review Suresh. Doc comments I fixed. Code comments, I'd like to check a few things with you before cutting another patch. Everything else I'll fix as recommended. I think DiskBlockLocation, DiskId are not generic names. What if the underlying is not local disks at all, but mounted directory from another storage? Perhaps Storage(BlockLocation|Id)? Volume(BlockLocation|Id)? I'm not entirely sure of the end-user terminology here. Instead of getBLockFile, you may be better of using getReplicaInfo and use the volume from it to get the directory. This was a really good point, it's bugged. I changed it to doing comparisons on volumes, which should work as expected (no path comparisons). Is this functionality expected to be used for a single file or for blocks belonging to many files? Because the current way of diskID as index in a datastructure that could change (with disk failures etc.) may not provide stable diskIds. I am surprised the API that returns list of diskIds and diskIds set for each block. What is the use of first diskId list. This could have been more clearly documented in the code, I'll beef it up. I did this based on Todd's earlier comment ; basically we pass a list of the DiskIds on the datanode (one per volume), and then a list of indexes into this list (one per block). Since the DiskId of a volume should be the same for the life of a datanode, I think this is fairly stable. Clients are going to have to deal with staleness in this disk location info, since as you noted, a block's volume can change based on configuration, failures, and normal HDFS operation. Clients that peek inside HDFS for BlockLocations via #getBlockLocations() need to be fairly sophisticated anyway, since they already deal with this kind of thing at the DN level. dfs.datanode.handler.count is 3. This could push the handler count limit. Should I just bump the default (say, to 10)? I haven't done any performance testing, so I don't know if it's a problem. Please leave DFSClient#getBlockLocation() unchanged. I'd love to do this, but there's currently a hacky thing going on here. The new #getDiskBlockLocations call needs ExtendedBlock}}s to query the datanodes. The current {{#getBlockLocations returns BlockLocation}}s, which don't have this information. That's why I changed it to instead return {{HdfsBlockLocation instead, which also include the required ExtendedBlock}}s along with the {{BlockLocation data.
          Hide
          Aaron T. Myers added a comment -

          Why is this API marked @InterfaceAudience.Public. I think we should remove it and just leave InterfaceStability.Unstable

          I was under the impression that all public classes needed to have an @InterfaceAudience annotation, and all public classes needed to have an @InterfaceStability annotation unless they're marked @InterfaceAudience.Private. Am I wrong about that?

          Configuration to turn off this functionlity should be on the server side also. Otherwise a client can just enable this functionlality without the admin having control over it.

          I thought about this a fair bit while reviewing the code. The conclusion that I came to is that the stated reason that Arun wanted this feature disabled by default was "so that people who use this understand that this isn't necessarily supported." A client-side-only config seems to serve that purpose. Making this config server side as well only serves to require the admin enable the config and restart their cluster before some client that wants to try to use this functionality can give it a shot. That seems to me to be a strictly unnecessary pain for both the admin and user that doesn't seem to further Arun's stated goal. For that matter, why would an admin want to prevent clients from calling this API? If you insist on having a server side config for this, I'd like to suggest having two separate configs: a server-side one that defaults to enabled, but so that an admin may consciously disable it, and a client-side config that defaults to disabled so that users of this API must consciously configure their client, to support Arun's stated goal of making sure people are aware that it's an experimental API.

          Show
          Aaron T. Myers added a comment - Why is this API marked @InterfaceAudience.Public. I think we should remove it and just leave InterfaceStability.Unstable I was under the impression that all public classes needed to have an @InterfaceAudience annotation, and all public classes needed to have an @InterfaceStability annotation unless they're marked @InterfaceAudience.Private. Am I wrong about that? Configuration to turn off this functionlity should be on the server side also. Otherwise a client can just enable this functionlality without the admin having control over it. I thought about this a fair bit while reviewing the code. The conclusion that I came to is that the stated reason that Arun wanted this feature disabled by default was "so that people who use this understand that this isn't necessarily supported." A client-side-only config seems to serve that purpose. Making this config server side as well only serves to require the admin enable the config and restart their cluster before some client that wants to try to use this functionality can give it a shot. That seems to me to be a strictly unnecessary pain for both the admin and user that doesn't seem to further Arun's stated goal. For that matter, why would an admin want to prevent clients from calling this API? If you insist on having a server side config for this, I'd like to suggest having two separate configs: a server-side one that defaults to enabled, but so that an admin may consciously disable it, and a client-side config that defaults to disabled so that users of this API must consciously configure their client, to support Arun's stated goal of making sure people are aware that it's an experimental API.
          Hide
          Arun C Murthy added a comment -

          I'd really encourage you to put this into the DataNode and throw an UnsupportedOperationException rather than merely do this via a client-side config.

          Show
          Arun C Murthy added a comment - I'd really encourage you to put this into the DataNode and throw an UnsupportedOperationException rather than merely do this via a client-side config.
          Hide
          Aaron T. Myers added a comment -

          I'd really encourage you to put this into the DataNode and throw an UnsupportedOperationException rather than merely do this via a client-side config.

          That's fine by me. I don't feel super strongly about this, so if this is your preference Arun, let's go with that.

          Show
          Aaron T. Myers added a comment - I'd really encourage you to put this into the DataNode and throw an UnsupportedOperationException rather than merely do this via a client-side config. That's fine by me. I don't feel super strongly about this, so if this is your preference Arun, let's go with that.
          Hide
          Suresh Srinivas added a comment -

          Perhaps Storage(BlockLocation|Id)? Volume(BlockLocation|Id)? I'm not entirely sure of the end-user terminology here.

          DiskBlockLocation could be BlockStorageLocation or just StorageLocation.
          DiskId - StorageId seems appropriate here. However it is used for other things in HDFS. Als you suggested, perhaps VolumeId may be okay.

          Should I just bump the default (say, to 10)? I haven't done any performance testing, so I don't know if it's a problem.

          Only with this feature there will be more RPC calls to datanodes and hence may need more handlers. Handler is just a thread, so increasing it to 10 should be fine.

          @aaron - need server side config as well. That is the only way an admin could control the accessibility to the feature. One could use exception/support for required method to figure out if server supports the functionality on client side instead of config.

          Please address my previous comment:

          Is there a timeline where someone will work on HBase or MapReduce enhancements to use this capability?

          Show
          Suresh Srinivas added a comment - Perhaps Storage(BlockLocation|Id)? Volume(BlockLocation|Id)? I'm not entirely sure of the end-user terminology here. DiskBlockLocation could be BlockStorageLocation or just StorageLocation. DiskId - StorageId seems appropriate here. However it is used for other things in HDFS. Als you suggested, perhaps VolumeId may be okay. Should I just bump the default (say, to 10)? I haven't done any performance testing, so I don't know if it's a problem. Only with this feature there will be more RPC calls to datanodes and hence may need more handlers. Handler is just a thread, so increasing it to 10 should be fine. @aaron - need server side config as well. That is the only way an admin could control the accessibility to the feature. One could use exception/support for required method to figure out if server supports the functionality on client side instead of config. Please address my previous comment: Is there a timeline where someone will work on HBase or MapReduce enhancements to use this capability?
          Hide
          Andrew Purtell added a comment -

          Is there a timeline where someone will work on HBase or MapReduce enhancements to use this capability?

          I put up some ramblings on HBASE-6572. The scope is much larger and there's no timeline, it's a brainstorming issue. However, if you'd like this issue can be linked to it.

          Show
          Andrew Purtell added a comment - Is there a timeline where someone will work on HBase or MapReduce enhancements to use this capability? I put up some ramblings on HBASE-6572 . The scope is much larger and there's no timeline, it's a brainstorming issue. However, if you'd like this issue can be linked to it.
          Hide
          Andrew Wang added a comment -

          Thanks everyone for all your input! Here's another spin of the patch. Big things:

          • I renamed the Disk* classes to BlockStorageLocation and VolumeId, and tried to update all the javadoc/comments.
          • I split out most of the DFSClient code into a new BlockStorageLocationUtil class, which is ~300 lines of static methods. I pulled apart one of the long methods. Doing this for the other long method would arguably be messier, so I left it.
          • Added the DN-side config option. If any of the DNs throws an UnsupportedOperationException, it's bubbled up to the client (thus failing the entire call). The client-side code also checks for the same DN config option, so you need to enable it in both the client and DN for this to do anything.
          • Bumped the DN handler count to 10.

          I think Suresh's other more minor comments are also addressed.

          Show
          Andrew Wang added a comment - Thanks everyone for all your input! Here's another spin of the patch. Big things: I renamed the Disk* classes to BlockStorageLocation and VolumeId, and tried to update all the javadoc/comments. I split out most of the DFSClient code into a new BlockStorageLocationUtil class, which is ~300 lines of static methods. I pulled apart one of the long methods. Doing this for the other long method would arguably be messier, so I left it. Added the DN-side config option. If any of the DNs throws an UnsupportedOperationException, it's bubbled up to the client (thus failing the entire call). The client-side code also checks for the same DN config option, so you need to enable it in both the client and DN for this to do anything. Bumped the DN handler count to 10. I think Suresh's other more minor comments are also addressed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12540815/hdfs-3672-9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestFsck

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3001//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3001//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540815/hdfs-3672-9.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFsck +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3001//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3001//console This message is automatically generated.
          Hide
          Andrew Wang added a comment -

          Ran TestFsck locally and it passed, test failures I think are unrelated.

          Show
          Andrew Wang added a comment - Ran TestFsck locally and it passed, test failures I think are unrelated.
          Hide
          Aaron T. Myers added a comment -

          The latest patch looks good to me, and I believe it addresses all of Suresh's feedback.

          Suresh, do you have any more comments on the latest patch?

          Show
          Aaron T. Myers added a comment - The latest patch looks good to me, and I believe it addresses all of Suresh's feedback. Suresh, do you have any more comments on the latest patch?
          Hide
          Suresh Srinivas added a comment -

          I put up some ramblings on HBASE-6572. The scope is much larger and there's no timeline, it's a brainstorming issue. However, if you'd like this issue can be linked to it.

          Andrew, I am not sure how this jira, the solution it is providing helps HBASE-6572. Some of the intent of HBASE-6572 is why I think the current temporary hack is a wrong way to go about the solution. See my comments above here and here

          Show
          Suresh Srinivas added a comment - I put up some ramblings on HBASE-6572 . The scope is much larger and there's no timeline, it's a brainstorming issue. However, if you'd like this issue can be linked to it. Andrew, I am not sure how this jira, the solution it is providing helps HBASE-6572 . Some of the intent of HBASE-6572 is why I think the current temporary hack is a wrong way to go about the solution. See my comments above here and here
          Hide
          Andrew Purtell added a comment -

          @Suresh, thanks for linking HBASE-6572 to HDFS-2832, I missed that issue. That's a better issue linkage.

          If HDFS is to support heterogeneous/tiered storage, then somehow the NNs and DNs must negotiate block placement by policy. For example, suppose the NN is doing some kind of path based mapping of files->blocks->device type. Say the default is disk. Now the user updates the policy for a subtree of the namespace to solid state. For any new file in that subtree the NN would presumably pass a hint to the DFSClient and the DFSClient would in turn pass the hint to the DNs: place block on the desired media type or fail. For any existing file in the subtree, the NN would need to migrate blocks from one storage tier to another. Presumably the DN must include in block reports the "disk location" including the media type so the NN has the necessary information to accomplish that. Simply exposing that "disk location" information via an API is the intent of this issue, right? Scratching one itch here can be leveraged as incremental development toward a larger goal? Happy to take this discussion to HDFS-2832 or offline or simply drop it if a distraction or in error.

          Show
          Andrew Purtell added a comment - @Suresh, thanks for linking HBASE-6572 to HDFS-2832 , I missed that issue. That's a better issue linkage. If HDFS is to support heterogeneous/tiered storage, then somehow the NNs and DNs must negotiate block placement by policy. For example, suppose the NN is doing some kind of path based mapping of files->blocks->device type. Say the default is disk. Now the user updates the policy for a subtree of the namespace to solid state. For any new file in that subtree the NN would presumably pass a hint to the DFSClient and the DFSClient would in turn pass the hint to the DNs: place block on the desired media type or fail. For any existing file in the subtree, the NN would need to migrate blocks from one storage tier to another. Presumably the DN must include in block reports the "disk location" including the media type so the NN has the necessary information to accomplish that. Simply exposing that "disk location" information via an API is the intent of this issue, right? Scratching one itch here can be leveraged as incremental development toward a larger goal? Happy to take this discussion to HDFS-2832 or offline or simply drop it if a distraction or in error.
          Hide
          Suresh Srinivas added a comment -

          Thanks Andrew. HDFS-2832 can benefit a lot out of the requirements from HBase. Planning to start working on it. Will ping you to get use cases and requirements.

          Show
          Suresh Srinivas added a comment - Thanks Andrew. HDFS-2832 can benefit a lot out of the requirements from HBase. Planning to start working on it. Will ping you to get use cases and requirements.
          Hide
          Suresh Srinivas added a comment -

          @Aaron, want to make sure we are on the same page. This is a temporary solution. I am not going to block committing this change. When we make enough progress in HDFS-2832, I plan to remove this functionality.

          Show
          Suresh Srinivas added a comment - @Aaron, want to make sure we are on the same page. This is a temporary solution. I am not going to block committing this change. When we make enough progress in HDFS-2832 , I plan to remove this functionality.
          Hide
          Aaron T. Myers added a comment -

          @Aaron, want to make sure we are on the same page. This is a temporary solution. I am not going to block committing this change. When we make enough progress in HDFS-2832, I plan to remove this functionality.

          To be clear, you mean that you're going to implement this differently, right? The thing that I care about is that clients that wish to have access to this information can get to it. I agree that once HDFS-2832 is implemented, this information will be available in the NN, so we won't need to do the RPC fanout to the DNs, but the user API will remain.

          Show
          Aaron T. Myers added a comment - @Aaron, want to make sure we are on the same page. This is a temporary solution. I am not going to block committing this change. When we make enough progress in HDFS-2832 , I plan to remove this functionality. To be clear, you mean that you're going to implement this differently, right? The thing that I care about is that clients that wish to have access to this information can get to it. I agree that once HDFS-2832 is implemented, this information will be available in the NN, so we won't need to do the RPC fanout to the DNs, but the user API will remain.
          Hide
          Suresh Srinivas added a comment -

          but the user API will remain.

          There will be equivalent functionality. The user API is marked unstable and may be removed depending on how HDFS-2832 implements this functionality. Not clear right now.

          Show
          Suresh Srinivas added a comment - but the user API will remain. There will be equivalent functionality. The user API is marked unstable and may be removed depending on how HDFS-2832 implements this functionality. Not clear right now.
          Hide
          Aaron T. Myers added a comment -

          Cool. Sounds like we're on the same page then.

          If there are no more comments, I'm going to go ahead and commit this later today.

          Show
          Aaron T. Myers added a comment - Cool. Sounds like we're on the same page then. If there are no more comments, I'm going to go ahead and commit this later today.
          Hide
          Andrew Wang added a comment -

          Rebase patch on trunk.

          Show
          Andrew Wang added a comment - Rebase patch on trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541273/hdfs-3672-10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3031//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541273/hdfs-3672-10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. -1 javac. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3031//console This message is automatically generated.
          Hide
          Andrew Wang added a comment -

          Rebase try 2. A lesson about compile testing after a rebase has been learned.

          Show
          Andrew Wang added a comment - Rebase try 2. A lesson about compile testing after a rebase has been learned.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541275/hdfs-3672-11.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestFsck
          org.apache.hadoop.hdfs.server.namenode.TestCheckpoint

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3032//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3032//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541275/hdfs-3672-11.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.server.namenode.TestCheckpoint +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3032//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3032//console This message is automatically generated.
          Hide
          Andrew Wang added a comment -

          Ran the three failed tests locally, and they passed. I think the failures are unrelated.

          Show
          Andrew Wang added a comment - Ran the three failed tests locally, and they passed. I think the failures are unrelated.
          Hide
          Aaron T. Myers added a comment -

          The latest patch looks pretty good to me, and I agree that the test failures seem unrelated. One small comment:

          It seems reasonable to hard-code "false" for the useHostname parameter in calls to DatanodeInfo#getIpcAddr where the return value ends up just getting used in a log message, but in the call to DFSUtil#createClientDatanodeProtocolProxy, you should use the value of DFSClient.Conf.connectToDnViaHostname, so that this patch will be in keeping with the changes introduced by HDFS-3150.

          +1 once this is addressed.

          Show
          Aaron T. Myers added a comment - The latest patch looks pretty good to me, and I agree that the test failures seem unrelated. One small comment: It seems reasonable to hard-code "false" for the useHostname parameter in calls to DatanodeInfo#getIpcAddr where the return value ends up just getting used in a log message, but in the call to DFSUtil#createClientDatanodeProtocolProxy, you should use the value of DFSClient.Conf.connectToDnViaHostname, so that this patch will be in keeping with the changes introduced by HDFS-3150 . +1 once this is addressed.
          Hide
          Andrew Wang added a comment -

          My bad, should have followed that hostname change a little more closely. Now it passes the conf parameter down to be properly obeyed by the RPC threads.

          Show
          Andrew Wang added a comment - My bad, should have followed that hostname change a little more closely. Now it passes the conf parameter down to be properly obeyed by the RPC threads.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541314/hdfs-3672-12.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3036//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3036//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541314/hdfs-3672-12.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3036//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3036//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Thanks for making that change, Andrew.

          +1, the latest patch looks good to me. I'm going to commit this momentarily.

          Show
          Aaron T. Myers added a comment - Thanks for making that change, Andrew. +1, the latest patch looks good to me. I'm going to commit this momentarily.
          Hide
          Aaron T. Myers added a comment -

          I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Andrew, and thanks a lot to Suresh, Arun, et al for the discussion.

          Show
          Aaron T. Myers added a comment - I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Andrew, and thanks a lot to Suresh, Arun, et al for the discussion.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2598 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2598/)
          HDFS-3672. Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2598 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2598/ ) HDFS-3672 . Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2662 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2662/)
          HDFS-3672. Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2662 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2662/ ) HDFS-3672 . Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2627 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2627/)
          HDFS-3672. Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2627 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2627/ ) HDFS-3672 . Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1138 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1138/)
          HDFS-3672. Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1138 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1138/ ) HDFS-3672 . Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1170 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1170/)
          HDFS-3672. Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1170 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1170/ ) HDFS-3672 . Expose disk-location information for blocks to enable better scheduling. Contributed by Andrew Wang. (Revision 1374355) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374355 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsBlockLocation.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientDatanodeProtocol.proto /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2688 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2688/)
          MAPREDUCE-4577. HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2688 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2688/ ) MAPREDUCE-4577 . HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2624 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2624/)
          MAPREDUCE-4577. HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2624 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2624/ ) MAPREDUCE-4577 . HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2652 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2652/)
          MAPREDUCE-4577. HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2652 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2652/ ) MAPREDUCE-4577 . HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1143 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1143/)
          MAPREDUCE-4577. HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1143 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1143/ ) MAPREDUCE-4577 . HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1175 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1175/)
          MAPREDUCE-4577. HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1175 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1175/ ) MAPREDUCE-4577 . HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test. Contributed by Aaron T. Myers. (Revision 1376297) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376297 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java

            People

            • Assignee:
              Andrew Wang
              Reporter:
              Andrew Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              33 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development