[HADOOP-13453] S3Guard: Instrument new functionality with Hadoop metrics. - ASF JIRA

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: HADOOP-13345
Component/s: fs/s3
Labels:
None

Target Version/s:

HADOOP-13345

Description

Provide Hadoop metrics showing operational details of the S3Guard implementation.

The metrics will be implemented in this ticket:

● S3GuardRechecksNthPercentileLatency (MutableQuantiles) Percentile time spent
in rechecks attempting to achieve consistency. Repeated for multiple percentile values
of N. This metric is an indicator of the additional latency cost of running S3A with
S3Guard.
● S3GuardRechecksNumOps (MutableQuantiles) Number of times a consistency
recheck was required while attempting to achieve consistency.
● S3GuardStoreNthPercentileLatency (MutableQuantiles) Percentile time spent in
operations against the consistent store, including both write operations during file system
mutations and read operations during file system consistency checks. Repeated for
multiple percentile values of N. This metric is an indicator of latency to the consistent
store implementation.
● S3GuardConsistencyStoreNumOps (MutableQuantiles) Number of operations
against the consistent store, including both write operations during file system mutations
and read operations during file system consistency checks.
● S3GuardConsistencyStoreFailures (MutableCounterLong) Number of failures
during operations against the consistent store implementation.
● S3GuardConsistencyStoreTimeouts (MutableCounterLong) Number of timeouts
during operations against the consistent store implementation.
● S3GuardInconsistencies (MutableCounterLong) C ount of times S3Guard failed to
achieve consistency, even after exhausting all rechecks. A high count may indicate
unexpected outofband modification of the S3 bucket contents, such as by an external
tool that does not make corresponding updates to the consistent store.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-13453-HADOOP-13345-001.patch
25/Feb/17 21:18
6 kB
Ai Deng
HADOOP-13453-HADOOP-13345-002.patch
25/Feb/17 21:29
13 kB
Ai Deng
HADOOP-13453-HADOOP-13345-003.patch
19/Mar/17 08:07
18 kB
Ai Deng
HADOOP-13453-HADOOP-13345-004.patch
13/Apr/17 15:55
11 kB
Steve Loughran
HADOOP-13453-HADOOP-13345-005.patch
13/Apr/17 17:25
12 kB
Steve Loughran

Activity

Ascending order - Click to sort in descending order

Steve Loughran added a comment - 06/Jan/17 19:37

these should all go into org.apache.hadoop.fs.s3a.Statistic & Instrumentation
tracking total & ongoing dynamoDB request rates could be useful, as it will help identify when you've over/under provisioned your DDB.
include stats on detected inconsistencies
include in S3AFileSystem.toString.

Steve Loughran added a comment - 06/Jan/17 19:37 these should all go into org.apache.hadoop.fs.s3a.Statistic & Instrumentation tracking total & ongoing dynamoDB request rates could be useful, as it will help identify when you've over/under provisioned your DDB. include stats on detected inconsistencies include in S3AFileSystem.toString .

Ai Deng added a comment - 17/Jan/17 00:01

Hello stevel@apache.org actually, I think as all the metrics (in this story) are send by the new implementation of s3Guard, maybe it's better to separate this new metrics code with S3AInstrumentation. The currently S3AInstrumentation has already 800 lines. We can more easily to disable the s3 guard metrics if we separate the two. But I'm not sure how much codes are reusable in S3AInstrumentation for the new metrics.

Also I don't find any tests for S3AInstrumentation, how we test these metrics system in Hadoop?

Sorry for the basic question, i'm really new for work on Hadoop code base.

Ai Deng added a comment - 17/Jan/17 00:01 Hello stevel@apache.org actually, I think as all the metrics (in this story) are send by the new implementation of s3Guard, maybe it's better to separate this new metrics code with S3AInstrumentation. The currently S3AInstrumentation has already 800 lines. We can more easily to disable the s3 guard metrics if we separate the two. But I'm not sure how much codes are reusable in S3AInstrumentation for the new metrics. Also I don't find any tests for S3AInstrumentation, how we test these metrics system in Hadoop? Sorry for the basic question, i'm really new for work on Hadoop code base.

Steve Loughran added a comment - 17/Jan/17 11:50

They're going to have to go into that file because those are the metrics published by the S3A filesystem when deployed, returned by S3AStorageStatistics in a call to S3AFileSystem.getStorageStatistics(), and printed in {{S3AFileSystem.toString(). We could choose whether to add the specific metrics to every S3a FS instance; that's something to consider. Listing the values but returning 0 for all gauges and counters is the most consistent.

Don't worry about the class length: if you look at it in detail, there's two nested classes + support methods explicitly for output/input streams...you don't need to go there. The rest of the code is fairly simple

add new values to org.apache.hadoop.fs.s3a.Statistic; prefix s3guard_
In S3AInstrumentation, add counters to the array COUNTERS_TO_CREATE; gauges to GAUGES_TO_CREATE
Pass in an instance of the instrumentation down to S3Guard
have the code call incrementCounter and increment/decrementGauge as appropriate
I'd like a simple counter of s3guard_enabled and s3guard_authoritative, which will be 0 when there's no s3guard running, 1 when the respective booleans are up. Why? Remote visibility

You make a good point, "where are the tests?". The answer is: the metrics can be used to test the internal state of the S3 classes, therefore become implicitly tested there.

Take a look at ITestS3ADirectoryPerformance for a key example of this: our test cases use the counters of the various HTTP operations as the means to verify that API calls work as expected. (note that s3guard, by reducing these, has complicated the tests)

That is, you verify the counters work by asserting that they change as you make operations to the DFS. see: http://steveloughran.blogspot.co.uk/2016/04/distributed-testing-making-use-of.html for more of my thinking here

Sorry for the basic question, i'm really new for work on Hadoop code base.

happy to explain my reasoning. We've all started off staring at a vast amount of code that we don't understand; there are still big bits of Hadoop that I don't go near.

Steve Loughran added a comment - 17/Jan/17 11:50 They're going to have to go into that file because those are the metrics published by the S3A filesystem when deployed, returned by S3AStorageStatistics in a call to S3AFileSystem.getStorageStatistics(), and printed in {{S3AFileSystem.toString() . We could choose whether to add the specific metrics to every S3a FS instance; that's something to consider. Listing the values but returning 0 for all gauges and counters is the most consistent. Don't worry about the class length: if you look at it in detail, there's two nested classes + support methods explicitly for output/input streams...you don't need to go there. The rest of the code is fairly simple add new values to org.apache.hadoop.fs.s3a.Statistic; prefix s3guard_ In S3AInstrumentation , add counters to the array COUNTERS_TO_CREATE ; gauges to GAUGES_TO_CREATE Pass in an instance of the instrumentation down to S3Guard have the code call incrementCounter and increment/decrementGauge as appropriate I'd like a simple counter of s3guard_enabled and s3guard_authoritative , which will be 0 when there's no s3guard running, 1 when the respective booleans are up. Why? Remote visibility You make a good point, "where are the tests?". The answer is: the metrics can be used to test the internal state of the S3 classes, therefore become implicitly tested there. Take a look at ITestS3ADirectoryPerformance for a key example of this: our test cases use the counters of the various HTTP operations as the means to verify that API calls work as expected. (note that s3guard, by reducing these, has complicated the tests) That is, you verify the counters work by asserting that they change as you make operations to the DFS. see: http://steveloughran.blogspot.co.uk/2016/04/distributed-testing-making-use-of.html for more of my thinking here Sorry for the basic question, i'm really new for work on Hadoop code base. happy to explain my reasoning. We've all started off staring at a vast amount of code that we don't understand; there are still big bits of Hadoop that I don't go near.

Ai Deng added a comment - 19/Jan/17 00:34 - edited

steve_l Thank you very much for the explication, that's very helpful.

I have 2 questions for the moment, for sure there are more to come.

I see 2 pattern to change the counter value in S3AInstrumentation, have a proper method like fileCreated() or pass one Statistic to the generic method incrementCounter(), it is for a reason we keep both? Looks like you suggest to use the second approach.
I can't find any usage of S3AFileSystem.getStorageStatistics() in the project, what is the main propose of this statistics? it's for use outside of Hadoop? I don't need pass an instance of storageStatistics to S3Guard? In S3AFileSystem, we always increment the both.
```
  protected void incrementStatistic(Statistic statistic, long count) {
    instrumentation.incrementCounter(statistic, count);
    storageStatistics.incrementCounter(statistic, count);
  }
```

Ai Deng added a comment - 19/Jan/17 00:34 - edited steve_l Thank you very much for the explication, that's very helpful. I have 2 questions for the moment, for sure there are more to come. I see 2 pattern to change the counter value in S3AInstrumentation, have a proper method like fileCreated() or pass one Statistic to the generic method incrementCounter(), it is for a reason we keep both? Looks like you suggest to use the second approach. I can't find any usage of S3AFileSystem.getStorageStatistics() in the project, what is the main propose of this statistics? it's for use outside of Hadoop? I don't need pass an instance of storageStatistics to S3Guard? In S3AFileSystem, we always increment the both. protected void incrementStatistic(Statistic statistic, long count) { instrumentation.incrementCounter(statistic, count); storageStatistics.incrementCounter(statistic, count); }

Steve Loughran added a comment - 23/Jan/17 12:13

Hi, don't worry about asking questions, we'll do our best to get you contributing code —it benefits all of us if you are adding code to Hadoop.

The split between low level increment named counter and more elegant "event with internal counters?". The event ones are cleaner, as they stop the rest of the code having to know exactly which counters/gauges to use. Consider the elegant ones the best approach, and the direct invocation us being lazy.

The S3aInstrumentation class also has a set of explicit named counters "filesDeleted" as well as lots of ones that are only listed in the arrays GAUGES_TO_CREATE and COUNTERS_TO_CREATE. That's evolution over time; I got bored of having to name and register lots of fields, and realised I could do it from the arrays, at the cost of a hash lookup on every increment.

Outside the S3a class itself, i've tried to have external inner classes to do the counting, with the results merged in at the end (example: the input and output streams), with the inner classes using simple long values, rather than atomics. Why? Eliminates any delays during increments, and lets us override the toString() values for input/output streams with dumps of the values (go on, try it!). We can have many input/output streams per FS instance, so the risk of contention for atomic int/log values is potentially quite high.

I think for s3guard we could add a new inner class passed in to each s3guard instance; it would export the various methods for events that s3guard could raise, such as tableCreated(), tableDeleted() —these can directly increment the atomic counters in the instrumentation, as we'd only have a 1:1 map of S3aFS instance and a s3guard store instance.

Regarding access the statistics, that's hooked up to FileSystem.getStorageStatistics(), which is intended to provide the storage stats for any FS; s3a and HDFS share common statistic names for the common statistics. The latest versions of Tez do collect the statistics of jobs, and so give you the aggregate statistics across your entire query. Until now, only Filesystem.getStatistics() has been used, which returns a fixed set of values (bytes read/written, etc). Spark still only collects those; it'd take some migration to hadoop 2.8+ to pick up the new data. Until then, it's something we can use in tests.

Steve Loughran added a comment - 23/Jan/17 12:13 Hi, don't worry about asking questions, we'll do our best to get you contributing code —it benefits all of us if you are adding code to Hadoop. The split between low level increment named counter and more elegant "event with internal counters?". The event ones are cleaner, as they stop the rest of the code having to know exactly which counters/gauges to use. Consider the elegant ones the best approach, and the direct invocation us being lazy. The S3aInstrumentation class also has a set of explicit named counters "filesDeleted" as well as lots of ones that are only listed in the arrays GAUGES_TO_CREATE and COUNTERS_TO_CREATE . That's evolution over time; I got bored of having to name and register lots of fields, and realised I could do it from the arrays, at the cost of a hash lookup on every increment. Outside the S3a class itself, i've tried to have external inner classes to do the counting, with the results merged in at the end (example: the input and output streams), with the inner classes using simple long values, rather than atomics. Why? Eliminates any delays during increments, and lets us override the toString() values for input/output streams with dumps of the values (go on, try it!). We can have many input/output streams per FS instance, so the risk of contention for atomic int/log values is potentially quite high. I think for s3guard we could add a new inner class passed in to each s3guard instance; it would export the various methods for events that s3guard could raise, such as tableCreated() , tableDeleted() —these can directly increment the atomic counters in the instrumentation, as we'd only have a 1:1 map of S3aFS instance and a s3guard store instance. Regarding access the statistics, that's hooked up to FileSystem.getStorageStatistics() , which is intended to provide the storage stats for any FS; s3a and HDFS share common statistic names for the common statistics. The latest versions of Tez do collect the statistics of jobs, and so give you the aggregate statistics across your entire query. Until now, only Filesystem.getStatistics() has been used, which returns a fixed set of values (bytes read/written, etc). Spark still only collects those; it'd take some migration to hadoop 2.8+ to pick up the new data. Until then, it's something we can use in tests.

Ai Deng added a comment - 26/Jan/17 11:04

Steve, thank you for sharing these knowledge and thought. It's a good idea to having a inner class for S3guard metrics.

I have started a little with all your help, but I will be on holiday for next two weeks (back to China for the new year). I really hope I can resolve this ticket (could work more quick on this after the holiday), but if the timing is not match to the plan of Hadoop13345, please affect this ticket to someone else, so we can finish in time.

I will try to catch up with you in China.

Ai Deng added a comment - 26/Jan/17 11:04 Steve, thank you for sharing these knowledge and thought. It's a good idea to having a inner class for S3guard metrics. I have started a little with all your help, but I will be on holiday for next two weeks (back to China for the new year). I really hope I can resolve this ticket (could work more quick on this after the holiday), but if the timing is not match to the plan of Hadoop13345, please affect this ticket to someone else, so we can finish in time. I will try to catch up with you in China.

Steve Loughran added a comment - 26/Jan/17 11:40

—don't worry about being on holiday for the next few weeks; a fair few people have gone off to enjoy themselves. Take a break from your emails and enjoy yourself!

Steve Loughran added a comment - 26/Jan/17 11:40 —don't worry about being on holiday for the next few weeks; a fair few people have gone off to enjoy themselves. Take a break from your emails and enjoy yourself!

Aaron Fabbri added a comment - 20/Feb/17 19:02

I will comment as I see new places in the code that could use metrics:

~~HADOOP-13904~~ (if it gets committed) in DynamoDBMetadataStore#retryBackoff()

Aaron Fabbri added a comment - 20/Feb/17 19:02 I will comment as I see new places in the code that could use metrics: HADOOP-13904 (if it gets committed) in DynamoDBMetadataStore#retryBackoff()

Ai Deng added a comment - 21/Feb/17 00:21

fabbri Thanks.
stevel@apache.org I just made a simple change. Could you please check that? (wip patch) Just make sure I'm on the right way to doing things. Thanks.

Ai Deng added a comment - 21/Feb/17 00:21 fabbri Thanks. stevel@apache.org I just made a simple change. Could you please check that? (wip patch) Just make sure I'm on the right way to doing things. Thanks.

Steve Loughran added a comment - 22/Feb/17 18:04

thanks, I've hit the submit patch button, but jenkins will fail as the patch isn't going to apply

the way Hadoop builds work is you need to include the branch name if its not trunk, here

HADOOP-13453-HADOOP-13345-001.patch

Try that with the existing code and you should have the machines review the patch (which we rely on to do the basic checks).

For object store tests we also require the submitter to declare which s3 infrastructure they tested against —because Jenkins doesn't run those tests. Here's the details: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md

Patch-wise, base design looks good: you've added a new statistic and passing the instrumentation down to add it.

We just need to think of all the statistics to collect.

Thanks for doing this.

Steve Loughran added a comment - 22/Feb/17 18:04 thanks, I've hit the submit patch button, but jenkins will fail as the patch isn't going to apply the way Hadoop builds work is you need to include the branch name if its not trunk, here HADOOP-13453-HADOOP-13345-001.patch Try that with the existing code and you should have the machines review the patch (which we rely on to do the basic checks). For object store tests we also require the submitter to declare which s3 infrastructure they tested against —because Jenkins doesn't run those tests. Here's the details: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md Patch-wise, base design looks good: you've added a new statistic and passing the instrumentation down to add it. We just need to think of all the statistics to collect. Thanks for doing this.

Hadoop QA added a comment - 22/Feb/17 18:08

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 0s	Docker mode activated.
-1	patch	0m 7s	~~HADOOP-13453~~ does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.

Subsystem	Report/Notes
JIRA Issue	~~HADOOP-13453~~
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12853617/HADOOP-13453.wip-01.patch
Console output	https://builds.apache.org/job/PreCommit-HADOOP-Build/11689/console
Powered by	Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 22/Feb/17 18:08 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 7s HADOOP-13453 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HADOOP-13453 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853617/HADOOP-13453.wip-01.patch Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11689/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

Hadoop QA added a comment - 25/Feb/17 21:48

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 17s	Docker mode activated.
+1	@author	0m 0s	The patch does not contain any @author tags.
-1	test4tests	0m 0s	The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
+1	mvninstall	13m 5s	~~HADOOP-13345~~ passed
+1	compile	0m 24s	~~HADOOP-13345~~ passed
+1	checkstyle	0m 16s	~~HADOOP-13345~~ passed
+1	mvnsite	0m 27s	~~HADOOP-13345~~ passed
+1	mvneclipse	0m 22s	~~HADOOP-13345~~ passed
+1	findbugs	0m 33s	~~HADOOP-13345~~ passed
+1	javadoc	0m 16s	~~HADOOP-13345~~ passed
+1	mvninstall	0m 23s	the patch passed
+1	compile	0m 19s	the patch passed
+1	javac	0m 19s	the patch passed
+1	checkstyle	0m 12s	the patch passed
+1	mvnsite	0m 22s	the patch passed
+1	mvneclipse	0m 11s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	findbugs	0m 36s	the patch passed
+1	javadoc	0m 12s	the patch passed
+1	unit	0m 36s	hadoop-aws in the patch passed.
+1	asflicense	0m 16s	The patch does not generate ASF License warnings.
		20m 9s

Subsystem	Report/Notes
Docker	Image:yetus/hadoop:a9ad5d6
JIRA Issue	~~HADOOP-13453~~
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12854675/HADOOP-13453-HADOOP-13345-001.patch
Optional Tests	asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
uname	Linux 929c7a818845 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/testptch/hadoop/patchprocess/precommit/personality/provided.sh
git revision	~~HADOOP-13345~~ / 95e0143
Default Java	1.8.0_121
findbugs	v3.0.0
Test Results	https://builds.apache.org/job/PreCommit-HADOOP-Build/11717/testReport/
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://builds.apache.org/job/PreCommit-HADOOP-Build/11717/console
Powered by	Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 25/Feb/17 21:48 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 13m 5s HADOOP-13345 passed +1 compile 0m 24s HADOOP-13345 passed +1 checkstyle 0m 16s HADOOP-13345 passed +1 mvnsite 0m 27s HADOOP-13345 passed +1 mvneclipse 0m 22s HADOOP-13345 passed +1 findbugs 0m 33s HADOOP-13345 passed +1 javadoc 0m 16s HADOOP-13345 passed +1 mvninstall 0m 23s the patch passed +1 compile 0m 19s the patch passed +1 javac 0m 19s the patch passed +1 checkstyle 0m 12s the patch passed +1 mvnsite 0m 22s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 36s the patch passed +1 javadoc 0m 12s the patch passed +1 unit 0m 36s hadoop-aws in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 20m 9s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-13453 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12854675/HADOOP-13453-HADOOP-13345-001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 929c7a818845 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / 95e0143 Default Java 1.8.0_121 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11717/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11717/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

Hadoop QA added a comment - 25/Feb/17 21:58

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 17s	Docker mode activated.
+1	@author	0m 0s	The patch does not contain any @author tags.
-1	test4tests	0m 0s	The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
+1	mvninstall	14m 1s	~~HADOOP-13345~~ passed
+1	compile	0m 19s	~~HADOOP-13345~~ passed
+1	checkstyle	0m 15s	~~HADOOP-13345~~ passed
+1	mvnsite	0m 21s	~~HADOOP-13345~~ passed
+1	mvneclipse	0m 22s	~~HADOOP-13345~~ passed
+1	findbugs	0m 28s	~~HADOOP-13345~~ passed
+1	javadoc	0m 14s	~~HADOOP-13345~~ passed
+1	mvninstall	0m 17s	the patch passed
+1	compile	0m 17s	the patch passed
+1	javac	0m 17s	the patch passed
+1	checkstyle	0m 11s	the patch passed
+1	mvnsite	0m 18s	the patch passed
+1	mvneclipse	0m 11s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	findbugs	0m 32s	the patch passed
+1	javadoc	0m 11s	the patch passed
+1	unit	0m 35s	hadoop-aws in the patch passed.
+1	asflicense	0m 17s	The patch does not generate ASF License warnings.
		20m 25s

Subsystem	Report/Notes
Docker	Image:yetus/hadoop:a9ad5d6
JIRA Issue	~~HADOOP-13453~~
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12854676/HADOOP-13453-HADOOP-13345-002.patch
Optional Tests	asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
uname	Linux 72388b8af098 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/testptch/hadoop/patchprocess/precommit/personality/provided.sh
git revision	~~HADOOP-13345~~ / 95e0143
Default Java	1.8.0_121
findbugs	v3.0.0
Test Results	https://builds.apache.org/job/PreCommit-HADOOP-Build/11718/testReport/
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://builds.apache.org/job/PreCommit-HADOOP-Build/11718/console
Powered by	Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 25/Feb/17 21:58 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 14m 1s HADOOP-13345 passed +1 compile 0m 19s HADOOP-13345 passed +1 checkstyle 0m 15s HADOOP-13345 passed +1 mvnsite 0m 21s HADOOP-13345 passed +1 mvneclipse 0m 22s HADOOP-13345 passed +1 findbugs 0m 28s HADOOP-13345 passed +1 javadoc 0m 14s HADOOP-13345 passed +1 mvninstall 0m 17s the patch passed +1 compile 0m 17s the patch passed +1 javac 0m 17s the patch passed +1 checkstyle 0m 11s the patch passed +1 mvnsite 0m 18s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 32s the patch passed +1 javadoc 0m 11s the patch passed +1 unit 0m 35s hadoop-aws in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 20m 25s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-13453 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12854676/HADOOP-13453-HADOOP-13345-002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 72388b8af098 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / 95e0143 Default Java 1.8.0_121 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11718/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11718/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

Ai Deng added a comment - 25/Feb/17 22:43

Hi stevel@apache.org Thank you for the information. I will keep adding more metrics. (still on the S3Guard class level)

Regards of the metrics mentioned in the document (I have copied it to the Jira ticket), as we consider the metadata in the store is "fresher", and use it first, we don't do any recheck(for the inconsistent between S3 and metadata store) right? So the metrics "S3GuardRechecksNthPercentileLatency", "S3GuardRechecksNumOps", "S3GuardInconsistencies" will not need any more.

For the Jenkins build, I can't find the "Submit" button on the Jira, it because my user permission or I miss something?

Ai Deng added a comment - 25/Feb/17 22:43 Hi stevel@apache.org Thank you for the information. I will keep adding more metrics. (still on the S3Guard class level) Regards of the metrics mentioned in the document (I have copied it to the Jira ticket), as we consider the metadata in the store is "fresher", and use it first, we don't do any recheck(for the inconsistent between S3 and metadata store) right? So the metrics "S3GuardRechecksNthPercentileLatency", "S3GuardRechecksNumOps", "S3GuardInconsistencies" will not need any more. For the Jenkins build, I can't find the "Submit" button on the Jira, it because my user permission or I miss something?

Ai Deng added a comment - 25/Feb/17 22:50

Looks like the Jenkins run automatically for the patch. Will modify the existing S3Guard tests scenarios to test the metrics added.

Ai Deng added a comment - 25/Feb/17 22:50 Looks like the Jenkins run automatically for the patch. Will modify the existing S3Guard tests scenarios to test the metrics added.

Steve Loughran added a comment - 10/Mar/17 15:13

I'm afraid ~~HADOOP-13914~~ has just broken the patch, which means, sadly, you get to do the merge. Let's get this in before anything else traumatic comes in, so other patches get to suffer next time.

I like what you've done measuring latency as well as counts. I think we could actually do this more broadly. I think the timing counting should be in a finally() clause though, so timings for failures get included too. (side issue: count success and failures separately? with different timings?)

I would like to think about how we could avoiding having to pass the instrumentation around all the time. Ideally, we could just pass it in as a constructor to the metadata store. Alternatively, that store could collect metrics and we could wire it up, but I don't see an easy way to do that in Hadoop metrics (compared to Coda Hale's). The easiest would be just to pass in the S3AInstrumentation (or an inner class) down, but currently the metastore interface is not specific to S3A only.

If we add an interface for metadata store instrumentation, then S3AInstrumentation can implement it in an inner class, and S3AFS can pass it down during initialization. Th's would let the metastore do all it wants, with well defined strings, of course.

What do people think?

Steve Loughran added a comment - 10/Mar/17 15:13 I'm afraid HADOOP-13914 has just broken the patch, which means, sadly, you get to do the merge. Let's get this in before anything else traumatic comes in, so other patches get to suffer next time. I like what you've done measuring latency as well as counts. I think we could actually do this more broadly. I think the timing counting should be in a finally() clause though, so timings for failures get included too. (side issue: count success and failures separately? with different timings?) I would like to think about how we could avoiding having to pass the instrumentation around all the time. Ideally, we could just pass it in as a constructor to the metadata store. Alternatively, that store could collect metrics and we could wire it up, but I don't see an easy way to do that in Hadoop metrics (compared to Coda Hale's). The easiest would be just to pass in the S3AInstrumentation (or an inner class) down, but currently the metastore interface is not specific to S3A only. If we add an interface for metadata store instrumentation, then S3AInstrumentation can implement it in an inner class, and S3AFS can pass it down during initialization. Th's would let the metastore do all it wants, with well defined strings, of course. What do people think?

Ai Deng added a comment - 19/Mar/17 08:10

Hi stevel@apache.org, I have added a new patch following your suggestion. If it is ok, we can discuss the metrics we want to add?

I come out this list of operation and latency metrics for this ticket, can you check if I miss anything? Thank you.

Status:
S3GUARD_METADATASTORE_ENABLED
S3GUARD_METADATASTORE_IS_AUTHORITATIVE
Operations:
S3GUARD_METADATASTORE_INITIALIZATION
S3GUARD_METADATASTORE_DELETE_PATH
S3GUARD_METADATASTORE_DELETE_PATH_LATENCY
S3GUARD_METADATASTORE_DELETE_SUBTREE_PATCH
S3GUARD_METADATASTORE_GET_PATH
S3GUARD_METADATASTORE_GET_PATH_LATENCY
S3GUARD_METADATASTORE_GET_CHILDREN_PATH
S3GUARD_METADATASTORE_GET_CHILDREN_PATH_LATENCY
S3GUARD_METADATASTORE_MOVE_PATH
S3GUARD_METADATASTORE_PUT_PATH
S3GUARD_METADATASTORE_PUT_PATH_LATENCY
S3GUARD_METADATASTORE_CLOSE
S3GUARD_METADATASTORE_DESTORY
From S3Guard:
S3GUARD_METADATASTORE_MERGE_DIRECTORY
For the failures:
S3GUARD_METADATASTORE_DELETE_FAILURE
S3GUARD_METADATASTORE_GET_FAILURE
S3GUARD_METADATASTORE_PUT_FAILURE
Etc:
S3GUARD_METADATASTORE_PUT_RETRY_TIMES

Ai Deng added a comment - 19/Mar/17 08:10 Hi stevel@apache.org , I have added a new patch following your suggestion. If it is ok, we can discuss the metrics we want to add? I come out this list of operation and latency metrics for this ticket, can you check if I miss anything? Thank you. Status: S3GUARD_METADATASTORE_ENABLED S3GUARD_METADATASTORE_IS_AUTHORITATIVE Operations: S3GUARD_METADATASTORE_INITIALIZATION S3GUARD_METADATASTORE_DELETE_PATH S3GUARD_METADATASTORE_DELETE_PATH_LATENCY S3GUARD_METADATASTORE_DELETE_SUBTREE_PATCH S3GUARD_METADATASTORE_GET_PATH S3GUARD_METADATASTORE_GET_PATH_LATENCY S3GUARD_METADATASTORE_GET_CHILDREN_PATH S3GUARD_METADATASTORE_GET_CHILDREN_PATH_LATENCY S3GUARD_METADATASTORE_MOVE_PATH S3GUARD_METADATASTORE_PUT_PATH S3GUARD_METADATASTORE_PUT_PATH_LATENCY S3GUARD_METADATASTORE_CLOSE S3GUARD_METADATASTORE_DESTORY From S3Guard: S3GUARD_METADATASTORE_MERGE_DIRECTORY For the failures: S3GUARD_METADATASTORE_DELETE_FAILURE S3GUARD_METADATASTORE_GET_FAILURE S3GUARD_METADATASTORE_PUT_FAILURE Etc: S3GUARD_METADATASTORE_PUT_RETRY_TIMES

Ai Deng added a comment - 19/Mar/17 08:12

I think maybe measure the number of path has been operated (put, get … ) in MetaStore could be interesting. The end user can see how big their S3 file system has been managed in S3Guard.

Ai Deng added a comment - 19/Mar/17 08:12 I think maybe measure the number of path has been operated (put, get … ) in MetaStore could be interesting. The end user can see how big their S3 file system has been managed in S3Guard.

Hadoop QA added a comment - 19/Mar/17 08:27

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 21s	Docker mode activated.
+1	@author	0m 0s	The patch does not contain any @author tags.
-1	test4tests	0m 0s	The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
+1	mvninstall	12m 34s	~~HADOOP-13345~~ passed
+1	compile	0m 20s	~~HADOOP-13345~~ passed
+1	checkstyle	0m 15s	~~HADOOP-13345~~ passed
+1	mvnsite	0m 22s	~~HADOOP-13345~~ passed
+1	mvneclipse	0m 14s	~~HADOOP-13345~~ passed
+1	findbugs	0m 27s	~~HADOOP-13345~~ passed
+1	javadoc	0m 15s	~~HADOOP-13345~~ passed
+1	mvninstall	0m 18s	the patch passed
+1	compile	0m 18s	the patch passed
+1	javac	0m 18s	the patch passed
-0	checkstyle	0m 12s	hadoop-tools/hadoop-aws: The patch generated 19 new + 25 unchanged - 0 fixed = 44 total (was 25)
+1	mvnsite	0m 20s	the patch passed
+1	mvneclipse	0m 11s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	findbugs	0m 32s	the patch passed
+1	javadoc	0m 11s	the patch passed
+1	unit	0m 36s	hadoop-aws in the patch passed.
+1	asflicense	0m 16s	The patch does not generate ASF License warnings.
		18m 59s

Subsystem	Report/Notes
Docker	Image:yetus/hadoop:a9ad5d6
JIRA Issue	~~HADOOP-13453~~
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12859449/HADOOP-13453-HADOOP-13345-003.patch
Optional Tests	asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
uname	Linux bd705d5c15d7 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/testptch/hadoop/patchprocess/precommit/personality/provided.sh
git revision	~~HADOOP-13345~~ / b54e1b2
Default Java	1.8.0_121
findbugs	v3.0.0
checkstyle	https://builds.apache.org/job/PreCommit-HADOOP-Build/11851/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/PreCommit-HADOOP-Build/11851/testReport/
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://builds.apache.org/job/PreCommit-HADOOP-Build/11851/console
Powered by	Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 19/Mar/17 08:27 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 12m 34s HADOOP-13345 passed +1 compile 0m 20s HADOOP-13345 passed +1 checkstyle 0m 15s HADOOP-13345 passed +1 mvnsite 0m 22s HADOOP-13345 passed +1 mvneclipse 0m 14s HADOOP-13345 passed +1 findbugs 0m 27s HADOOP-13345 passed +1 javadoc 0m 15s HADOOP-13345 passed +1 mvninstall 0m 18s the patch passed +1 compile 0m 18s the patch passed +1 javac 0m 18s the patch passed -0 checkstyle 0m 12s hadoop-tools/hadoop-aws: The patch generated 19 new + 25 unchanged - 0 fixed = 44 total (was 25) +1 mvnsite 0m 20s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 32s the patch passed +1 javadoc 0m 11s the patch passed +1 unit 0m 36s hadoop-aws in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 18m 59s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-13453 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12859449/HADOOP-13453-HADOOP-13345-003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux bd705d5c15d7 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / b54e1b2 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11851/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11851/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11851/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

Steve Loughran added a comment - 12/Apr/17 21:06

This is on my review todo list, don't think I've forgotten!

Steve Loughran added a comment - 12/Apr/17 21:06 This is on my review todo list, don't think I've forgotten!

Steve Loughran added a comment - 13/Apr/17 15:53

I see your point about quantile config: don't know what to do there.

be careful with those imports, S3AInstrumentation had all its imports expanded and moved around. Import sections are one of the main merge-conflict areas, so its critical to keep changes there to a minimum. We normally turn off any IDE automatic features to avoid this.
moved off the split of separate interface and impl for the instrumentation; everything is closely couple enough we don't need to abstract things away.

I'm doing a patch with my changes; if everything is happy and I can run a full integration test suite (~~HADOOP-14216~~ has broken this), then I'll +1 it; once it is in we can expand the metrics

Steve Loughran added a comment - 13/Apr/17 15:53 I see your point about quantile config: don't know what to do there. be careful with those imports, S3AInstrumentation had all its imports expanded and moved around. Import sections are one of the main merge-conflict areas, so its critical to keep changes there to a minimum. We normally turn off any IDE automatic features to avoid this. moved off the split of separate interface and impl for the instrumentation; everything is closely couple enough we don't need to abstract things away. I'm doing a patch with my changes; if everything is happy and I can run a full integration test suite ( HADOOP-14216 has broken this), then I'll +1 it; once it is in we can expand the metrics

Steve Loughran added a comment - 13/Apr/17 15:55

Patch 003 with some tuning.

Something is wrong with my IDE (IntelliJ IDEA 2017.1) and it is reorganising imports without warning. I think I've fixed it here.

Tested: none. Someone has gone and broken XInclude

Steve Loughran added a comment - 13/Apr/17 15:55 Patch 003 with some tuning. Something is wrong with my IDE (IntelliJ IDEA 2017.1) and it is reorganising imports without warning. I think I've fixed it here. Tested: none. Someone has gone and broken XInclude

Steve Loughran added a comment - 13/Apr/17 17:25

Patch 005; accidentally lost an import while cleaning up IDE import games.

Tested: s3a ireland with the opts -Dparallel-tests -DtestsThreadCount=8 -Ddynamo. All well. Best test run for ages. Hopefully that means that DDB is fixing that intermittent root contract test failure.

Steve Loughran added a comment - 13/Apr/17 17:25 Patch 005; accidentally lost an import while cleaning up IDE import games. Tested: s3a ireland with the opts -Dparallel-tests -DtestsThreadCount=8 -Ddynamo . All well. Best test run for ages. Hopefully that means that DDB is fixing that intermittent root contract test failure.

Hadoop QA added a comment - 13/Apr/17 17:54

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 18s	Docker mode activated.
+1	@author	0m 0s	The patch does not contain any @author tags.
-1	test4tests	0m 0s	The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
+1	mvninstall	16m 0s	~~HADOOP-13345~~ passed
+1	compile	0m 24s	~~HADOOP-13345~~ passed
+1	checkstyle	0m 19s	~~HADOOP-13345~~ passed
+1	mvnsite	0m 27s	~~HADOOP-13345~~ passed
+1	mvneclipse	0m 30s	~~HADOOP-13345~~ passed
+1	findbugs	0m 36s	~~HADOOP-13345~~ passed
+1	javadoc	0m 16s	~~HADOOP-13345~~ passed
+1	mvninstall	0m 24s	the patch passed
+1	compile	0m 22s	the patch passed
+1	javac	0m 22s	the patch passed
-0	checkstyle	0m 13s	hadoop-tools/hadoop-aws: The patch generated 1 new + 38 unchanged - 0 fixed = 39 total (was 38)
+1	mvnsite	0m 23s	the patch passed
+1	mvneclipse	0m 14s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	findbugs	0m 40s	the patch passed
+1	javadoc	0m 13s	the patch passed
+1	unit	0m 40s	hadoop-aws in the patch passed.
+1	asflicense	0m 30s	The patch does not generate ASF License warnings.
		23m 57s

Subsystem	Report/Notes
Docker	Image:yetus/hadoop:612578f
JIRA Issue	~~HADOOP-13453~~
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12863312/HADOOP-13453-HADOOP-13345-005.patch
Optional Tests	asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
uname	Linux 40fb9ba1fa78 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/testptch/hadoop/patchprocess/precommit/personality/provided.sh
git revision	~~HADOOP-13345~~ / af8250a
Default Java	1.8.0_121
findbugs	v3.0.0
checkstyle	https://builds.apache.org/job/PreCommit-HADOOP-Build/12097/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/PreCommit-HADOOP-Build/12097/testReport/
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://builds.apache.org/job/PreCommit-HADOOP-Build/12097/console
Powered by	Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

This message was automatically generated.

Hadoop QA added a comment - 13/Apr/17 17:54 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 16m 0s HADOOP-13345 passed +1 compile 0m 24s HADOOP-13345 passed +1 checkstyle 0m 19s HADOOP-13345 passed +1 mvnsite 0m 27s HADOOP-13345 passed +1 mvneclipse 0m 30s HADOOP-13345 passed +1 findbugs 0m 36s HADOOP-13345 passed +1 javadoc 0m 16s HADOOP-13345 passed +1 mvninstall 0m 24s the patch passed +1 compile 0m 22s the patch passed +1 javac 0m 22s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-aws: The patch generated 1 new + 38 unchanged - 0 fixed = 39 total (was 38) +1 mvnsite 0m 23s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 40s the patch passed +1 javadoc 0m 13s the patch passed +1 unit 0m 40s hadoop-aws in the patch passed. +1 asflicense 0m 30s The patch does not generate ASF License warnings. 23m 57s Subsystem Report/Notes Docker Image:yetus/hadoop:612578f JIRA Issue HADOOP-13453 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12863312/HADOOP-13453-HADOOP-13345-005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 40fb9ba1fa78 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision HADOOP-13345 / af8250a Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12097/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12097/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12097/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

Ai Deng added a comment - 14/Apr/17 22:04

Ok, will test the path05 when I back from holiday. Happy easter everyone!

Ai Deng added a comment - 14/Apr/17 22:04 Ok, will test the path05 when I back from holiday. Happy easter everyone!

Ai Deng added a comment - 03/May/17 12:17

Hi steve_l happy with patch5, we should push it to the branch and start to add more metrics?

Ai Deng added a comment - 03/May/17 12:17 Hi steve_l happy with patch5, we should push it to the branch and start to add more metrics?

Steve Loughran added a comment - 03/May/17 18:40

+1
yeah. I'll do it now, one last test run to see all is well

Steve Loughran added a comment - 03/May/17 18:40 +1 yeah. I'll do it now, one last test run to see all is well

Steve Loughran added a comment - 03/May/17 19:53

OK, it's in: that's for this

now we have to think about what extra things to measure....

Steve Loughran added a comment - 03/May/17 19:53 OK, it's in: that's for this now we have to think about what extra things to measure....

Ai Deng added a comment - 03/May/17 22:14

Cool, I listed my suggestion for the metrics in previous comment, what is your thoughts? Let's decide the list first.

Ai Deng added a comment - 03/May/17 22:14 Cool, I listed my suggestion for the metrics in previous comment, what is your thoughts? Let's decide the list first.

Steve Loughran added a comment - 04/May/17 10:39

Why not take that list, create a new JIRA off ~~HADOOP-13345~~ "add more s3guard metrics" and suggest those as the start.

One interesting one to see if we could detect would be mismatches between s3guard and the underlying object store: if we can observe inconsistencies (how?) then that should be measured. The S3mper blog posts looks at how netflix detected consistency issues in S3 that way

Steve Loughran added a comment - 04/May/17 10:39 Why not take that list, create a new JIRA off HADOOP-13345 "add more s3guard metrics" and suggest those as the start. One interesting one to see if we could detect would be mismatches between s3guard and the underlying object store: if we can observe inconsistencies (how?) then that should be measured. The S3mper blog posts looks at how netflix detected consistency issues in S3 that way

People

Assignee:: Ai Deng

Reporter:: Chris Nauroth

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 02/Aug/16 00:09

Updated:: 04/May/17 10:39

Resolved:: 03/May/17 19:53

Hadoop Common

Details

Description

Attachments

Attachments

Activity

People

Dates