[HIVE-14828] Cloud/S3: Stats publishing should be on HDFS instead of S3 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.3.0
Component/s: Statistics
Labels:
None

Description

Currently, stats files are created in S3. Later as a part of FSStatsAggregator, it reads this file and populates MS again.

2016-09-23 05:57:46,772 INFO  [main]: fs.FSStatsPublisher (FSStatsPublisher.java:init(49)) - created : s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001
2016-09-23 05:57:46,773 DEBUG [main]: fs.FSStatsAggregator (FSStatsAggregator.java:connect(53)) - About to read stats from : s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001

Instead of this, stats can be written directly on to HDFS and read locally instead of S3, which would help in reducing couple of calls to S3.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-14828.1.patch
20/Oct/16 00:18
1 kB
Rajesh Balamohan
HIVE-14828.branch-1.2.001.patch
23/Sep/16 09:19
2 kB
Rajesh Balamohan
HIVE-14828.branch-2.0.001.patch
19/Oct/16 00:58
1 kB
Rajesh Balamohan

Issue Links

is related to

HIVE-13925 ETL optimizations

Open

Activity

People

Assignee:: Rajesh Balamohan

Reporter:: Rajesh Balamohan

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 23/Sep/16 07:12

Updated:: 20/Oct/16 00:18

Resolved:: 11/Oct/16 21:38