Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.1.0
-
None
-
None
-
None
Description
Working with tables that resides on Amazon S3 (or any other object store) have several performance impact when reading or writing data, and also consistency issues.
This JIRA is an umbrella task to monitor all the performance improvements that can be done in Hive to work better with S3 data.
Attachments
Issue Links
- depends upon
-
HADOOP-11694 Über-jira: S3a phase II: robustness, scale and performance
- Resolved
-
HIVE-14323 Reduce number of FS permissions and redundant FS operations
- Closed
- is depended upon by
-
HADOOP-13525 Optimize uses of FS operations in the ASF analysis frameworks and libraries
- Resolved
- is related to
-
HIVE-16277 Exchange Partition between filesystems throws "IllegalArgumentException Wrong FS"
- Open
-
HIVE-1620 Patch to write directly to S3 from Hive
- Open
- relates to
-
HADOOP-13204 Über-jira: S3a phase III: scale and tuning
- Resolved
-
HIVE-14920 S3: Optimize SimpleFetchOptimizer::checkThreshold()
- Closed
1.
|
Skip 'distcp' call when copying data from HDSF to S3 | Patch Available | Sergio Peña | |
2.
|
FileSinkOperator should not rename files to final paths when S3 is the default destination | Reopened | Sergio Peña | |
3.
|
Last MR job in Hive should be able to write to a different scratch directory | Reopened | Sahil Takiar | |
4.
|
Investigate if staging data on S3 can always go under the scratch dir | Open | Unassigned | |
5.
|
Add support for using Hadoop's S3A OutputCommitter | Patch Available | Unassigned | |
6.
|
Dynamic Partitioning Integration with Hadoop's S3A OutputCommitter | Open | Unassigned | |
7.
|
Ability to selectively run tests in TestBlobstoreCliDriver | Open | Sahil Takiar |