[ORC-508] Add a reader/writer that does not depend on Hadoop FileSystem - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Java
Labels:
None

Description

It seems that the default implementation classes of Orc today depend on Hadoop FS objects to write. This is not ideal for APIs that do not rely on Hadoop. For some context I was taking a look at adding support for Apache Beam, but Beam's API supports multiple filesystems with a more generic abstraction that relies on Java's Channels and Streams APIs and delegate directly to Distributed FS e.g. Google Cloud Storage, Amazon S3, etc. It would be really nice to have such support in the core implementation and to maybe split the Hadoop dependencies implementation into its own module in the future.

Attachments

Issue Links

blocks

BEAM-1861 ORC support

Open

links to

GitHub Pull Request #641

Activity

People

Assignee:: Unassigned

Reporter:: Ismaël Mejía

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 27/May/19 11:48

Updated:: 10/May/22 00:05