[SPARK-22666] Spark datasource for image format - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.4.0
Component/s: ML
Labels:
None

Target Version/s:

2.4.0

Description

The current API for the new image format is implemented as a standalone feature, in order to make it reside within the mllib package. As discussed in ~~SPARK-21866~~, users should be able to load images through the more common spark source reader interface.

This ticket is concerned with adding image reading support in the spark source API, through either of the following interfaces:

spark.read.format("image")...
spark.read.image....
The output is a dataframe that contains images (and the file names for example), following the semantics discussed already in ~~SPARK-21866~~.

A few technical notes:

since the functionality is implemented in mllib, calling this function may fail at runtime if users have not imported the spark-mllib dependency
How to deal with very flat directories? It is common to have millions of files in a single "directory" (like in S3), which seems to have caused some issues to some users. If this issue is too complex to handle in this ticket, it can be dealt with separately.

Attachments

Issue Links

blocks

SPARK-25345 Deprecate readImages APIs from ImageSchema

Resolved

SPARK-25524 Spark datasource for image/libsvm user guide

Resolved

is duplicated by

SPARK-25157 Streaming of image files from directory

Resolved

is related to

SPARK-25349 Support sample pushdown in Data Source V2

Open

SPARK-25348 Data source for binary files

Resolved

relates to

SPARK-21866 SPIP: Image support in Spark

Resolved

links to

[Github] Pull Request #22328 (WeichenXu123)

GitHub Pull Request #24362

(1 relates to, 2 links to)

Activity

People

Assignee:: Weichen Xu

Reporter:: Timothy Hunter

Shepherd:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 30/Nov/17 22:07

Updated:: 11/Sep/19 01:11

Resolved:: 05/Sep/18 18:59