[SPARK-28153] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.3, 2.4.3, 3.0.0
Fix Version/s: 2.4.4, 3.0.0
Component/s: PySpark
Labels:
None

Description

from pyspark.sql.functions import udf, input_file_name
spark.range(10).write.mode("overwrite").parquet("/tmp/foo")
spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), input_file_name()).show()

+------------+-----------------+
|<lambda>(id)|input_file_name()|
+------------+-----------------+
|           8|                 |
|           5|                 |
|           0|                 |
|           9|                 |
|           6|                 |
|           2|                 |
|           3|                 |
|           4|                 |
|           7|                 |
|           1|                 |
+------------+-----------------+

Attachments

Issue Links

relates to

SPARK-27966 input_file_name empty when listing files in parallel

Resolved

links to

GitHub Pull Request #24958

GitHub Pull Request #25321

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 25/Jun/19 01:01

Updated:: 12/Dec/22 18:10

Resolved:: 01/Aug/19 05:20