Description
from pyspark.sql.functions import udf, input_file_name spark.range(10).write.mode("overwrite").parquet("/tmp/foo") spark.read.parquet("/tmp/foo").select(udf(lambda x: x, "long")("id"), input_file_name()).show()
+------------+-----------------+ |<lambda>(id)|input_file_name()| +------------+-----------------+ | 8| | | 5| | | 0| | | 9| | | 6| | | 2| | | 3| | | 4| | | 7| | | 1| | +------------+-----------------+
Attachments
Issue Links
- relates to
-
SPARK-27966 input_file_name empty when listing files in parallel
- Resolved
- links to