Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.1
-
None
Description
Currently JSON and CSV parsers are called even if required schema is empty. Invoking the parser per each line has some non-zero overhead. The action can be skipped. Such optimization should speed up count(), for example.
Attachments
Issue Links
- causes
-
SPARK-26745 Non-parsing Dataset.count() optimization causes inconsistent results for JSON inputs with empty lines
- Resolved
- links to
(1 links to)