Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
OrcStorage needs to detect the schema of input data paths. If some data paths have no ORC files (perhaps only a _SUCCESS marker is present), this will fail.
For example:
A = LOAD '/path/to/20230101,/path/to/20230102' USING OrcStorage();
If /path/to/20230101 contains only a _SUCCESS marker and 20230102 contains data, OrcStorage fails to detect the schema and Pig exits with a confusing/unhelpful error, something like "Cannot find any ORC files from <locations>. Probably multiple load/store statements in script."
The code tries to use a search algorithm to recursively search through all input paths for the data (via Utils.depthFirstSearchForFile), but it is implemented incorrectly and returns early in this scenario.
I'll attach a patch.