Description
Tez does not explicitly set the permissions of intermediate output files for shuffle. In a secure cluster the shuffle service is running as a different user than the task, so the output files require group readability in order to serve up the data during the shuffle phase. If the umask is too restrictive (e.g.: 077) then the task's file.out and file.out.index permissions can be too restrictive to allow the shuffle handler to access them.
Attachments
Attachments
Issue Links
- is duplicated by
-
TEZ-4071 shuffle throws exceptions with an external table with multiple hdfs files
- Closed
- relates to
-
MAPREDUCE-7033 Map outputs implicitly rely on permissive umask for shuffle
- Resolved
-
HIVE-23518 Tez may skip file permission update on intermediate output
- Resolved
-
TEZ-4185 Tez may skip file permission update on intermediate output
- Resolved