Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
HIVE-318. Fix union all queries. (Namit Jain via zshao)
Description
1. Map-only job : same input
Hangs because mapper tries to same open twice, and hadoop filesystem complains.
Fix: Only initialize once - keep state at the Operator level for the same. Should do same for Close.
2. Map-only job : different inputs
Loss of data due to rename.
Fix: change rename to move files to the directory.
3. Map-only job in subquery + RedSink: works currently
4. 2 variables: so 4 sub-cases
Number of sub-queries having map-reduce jobs. (1/2)
Operator after Union (RS/FS)
a. Number of sub-queries having map-reduce jobs. 1
Operator after Union: RS
Can be done in 2MR - really difficult with current infrastructure.
Should do with 3 MR jobs now - break on top of UNION.
Future optimization: move operators between Union and RS before Union.
b. Number of sub-queries having map-reduce jobs. 2
Operator after Union: RS
Needs 3MR - Should do with 3 MR jobs - break on top of UNION.
Future optimization: move operators between Union and RS before Union.
c. Number of sub-queries having map-reduce jobs. 1
Operator after Union: FS
Can be done in 1MR - really difficult with current infrastructure.
Can be easily done with 2 MR by removing UNION and cloning operators between Union and FS.
Should do with 3 MR jobs now - break on top of UNION.
Followup optimization: 2MR should be able to handle
d. Number of sub-queries having map-reduce jobs. 2
Operator after Union: FS
Can be easily done with 2 MR by removing UNION and cloning operators between Union and FS.
Should do with 3 MR jobs now - break on top of UNION.
Followup optimization: 2MR should be able to handle
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-205 file sink operator needs unique file names
- Resolved