Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Most of the time users specify default_parallel or PARALLEL in multiples of 10. This causes data skew and is not that effective. For eg: Had a user specify 1000 and all records went into 1 reducer, before it was changed to 999. But in some cases where user wants exact number of output files a non-prime number is desired. We should log a warning message if we see a non-prime number, so that it at least makes the user re-look and change config if it was not intentional and was done for sake of ease.