Description
Generator writes the temporary output of the Selector job/step twice (see line 516). Not a big issue when generating small fetch lists but may be when working on large data. The temporary output looks like:
% tree -h generate-temp-fc27fe85-9ddc-4926-b6ba-dcd0066d5007/ enerate-temp-fc27fe85-9ddc-4926-b6ba-dcd0066d5007/ |-- [4.0K] fetchlist-1 | `-- [ 25M] part-r-00000 `-- [ 77M] part-r-00000 1 directory, 2 files % file generate-temp-fc27fe85-9ddc-4926-b6ba-dcd0066d5007/part-r-00000 generate-temp-fc27fe85-9ddc-4926-b6ba-dcd0066d5007/part-r-00000: ASCII text % file generate-temp-fc27fe85-9ddc-4926-b6ba-dcd0066d5007/fetchlist-1/part-r-00000 generate-temp-fc27fe85-9ddc-4926-b6ba-dcd0066d5007/fetchlist-1/part-r-00000: Apache Hadoop Sequence file version 6
The unneeded output is plain-text which explains its larger size compared to the Hadoop Sequence file.
Attachments
Issue Links
- links to