ts++, or ts--, could be an option?
ts++ or ts-- will not solve this problem. Reason being each mapper spawns a new JVM and ts will be reset to initial value. so, still there is a chance of ts collision.
that the timestamps are all identical. The whole point is that, in a bulk-load-only workflow, you can identify each bulk load exactly, and correlate it to the MR job that inserted it.
No Todd. At least the implementation is buggy enough and not matching with this expected behavior.
New timestamp is generated for each map task (i.e., for each split) in TsvImporterMapper.doSetup.
Please check my previous comments.
So this is only about ImportTsv? Should change the title in that case.
I'm not aware what other tools comes under bulkload. Bulkload documentation talks only about importtsv.
But if you feel we should change the title, feel free to modify the title.
If you want to use custom timestamps, you should specify a timestamp column in your data, or write your own MR job (ImportTsv is just an example which use useful for some cases, but for anything advanced I would expect users to write their own code)
I think we can provide the provision to specify the timestamp column (Like ROWKEY column) as arguments.
Example : importtsv.columns='HBASE_ROW_KEY, HBASE_TS_KEY, emp:name,emp:sal,dept:code'
This makes importtsv more usable. Otherwise, user has to copy paste entire importtsv code and do this minor modification.
Please let me know your suggestions on this.