Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
With HBase there's no difference between inserting and updating a row. Updating is done by inserting on a row with the same rowKey. HBase will then overwrite the existing cells if the column family and qualifier match.
With HBase you don't execute the insertion of a cell if the value is NULL.
What do we do if a cell has a value (e.g. address:housnr = 1) and it get's updated by a input column that has NULL as value? In the current implementation, we don't execute the insertion. Leaving the cell with a old value (e.g. address:housnr = 1) and a older timestamp then the other cells if they do get updated. This will probably result in unexpected behaviour for someone reading the table after the insertion. You set something to NULL, then you don't expect a value to still exist after that.
I see 4 possible options:
1. We could delete the cell from HBase to match how inserting a row works. However then the qualifier also doesn't exist anymore and then you probably can't go back to a older version of the value if the columnFamily uses versions.
2. We could also set the value to an empty byte array.
3. We could also change the Read functionality, to only show rows with the latest timestamp.
4. We keep it this way.
What's the best solution here?