INSERT OVERWRITE INTO which removes all table data and inserts new data is already implemented in the current Tajo. So, Grammar and many parts are implemented. However, INSERT INTO statement which preserves existing data and adds new data is not implemented. This feature is necessary. It would be very nice if someone take this issue.
As you asked, I'm going to give more description.
Many parts are already implemented in the current Tajo. The key of this issue is to determine the file name pattern used for newly written data files and enable each task to output the determined file names. Currently, each worker writes the files as part-<execution block id>-<queryunit id>, where query unit is corresponding to Task in MR.
If possible, It would be nice if newly written file names follow the last written file name. But, this manner may require not small changes.
We can get the last file name in GlobalEngine in TajoMaster, and we can convey the filename prefix and the last number via QueryContext object which are propagated throughout all paths of a query. As I mentioned above, each query unit generates the output filename according to the query unit id (i.e., task id). In order to follow the last number of the final written file, we need to modify the file name only if the filename prefix and last number is given.
My description is just my idea. You can feel free to suggest your idea.