[IMPALA-4353] random query generation for INSERTs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: Impala 2.8.0
Fix Version/s: Impala 2.9.0
Component/s: Infrastructure
Labels:
None

Target Version:

Kudu_Impala

Description

Generate random INSERT queries for Impala/Kudu tables. The syntax is roughly:

[with_statement]
INSERT IGNORE INTO <KUDU_TBL> SELECT <Statement>
INSERT IGNORE INTO <KUDU_TBL> <column list> VALUES <values list>

The WITH statement is optional
IGNORE will be required. This means ignore primary key duplications.
We can have IGNORE SELECT or IGNORE VALUES statements.

The IGNORE requires comparison with Postgres 9.5 or higher (see ~~IMPALA-4340~~).

The scope of this Jira is to take advantage of dependent work (~~IMPALA-4340~~, ~~IMPALA-4338~~, ~~IMPALA-4343~~, ~~IMPALA-4351~~, ~~IMPALA-4352~~) and add methods to the QueryGenerator to generate Pythonic representations of queries.

The primary key considerations are important:

Primary keys can't be NULL
Primary keys must be unique
The IGNORE keyword means that duplicate-PK rows inserted will race to win. The determinism will be difficult to manage.
The IGNORE keyword means that if a row with that PK already exists, any new rows attempted to be inserted with the same PK will also be ignored.

This means the query generator needs to be smarter than before about the queries it generates. For example, it shouldn't generate a query in which the expression for the inserted rows' PK column evaluates to a constant: at most 1 of the rows would actually get inserted. One option (for example, in the case of a numerical PK) would be to employ a special expression that applies an offset from the MAX() value in the column.

Attachments

Activity

People

Assignee:: Michael Brown

Reporter:: Michael Brown

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 24/Oct/16 18:47

Updated:: 13/Jan/17 16:43

Resolved:: 13/Jan/17 16:43