Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
1.12.2
-
None
-
None
-
[nix-shell:~/streaming-consistency/flink]$ java -version openjdk version "1.8.0_265" OpenJDK Runtime Environment (build 1.8.0_265-ga) OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode) [nix-shell:~/streaming-consistency/flink]$ flink --version Version: 1.12.2, Commit ID: 4dedee0 [nix-shell:~/streaming-consistency/flink]$ nix-info system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos
[nix-shell:~/streaming-consistency/flink]$ java -version openjdk version "1.8.0_265" OpenJDK Runtime Environment (build 1.8.0_265-ga) OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode) [nix-shell:~/streaming-consistency/flink]$ flink --version Version: 1.12.2, Commit ID: 4dedee0 [nix-shell:~/streaming-consistency/flink]$ nix-info system: "x86_64-linux" , multi-user?: yes, version: nix-env (Nix) 2.3.10, channels(jamie): "", channels(root): " nixos-20.09.3554.f8929dce13e", nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos
Description
I'm running this simple query:
CREATE VIEW credits AS SELECT to_account AS account, sum(amount) AS credits FROM transactions GROUP BY to_account; CREATE VIEW debits AS SELECT from_account AS account, sum(amount) AS debits FROM transactions GROUP BY from_account; CREATE VIEW balance AS SELECT credits.account AS account, credits - debits AS balance FROM credits, debits WHERE credits.account = debits.account; CREATE VIEW total AS SELECT sum(balance) FROM balance;
The `total` view is a sanity check - it's value should always be 0 because money is only moved from one account to another, never created or destroyed.
In streaming mode (code here) only about ~0.04% of the output values are 0. The absolute error in the outputs increases roughly linearly wrt to the number of input transactions. But after the inputs are finished it does return to 0.
In batch mode (code here) it produces 0 for a while but then has large jumps to incorrect outputs and never returns to 0. In this run, the first ~44% of the outputs are correct but the final answer is -48811 which amounts to miscounting ~5% of the inputs.
I also run a variant of that query which joins on event time. In streaming mode it produces similar results to the original. In batch mode only 2 out of 1718375 outputs were correct and the final error was similar to the original query.