Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
In madlib 1.13, if I run the follow query
DROP TABLE IF EXISTS vertex, "EDGE"; CREATE TABLE vertex( id INTEGER ); CREATE TABLE "EDGE"( src INTEGER, dest INTEGER, user_id INTEGER ); INSERT INTO vertex VALUES (0), (1), (2); INSERT INTO "EDGE" VALUES (0, 1, 1), (0, 2, 1), (1, 2, 1), (2, 1, 1), (0, 1, 2); DROP TABLE IF EXISTS pagerank_ppr_grp_out; DROP TABLE IF EXISTS pagerank_ppr_grp_out_summary; SELECT pagerank( 'vertex', -- Vertex table 'id', -- Vertix id column '"EDGE"', -- "EDGE" table 'src=src, dest=dest', -- "EDGE" args 'pagerank_ppr_grp_out', -- Output table of PageRank NULL, -- Default damping factor (0.85) NULL, -- Default max iters (100) NULL, -- Default Threshold 'user_id');
I will get result
madlib=# select * from pagerank_ppr_grp_out order by user_id, id; user_id | id | pagerank ---------+----+------------------- 1 | 0 | 0.05 1 | 0 | 0.05 1 | 1 | 0.614906399170753 1 | 2 | 0.614906399170753 2 | 0 | 0.075 2 | 1 | 0.13875 (6 rows)
where user_id=1, id=1, pagerank=0.05 appears twice.
We should correct it to only show distinct result.
Besides, for user_id=1, all pagerank scores should sum up to 1. The score for user_id=1, id=1 should be 0.475, and the score for user_id=1, id=2 should be 0.475. We should correct this calculation too.
Attachments
Issue Links
- links to