Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The mailglomper script does not take account of renamed mailing lists.
This can result in double counting the activity for a project.
For example, commits@libcloud was renamed to notifications@libcloud in March 2014.
However the data in the maildata_extended.json file includes weekly epoch entries
for commits:
1507161600 2017-10-05 00:00:00 UTC
to
1524096000 2018-04-19 00:00:00 UTC
whereas notifications has:
1515024000 2018-01-04 00:00:00 UTC
to
1531958400 2018-07-19 00:00:00 UTC
The weekly counts agree for the overlap period.
If the commits mbox files were still present up to April 2018, there would be an index entry for the list, and if there was also a redirect in place, the code would see the redirected files.
I think the code should probably ignore redirects if that's possible.
When a list is renamed, the old data ought to be dropped, otherwise it may be double-counted.
Also the obsolete entries will gradually accumulate.
This applies to both the maildata_weekly.json and maildata_extended.json files.