Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
It seems that filtered replication speed when using native Erlang filters is bound more to the byte-length size of the filter code than the actual logic that is executed. This (hopefully) could be fixed by caching the code eval instead of doing repeated for every document.
We have an Erlang filter that is used for replication, that determines if a document should replicate to a user based on some logic. This was originally written in JS 1 but was converted to Erlang 2 for performance. For reference the Javascript took 3 ~54 seconds, while the Erlang took ~28.
In an attempt to make the Erlang even faster I 'compiled' the filter into a static list of users / metausers, so that the filter could effectively just be "is this users id in this list of allowed users" instead of running recursive logic. To be safe, I kept the old code in the filter as well as a fallback in case the document didn't have the compiled permission list 4.
Confusingly, even through this filter would have executed far less code, filter performance remained the same. It was only when I removed the fallback 5 that the filter went any faster, now 3 times faster at ~10 seconds.
To me this says that a large amount of the time spent is in the actual time spent evaling the string that represents the Erlang into the engine, and that this eval parsing happens over and over again for each document.
To prove this I created a version of the compiled-with-backup 4 version with a bunch of extra Erlang in it that I knew would never actually get executed 6, as well as a version where I removed all the comments etc to make it as small as possible 7. As expected, the long version was much slower (~80 seconds) and the smaller version was slightly faster (~9.5 seconds), even through their code execution paths haven't changed.
To me it seems like Couch should be caching the process of loading this code as a string from the ddoc and injecting it into the Erlang process so that you only take the hit from this once, as this hit seems very major (ie more important than actually writing a simple filter).
1 https://github.com/medic/medic-webapp/blob/filter_experiments_wip/lib/filters.js#L16 – it is not mandatory to understand what this and subsequent code links do, just their differences with relation to performance
2 https://github.com/medic/medic-webapp/blob/filter_experiments_wip/ddocs/erlang_filters/filters/doc_by_place_live.erl
3 All performance measurements are over the same 10k documents
4 https://github.com/medic/medic-webapp/blob/filter_experiments_wip/ddocs/erlang_filters/filters/doc_by_place_backup.erl
5 https://github.com/medic/medic-webapp/blob/filter_experiments_wip/ddocs/erlang_filters/filters/doc_by_place.erl
6 https://github.com/medic/medic-webapp/blob/filter_experiments_wip/ddocs/erlang_filters/filters/doc_by_place_backup_extra_long.erl
7 https://github.com/medic/medic-webapp/blob/filter_experiments_wip/ddocs/erlang_filters/filters/doc_by_place_fast.erl