Currently, we read shared inputs redundantly in each parfor worker. This causes redundant read and is unnecessarily memory-inefficient.
This task aims to read shared inputs once per process and reuse them across threads. The most elegant way of handling this is to reuse initially parsed symbol table entries (instances of matrix objects), except for result variables. Then the sharing happens automatically (similar to local parfor) over the shared per-process buffer pool.