Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The Setup function of a data layer opens the database (e.g., DataShard or LMDB) and reads a sample record. The sample record is necessary for setting upper layers' data shape. Every data layer's Setup function is called when SINGA creates the NeuralNet object. If there the group size is 128 and partitioning is on dimension 0, then 128 data layers will be created. The memory would be used up if the database object has large cache (prefetch) size.
Although every process has the full NeuralNet object, i.e., all layers. Each process has a subset of workers which run over a subset of (data) layers. Consequently, in one process, only a small number of data layers will call ComputeFeature to read data records.
To fix the bug, we just close the database after reading one sample record in Setup function, and re-open it in ComputeFeature function. In this way, only a smaller number of database instances are open in each process.