Uploaded image for project: 'Singa'
  1. Singa
  2. SINGA-47

Fix a bug in data layers that leads to out-of-memory when group size is too large

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None

    Description

      The Setup function of a data layer opens the database (e.g., DataShard or LMDB) and reads a sample record. The sample record is necessary for setting upper layers' data shape. Every data layer's Setup function is called when SINGA creates the NeuralNet object. If there the group size is 128 and partitioning is on dimension 0, then 128 data layers will be created. The memory would be used up if the database object has large cache (prefetch) size.

      Although every process has the full NeuralNet object, i.e., all layers. Each process has a subset of workers which run over a subset of (data) layers. Consequently, in one process, only a small number of data layers will call ComputeFeature to read data records.

      To fix the bug, we just close the database after reading one sample record in Setup function, and re-open it in ComputeFeature function. In this way, only a smaller number of database instances are open in each process.

      Attachments

        Activity

          People

            wangwei.cs wangwei
            wangwei.cs wangwei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: