Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1337

DL: Better warning and default for gpu memory fraction when no of gpus < no of segments

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • v1.16
    • Deep Learning
    • None

    Description

      We support the use case when no of gpus < no of segments however we noticed that sometimes this causes gpdb failures like

      could not connect to segment: initialization of segworker group failed
      
      1. We should give a meaningful warning to the user to make them aware that this feature may or may not work and also make a recommendation
      2. We should also come up with a better heuristic for the memory fraction value. Currently we default to using 90% of the available memory and distribute it evenly among the segments.

      Possible recommendations
      1. Use as many gpus as segments (this may not be practical)
      2. May be a smaller buffer size will help. Use minibatch preprocessor dl to pack less images. (we need to test this before we recommend it)

      Attachments

        Activity

          People

            Unassigned Unassigned
            nikhilkak Nikhil
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: