Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1368

Identify potential performance issues in modules using distributed by clause

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • v3.0.0
    • Module: Graph
    • None

    Description

      Based on our findings in this JIRA, there may be some performance hits in other modules due to the way we use distributed by clause at the moment. After going through the code, we noticed the following issues that we may want to explore a bit:
      Graph modules:
      1. apsp.py_in This does not use distributed by the wrong way, but we noticed it creates an index for Postgres.
      2. sssp.py_in This does not use distributed by the wrong way, but we noticed it creates an index for Postgres. Jira to track this and the previous issue is https://issues.apache.org/jira/browse/MADLIB-1369
      3. hits.py_in Uses distributed by with grouping, must be changed.
      4. pagerank.py_in Uses distributed by with grouping, must be changed.
      5. wcc.py_in Uses distributed by with grouping, must be changed. Jira to track this is https://issues.apache.org/jira/browse/MADLIB-1367

      Non-Graph modules that use distributed by:
      1. logistic.py_in This is the only module that uses group iteration controller from group_control.py_in which distributes rel_state table based on grouping columns. The fix here could be to remove the distributed by clause present in group_control.py_in.
      2. path.py_in A temporary table created in path distributes it using multiple columns, we must check if that was intentional.
      3. encode_categorical.py_in The output table creation query has a distributed by clause which uses the distribution key provided by the user as an input param. What was the intention behind that optional param, or rather what is the expected behavior for a given param value?
      4. bayes.py_in There are a couple of distributed by clauses. Check if that was intentional.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            njayaram Nandish Jayaram
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: