Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-3430

Query interface over lineage traces

    XMLWordPrintableJSON

Details

    • Story
    • Status: Closed
    • Major
    • Resolution: Implemented
    • None
    • SystemDS 3.3
    • None

    Description

      Build a query interface over many text-based lineage traces from various ML workloads. A lineage trace is a serialized DAG of the operations without any control flows. The task is to deserialize the traces into in-memory formats and answer queries regarding the workload characteristics. 

      The in-memory format could be tabular or semi-structured. The internal representation should preserve the structure of the DAGs and the operators' properties to answer all kinds of queries. One possible way would be to represent each DAG by multiple tables – one for each operator with the corresponding attributes, and one for preserving the structure with attributes including output nodes for each operator. These tables can be joined by IDs. Existing libraries (e.g. Pandas) can be used to define the query interface.

      Example queries include:

      Find all DAGs with a convolution operation that takes more than 20ms to execute.
      Compare the total number of operations between two DAGs.
      Group DAGs by the type of non-linear operation used and calculate the average execution time for each group.
      Compare the memory usage of the matrix multiplication between two DAGs.
      Find similar DAGs on different datasets.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Arnab Phani Arnab Phani
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: