Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-1207

Dataset exists query in lineage APIs takes longer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.8-incubating
    • None
    • None

    Description

      Hive_column now extends DataSet. Lineage Service uses the DSL query Dataset where _guid = <id> which maps to the gremlin query g.V().has(supertype, Dataset).has(_guid, <id>). Since the first filter is on type which returns many vertices, this query is slow. Supertypes is a list property and not sure how adding combined index will work. This can be replaced with graph query directly like

      titanGraph.query().has(Constants.GUID_PROPERTY_KEY, guid)
                                .has(Constants.SUPER_TYPES_PROPERTY_KEY, AtlasClient.DATA_SET_SUPER_TYPE)
      

      Thanks ssainath for helping to test this

      Attachments

        1. ATLAS-1207.patch
          7 kB
          Shwetha GS
        2. ATLAS-1207-v2.patch
          18 kB
          Shwetha GS

        Activity

          People

            shwethags Shwetha GS
            sharmadhas Sharmadha S
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: