Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-1506

Refactor resource interface for timeseries-based data like jobs to much better performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Resolved
    • v1.3.0, v1.4.0, v1.5.0
    • None
    • None

    Description

      Problem

      Currently all operations like getJobOutputs/getJobs and so on are use two-times scan to get the response, for example, currently the scan always:
      1. Get keys, sort, get first and last key (in fact which is just get by prefix filter) with "store.listResources(resourcePath)"
      2. Re-scan the keys with timestamp filter: "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"

      public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis) throws PersistentException {
              try {
                  NavigableSet<String> resources = store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
                  if (resources == null || resources.isEmpty()) {
                      return Collections.emptyList();
                  }
                  // Collections.sort(resources);
                  String rangeStart = resources.first();
                  String rangeEnd = resources.last();
                  return store.getAllResources(rangeStart, rangeEnd, timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, JOB_OUTPUT_SERIALIZER);
              } catch (IOException e) {
                  logger.error("error get all Jobs:", e);
                  throw new PersistentException(e);
              }
          }
      

      Solution

      In fact we could simply combine the two-times scan into one directly:

      store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
      store.getAllResources(resourcePath, Class, Serializer)
      

      For example, refactored "List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis)" as following:

      public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis) throws PersistentException {
              try {
                  return store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, JOB_OUTPUT_SERIALIZER);
              } catch (IOException e) {
                  logger.error("error get all Jobs:", e);
                  throw new PersistentException(e);
              }
          }
      

      Attachments

        Issue Links

          Activity

            People

              haoch Hao Chen
              haoch Hao Chen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: