Uploaded image for project: 'Apache Gearpump'
  1. Apache Gearpump
  2. GEARPUMP-63

Gearpump Storage framework

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      imported from https://github.com/gearpump/gearpump/issues/1197 on behalf of whjiang. His original proposal,

      In general, a Gearpump application requires following storage support:

      1. Jar-file storage to store the application jar file(s).
      2. application log. Currently we store logs in each node which makes application log analysis difficult.
      3. application metrics.
      4. application configuration.
      5. data source offset store (for at-least once semantics of streaming application)
      6. application state checkpoint store (for transaction semantics)

      The general idea is:

      1. Provide a storage system satisfied the above requirements.
      2. Assume this storage is highly available. That means, it is user's duty to provide such kind of a storage. For test purpose, user can use some non-HA storage system. But, in product use, it shall be HAed.
      3. Isolate usage from implementation. That is, Gearpump doesn't rely on Hadoop-common or HDFS or one specific implementation to provide such storage. User is free to implement its own storage.
      4. This is a daemon provided functionality and can be used by every Gearpump application.
      5. This storage shall provide data retentation functionality and access control.
      6. This storage provides a set of API to meet the above requirements instead of one low-level API.
      7. User can override the system setting to provide dedicated implementation for certain sub-storage system, e.g. chekcpoint store.
      8. Akka replication shall store minimal info for an application and leave the majority to this storage system. I.e. akka replication is more like a seed to this storage system.
      9. In release, each storage implementation (e.g. storage-hdfs) is a standalone module/artifact.

      The draft of this storage looks like (quite initial, tentative to change):

      trait Storage {
          def createAppStorage(AppName, AppId) : AppStorage
          def getAppStorage(AppId) : Option[AppStorage]
      }
      
      trait AppStorage {
          def open
          def close
          def getJarStore: JarStore
          def getMetricsStore: AppMetricsStore
          def getKVStore: KVStore
          def getLogAppender: LogAppender
          def getConfiguration(ProcessorId): UserConfig
          def setConfiguration(ProcessorId, UserConfig)
      }
      
      trait JarStore {
          def copyFromLocal(localPath, remotePath)
          def copyToLocal(remotePath, localPath)
      }
      
      ///assume K is sortable
      trait KVStore[K,V] {
          def append(key, value)
          def read(key): Try[Option[V]]
      }
      

      Attachments

        Issue Links

          Activity

            People

              whjiang Weihua Jiang
              mauzhang Manu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: