Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-580

Add a Block Management Service to REEF

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • REEF-IO, REEF.NET IO
    • None

    Description

      We propose the addition of a data Block Management service to REEF. The Block Manager manages the transient data of a Big Data application. The Block Manager assumes that transient data can be managed in the following hierarchy:

      • Data Set: A data set consists of a set of (physical)n partitions. For instance, a folder on HDFS could be considered a data set, while its files constitute the partitions.
      • Partition: a physical partition of a data set. In the example above, it would be a file. Partitions consist of Blocks.
      • Block: The atomic unit of data management. Each block belongs to exactly one partition. Blocks are immutable. Blocks can be stored in Evaluator memory, on local Disk or stable, distributed storage. Blocks can have replicas across these memory tiers. Blocks contain data of arbitrary format. From the perspective of this Block Management service, they are large, fixed sized byte arrays.

      The purpose of the Block Manager is to manage the metadata and movement of data sets organized in such a way. To facilitate that, each Block, Partition and DataSet has a unique ID.

      On the Task side, the Block Manager facilitates the retrieval of and access to any Block or Partition by their ID. Specific access methods are yet to be designed (e.g. whether or not there is an order to the blocks). Also, new Blocks can be created on the Task side for a given Partition. Special consideration shall be given to the memory allocation efficiency of this operation.

      On the Driver side, the Block Manager keeps track of the metadata of all Blocks. It provides a network protocol used by the Task side components to retrieve and update metadata records. Metadata can be kept in memory or, in a later version, in stable storage such as a SQL database.

      The Block Management service shall be built in a language and platform agnostic manner. At the very least, the Driver side network protocol needs to be accessible by both JVM and CLR implementations of the Task side. REST could be an appropriate approach.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              markus.weimer Markus Weimer
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: