Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-10

HDFS Integration

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • hdfs
    • None

    Description

      Ability to persist data on HDFS had been under development for GemFire. It was part of the latest code drop, GEODE-8. As part of this feature we are proposing some changes to the HdfsStore management API (see attached doc for details).

      1. The current API has nested configuration for compaction and async queue. This nested structure forces user to execute multiple steps to manage a store. It also does not seem to be consistent with other management APIs
      2. Some member names in current API are confusing

      HDFS Integration: Geode as a transactional layer that microbatches data out to Hadoop. This capability makes Geode a NoSQL store that can sit on top of Hadoop and parallelize the process of moving data from the in memory tier into Hadoop, making it very useful for capturing and processing fast data while making it available for Hadoop jobs relatively quickly. The key requirements being met here are

      1. Ingest data into HDFS parallely
      2. Cache bloom filters and allow fast lookups of individual elements
      3. Have programmable policies for deciding what stays in memory
      4. Roll files in HDFS
      5. Index data that is in memory
      6. Have expiration policies that allows the transactional set to decay out older data
      7. Solution needs to support replicated and partitioned regions

      Attachments

        Issue Links

          1.
          Re-introduce the HDFS unit tests that were disabled before code-drop (GEODE-8) due to failures Sub-task Resolved Unassigned
          2.
          RegionWithHDFSPersistenceBasicDUnitTest.testPUTDMLBulkSupport fails intermittently with suspect string Sub-task Resolved Unassigned
          3.
          HDFSStore API should be more consistent Sub-task Resolved Unassigned
          4.
          RegionWithHDFSPersistenceBasicDUnitTest.testGlobalDestroyFromAccessor fails intermittently Sub-task Resolved Unassigned
          5.
          Hadoop integration tests fail under Windows Sub-task Resolved Unassigned
          6.
          Failure from RegionWithHDFSOffHeapBasicDUnitTest.testWObasicClose in nightly build Sub-task Resolved Unassigned
          7.
          CI failure: RegionWithHDFSBasicDUnitTest.testPUTDMLBulkSupport Sub-task Resolved Unassigned
          8.
          CI failure: ColocatedRegionWithHDFSDUnitTest.testGetFromHDFS Sub-task Resolved Unassigned
          9.
          CI failure: RegionWithHDFSBasicDUnitTest.testPUTDMLSupport Sub-task Resolved Unassigned
          10.
          CI failure: RegionWithHDFSOffHeapBasicDUnitTest.testGetFromHDFS Sub-task Resolved Unassigned
          11.
          CI failure: RegionWithHDFSOffHeapBasicDUnitTest.testByteArrays Sub-task Resolved Unassigned
          12.
          CI failure: RegionWithHDFSBasicDUnitTest.testGetFromHDFS Sub-task Resolved Unassigned
          13.
          CI failure: RegionWithHDFSBasicDUnitTest.testByteArrays Sub-task Resolved Unassigned
          14.
          CI failure: RegionWithHDFSOffHeapBasicDUnitTest.testGlobalDestroyWithHDFSData Sub-task Resolved Unassigned
          15.
          CI failure: RegionWithHDFSPersistenceBasicDUnitTest.testGetFromHDFS Sub-task Resolved Unassigned
          16.
          Failure from RegionWithHDFSPersistenceBasicDUnitTest.testGlobalDestroyWithQueueData Sub-task Resolved Unassigned
          17.
          HDFSConfigJUnitTest testHdfsStoreInvalidCompactionConf expected exception logic is incorrect Sub-task Closed Unassigned

          Activity

            People

              Unassigned Unassigned
              upthewaterspout Dan Smith
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: