Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8707

Implement an async pure c++ HDFS client

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2.0
    • Component/s: hdfs-client
    • Labels:
      None

      Description

      As part of working on the C++ ORC reader at ORC-3, we need an HDFS pure C++ client that lets us do async io to HDFS. We want to start from the code that Haohui's been working on at https://github.com/haohui/libhdfspp .

        Attachments

          Issue Links

          1.
          Integrate the build infrastructure with hdfs-client Sub-task Resolved Haohui Mai
          2.
          Import third_party libraries into the repository Sub-task Resolved Haohui Mai
          3.
          Use std::chrono to implement the timer in the asio library Sub-task Resolved Haohui Mai
          4.
          Initial implementation of a Hadoop RPC v9 client Sub-task Resolved Haohui Mai
          5.
          Use Doxygen to generate documents for libhdfspp Sub-task Resolved Haohui Mai
          6.
          Implement the continuation library for libhdfspp Sub-task Resolved Haohui Mai
          7.
          Implement remote block reader in libhdfspp Sub-task Resolved Haohui Mai
          8.
          Generate Hadoop RPC stubs from protobuf definitions Sub-task Resolved Haohui Mai
          9.
          Implement a libhdfs(3) compatible API Sub-task Resolved James Clampffer
          10.
          Implement FileSystem and InputStream API for libhdfspp Sub-task Resolved Haohui Mai
          11.
          SASL support for data transfer protocol in libhdfspp Sub-task Resolved Haohui Mai
          12.
          Implement unit tests for remote block reader in libhdfspp Sub-task Resolved Haohui Mai
          13.
          InputStream.PositionRead() should be aware of available DNs Sub-task Resolved Haohui Mai
          14.
          Fix compilation issues on arch linux Sub-task Resolved Owen O'Malley
          15.
          Initialize protobuf fields in RemoteBlockReaderTest Sub-task Resolved Haohui Mai
          16.
          RPC client should fail gracefully when the connection is timed out or reset Sub-task Resolved Haohui Mai
          17.
          Retry reads on DN failure Sub-task Resolved James Clampffer
          18.
          InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers Sub-task Resolved Haohui Mai
          19.
          Suppress false positives from Valgrind on uninitialized variables in tests Sub-task Resolved Haohui Mai
          20.
          Config file reader / options classes for libhdfs++ Sub-task Resolved Bob Hansen
          21.
          Add logging system for libdhfs++ Sub-task Resolved James Clampffer
          22.
          Move the implementation to the hdfs-native-client module Sub-task Resolved Haohui Mai
          23.
          Refactor libhdfs into stateful/ephemeral objects Sub-task Resolved Bob Hansen
          24.
          Simplify embedding libhdfspp into other projects Sub-task Resolved James Clampffer
          25.
          Add valgrind suppression for statically initialized library objects Sub-task Resolved James Clampffer
          26.
          libhdfs++ should respect NN retry configuration settings Sub-task Resolved Bob Hansen
          27.
          InputStreamImpl should hold a shared_ptr of the BlockReader Sub-task Resolved James Clampffer
          28.
          Implement basic NN operations Sub-task Resolved Anatoli Shein
          29.
          Implement a unix-like cat utility Sub-task Resolved James Clampffer
          30.
          Import RapidXML 1.13 for libhdfspp Sub-task Resolved Bob Hansen
          31.
          libhdfspp should use sizeof(int32_t) instead of sizeof(int) when parsing data Sub-task Resolved James Clampffer
          32.
          Allow the location of hadoop source tree resources to be passed to CMake during a build. Sub-task Resolved Bob Hansen
          33.
          Create a generic function to synchronize async functions and methods. Sub-task Resolved James Clampffer
          34.
          libhdfspp fails to compile after HDFS-9207 Sub-task Resolved Haohui Mai
          35.
          Test libhdfs++ with existing libhdfs tests Sub-task Resolved Stephen
          36.
          Get libhdfs++ gmock tests running with CI Sub-task Resolved Haohui Mai
          37.
          Implement reads with implicit offset state in libhdfs++ Sub-task Resolved James Clampffer
          38.
          HDFS-8707 builds are failing with protobuf directories as undef Sub-task Resolved Haohui Mai
          39.
          Build both static and dynamic libraries for libhdfspp Sub-task Resolved Stephen
          40.
          Clean up the RAT warnings in the HDFS-8707 branch. Sub-task Resolved Bob Hansen
          41.
          Import the optional library into libhdfs++ Sub-task Resolved Bob Hansen
          42.
          Be able to build/test libhdfspp without Maven Sub-task Resolved Unassigned
          43.
          Enable valgrind for libhdfspp unit tests Sub-task Resolved Bob Hansen
          44.
          libhdfs++ Fix memory stomp in OpenFileForRead. Sub-task Resolved James Clampffer
          45.
          Fix protobuf runtime warnings Sub-task Resolved Unassigned
          46.
          libhdfs++: suppress warnings from third-party libraries Sub-task Resolved Bob Hansen
          47.
          Documentation needs to be exposed Sub-task Resolved Unassigned
          48.
          No header files in mvn package Sub-task Resolved Unassigned
          49.
          libhdfs++ Fix valgrind failures when using more than 1 io_service worker thread. Sub-task Resolved James Clampffer
          50.
          libhdfs++ Enable builds with no compiler optimizations Sub-task Resolved Bob Hansen
          51.
          Enable CI infrasructure to use libhdfs++ hdfsRead Sub-task Resolved Stephen
          52.
          libhdfs++: move lib/proto/cpp_helpers to third-party since it won't have an ASF license Sub-task Resolved Bob Hansen
          53.
          libhdfs++ Initialize BadNodeTracker in FileSystemImpl constructor Sub-task Resolved James Clampffer
          54.
          libhdfs++: failure to connect to ipv6 host causes CI unit tests to fail Sub-task Resolved Bob Hansen
          55.
          libhdfs++ deadlocks in Filesystem::New if NN conneciton fails Sub-task Resolved Bob Hansen
          56.
          libhdfs++: implement HDFSConfiguration class Sub-task Resolved Bob Hansen
          57.
          libhdfs++: load configuration from files Sub-task Resolved Bob Hansen
          58.
          libhdfs++: pull Options from default configs by default Sub-task Resolved Bob Hansen
          59.
          libhfds++: Allow seek to EOF Sub-task Resolved Bob Hansen
          60.
          libhdfs++ Add runtime hooks to allow a client application to add low level monitoring and tests. Sub-task Resolved Bob Hansen
          61.
          libhdfs++: Add a mechanism to retrieve human readable error messages through the C API Sub-task Resolved James Clampffer
          62.
          libhdfs++: Implement builder apis from C bindings Sub-task Resolved Bob Hansen
          63.
          libhdfs++: Add additional type-safe getters to the Configuration class Sub-task Resolved Unassigned
          64.
          libhdfs++: for consistency, include files should be in hdfspp Sub-task Resolved Bob Hansen
          65.
          libhdfs++: Support async cancellation of read operations Sub-task Resolved James Clampffer
          66.
          libhdfs++: Fix inconsistencies with libhdfs C API Sub-task Resolved James Clampffer
          67.
          libhdfs++: potential segfault after teardown Sub-task Resolved Bob Hansen
          68.
          libhdfs++: Add appropriate catch blocks for ASIO operations that throw Sub-task Resolved Bob Hansen
          69.
          libhdfs++: Reimplement Status object as a normal struct Sub-task Resolved James Clampffer
          70.
          libhdfs++: Create examples of consuming libhdfs++ Sub-task Resolved Unassigned
          71.
          libhdfs++: Implement simple authentication Sub-task Resolved Bob Hansen
          72.
          libhdfs++: GetLastError not returning meaningful messages after some failures Sub-task Resolved Bob Hansen
          73.
          libhdfs++: RPC engine will attempt to close an asio socket before it's been opened Sub-task Resolved James Clampffer
          74.
          libhdfs++: Client fails to pass TokenProto from LocatedBlockProto to server when reading a block Sub-task Resolved James Clampffer
          75.
          libhfds++: ConfigurationLoader throws parse_exception on invalid input Sub-task Resolved Bob Hansen
          76.
          libhdfs++: EACCES not setting errno correctly Sub-task Resolved Bob Hansen
          77.
          libhdfs++: Cancel outstanding operations without calling shutdown Sub-task Resolved Bob Hansen
          78.
          libhfds++: C++ exceptions should never escape the C API Sub-task Resolved Unassigned
          79.
          libhdfs++: Add test suite to simulate network issues Sub-task Resolved Xiaowei Zhu
          80.
          libhdfs++: add hooks to facilitate fault testing Sub-task Resolved Bob Hansen
          81.
          libhdfs++: find a URI parsing library Sub-task Resolved Bob Hansen
          82.
          libhdfs++: Implement debug allocators Sub-task Resolved Xiaowei Zhu
          83.
          libhdfs++: Integrate logging with the C API Sub-task Resolved Unassigned
          84.
          libhdfs++: Shutdown sockets to avoid "Connection reset by peer" Sub-task Resolved James Clampffer
          85.
          libhdfs++: Fix race conditions in RPC layer Sub-task Resolved Bob Hansen
          86.
          libhdfs++: Datanode protocol version mismatch Sub-task Resolved James Clampffer
          87.
          libhdfs++: File length doesn't always count the last block if it's being written to Sub-task Resolved Xiaowei Zhu
          88.
          libhdfs++: hdfsConnect hangs when given bad host or port Sub-task Resolved James Clampffer
          89.
          libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket Sub-task Resolved James Clampffer
          90.
          hdfs-native-client fails to build with CMake 2.8.11 or earlier Sub-task Resolved Tibor Kiss
          91.
          libhdfs++: Add SASL authentication Sub-task Resolved Bob Hansen
          92.
          libhdfs++: Get rid of lock in RpcConnectionImpl destructor Sub-task Resolved James Clampffer
          93.
          libhdfs++: HA namenode support Sub-task Resolved James Clampffer
          94.
          libhdfs++: Implement Cyrus SASL implementation in sasl_enigne.cc Sub-task Resolved Bob Hansen
          95.
          libhdfspp: Move NameNodeOp to a separate file Sub-task Resolved Anatoli Shein
          96.
          libhdfs++: Implement GetPathInfo and ListDirectory Sub-task Resolved Bob Hansen
          97.
          libhdfs++: Implement GetBlockLocations Sub-task Resolved Bob Hansen
          98.
          libhdfs++: Implement GetFsStats Sub-task Resolved Anatoli Shein
          99.
          libhdfs++: Implement snapshot operations and GetFsStats Sub-task Resolved Anatoli Shein
          100.
          libhfds++: if HA is available, authentication doesn't get parsed in configs Sub-task Resolved Bob Hansen
          101.
          libhdfs++: make error returning mechanism consistent across all hdfs operations Sub-task Resolved Anatoli Shein
          102.
          libhdfs++: Implement mkdirs, rmdir, rename, and remove Sub-task Resolved Anatoli Shein
          103.
          libhdfs++: Implement chmod and chown Sub-task Resolved Anatoli Shein
          104.
          libhdfs++: Add connect timeouts to async_connect calls Sub-task Resolved Bob Hansen
          105.
          libhdfs++: hdfsGetBlockLocations doesn't null terminate ip address strings Sub-task Resolved James Clampffer
          106.
          libhdfs++: Silence compile warnings from URI parser Sub-task Resolved James Clampffer
          107.
          libhdfs++: Client Name Protobuf Error Sub-task Resolved Bob Hansen
          108.
          libhdfs++: reorder directories in src/main/libhdfspp/examples, and add C++ version of cat tool Sub-task Resolved Anatoli Shein
          109.
          libhdfs++: Implement parallel find with wildcards tool Sub-task Resolved Anatoli Shein
          110.
          libhdfs++: return explicit error when non-secured client connects to secured server Sub-task Resolved Kai Jiang
          111.
          libhdfs++: FileSystem should have a convenience no-args ctor Sub-task Resolved James Clampffer
          112.
          libhdfs++: Expose an InputStream interface for the apache ORC project Sub-task Resolved James Clampffer
          113.
          libhdfs++: In RPC engine replace vector with deque for pending requests Sub-task Resolved Anatoli Shein
          114.
          libhdfs++: Implement recursive directory generator Sub-task Resolved Anatoli Shein
          115.
          libhdfs++: synchronize access to working_directory and bytes_read_. Sub-task Resolved Anatoli Shein
          116.
          libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find. Sub-task Resolved Anatoli Shein
          117.
          libhdfs++: Fix broken logic in HA retry policy Sub-task Resolved James Clampffer
          118.
          libhdfs++: Implement the rest of the tools Sub-task Resolved Anatoli Shein
          119.
          libhdfs++: Public API should expose configuration parser Sub-task Resolved James Clampffer
          120.
          libhdfs++: rationalize ioservice interactions Sub-task Resolved James Clampffer
          121.
          libhdfs++: Public API headers should not depend on internal implementation Sub-task Resolved James Clampffer
          122.
          libhdfs++: Make log levels consistent Sub-task Resolved James Clampffer
          123.
          libhdfs++: Fix object lifecycle issues in the BlockReader Sub-task Resolved James Clampffer
          124.
          libhdfs++: Make connection to HA clusters faster Sub-task Resolved James Clampffer
          125.
          libhdfs++: Don't retry if there is an authentication failure Sub-task Resolved James Clampffer
          126.
          libhdfs++: FileSystem needs to be able to cancel pending connections Sub-task Resolved James Clampffer
          127.
          Expose rack id in hdfsDNInfo Sub-task Resolved Xiaowei Zhu
          128.
          libhdfs++: Some refactoring to better organize files Sub-task Resolved James Clampffer
          129.
          libhdfs++: Segfault in HA failover if DNS lookup for both Namenodes fails Sub-task Resolved James Clampffer
          130.
          libhdfs++: Log Datanode information when reading an HDFS block Sub-task Resolved Xiaowei Zhu
          131.
          libhdfs++: Fix race condition in ScopedResolver Sub-task Resolved James Clampffer
          132.
          libhdfs++: Log Datanode read size when reading an HDFS block Sub-task Resolved Xiaowei Zhu
          133.
          libhdfs++: Add a build option to skip building examples, tests, and tools Sub-task Resolved Anatoli Shein
          134.
          libhdfs++: RPC connection should handle authorization error call id Sub-task Resolved James Clampffer
          135.
          libhdfs++: Catch exceptions thrown by runtime hooks Sub-task Resolved James Clampffer
          136.
          libhdfs++: SASL events should be scoped closer to usage Sub-task Resolved James Clampffer
          137.
          libhdfs++: Get minidfscluster tests running under valgrind Sub-task Resolved Anatoli Shein
          138.
          libhdfs++: Authentication failure when first NN of kerberized HA cluster is standby Sub-task Resolved James Clampffer
          139.
          libhdfs++: Docker script fails while trying to download JDK 7. Sub-task Resolved Anatoli Shein
          140.
          libhdfs++: A few portability issues Sub-task Resolved Anatoli Shein
          141.
          libhdfs++: read with offset at EOF should return 0 bytes instead of error Sub-task Resolved Xiaowei Zhu
          142.
          libhdfs++: Fix compilation errors and warnings when compiling with Clang Sub-task Resolved Anatoli Shein
          143.
          libhdfs++: add Clang build and tests to the CI system Sub-task Resolved Anatoli Shein
          144.
          libhdfs++: Provide workaround to support cancel on filesystem connect until HDFS-11437 is resolved Sub-task Resolved James Clampffer
          145.
          libhdfs++: Make sure all steps in SaslProtocol end up calling AuthComplete Sub-task Resolved James Clampffer
          146.
          libhdfs++: Rebase 8707 branch onto an up to date version of trunk Sub-task Resolved Deepak Majeti
          147.
          libhdfs++: Add a synchronization interface for the GSSAPI Sub-task Resolved James Clampffer
          148.
          libhdfs++: PROTOC_IS_COMPATIBLE check fails if protobuf library is built from source Sub-task Resolved Anatoli Shein
          149.
          libhdfs++: Prevent Requests from holding dangling pointer to RpcEngine Sub-task Resolved James Clampffer

            Activity

              People

              • Assignee:
                James C James Clampffer
                Reporter:
                omalley Owen O'Malley
              • Votes:
                1 Vote for this issue
                Watchers:
                49 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: