Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13963

Consolidate Hadoop file systems usage and Hadoop integration docs

    XMLWordPrintableJSON

    Details

      Description

      We have hadoop related docs in several places at the moment:

      • dev/batch/connectors.md (Hadoop FS implementations and setup)
      • dev/batch/hadoop_compatibility.md (not valid any more that Flink always has Hadoop types out of the box as we do not build and provide Flink with Hadoop by default)
      • ops/filesystems/index.md (plugins,¬†Hadoop FS implementations and setup revisited)
      • ops/deployment/hadoop.md (Hadoop classpath)
      • ops/config.md (deprecated way to provide Hadoop configuration in Flink conf)

      We could consolidate all these pieces of docs into a consistent structure to help users to navigate through the docs to well-defined spots depending on which feature they are trying to use.

      The places in docs which should contain the information about Hadoop:

      • dev/batch/hadoop_compatibility.md (only Dataset API specific stuff about integration with Hadoop)
      • ops/filesystems/index.md (Flink FS plugins and Hadoop FS implementations)
      • ops/deployment/hadoop.md (Hadoop configuration and classpath)

      How to setup Hadoop itself should be only in ops/deployment/hadoop.md. All other places dealing with Hadoop/HDFS should contain only their related things and just reference it 'how to configure Hadoop'. Like all chapters about writing to file systems (batch connectors and streaming file sinks) should just reference file systems.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                azagrebin Andrey Zagrebin
                Reporter:
                azagrebin Andrey Zagrebin
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m