Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6660

expand the streaming connectors overview page

    Details

      Description

      The overview page for streaming connectors is too lean – it should provide more context and also guide the reader toward related topics.

      Note that FLINK-6038 will add links to the Bahir connectors.

        Issue Links

          Activity

          Hide
          tzulitai Tzu-Li (Gordon) Tai added a comment -

          Thanks for the contribution David!
          Resolved for master via 557540a51cf8a1d6fef1e2e80ad0db4c148b3302.
          Resolved for release-1.3 via ce685dbdae011b6220934836339b0a0130929ba4.

          Show
          tzulitai Tzu-Li (Gordon) Tai added a comment - Thanks for the contribution David! Resolved for master via 557540a51cf8a1d6fef1e2e80ad0db4c148b3302. Resolved for release-1.3 via ce685dbdae011b6220934836339b0a0130929ba4.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3964

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3964
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3964

          "3rd party connectors" doesn't seem suitable. There are also some self hosted connectors, AFAIK.
          Perhaps simply "Connectors in Apache Bahir"? Either way, I'll proceed to merge this. The naming could perhaps be refined when the Bahir connectors are added to the docs with https://issues.apache.org/jira/browse/FLINK-6038.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3964 "3rd party connectors" doesn't seem suitable. There are also some self hosted connectors, AFAIK. Perhaps simply "Connectors in Apache Bahir"? Either way, I'll proceed to merge this. The naming could perhaps be refined when the Bahir connectors are added to the docs with https://issues.apache.org/jira/browse/FLINK-6038 .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alpinegizmo commented on the issue:

          https://github.com/apache/flink/pull/3964

          @greghogan Yeah, I'm not entirely satisfied with "predefined" and "bundled", but those were the terms already used in the documentation, and I didn't want to introduce new terminology unless it's decisively better. Any suggestions? For the connectors coming from Bahir (and possibly elsewhere) I'm thinking about either something like "Other Connectors" or "3rd Party Connectors", or simply "Connectors in Apache Bahir" (assuming there aren't any non-bahir 3rd party connectors worth mentioning).

          Show
          githubbot ASF GitHub Bot added a comment - Github user alpinegizmo commented on the issue: https://github.com/apache/flink/pull/3964 @greghogan Yeah, I'm not entirely satisfied with "predefined" and "bundled", but those were the terms already used in the documentation, and I didn't want to introduce new terminology unless it's decisively better. Any suggestions? For the connectors coming from Bahir (and possibly elsewhere) I'm thinking about either something like "Other Connectors" or "3rd Party Connectors", or simply "Connectors in Apache Bahir" (assuming there aren't any non-bahir 3rd party connectors worth mentioning).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user greghogan commented on the issue:

          https://github.com/apache/flink/pull/3964

          I have a question on the names "predefined" and "bundled". There is also the 3rd category of connectors from Bahir. Perhaps this can be reconsidered in the following update.

          Show
          githubbot ASF GitHub Bot added a comment - Github user greghogan commented on the issue: https://github.com/apache/flink/pull/3964 I have a question on the names "predefined" and "bundled". There is also the 3rd category of connectors from Bahir. Perhaps this can be reconsidered in the following update.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3964

          Thanks for the update, I think its an improvement.
          +1, merging this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3964 Thanks for the update, I think its an improvement. +1, merging this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3964#discussion_r117963365

          — Diff: docs/dev/connectors/index.md —
          @@ -25,22 +25,54 @@ specific language governing permissions and limitations
          under the License.
          -->

          -Connectors provide code for interfacing with various third-party systems.
          +* toc
          +{:toc}

          -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.)
          +## Predefined Sources and Sinks

          +## Bundled Connectors

          +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:

          -To run an application using one of these connectors, additional third party
          -components are usually required to be installed and launched, e.g. the servers
          -for the message queues. Further instructions for these can be found in the
          -corresponding subsections.
          + * [Apache Kafka](kafka.html) (sink/source)
          + * [Apache Cassandra](cassandra.html) (sink)
          + * [Amazon Kinesis Streams](kinesis.html) (sink/source)
          + * [Elasticsearch](elasticsearch.html) (sink)
          + * [Hadoop FileSystem](filesystem_sink.html) (sink)
          + * [RabbitMQ](rabbitmq.html) (sink/source)
          + * [Apache NiFi](nifi.html) (sink/source)
          + * [Twitter Streaming API](twitter.html) (source)
          +
          +Keep in mind that to use one of these connectors in an application, additional third party
          +components are usually required, e.g. servers for the data stores or message queues.
          +Note also that while the streaming connectors listed in this section are part of the
          +Flink project and are included in source releases, they are not included in the binary distributions.
          +Further instructions can be found in the corresponding subsections.
          — End diff –

          That makes sense!

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3964#discussion_r117963365 — Diff: docs/dev/connectors/index.md — @@ -25,22 +25,54 @@ specific language governing permissions and limitations under the License. --> -Connectors provide code for interfacing with various third-party systems. +* toc +{:toc} -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.) +## Predefined Sources and Sinks * [Apache Kafka] ( https://kafka.apache.org/ ) (sink/source) * [Elasticsearch] ( https://elastic.co/ ) (sink) * [Hadoop FileSystem] ( http://hadoop.apache.org ) (sink) * [RabbitMQ] ( http://www.rabbitmq.com/ ) (sink/source) * [Amazon Kinesis Streams] ( http://aws.amazon.com/kinesis/streams/ ) (sink/source) * [Twitter Streaming API] ( https://dev.twitter.com/docs/streaming-apis ) (source) * [Apache NiFi] ( https://nifi.apache.org ) (sink/source) * [Apache Cassandra] ( https://cassandra.apache.org/ ) (sink) +A few basic data sources and sinks are built into Flink and are always available. +The [predefined data sources] ({{ site.baseurll }}/dev/datastream_api.html#data-sources) include reading from files, directories, and sockets, and +ingesting data from collections and iterators. +The [predefined data sinks] ({{ site.baseurl }}/dev/datastream_api.html#data-sinks) support writing to files, to stdout and stderr, and to sockets. +## Bundled Connectors +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported: -To run an application using one of these connectors, additional third party -components are usually required to be installed and launched, e.g. the servers -for the message queues. Further instructions for these can be found in the -corresponding subsections. + * [Apache Kafka] (kafka.html) (sink/source) + * [Apache Cassandra] (cassandra.html) (sink) + * [Amazon Kinesis Streams] (kinesis.html) (sink/source) + * [Elasticsearch] (elasticsearch.html) (sink) + * [Hadoop FileSystem] (filesystem_sink.html) (sink) + * [RabbitMQ] (rabbitmq.html) (sink/source) + * [Apache NiFi] (nifi.html) (sink/source) + * [Twitter Streaming API] (twitter.html) (source) + +Keep in mind that to use one of these connectors in an application, additional third party +components are usually required, e.g. servers for the data stores or message queues. +Note also that while the streaming connectors listed in this section are part of the +Flink project and are included in source releases, they are not included in the binary distributions. +Further instructions can be found in the corresponding subsections. — End diff – That makes sense!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alpinegizmo commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3964#discussion_r117953990

          — Diff: docs/dev/connectors/index.md —
          @@ -25,22 +25,54 @@ specific language governing permissions and limitations
          under the License.
          -->

          -Connectors provide code for interfacing with various third-party systems.
          +* toc
          +{:toc}

          -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.)
          +## Predefined Sources and Sinks

          +## Bundled Connectors

          +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:

          -To run an application using one of these connectors, additional third party
          -components are usually required to be installed and launched, e.g. the servers
          -for the message queues. Further instructions for these can be found in the
          -corresponding subsections.
          + * [Apache Kafka](kafka.html) (sink/source)
          + * [Apache Cassandra](cassandra.html) (sink)
          + * [Amazon Kinesis Streams](kinesis.html) (sink/source)
          + * [Elasticsearch](elasticsearch.html) (sink)
          + * [Hadoop FileSystem](filesystem_sink.html) (sink)
          + * [RabbitMQ](rabbitmq.html) (sink/source)
          + * [Apache NiFi](nifi.html) (sink/source)
          + * [Twitter Streaming API](twitter.html) (source)
          +
          +Keep in mind that to use one of these connectors in an application, additional third party
          +components are usually required, e.g. servers for the data stores or message queues.
          +Note also that while the streaming connectors listed in this section are part of the
          +Flink project and are included in source releases, they are not included in the binary distributions.
          +Further instructions can be found in the corresponding subsections.
          +
          +## Related Topics
          — End diff –

          I'm thinking that this connectors overview page is currently the best place to put together an summary of everything in Flink relating to connections to external systems. I'll rework things a bit – see if it helps.

          Show
          githubbot ASF GitHub Bot added a comment - Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/3964#discussion_r117953990 — Diff: docs/dev/connectors/index.md — @@ -25,22 +25,54 @@ specific language governing permissions and limitations under the License. --> -Connectors provide code for interfacing with various third-party systems. +* toc +{:toc} -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.) +## Predefined Sources and Sinks * [Apache Kafka] ( https://kafka.apache.org/ ) (sink/source) * [Elasticsearch] ( https://elastic.co/ ) (sink) * [Hadoop FileSystem] ( http://hadoop.apache.org ) (sink) * [RabbitMQ] ( http://www.rabbitmq.com/ ) (sink/source) * [Amazon Kinesis Streams] ( http://aws.amazon.com/kinesis/streams/ ) (sink/source) * [Twitter Streaming API] ( https://dev.twitter.com/docs/streaming-apis ) (source) * [Apache NiFi] ( https://nifi.apache.org ) (sink/source) * [Apache Cassandra] ( https://cassandra.apache.org/ ) (sink) +A few basic data sources and sinks are built into Flink and are always available. +The [predefined data sources] ({{ site.baseurll }}/dev/datastream_api.html#data-sources) include reading from files, directories, and sockets, and +ingesting data from collections and iterators. +The [predefined data sinks] ({{ site.baseurl }}/dev/datastream_api.html#data-sinks) support writing to files, to stdout and stderr, and to sockets. +## Bundled Connectors +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported: -To run an application using one of these connectors, additional third party -components are usually required to be installed and launched, e.g. the servers -for the message queues. Further instructions for these can be found in the -corresponding subsections. + * [Apache Kafka] (kafka.html) (sink/source) + * [Apache Cassandra] (cassandra.html) (sink) + * [Amazon Kinesis Streams] (kinesis.html) (sink/source) + * [Elasticsearch] (elasticsearch.html) (sink) + * [Hadoop FileSystem] (filesystem_sink.html) (sink) + * [RabbitMQ] (rabbitmq.html) (sink/source) + * [Apache NiFi] (nifi.html) (sink/source) + * [Twitter Streaming API] (twitter.html) (source) + +Keep in mind that to use one of these connectors in an application, additional third party +components are usually required, e.g. servers for the data stores or message queues. +Note also that while the streaming connectors listed in this section are part of the +Flink project and are included in source releases, they are not included in the binary distributions. +Further instructions can be found in the corresponding subsections. + +## Related Topics — End diff – I'm thinking that this connectors overview page is currently the best place to put together an summary of everything in Flink relating to connections to external systems. I'll rework things a bit – see if it helps.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alpinegizmo commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3964#discussion_r117942372

          — Diff: docs/dev/connectors/index.md —
          @@ -25,22 +25,54 @@ specific language governing permissions and limitations
          under the License.
          -->

          -Connectors provide code for interfacing with various third-party systems.
          +* toc
          +{:toc}

          -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.)
          +## Predefined Sources and Sinks

          +## Bundled Connectors

          +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:

          -To run an application using one of these connectors, additional third party
          -components are usually required to be installed and launched, e.g. the servers
          -for the message queues. Further instructions for these can be found in the
          -corresponding subsections.
          + * [Apache Kafka](kafka.html) (sink/source)
          + * [Apache Cassandra](cassandra.html) (sink)
          + * [Amazon Kinesis Streams](kinesis.html) (sink/source)
          + * [Elasticsearch](elasticsearch.html) (sink)
          + * [Hadoop FileSystem](filesystem_sink.html) (sink)
          + * [RabbitMQ](rabbitmq.html) (sink/source)
          + * [Apache NiFi](nifi.html) (sink/source)
          + * [Twitter Streaming API](twitter.html) (source)
          +
          +Keep in mind that to use one of these connectors in an application, additional third party
          +components are usually required, e.g. servers for the data stores or message queues.
          +Note also that while the streaming connectors listed in this section are part of the
          +Flink project and are included in source releases, they are not included in the binary distributions.
          +Further instructions can be found in the corresponding subsections.
          — End diff –

          We can't count on people reading (or remembering) this overview page, so I think it's important for each connector page to continue to link to those instructions.

          Show
          githubbot ASF GitHub Bot added a comment - Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/3964#discussion_r117942372 — Diff: docs/dev/connectors/index.md — @@ -25,22 +25,54 @@ specific language governing permissions and limitations under the License. --> -Connectors provide code for interfacing with various third-party systems. +* toc +{:toc} -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.) +## Predefined Sources and Sinks * [Apache Kafka] ( https://kafka.apache.org/ ) (sink/source) * [Elasticsearch] ( https://elastic.co/ ) (sink) * [Hadoop FileSystem] ( http://hadoop.apache.org ) (sink) * [RabbitMQ] ( http://www.rabbitmq.com/ ) (sink/source) * [Amazon Kinesis Streams] ( http://aws.amazon.com/kinesis/streams/ ) (sink/source) * [Twitter Streaming API] ( https://dev.twitter.com/docs/streaming-apis ) (source) * [Apache NiFi] ( https://nifi.apache.org ) (sink/source) * [Apache Cassandra] ( https://cassandra.apache.org/ ) (sink) +A few basic data sources and sinks are built into Flink and are always available. +The [predefined data sources] ({{ site.baseurll }}/dev/datastream_api.html#data-sources) include reading from files, directories, and sockets, and +ingesting data from collections and iterators. +The [predefined data sinks] ({{ site.baseurl }}/dev/datastream_api.html#data-sinks) support writing to files, to stdout and stderr, and to sockets. +## Bundled Connectors +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported: -To run an application using one of these connectors, additional third party -components are usually required to be installed and launched, e.g. the servers -for the message queues. Further instructions for these can be found in the -corresponding subsections. + * [Apache Kafka] (kafka.html) (sink/source) + * [Apache Cassandra] (cassandra.html) (sink) + * [Amazon Kinesis Streams] (kinesis.html) (sink/source) + * [Elasticsearch] (elasticsearch.html) (sink) + * [Hadoop FileSystem] (filesystem_sink.html) (sink) + * [RabbitMQ] (rabbitmq.html) (sink/source) + * [Apache NiFi] (nifi.html) (sink/source) + * [Twitter Streaming API] (twitter.html) (source) + +Keep in mind that to use one of these connectors in an application, additional third party +components are usually required, e.g. servers for the data stores or message queues. +Note also that while the streaming connectors listed in this section are part of the +Flink project and are included in source releases, they are not included in the binary distributions. +Further instructions can be found in the corresponding subsections. — End diff – We can't count on people reading (or remembering) this overview page, so I think it's important for each connector page to continue to link to those instructions.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3964#discussion_r117914005

          — Diff: docs/dev/connectors/index.md —
          @@ -25,22 +25,54 @@ specific language governing permissions and limitations
          under the License.
          -->

          -Connectors provide code for interfacing with various third-party systems.
          +* toc
          +{:toc}

          -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.)
          +## Predefined Sources and Sinks

          +## Bundled Connectors

          +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:

          -To run an application using one of these connectors, additional third party
          -components are usually required to be installed and launched, e.g. the servers
          -for the message queues. Further instructions for these can be found in the
          -corresponding subsections.
          + * [Apache Kafka](kafka.html) (sink/source)
          + * [Apache Cassandra](cassandra.html) (sink)
          + * [Amazon Kinesis Streams](kinesis.html) (sink/source)
          + * [Elasticsearch](elasticsearch.html) (sink)
          + * [Hadoop FileSystem](filesystem_sink.html) (sink)
          + * [RabbitMQ](rabbitmq.html) (sink/source)
          + * [Apache NiFi](nifi.html) (sink/source)
          + * [Twitter Streaming API](twitter.html) (source)
          +
          +Keep in mind that to use one of these connectors in an application, additional third party
          +components are usually required, e.g. servers for the data stores or message queues.
          +Note also that while the streaming connectors listed in this section are part of the
          +Flink project and are included in source releases, they are not included in the binary distributions.
          +Further instructions can be found in the corresponding subsections.
          +
          +## Related Topics
          — End diff –

          I somehow find the related topics listed here a bit out-of-place. I can understand the intention, but was also wondering whether a separate "Common usage patterns" page is a better placement for these topics?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3964#discussion_r117914005 — Diff: docs/dev/connectors/index.md — @@ -25,22 +25,54 @@ specific language governing permissions and limitations under the License. --> -Connectors provide code for interfacing with various third-party systems. +* toc +{:toc} -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.) +## Predefined Sources and Sinks * [Apache Kafka] ( https://kafka.apache.org/ ) (sink/source) * [Elasticsearch] ( https://elastic.co/ ) (sink) * [Hadoop FileSystem] ( http://hadoop.apache.org ) (sink) * [RabbitMQ] ( http://www.rabbitmq.com/ ) (sink/source) * [Amazon Kinesis Streams] ( http://aws.amazon.com/kinesis/streams/ ) (sink/source) * [Twitter Streaming API] ( https://dev.twitter.com/docs/streaming-apis ) (source) * [Apache NiFi] ( https://nifi.apache.org ) (sink/source) * [Apache Cassandra] ( https://cassandra.apache.org/ ) (sink) +A few basic data sources and sinks are built into Flink and are always available. +The [predefined data sources] ({{ site.baseurll }}/dev/datastream_api.html#data-sources) include reading from files, directories, and sockets, and +ingesting data from collections and iterators. +The [predefined data sinks] ({{ site.baseurl }}/dev/datastream_api.html#data-sinks) support writing to files, to stdout and stderr, and to sockets. +## Bundled Connectors +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported: -To run an application using one of these connectors, additional third party -components are usually required to be installed and launched, e.g. the servers -for the message queues. Further instructions for these can be found in the -corresponding subsections. + * [Apache Kafka] (kafka.html) (sink/source) + * [Apache Cassandra] (cassandra.html) (sink) + * [Amazon Kinesis Streams] (kinesis.html) (sink/source) + * [Elasticsearch] (elasticsearch.html) (sink) + * [Hadoop FileSystem] (filesystem_sink.html) (sink) + * [RabbitMQ] (rabbitmq.html) (sink/source) + * [Apache NiFi] (nifi.html) (sink/source) + * [Twitter Streaming API] (twitter.html) (source) + +Keep in mind that to use one of these connectors in an application, additional third party +components are usually required, e.g. servers for the data stores or message queues. +Note also that while the streaming connectors listed in this section are part of the +Flink project and are included in source releases, they are not included in the binary distributions. +Further instructions can be found in the corresponding subsections. + +## Related Topics — End diff – I somehow find the related topics listed here a bit out-of-place. I can understand the intention, but was also wondering whether a separate "Common usage patterns" page is a better placement for these topics?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3964#discussion_r117914510

          — Diff: docs/dev/connectors/index.md —
          @@ -25,22 +25,54 @@ specific language governing permissions and limitations
          under the License.
          -->

          -Connectors provide code for interfacing with various third-party systems.
          +* toc
          +{:toc}

          -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.)
          +## Predefined Sources and Sinks

          +## Bundled Connectors

          +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:

          -To run an application using one of these connectors, additional third party
          -components are usually required to be installed and launched, e.g. the servers
          -for the message queues. Further instructions for these can be found in the
          -corresponding subsections.
          + * [Apache Kafka](kafka.html) (sink/source)
          + * [Apache Cassandra](cassandra.html) (sink)
          + * [Amazon Kinesis Streams](kinesis.html) (sink/source)
          + * [Elasticsearch](elasticsearch.html) (sink)
          + * [Hadoop FileSystem](filesystem_sink.html) (sink)
          + * [RabbitMQ](rabbitmq.html) (sink/source)
          + * [Apache NiFi](nifi.html) (sink/source)
          + * [Twitter Streaming API](twitter.html) (source)
          +
          +Keep in mind that to use one of these connectors in an application, additional third party
          +components are usually required, e.g. servers for the data stores or message queues.
          +Note also that while the streaming connectors listed in this section are part of the
          +Flink project and are included in source releases, they are not included in the binary distributions.
          +Further instructions can be found in the corresponding subsections.
          — End diff –

          The "further instructions" in all the connectors, I think, just link to https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/linking.html. Perhaps can do that here, instead of repeatedly the same instruction in the connector pages.

          Either that, or we actually put some effort in adding more per-connector-specific detail (ex. exactly which dependencies to bundle with uber jar) in each respective page.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3964#discussion_r117914510 — Diff: docs/dev/connectors/index.md — @@ -25,22 +25,54 @@ specific language governing permissions and limitations under the License. --> -Connectors provide code for interfacing with various third-party systems. +* toc +{:toc} -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.) +## Predefined Sources and Sinks * [Apache Kafka] ( https://kafka.apache.org/ ) (sink/source) * [Elasticsearch] ( https://elastic.co/ ) (sink) * [Hadoop FileSystem] ( http://hadoop.apache.org ) (sink) * [RabbitMQ] ( http://www.rabbitmq.com/ ) (sink/source) * [Amazon Kinesis Streams] ( http://aws.amazon.com/kinesis/streams/ ) (sink/source) * [Twitter Streaming API] ( https://dev.twitter.com/docs/streaming-apis ) (source) * [Apache NiFi] ( https://nifi.apache.org ) (sink/source) * [Apache Cassandra] ( https://cassandra.apache.org/ ) (sink) +A few basic data sources and sinks are built into Flink and are always available. +The [predefined data sources] ({{ site.baseurll }}/dev/datastream_api.html#data-sources) include reading from files, directories, and sockets, and +ingesting data from collections and iterators. +The [predefined data sinks] ({{ site.baseurl }}/dev/datastream_api.html#data-sinks) support writing to files, to stdout and stderr, and to sockets. +## Bundled Connectors +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported: -To run an application using one of these connectors, additional third party -components are usually required to be installed and launched, e.g. the servers -for the message queues. Further instructions for these can be found in the -corresponding subsections. + * [Apache Kafka] (kafka.html) (sink/source) + * [Apache Cassandra] (cassandra.html) (sink) + * [Amazon Kinesis Streams] (kinesis.html) (sink/source) + * [Elasticsearch] (elasticsearch.html) (sink) + * [Hadoop FileSystem] (filesystem_sink.html) (sink) + * [RabbitMQ] (rabbitmq.html) (sink/source) + * [Apache NiFi] (nifi.html) (sink/source) + * [Twitter Streaming API] (twitter.html) (source) + +Keep in mind that to use one of these connectors in an application, additional third party +components are usually required, e.g. servers for the data stores or message queues. +Note also that while the streaming connectors listed in this section are part of the +Flink project and are included in source releases, they are not included in the binary distributions. +Further instructions can be found in the corresponding subsections. — End diff – The "further instructions" in all the connectors, I think, just link to https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/linking.html . Perhaps can do that here, instead of repeatedly the same instruction in the connector pages. Either that, or we actually put some effort in adding more per-connector-specific detail (ex. exactly which dependencies to bundle with uber jar) in each respective page.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3964#discussion_r117913544

          — Diff: docs/dev/connectors/index.md —
          @@ -25,22 +25,54 @@ specific language governing permissions and limitations
          under the License.
          -->

          -Connectors provide code for interfacing with various third-party systems.
          +* toc
          +{:toc}

          -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.)
          +## Predefined Sources and Sinks

          +## Bundled Connectors

          +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:

          -To run an application using one of these connectors, additional third party
          -components are usually required to be installed and launched, e.g. the servers
          -for the message queues. Further instructions for these can be found in the
          -corresponding subsections.
          + * [Apache Kafka](kafka.html) (sink/source)
          + * [Apache Cassandra](cassandra.html) (sink)
          + * [Amazon Kinesis Streams](kinesis.html) (sink/source)
          + * [Elasticsearch](elasticsearch.html) (sink)
          + * [Hadoop FileSystem](filesystem_sink.html) (sink)
          + * [RabbitMQ](rabbitmq.html) (sink/source)
          + * [Apache NiFi](nifi.html) (sink/source)
          + * [Twitter Streaming API](twitter.html) (source)
          +
          +Keep in mind that to use one of these connectors in an application, additional third party
          +components are usually required, e.g. servers for the data stores or message queues.
          +Note also that while the streaming connectors listed in this section are part of the
          +Flink project and are included in source releases, they are not included in the binary distributions.
          +Further instructions can be found in the corresponding subsections.
          +
          +## Related Topics
          +
          +### Data Enrichment via Async I/O
          +
          +Streaming applications sometimes need to pull in data from external services and databases
          +in order to enrich their event streams.
          +Flink offers an API for [Asynchronous I/O]({{ site.baseurl }}/dev/stream/asyncio.html)
          +to make it easier to do this efficiently and robustly.
          +
          +### Side Outputs
          +
          +You can always connect an input stream to as many sinks as you like, but sometimes it is
          +useful to emit additional result streams "on the side," as it were.
          +[Side Outputs]({{ site.baseurl }}/dev/stream/side_output.html) allow you to flexibily
          +split and filter your datastream in a typesafe way.
          +
          +### Queryable State
          +
          +Rather than always pushing data to external data stores, it is also possible for external applications to query Flink,
          +and read from the partitioned state it manages on demand.
          +In some cases this [Queryable State]({{ site.baseurl }}/dev/stream/queryable_state.html) interface can
          +eliminate what would otherwise be a bottleneck.
          — End diff –

          Not sure if its just me, but it was a bit unclear to me at a first glance what "would otherwise be a bottleneck" refers to. Perhaps move the "bottleneck" aspect directly after "always pushing data to external data store" to make this more clear?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3964#discussion_r117913544 — Diff: docs/dev/connectors/index.md — @@ -25,22 +25,54 @@ specific language governing permissions and limitations under the License. --> -Connectors provide code for interfacing with various third-party systems. +* toc +{:toc} -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.) +## Predefined Sources and Sinks * [Apache Kafka] ( https://kafka.apache.org/ ) (sink/source) * [Elasticsearch] ( https://elastic.co/ ) (sink) * [Hadoop FileSystem] ( http://hadoop.apache.org ) (sink) * [RabbitMQ] ( http://www.rabbitmq.com/ ) (sink/source) * [Amazon Kinesis Streams] ( http://aws.amazon.com/kinesis/streams/ ) (sink/source) * [Twitter Streaming API] ( https://dev.twitter.com/docs/streaming-apis ) (source) * [Apache NiFi] ( https://nifi.apache.org ) (sink/source) * [Apache Cassandra] ( https://cassandra.apache.org/ ) (sink) +A few basic data sources and sinks are built into Flink and are always available. +The [predefined data sources] ({{ site.baseurll }}/dev/datastream_api.html#data-sources) include reading from files, directories, and sockets, and +ingesting data from collections and iterators. +The [predefined data sinks] ({{ site.baseurl }}/dev/datastream_api.html#data-sinks) support writing to files, to stdout and stderr, and to sockets. +## Bundled Connectors +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported: -To run an application using one of these connectors, additional third party -components are usually required to be installed and launched, e.g. the servers -for the message queues. Further instructions for these can be found in the -corresponding subsections. + * [Apache Kafka] (kafka.html) (sink/source) + * [Apache Cassandra] (cassandra.html) (sink) + * [Amazon Kinesis Streams] (kinesis.html) (sink/source) + * [Elasticsearch] (elasticsearch.html) (sink) + * [Hadoop FileSystem] (filesystem_sink.html) (sink) + * [RabbitMQ] (rabbitmq.html) (sink/source) + * [Apache NiFi] (nifi.html) (sink/source) + * [Twitter Streaming API] (twitter.html) (source) + +Keep in mind that to use one of these connectors in an application, additional third party +components are usually required, e.g. servers for the data stores or message queues. +Note also that while the streaming connectors listed in this section are part of the +Flink project and are included in source releases, they are not included in the binary distributions. +Further instructions can be found in the corresponding subsections. + +## Related Topics + +### Data Enrichment via Async I/O + +Streaming applications sometimes need to pull in data from external services and databases +in order to enrich their event streams. +Flink offers an API for [Asynchronous I/O] ({{ site.baseurl }}/dev/stream/asyncio.html) +to make it easier to do this efficiently and robustly. + +### Side Outputs + +You can always connect an input stream to as many sinks as you like, but sometimes it is +useful to emit additional result streams "on the side," as it were. + [Side Outputs] ({{ site.baseurl }}/dev/stream/side_output.html) allow you to flexibily +split and filter your datastream in a typesafe way. + +### Queryable State + +Rather than always pushing data to external data stores, it is also possible for external applications to query Flink, +and read from the partitioned state it manages on demand. +In some cases this [Queryable State] ({{ site.baseurl }}/dev/stream/queryable_state.html) interface can +eliminate what would otherwise be a bottleneck. — End diff – Not sure if its just me, but it was a bit unclear to me at a first glance what "would otherwise be a bottleneck" refers to. Perhaps move the "bottleneck" aspect directly after "always pushing data to external data store" to make this more clear?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3964#discussion_r117913269

          — Diff: docs/dev/connectors/index.md —
          @@ -25,22 +25,54 @@ specific language governing permissions and limitations
          under the License.
          -->

          -Connectors provide code for interfacing with various third-party systems.
          +* toc
          +{:toc}

          -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.)
          +## Predefined Sources and Sinks

          +## Bundled Connectors

          +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:

          -To run an application using one of these connectors, additional third party
          -components are usually required to be installed and launched, e.g. the servers
          -for the message queues. Further instructions for these can be found in the
          -corresponding subsections.
          + * [Apache Kafka](kafka.html) (sink/source)
          + * [Apache Cassandra](cassandra.html) (sink)
          + * [Amazon Kinesis Streams](kinesis.html) (sink/source)
          + * [Elasticsearch](elasticsearch.html) (sink)
          + * [Hadoop FileSystem](filesystem_sink.html) (sink)
          + * [RabbitMQ](rabbitmq.html) (sink/source)
          + * [Apache NiFi](nifi.html) (sink/source)
          + * [Twitter Streaming API](twitter.html) (source)
          +
          +Keep in mind that to use one of these connectors in an application, additional third party
          +components are usually required, e.g. servers for the data stores or message queues.
          +Note also that while the streaming connectors listed in this section are part of the
          +Flink project and are included in source releases, they are not included in the binary distributions.
          +Further instructions can be found in the corresponding subsections.
          +
          +## Related Topics
          +
          +### Data Enrichment via Async I/O
          +
          +Streaming applications sometimes need to pull in data from external services and databases
          +in order to enrich their event streams.
          +Flink offers an API for [Asynchronous I/O]({{ site.baseurl }}/dev/stream/asyncio.html)
          +to make it easier to do this efficiently and robustly.
          +
          +### Side Outputs
          +
          +You can always connect an input stream to as many sinks as you like, but sometimes it is
          +useful to emit additional result streams "on the side," as it were.
          +[Side Outputs]({{ site.baseurl }}/dev/stream/side_output.html) allow you to flexibily
          +split and filter your datastream in a typesafe way.
          +
          +### Queryable State
          +
          +Rather than always pushing data to external data stores, it is also possible for external applications to query Flink,
          +and read from the partitioned state it manages on demand.
          +In some cases this [Queryable State]({{ site.baseurl }}/dev/stream/queryable_state.html) interface can
          — End diff –

          "this" --> "the" somehow feels more natural to me?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3964#discussion_r117913269 — Diff: docs/dev/connectors/index.md — @@ -25,22 +25,54 @@ specific language governing permissions and limitations under the License. --> -Connectors provide code for interfacing with various third-party systems. +* toc +{:toc} -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.) +## Predefined Sources and Sinks * [Apache Kafka] ( https://kafka.apache.org/ ) (sink/source) * [Elasticsearch] ( https://elastic.co/ ) (sink) * [Hadoop FileSystem] ( http://hadoop.apache.org ) (sink) * [RabbitMQ] ( http://www.rabbitmq.com/ ) (sink/source) * [Amazon Kinesis Streams] ( http://aws.amazon.com/kinesis/streams/ ) (sink/source) * [Twitter Streaming API] ( https://dev.twitter.com/docs/streaming-apis ) (source) * [Apache NiFi] ( https://nifi.apache.org ) (sink/source) * [Apache Cassandra] ( https://cassandra.apache.org/ ) (sink) +A few basic data sources and sinks are built into Flink and are always available. +The [predefined data sources] ({{ site.baseurll }}/dev/datastream_api.html#data-sources) include reading from files, directories, and sockets, and +ingesting data from collections and iterators. +The [predefined data sinks] ({{ site.baseurl }}/dev/datastream_api.html#data-sinks) support writing to files, to stdout and stderr, and to sockets. +## Bundled Connectors +Connectors provide code for interfacing with various third-party systems. Currently these systems are supported: -To run an application using one of these connectors, additional third party -components are usually required to be installed and launched, e.g. the servers -for the message queues. Further instructions for these can be found in the -corresponding subsections. + * [Apache Kafka] (kafka.html) (sink/source) + * [Apache Cassandra] (cassandra.html) (sink) + * [Amazon Kinesis Streams] (kinesis.html) (sink/source) + * [Elasticsearch] (elasticsearch.html) (sink) + * [Hadoop FileSystem] (filesystem_sink.html) (sink) + * [RabbitMQ] (rabbitmq.html) (sink/source) + * [Apache NiFi] (nifi.html) (sink/source) + * [Twitter Streaming API] (twitter.html) (source) + +Keep in mind that to use one of these connectors in an application, additional third party +components are usually required, e.g. servers for the data stores or message queues. +Note also that while the streaming connectors listed in this section are part of the +Flink project and are included in source releases, they are not included in the binary distributions. +Further instructions can be found in the corresponding subsections. + +## Related Topics + +### Data Enrichment via Async I/O + +Streaming applications sometimes need to pull in data from external services and databases +in order to enrich their event streams. +Flink offers an API for [Asynchronous I/O] ({{ site.baseurl }}/dev/stream/asyncio.html) +to make it easier to do this efficiently and robustly. + +### Side Outputs + +You can always connect an input stream to as many sinks as you like, but sometimes it is +useful to emit additional result streams "on the side," as it were. + [Side Outputs] ({{ site.baseurl }}/dev/stream/side_output.html) allow you to flexibily +split and filter your datastream in a typesafe way. + +### Queryable State + +Rather than always pushing data to external data stores, it is also possible for external applications to query Flink, +and read from the partitioned state it manages on demand. +In some cases this [Queryable State] ({{ site.baseurl }}/dev/stream/queryable_state.html) interface can — End diff – "this" --> "the" somehow feels more natural to me?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user greghogan commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3964#discussion_r117838617

          — Diff: docs/dev/connectors/index.md —
          @@ -25,22 +25,54 @@ specific language governing permissions and limitations
          under the License.
          -->

          -Connectors provide code for interfacing with various third-party systems.
          +* toc
          +{:toc}

          -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.)
          +## Predefined Sources and Sinks

          Should there be a comma between sockets and and?

          Show
          githubbot ASF GitHub Bot added a comment - Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/3964#discussion_r117838617 — Diff: docs/dev/connectors/index.md — @@ -25,22 +25,54 @@ specific language governing permissions and limitations under the License. --> -Connectors provide code for interfacing with various third-party systems. +* toc +{:toc} -Currently these systems are supported: (Please select the respective documentation page from the navigation on the left.) +## Predefined Sources and Sinks * [Apache Kafka] ( https://kafka.apache.org/ ) (sink/source) * [Elasticsearch] ( https://elastic.co/ ) (sink) * [Hadoop FileSystem] ( http://hadoop.apache.org ) (sink) * [RabbitMQ] ( http://www.rabbitmq.com/ ) (sink/source) * [Amazon Kinesis Streams] ( http://aws.amazon.com/kinesis/streams/ ) (sink/source) * [Twitter Streaming API] ( https://dev.twitter.com/docs/streaming-apis ) (source) * [Apache NiFi] ( https://nifi.apache.org ) (sink/source) * [Apache Cassandra] ( https://cassandra.apache.org/ ) (sink) +A few basic data sources and sinks are built into Flink and are always available. +The [predefined data sources] ({{ site.baseurll }}/dev/datastream_api.html#data-sources) include reading from files, directories, and sockets, and End diff – Should there be a comma between sockets and and?
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user alpinegizmo opened a pull request:

          https://github.com/apache/flink/pull/3964

          FLINK-6660[docs] expand the connectors overview page

          Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
          If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
          In addition to going through the list, please provide a meaningful description of your changes.

          • [x] General
          • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
          • The pull request addresses only one issue
          • Each commit in the PR has a meaningful commit message (including the JIRA id)

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/alpinegizmo/flink 6660-connectors-docs

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3964.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3964


          commit d02b1da7ca570e1fc369595d91d7df1b2230918d
          Author: David Anderson <david@alpinegizmo.com>
          Date: 2017-05-22T15:39:34Z

          FLINK-6660[docs] expand the connectors overview page


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user alpinegizmo opened a pull request: https://github.com/apache/flink/pull/3964 FLINK-6660 [docs] expand the connectors overview page Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide] ( http://flink.apache.org/how-to-contribute.html ). In addition to going through the list, please provide a meaningful description of your changes. [x] General The pull request references the related JIRA issue (" [FLINK-XXX] Jira title text") The pull request addresses only one issue Each commit in the PR has a meaningful commit message (including the JIRA id) You can merge this pull request into a Git repository by running: $ git pull https://github.com/alpinegizmo/flink 6660-connectors-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3964.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3964 commit d02b1da7ca570e1fc369595d91d7df1b2230918d Author: David Anderson <david@alpinegizmo.com> Date: 2017-05-22T15:39:34Z FLINK-6660 [docs] expand the connectors overview page

            People

            • Assignee:
              alpinegizmo David Anderson
              Reporter:
              alpinegizmo David Anderson
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development