Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Streaming Connectors
    • Labels:
      None

      Description

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user mikedias opened a pull request:

          https://github.com/apache/flink/pull/2767

          FLINK-4988 Elasticsearch 5.x support

          Provides Elasticsearch 5.x support based on previous 2.x codebase.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/mikedias/flink FLINK-4988

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/2767.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2767



          Show
          githubbot ASF GitHub Bot added a comment - GitHub user mikedias opened a pull request: https://github.com/apache/flink/pull/2767 FLINK-4988 Elasticsearch 5.x support Provides Elasticsearch 5.x support based on previous 2.x codebase. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mikedias/flink FLINK-4988 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2767.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2767
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/2767

          Is elastic search backwards compatible? Is there a way to support multiple Elasticsearch versions in one connector?
          Would be nice not having to maintain three different versions of the ElasticSearch connector.

          @rmetzger What is your take on this?

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2767 Is elastic search backwards compatible? Is there a way to support multiple Elasticsearch versions in one connector? Would be nice not having to maintain three different versions of the ElasticSearch connector. @rmetzger What is your take on this?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/2767

          From taking a quick look at it, I would suggest that this connector shades netty away. That way we can avoid conflicts whenever we adjust Flink's internal netty version.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2767 From taking a quick look at it, I would suggest that this connector shades netty away. That way we can avoid conflicts whenever we adjust Flink's internal netty version.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikedias commented on the issue:

          https://github.com/apache/flink/pull/2767

          No, ES is not backward compatible... But we can reuse some classes or interfaces between versions. I have plans to do this in another PR, just for isolate possible issues.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikedias commented on the issue: https://github.com/apache/flink/pull/2767 No, ES is not backward compatible... But we can reuse some classes or interfaces between versions. I have plans to do this in another PR, just for isolate possible issues.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikedias commented on the issue:

          https://github.com/apache/flink/pull/2767

          Not sure if I can exclude netty4 dependency, but I'll take a look.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikedias commented on the issue: https://github.com/apache/flink/pull/2767 Not sure if I can exclude netty4 dependency, but I'll take a look.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/2767

          If you cannot exclude the netty dependency, you could try to shade it away: https://maven.apache.org/plugins/maven-shade-plugin/

          We do this all the time for conflicting dependencies.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2767 If you cannot exclude the netty dependency, you could try to shade it away: https://maven.apache.org/plugins/maven-shade-plugin/ We do this all the time for conflicting dependencies.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/2767

          @mikedias @rmetzger How do we proceed with this pull request?
          Should it become part of Flink, Bahir, or should hosted in the contributor's repository?

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2767 @mikedias @rmetzger How do we proceed with this pull request? Should it become part of Flink, Bahir, or should hosted in the contributor's repository?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/2767#discussion_r93052912

          — Diff: flink-streaming-connectors/flink-connector-elasticsearch5/pom.xml —
          @@ -0,0 +1,93 @@
          +<?xml version="1.0" encoding="UTF-8"?>
          +<!--
          +Licensed to the Apache Software Foundation (ASF) under one
          +or more contributor license agreements. See the NOTICE file
          +distributed with this work for additional information
          +regarding copyright ownership. The ASF licenses this file
          +to you under the Apache License, Version 2.0 (the
          +"License"); you may not use this file except in compliance
          +with the License. You may obtain a copy of the License at
          +
          + http://www.apache.org/licenses/LICENSE-2.0
          +
          +Unless required by applicable law or agreed to in writing,
          +software distributed under the License is distributed on an
          +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
          +KIND, either express or implied. See the License for the
          +specific language governing permissions and limitations
          +under the License.
          +-->
          +<project xmlns="http://maven.apache.org/POM/4.0.0"
          + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
          +
          + <modelVersion>4.0.0</modelVersion>
          +
          + <parent>
          + <groupId>org.apache.flink</groupId>
          + <artifactId>flink-streaming-connectors</artifactId>
          + <version>1.2-SNAPSHOT</version>
          + <relativePath>..</relativePath>
          + </parent>
          +
          + <artifactId>flink-connector-elasticsearch5_2.10</artifactId>
          + <name>flink-connector-elasticsearch5</name>
          +
          + <packaging>jar</packaging>
          +
          + <!-- Allow users to pass custom connector versions -->
          + <properties>
          + <elasticsearch.version>5.0.0</elasticsearch.version>
          + </properties>
          +
          + <dependencies>
          +
          + <!-- core dependencies -->
          +
          + <dependency>
          + <groupId>org.apache.flink</groupId>
          + <artifactId>flink-streaming-java_2.10</artifactId>
          + <version>$

          {project.version}

          </version>
          + <scope>provided</scope>
          + </dependency>
          +
          + <dependency>
          + <groupId>org.elasticsearch.client</groupId>
          + <artifactId>transport</artifactId>
          + <version>$

          {elasticsearch.version}

          </version>
          + </dependency>
          +
          + <dependency>
          + <groupId>org.apache.logging.log4j</groupId>
          + <artifactId>log4j-api</artifactId>
          + <version>2.7</version>
          + </dependency>
          + <dependency>
          + <groupId>org.apache.logging.log4j</groupId>
          + <artifactId>log4j-core</artifactId>
          + <version>2.7</version>
          — End diff –

          Why did you add log4j2 dependencies to the project?

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/2767#discussion_r93052912 — Diff: flink-streaming-connectors/flink-connector-elasticsearch5/pom.xml — @@ -0,0 +1,93 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd "> + + <modelVersion>4.0.0</modelVersion> + + <parent> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-connectors</artifactId> + <version>1.2-SNAPSHOT</version> + <relativePath>..</relativePath> + </parent> + + <artifactId>flink-connector-elasticsearch5_2.10</artifactId> + <name>flink-connector-elasticsearch5</name> + + <packaging>jar</packaging> + + <!-- Allow users to pass custom connector versions --> + <properties> + <elasticsearch.version>5.0.0</elasticsearch.version> + </properties> + + <dependencies> + + <!-- core dependencies --> + + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java_2.10</artifactId> + <version>$ {project.version} </version> + <scope>provided</scope> + </dependency> + + <dependency> + <groupId>org.elasticsearch.client</groupId> + <artifactId>transport</artifactId> + <version>$ {elasticsearch.version} </version> + </dependency> + + <dependency> + <groupId>org.apache.logging.log4j</groupId> + <artifactId>log4j-api</artifactId> + <version>2.7</version> + </dependency> + <dependency> + <groupId>org.apache.logging.log4j</groupId> + <artifactId>log4j-core</artifactId> + <version>2.7</version> — End diff – Why did you add log4j2 dependencies to the project?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/2767#discussion_r93053073

          — Diff: flink-streaming-connectors/flink-connector-elasticsearch5/pom.xml —
          @@ -0,0 +1,93 @@
          +<?xml version="1.0" encoding="UTF-8"?>
          +<!--
          +Licensed to the Apache Software Foundation (ASF) under one
          +or more contributor license agreements. See the NOTICE file
          +distributed with this work for additional information
          +regarding copyright ownership. The ASF licenses this file
          +to you under the Apache License, Version 2.0 (the
          +"License"); you may not use this file except in compliance
          +with the License. You may obtain a copy of the License at
          +
          + http://www.apache.org/licenses/LICENSE-2.0
          +
          +Unless required by applicable law or agreed to in writing,
          +software distributed under the License is distributed on an
          +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
          +KIND, either express or implied. See the License for the
          +specific language governing permissions and limitations
          +under the License.
          +-->
          +<project xmlns="http://maven.apache.org/POM/4.0.0"
          + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
          +
          + <modelVersion>4.0.0</modelVersion>
          +
          + <parent>
          + <groupId>org.apache.flink</groupId>
          + <artifactId>flink-streaming-connectors</artifactId>
          — End diff –

          The parent's name has changed in the meantime (we've refactored our maven structure a bit for the connectors)

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/2767#discussion_r93053073 — Diff: flink-streaming-connectors/flink-connector-elasticsearch5/pom.xml — @@ -0,0 +1,93 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd "> + + <modelVersion>4.0.0</modelVersion> + + <parent> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-connectors</artifactId> — End diff – The parent's name has changed in the meantime (we've refactored our maven structure a bit for the connectors)
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/2767

          @StephanEwen I'll start a discussion on the mailing list to decide how we want to proceed.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2767 @StephanEwen I'll start a discussion on the mailing list to decide how we want to proceed.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikedias commented on the issue:

          https://github.com/apache/flink/pull/2767

          I have no problem in hosting the connector in my github account.

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikedias commented on the issue: https://github.com/apache/flink/pull/2767 I have no problem in hosting the connector in my github account.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/2767

          @mikedias @rmetzger @StephanEwen
          I've also responded to the discussion in the ML started by Robert with a +1.

          A recap of the proposed approach on how we proceed:
          since the ES connectors share a lot of code, we'll refactor the ES connectors by having a `flink-connector-elasticsearch-base` module, and let version specific connections stem from that; basically how we're currently maintaining the Kafka connectors. There are two +1 votes for this approach, and no other objections, so I think we can agree on proceeding.

          @mikedias, are you be ok with me using your PR as a base to refactor the ES connectors? I'll open a new PR with your changes and mine together. I'll also let you know when the PR is opened

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2767 @mikedias @rmetzger @StephanEwen I've also responded to the discussion in the ML started by Robert with a +1. A recap of the proposed approach on how we proceed: since the ES connectors share a lot of code, we'll refactor the ES connectors by having a `flink-connector-elasticsearch-base` module, and let version specific connections stem from that; basically how we're currently maintaining the Kafka connectors. There are two +1 votes for this approach, and no other objections, so I think we can agree on proceeding. @mikedias, are you be ok with me using your PR as a base to refactor the ES connectors? I'll open a new PR with your changes and mine together. I'll also let you know when the PR is opened
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikedias commented on the issue:

          https://github.com/apache/flink/pull/2767

          @tzulitai sure, no problem!

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikedias commented on the issue: https://github.com/apache/flink/pull/2767 @tzulitai sure, no problem!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/2767#discussion_r95287899

          — Diff: flink-connectors/flink-connector-elasticsearch5/src/main/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSink.java —
          @@ -0,0 +1,259 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.streaming.connectors.elasticsearch5;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.Preconditions;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.action.index.IndexRequest;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.client.transport.TransportClient;
          +import org.elasticsearch.common.network.NetworkModule;
          +import org.elasticsearch.common.settings.Settings;
          +import org.elasticsearch.common.transport.InetSocketTransportAddress;
          +import org.elasticsearch.common.transport.TransportAddress;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.elasticsearch.transport.Netty3Plugin;
          +import org.elasticsearch.transport.client.PreBuiltTransportClient;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.net.InetSocketAddress;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +/**
          + * Sink that emits its input elements in bulk to an Elasticsearch cluster.
          + * <p>
          + * <p>
          + * The first

          {@link Map} passed to the constructor is forwarded to Elasticsearch when creating
          + * {@link TransportClient}. The config keys can be found in the Elasticsearch
          + * documentation. An important setting is {@code cluster.name}, this should be set to the name
          + * of the cluster that the sink should emit to.
          + * <p>
          + * <b>Attention: </b> When using the {@code TransportClient} the sink will fail if no cluster
          + * can be connected to.
          + * <p>
          + * The second {@link Map}

          is used to configure a

          {@link BulkProcessor}

          to send

          {@link IndexRequest IndexRequests}

          .
          + * This will buffer elements before sending a request to the cluster. The behaviour of the
          + *

          {@code BulkProcessor}

          can be configured using these config keys:
          + * <ul>
          + * <li>

          {@code bulk.flush.max.actions}

          : Maximum amount of elements to buffer
          + * <li>

          {@code bulk.flush.max.size.mb}

          : Maximum amount of data (in megabytes) to buffer
          + * <li>

          {@code bulk.flush.interval.ms}

          : Interval at which to flush data regardless of the other two
          + * settings in milliseconds
          + * </ul>
          + * <p>
          + * <p>
          + * You also have to provide an

          {@link RequestIndexer}. This is used to create an
          + * {@link IndexRequest} from an element that needs to be added to Elasticsearch. See
          + * {@link RequestIndexer}

          for an example.
          + *
          + * @param <T> Type of the elements emitted by this sink
          + */
          +public class ElasticsearchSink<T> extends RichSinkFunction<T> {
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSink.class);
          +
          + /**
          + * The user specified config map that we forward to Elasticsearch when we create the Client.
          + */
          + private final Map<String, String> esConfig;
          +
          + /**
          + * The user specified config map that we use to configure BulkProcessor.
          + */
          + private final Map<String, String> sinkConfig;
          +
          + /**
          + * The list of nodes that the TransportClient should connect to. This is null if we are using
          + * an embedded Node to get a Client.
          + */
          + private final List<InetSocketAddress> transportAddresses;
          +
          + /**
          + * The builder that is used to construct an

          {@link IndexRequest}

          from the incoming element.
          + */
          + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction;
          +
          + /**
          + * The Client that was either retrieved from a Node or is a TransportClient.
          + */
          + private transient Client client;
          +
          + /**
          + * Bulk processor that was created using the client
          + */
          + private transient BulkProcessor bulkProcessor;
          +
          + /**
          + * Bulk

          {@link org.elasticsearch.action.ActionRequest}

          indexer
          + */
          + private transient RequestIndexer requestIndexer;
          +
          + /**
          + * This is set from inside the BulkProcessor listener if there where failures in processing.
          + */
          + private final AtomicBoolean hasFailure = new AtomicBoolean(false);
          +
          + /**
          + * This is set from inside the BulkProcessor listener if a Throwable was thrown during processing.
          + */
          + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>();
          +
          + /**
          + * Creates a new ElasticsearchSink that connects to the cluster using a TransportClient.
          + *
          + * @param esConfig The map of user settings that are passed when constructing the TransportClient
          + * @param sinkConfig The map of user settings that are passed when constructing the BulkProcessor
          + * @param transportAddresses The Elasticsearch Nodes to which to connect using a

          {@code TransportClient}

          + * @param elasticsearchSinkFunction This is used to generate the ActionRequest from the incoming element
          + */
          + public ElasticsearchSink(Map<String, String> esConfig, Map<String, String> sinkConfig, List<InetSocketAddress> transportAddresses, ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          — End diff –

          Hi @mikedias,
          While refactoring the ES connectors based on your PR, I noticed that you changed the constructor's arguments to take the config for the client & bulk processor as separate `Map`s.

          Is there an absolute need for this? Let me know if I'm missing anything, otherwise I'd prefer to keep this as consistent as possible across the different ES versions (I'll change this as part of the new PR, if you agree).

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/2767#discussion_r95287899 — Diff: flink-connectors/flink-connector-elasticsearch5/src/main/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSink.java — @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.streaming.connectors.elasticsearch5; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.Preconditions; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.action.index.IndexRequest; +import org.elasticsearch.client.Client; +import org.elasticsearch.client.transport.TransportClient; +import org.elasticsearch.common.network.NetworkModule; +import org.elasticsearch.common.settings.Settings; +import org.elasticsearch.common.transport.InetSocketTransportAddress; +import org.elasticsearch.common.transport.TransportAddress; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.elasticsearch.transport.Netty3Plugin; +import org.elasticsearch.transport.client.PreBuiltTransportClient; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetSocketAddress; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +/** + * Sink that emits its input elements in bulk to an Elasticsearch cluster. + * <p> + * <p> + * The first {@link Map} passed to the constructor is forwarded to Elasticsearch when creating + * {@link TransportClient}. The config keys can be found in the Elasticsearch + * documentation. An important setting is {@code cluster.name}, this should be set to the name + * of the cluster that the sink should emit to. + * <p> + * <b>Attention: </b> When using the {@code TransportClient} the sink will fail if no cluster + * can be connected to. + * <p> + * The second {@link Map} is used to configure a {@link BulkProcessor} to send {@link IndexRequest IndexRequests} . + * This will buffer elements before sending a request to the cluster. The behaviour of the + * {@code BulkProcessor} can be configured using these config keys: + * <ul> + * <li> {@code bulk.flush.max.actions} : Maximum amount of elements to buffer + * <li> {@code bulk.flush.max.size.mb} : Maximum amount of data (in megabytes) to buffer + * <li> {@code bulk.flush.interval.ms} : Interval at which to flush data regardless of the other two + * settings in milliseconds + * </ul> + * <p> + * <p> + * You also have to provide an {@link RequestIndexer}. This is used to create an + * {@link IndexRequest} from an element that needs to be added to Elasticsearch. See + * {@link RequestIndexer} for an example. + * + * @param <T> Type of the elements emitted by this sink + */ +public class ElasticsearchSink<T> extends RichSinkFunction<T> { + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSink.class); + + /** + * The user specified config map that we forward to Elasticsearch when we create the Client. + */ + private final Map<String, String> esConfig; + + /** + * The user specified config map that we use to configure BulkProcessor. + */ + private final Map<String, String> sinkConfig; + + /** + * The list of nodes that the TransportClient should connect to. This is null if we are using + * an embedded Node to get a Client. + */ + private final List<InetSocketAddress> transportAddresses; + + /** + * The builder that is used to construct an {@link IndexRequest} from the incoming element. + */ + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction; + + /** + * The Client that was either retrieved from a Node or is a TransportClient. + */ + private transient Client client; + + /** + * Bulk processor that was created using the client + */ + private transient BulkProcessor bulkProcessor; + + /** + * Bulk {@link org.elasticsearch.action.ActionRequest} indexer + */ + private transient RequestIndexer requestIndexer; + + /** + * This is set from inside the BulkProcessor listener if there where failures in processing. + */ + private final AtomicBoolean hasFailure = new AtomicBoolean(false); + + /** + * This is set from inside the BulkProcessor listener if a Throwable was thrown during processing. + */ + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>(); + + /** + * Creates a new ElasticsearchSink that connects to the cluster using a TransportClient. + * + * @param esConfig The map of user settings that are passed when constructing the TransportClient + * @param sinkConfig The map of user settings that are passed when constructing the BulkProcessor + * @param transportAddresses The Elasticsearch Nodes to which to connect using a {@code TransportClient} + * @param elasticsearchSinkFunction This is used to generate the ActionRequest from the incoming element + */ + public ElasticsearchSink(Map<String, String> esConfig, Map<String, String> sinkConfig, List<InetSocketAddress> transportAddresses, ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { — End diff – Hi @mikedias, While refactoring the ES connectors based on your PR, I noticed that you changed the constructor's arguments to take the config for the client & bulk processor as separate `Map`s. Is there an absolute need for this? Let me know if I'm missing anything, otherwise I'd prefer to keep this as consistent as possible across the different ES versions (I'll change this as part of the new PR, if you agree).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/2767

          Sounds good. Thank you for taking care of this @tzulitai

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2767 Sounds good. Thank you for taking care of this @tzulitai
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikedias commented on a diff in the pull request:

          https://github.com/apache/flink/pull/2767#discussion_r95480042

          — Diff: flink-connectors/flink-connector-elasticsearch5/src/main/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSink.java —
          @@ -0,0 +1,259 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.streaming.connectors.elasticsearch5;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.Preconditions;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.action.index.IndexRequest;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.client.transport.TransportClient;
          +import org.elasticsearch.common.network.NetworkModule;
          +import org.elasticsearch.common.settings.Settings;
          +import org.elasticsearch.common.transport.InetSocketTransportAddress;
          +import org.elasticsearch.common.transport.TransportAddress;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.elasticsearch.transport.Netty3Plugin;
          +import org.elasticsearch.transport.client.PreBuiltTransportClient;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.net.InetSocketAddress;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +/**
          + * Sink that emits its input elements in bulk to an Elasticsearch cluster.
          + * <p>
          + * <p>
          + * The first

          {@link Map} passed to the constructor is forwarded to Elasticsearch when creating
          + * {@link TransportClient}. The config keys can be found in the Elasticsearch
          + * documentation. An important setting is {@code cluster.name}, this should be set to the name
          + * of the cluster that the sink should emit to.
          + * <p>
          + * <b>Attention: </b> When using the {@code TransportClient} the sink will fail if no cluster
          + * can be connected to.
          + * <p>
          + * The second {@link Map}

          is used to configure a

          {@link BulkProcessor}

          to send

          {@link IndexRequest IndexRequests}

          .
          + * This will buffer elements before sending a request to the cluster. The behaviour of the
          + *

          {@code BulkProcessor}

          can be configured using these config keys:
          + * <ul>
          + * <li>

          {@code bulk.flush.max.actions}

          : Maximum amount of elements to buffer
          + * <li>

          {@code bulk.flush.max.size.mb}

          : Maximum amount of data (in megabytes) to buffer
          + * <li>

          {@code bulk.flush.interval.ms}

          : Interval at which to flush data regardless of the other two
          + * settings in milliseconds
          + * </ul>
          + * <p>
          + * <p>
          + * You also have to provide an

          {@link RequestIndexer}. This is used to create an
          + * {@link IndexRequest} from an element that needs to be added to Elasticsearch. See
          + * {@link RequestIndexer}

          for an example.
          + *
          + * @param <T> Type of the elements emitted by this sink
          + */
          +public class ElasticsearchSink<T> extends RichSinkFunction<T> {
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSink.class);
          +
          + /**
          + * The user specified config map that we forward to Elasticsearch when we create the Client.
          + */
          + private final Map<String, String> esConfig;
          +
          + /**
          + * The user specified config map that we use to configure BulkProcessor.
          + */
          + private final Map<String, String> sinkConfig;
          +
          + /**
          + * The list of nodes that the TransportClient should connect to. This is null if we are using
          + * an embedded Node to get a Client.
          + */
          + private final List<InetSocketAddress> transportAddresses;
          +
          + /**
          + * The builder that is used to construct an

          {@link IndexRequest}

          from the incoming element.
          + */
          + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction;
          +
          + /**
          + * The Client that was either retrieved from a Node or is a TransportClient.
          + */
          + private transient Client client;
          +
          + /**
          + * Bulk processor that was created using the client
          + */
          + private transient BulkProcessor bulkProcessor;
          +
          + /**
          + * Bulk

          {@link org.elasticsearch.action.ActionRequest}

          indexer
          + */
          + private transient RequestIndexer requestIndexer;
          +
          + /**
          + * This is set from inside the BulkProcessor listener if there where failures in processing.
          + */
          + private final AtomicBoolean hasFailure = new AtomicBoolean(false);
          +
          + /**
          + * This is set from inside the BulkProcessor listener if a Throwable was thrown during processing.
          + */
          + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>();
          +
          + /**
          + * Creates a new ElasticsearchSink that connects to the cluster using a TransportClient.
          + *
          + * @param esConfig The map of user settings that are passed when constructing the TransportClient
          + * @param sinkConfig The map of user settings that are passed when constructing the BulkProcessor
          + * @param transportAddresses The Elasticsearch Nodes to which to connect using a

          {@code TransportClient}

          + * @param elasticsearchSinkFunction This is used to generate the ActionRequest from the incoming element
          + */
          + public ElasticsearchSink(Map<String, String> esConfig, Map<String, String> sinkConfig, List<InetSocketAddress> transportAddresses, ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          — End diff –

          Hi @tzulitai, It is necessary because ES 5.x is strict about configuration, not accepting any extra configuration that is not recognized. See https://www.elastic.co/guide/en/elasticsearch/reference/5.x/breaking_50_settings_changes.html

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikedias commented on a diff in the pull request: https://github.com/apache/flink/pull/2767#discussion_r95480042 — Diff: flink-connectors/flink-connector-elasticsearch5/src/main/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSink.java — @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.streaming.connectors.elasticsearch5; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.Preconditions; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.action.index.IndexRequest; +import org.elasticsearch.client.Client; +import org.elasticsearch.client.transport.TransportClient; +import org.elasticsearch.common.network.NetworkModule; +import org.elasticsearch.common.settings.Settings; +import org.elasticsearch.common.transport.InetSocketTransportAddress; +import org.elasticsearch.common.transport.TransportAddress; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.elasticsearch.transport.Netty3Plugin; +import org.elasticsearch.transport.client.PreBuiltTransportClient; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetSocketAddress; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +/** + * Sink that emits its input elements in bulk to an Elasticsearch cluster. + * <p> + * <p> + * The first {@link Map} passed to the constructor is forwarded to Elasticsearch when creating + * {@link TransportClient}. The config keys can be found in the Elasticsearch + * documentation. An important setting is {@code cluster.name}, this should be set to the name + * of the cluster that the sink should emit to. + * <p> + * <b>Attention: </b> When using the {@code TransportClient} the sink will fail if no cluster + * can be connected to. + * <p> + * The second {@link Map} is used to configure a {@link BulkProcessor} to send {@link IndexRequest IndexRequests} . + * This will buffer elements before sending a request to the cluster. The behaviour of the + * {@code BulkProcessor} can be configured using these config keys: + * <ul> + * <li> {@code bulk.flush.max.actions} : Maximum amount of elements to buffer + * <li> {@code bulk.flush.max.size.mb} : Maximum amount of data (in megabytes) to buffer + * <li> {@code bulk.flush.interval.ms} : Interval at which to flush data regardless of the other two + * settings in milliseconds + * </ul> + * <p> + * <p> + * You also have to provide an {@link RequestIndexer}. This is used to create an + * {@link IndexRequest} from an element that needs to be added to Elasticsearch. See + * {@link RequestIndexer} for an example. + * + * @param <T> Type of the elements emitted by this sink + */ +public class ElasticsearchSink<T> extends RichSinkFunction<T> { + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSink.class); + + /** + * The user specified config map that we forward to Elasticsearch when we create the Client. + */ + private final Map<String, String> esConfig; + + /** + * The user specified config map that we use to configure BulkProcessor. + */ + private final Map<String, String> sinkConfig; + + /** + * The list of nodes that the TransportClient should connect to. This is null if we are using + * an embedded Node to get a Client. + */ + private final List<InetSocketAddress> transportAddresses; + + /** + * The builder that is used to construct an {@link IndexRequest} from the incoming element. + */ + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction; + + /** + * The Client that was either retrieved from a Node or is a TransportClient. + */ + private transient Client client; + + /** + * Bulk processor that was created using the client + */ + private transient BulkProcessor bulkProcessor; + + /** + * Bulk {@link org.elasticsearch.action.ActionRequest} indexer + */ + private transient RequestIndexer requestIndexer; + + /** + * This is set from inside the BulkProcessor listener if there where failures in processing. + */ + private final AtomicBoolean hasFailure = new AtomicBoolean(false); + + /** + * This is set from inside the BulkProcessor listener if a Throwable was thrown during processing. + */ + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>(); + + /** + * Creates a new ElasticsearchSink that connects to the cluster using a TransportClient. + * + * @param esConfig The map of user settings that are passed when constructing the TransportClient + * @param sinkConfig The map of user settings that are passed when constructing the BulkProcessor + * @param transportAddresses The Elasticsearch Nodes to which to connect using a {@code TransportClient} + * @param elasticsearchSinkFunction This is used to generate the ActionRequest from the incoming element + */ + public ElasticsearchSink(Map<String, String> esConfig, Map<String, String> sinkConfig, List<InetSocketAddress> transportAddresses, ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { — End diff – Hi @tzulitai, It is necessary because ES 5.x is strict about configuration, not accepting any extra configuration that is not recognized. See https://www.elastic.co/guide/en/elasticsearch/reference/5.x/breaking_50_settings_changes.html
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/2767#discussion_r95601743

          — Diff: flink-connectors/flink-connector-elasticsearch5/src/main/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSink.java —
          @@ -0,0 +1,259 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.streaming.connectors.elasticsearch5;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.Preconditions;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.action.index.IndexRequest;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.client.transport.TransportClient;
          +import org.elasticsearch.common.network.NetworkModule;
          +import org.elasticsearch.common.settings.Settings;
          +import org.elasticsearch.common.transport.InetSocketTransportAddress;
          +import org.elasticsearch.common.transport.TransportAddress;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.elasticsearch.transport.Netty3Plugin;
          +import org.elasticsearch.transport.client.PreBuiltTransportClient;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.net.InetSocketAddress;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +/**
          + * Sink that emits its input elements in bulk to an Elasticsearch cluster.
          + * <p>
          + * <p>
          + * The first

          {@link Map} passed to the constructor is forwarded to Elasticsearch when creating
          + * {@link TransportClient}. The config keys can be found in the Elasticsearch
          + * documentation. An important setting is {@code cluster.name}, this should be set to the name
          + * of the cluster that the sink should emit to.
          + * <p>
          + * <b>Attention: </b> When using the {@code TransportClient} the sink will fail if no cluster
          + * can be connected to.
          + * <p>
          + * The second {@link Map}

          is used to configure a

          {@link BulkProcessor}

          to send

          {@link IndexRequest IndexRequests}

          .
          + * This will buffer elements before sending a request to the cluster. The behaviour of the
          + *

          {@code BulkProcessor}

          can be configured using these config keys:
          + * <ul>
          + * <li>

          {@code bulk.flush.max.actions}

          : Maximum amount of elements to buffer
          + * <li>

          {@code bulk.flush.max.size.mb}

          : Maximum amount of data (in megabytes) to buffer
          + * <li>

          {@code bulk.flush.interval.ms}

          : Interval at which to flush data regardless of the other two
          + * settings in milliseconds
          + * </ul>
          + * <p>
          + * <p>
          + * You also have to provide an

          {@link RequestIndexer}. This is used to create an
          + * {@link IndexRequest} from an element that needs to be added to Elasticsearch. See
          + * {@link RequestIndexer}

          for an example.
          + *
          + * @param <T> Type of the elements emitted by this sink
          + */
          +public class ElasticsearchSink<T> extends RichSinkFunction<T> {
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSink.class);
          +
          + /**
          + * The user specified config map that we forward to Elasticsearch when we create the Client.
          + */
          + private final Map<String, String> esConfig;
          +
          + /**
          + * The user specified config map that we use to configure BulkProcessor.
          + */
          + private final Map<String, String> sinkConfig;
          +
          + /**
          + * The list of nodes that the TransportClient should connect to. This is null if we are using
          + * an embedded Node to get a Client.
          + */
          + private final List<InetSocketAddress> transportAddresses;
          +
          + /**
          + * The builder that is used to construct an

          {@link IndexRequest}

          from the incoming element.
          + */
          + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction;
          +
          + /**
          + * The Client that was either retrieved from a Node or is a TransportClient.
          + */
          + private transient Client client;
          +
          + /**
          + * Bulk processor that was created using the client
          + */
          + private transient BulkProcessor bulkProcessor;
          +
          + /**
          + * Bulk

          {@link org.elasticsearch.action.ActionRequest}

          indexer
          + */
          + private transient RequestIndexer requestIndexer;
          +
          + /**
          + * This is set from inside the BulkProcessor listener if there where failures in processing.
          + */
          + private final AtomicBoolean hasFailure = new AtomicBoolean(false);
          +
          + /**
          + * This is set from inside the BulkProcessor listener if a Throwable was thrown during processing.
          + */
          + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>();
          +
          + /**
          + * Creates a new ElasticsearchSink that connects to the cluster using a TransportClient.
          + *
          + * @param esConfig The map of user settings that are passed when constructing the TransportClient
          + * @param sinkConfig The map of user settings that are passed when constructing the BulkProcessor
          + * @param transportAddresses The Elasticsearch Nodes to which to connect using a

          {@code TransportClient}

          + * @param elasticsearchSinkFunction This is used to generate the ActionRequest from the incoming element
          + */
          + public ElasticsearchSink(Map<String, String> esConfig, Map<String, String> sinkConfig, List<InetSocketAddress> transportAddresses, ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          — End diff –

          I see, thanks for the explanation! I think we can resolve this be keeping a single Map for user configuration at the API level, and internally, we separate out the bulk processor settings.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/2767#discussion_r95601743 — Diff: flink-connectors/flink-connector-elasticsearch5/src/main/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSink.java — @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.streaming.connectors.elasticsearch5; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.Preconditions; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.action.index.IndexRequest; +import org.elasticsearch.client.Client; +import org.elasticsearch.client.transport.TransportClient; +import org.elasticsearch.common.network.NetworkModule; +import org.elasticsearch.common.settings.Settings; +import org.elasticsearch.common.transport.InetSocketTransportAddress; +import org.elasticsearch.common.transport.TransportAddress; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.elasticsearch.transport.Netty3Plugin; +import org.elasticsearch.transport.client.PreBuiltTransportClient; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetSocketAddress; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +/** + * Sink that emits its input elements in bulk to an Elasticsearch cluster. + * <p> + * <p> + * The first {@link Map} passed to the constructor is forwarded to Elasticsearch when creating + * {@link TransportClient}. The config keys can be found in the Elasticsearch + * documentation. An important setting is {@code cluster.name}, this should be set to the name + * of the cluster that the sink should emit to. + * <p> + * <b>Attention: </b> When using the {@code TransportClient} the sink will fail if no cluster + * can be connected to. + * <p> + * The second {@link Map} is used to configure a {@link BulkProcessor} to send {@link IndexRequest IndexRequests} . + * This will buffer elements before sending a request to the cluster. The behaviour of the + * {@code BulkProcessor} can be configured using these config keys: + * <ul> + * <li> {@code bulk.flush.max.actions} : Maximum amount of elements to buffer + * <li> {@code bulk.flush.max.size.mb} : Maximum amount of data (in megabytes) to buffer + * <li> {@code bulk.flush.interval.ms} : Interval at which to flush data regardless of the other two + * settings in milliseconds + * </ul> + * <p> + * <p> + * You also have to provide an {@link RequestIndexer}. This is used to create an + * {@link IndexRequest} from an element that needs to be added to Elasticsearch. See + * {@link RequestIndexer} for an example. + * + * @param <T> Type of the elements emitted by this sink + */ +public class ElasticsearchSink<T> extends RichSinkFunction<T> { + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSink.class); + + /** + * The user specified config map that we forward to Elasticsearch when we create the Client. + */ + private final Map<String, String> esConfig; + + /** + * The user specified config map that we use to configure BulkProcessor. + */ + private final Map<String, String> sinkConfig; + + /** + * The list of nodes that the TransportClient should connect to. This is null if we are using + * an embedded Node to get a Client. + */ + private final List<InetSocketAddress> transportAddresses; + + /** + * The builder that is used to construct an {@link IndexRequest} from the incoming element. + */ + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction; + + /** + * The Client that was either retrieved from a Node or is a TransportClient. + */ + private transient Client client; + + /** + * Bulk processor that was created using the client + */ + private transient BulkProcessor bulkProcessor; + + /** + * Bulk {@link org.elasticsearch.action.ActionRequest} indexer + */ + private transient RequestIndexer requestIndexer; + + /** + * This is set from inside the BulkProcessor listener if there where failures in processing. + */ + private final AtomicBoolean hasFailure = new AtomicBoolean(false); + + /** + * This is set from inside the BulkProcessor listener if a Throwable was thrown during processing. + */ + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>(); + + /** + * Creates a new ElasticsearchSink that connects to the cluster using a TransportClient. + * + * @param esConfig The map of user settings that are passed when constructing the TransportClient + * @param sinkConfig The map of user settings that are passed when constructing the BulkProcessor + * @param transportAddresses The Elasticsearch Nodes to which to connect using a {@code TransportClient} + * @param elasticsearchSinkFunction This is used to generate the ActionRequest from the incoming element + */ + public ElasticsearchSink(Map<String, String> esConfig, Map<String, String> sinkConfig, List<InetSocketAddress> transportAddresses, ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { — End diff – I see, thanks for the explanation! I think we can resolve this be keeping a single Map for user configuration at the API level, and internally, we separate out the bulk processor settings.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tzulitai opened a pull request:

          https://github.com/apache/flink/pull/3112

          FLINK-4988 [elasticsearch] Add Elasticsearch 5.x Connector

          This PR is based on @mikedias's work in #2767 (1st commit), with additional work to restructure the Elasticsearch connectors (2nd commit). Basically, we now have a `flink-connector-elasticsearch-base` containing common behaviour and test code.

              1. Deprecated Constructors for Elasticsearch 1.x / 2.x
                As part of the restructuring, all connector versions now take a `ElasticsearchSinkFunction` (previously, 1.x took a `IndexRequestBuilder`, which was limited to only indexing actions on a Elasticsearch index) for full functional Elasticsearch support.

          The `ElasticsearchSinkFunction` was also reallocated from pacakge `o.a.f.s.c.elasticsearch2` in `flink-connector-elasticsearch2`, to `o.a.f.s.c.elasticsearch` in `flink-connector-elasticsearch-base`.

          This resulted in deprecation of the following:

          1. In Elasticsearch 1.x: All original `IndexRequestBuilder` constructors as well as the interface itself have been deprecated.
          2. In Elasticsearch 2.x: Due to the package relocation of `ElasticsearchSinkFunction`, all original constructors are also deprecated in favor of the new package path for the class.

          R: @rmetzger @mikedias please feel free to review, thank you!

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tzulitai/flink FLINK-4988

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3112.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3112


          commit 746cbb4dd029837d9955cd3138444f70305ac542
          Author: Mike Dias <mike.rodrigues.dias@gmail.com>
          Date: 2016-11-07T20:09:48Z

          FLINK-4988 Elasticsearch 5.x support

          commit 86482962b250899e9ac076768ff98bf8fbee58f8
          Author: Tzu-Li (Gordon) Tai <tzulitai@apache.org>
          Date: 2017-01-12T13:21:56Z

          FLINK-4988 [elasticsearch] Restructure Elasticsearch connectors


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tzulitai opened a pull request: https://github.com/apache/flink/pull/3112 FLINK-4988 [elasticsearch] Add Elasticsearch 5.x Connector This PR is based on @mikedias's work in #2767 (1st commit), with additional work to restructure the Elasticsearch connectors (2nd commit). Basically, we now have a `flink-connector-elasticsearch-base` containing common behaviour and test code. Deprecated Constructors for Elasticsearch 1.x / 2.x As part of the restructuring, all connector versions now take a `ElasticsearchSinkFunction` (previously, 1.x took a `IndexRequestBuilder`, which was limited to only indexing actions on a Elasticsearch index) for full functional Elasticsearch support. The `ElasticsearchSinkFunction` was also reallocated from pacakge `o.a.f.s.c.elasticsearch2` in `flink-connector-elasticsearch2`, to `o.a.f.s.c.elasticsearch` in `flink-connector-elasticsearch-base`. This resulted in deprecation of the following: 1. In Elasticsearch 1.x: All original `IndexRequestBuilder` constructors as well as the interface itself have been deprecated. 2. In Elasticsearch 2.x: Due to the package relocation of `ElasticsearchSinkFunction`, all original constructors are also deprecated in favor of the new package path for the class. R: @rmetzger @mikedias please feel free to review, thank you! You can merge this pull request into a Git repository by running: $ git pull https://github.com/tzulitai/flink FLINK-4988 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3112.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3112 commit 746cbb4dd029837d9955cd3138444f70305ac542 Author: Mike Dias <mike.rodrigues.dias@gmail.com> Date: 2016-11-07T20:09:48Z FLINK-4988 Elasticsearch 5.x support commit 86482962b250899e9ac076768ff98bf8fbee58f8 Author: Tzu-Li (Gordon) Tai <tzulitai@apache.org> Date: 2017-01-12T13:21:56Z FLINK-4988 [elasticsearch] Restructure Elasticsearch connectors
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/3112

          Thanks a lot for opening a pull request for this.
          It looks like some of the tests are failing on travis. Does ES5 support Java 7 ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3112 Thanks a lot for opening a pull request for this. It looks like some of the tests are failing on travis. Does ES5 support Java 7 ?
          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 @rmetzger Ah ... seems like ES 5.x requires at least Java 8. https://www.elastic.co/guide/en/elasticsearch/reference/master/_installation.html#_installation https://github.com/elastic/elasticsearch/issues/17584
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/3112

          Using maven build profiles, you can probably include the es5 connector in java8 builds only.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3112 Using maven build profiles, you can probably include the es5 connector in java8 builds only.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          Thanks for the pointers. We'll include the profile `include-elasticsearch5` for Java 8 builds only.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 Thanks for the pointers. We'll include the profile `include-elasticsearch5` for Java 8 builds only.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95969031

          — Diff: docs/dev/connectors/elasticsearch2.md —
          @@ -1,173 +0,0 @@


          — End diff –

          Can you replace the es2 page with a redirect to the new page?
          This way existing links are not broken.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95969031 — Diff: docs/dev/connectors/elasticsearch2.md — @@ -1,173 +0,0 @@ — End diff – Can you replace the es2 page with a redirect to the new page? This way existing links are not broken.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95973995

          — Diff: flink-connectors/flink-connector-elasticsearch5/pom.xml —
          @@ -0,0 +1,121 @@
          +<?xml version="1.0" encoding="UTF-8"?>
          +<!--
          +Licensed to the Apache Software Foundation (ASF) under one
          +or more contributor license agreements. See the NOTICE file
          +distributed with this work for additional information
          +regarding copyright ownership. The ASF licenses this file
          +to you under the Apache License, Version 2.0 (the
          +"License"); you may not use this file except in compliance
          +with the License. You may obtain a copy of the License at
          +
          + http://www.apache.org/licenses/LICENSE-2.0
          +
          +Unless required by applicable law or agreed to in writing,
          +software distributed under the License is distributed on an
          +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
          +KIND, either express or implied. See the License for the
          +specific language governing permissions and limitations
          +under the License.
          +-->
          +<project xmlns="http://maven.apache.org/POM/4.0.0"
          + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
          +
          + <modelVersion>4.0.0</modelVersion>
          +
          + <parent>
          + <groupId>org.apache.flink</groupId>
          + <artifactId>flink-connectors</artifactId>
          + <version>1.3-SNAPSHOT</version>
          + <relativePath>..</relativePath>
          + </parent>
          +
          + <artifactId>flink-connector-elasticsearch5_2.10</artifactId>
          + <name>flink-connector-elasticsearch5</name>
          +
          + <packaging>jar</packaging>
          +
          + <!-- Allow users to pass custom connector versions -->
          + <properties>
          + <elasticsearch.version>5.0.0</elasticsearch.version>
          + </properties>
          +
          + <dependencies>
          +
          + <!-- core dependencies -->
          +
          + <dependency>
          + <groupId>org.apache.flink</groupId>
          + <artifactId>flink-streaming-java_2.10</artifactId>
          + <version>$

          {project.version}</version>
          + <scope>provided</scope>
          + </dependency>
          +
          + <dependency>
          + <groupId>org.apache.flink</groupId>
          + <artifactId>flink-connector-elasticsearch-base_2.10</artifactId>
          + <version>${project.version}

          </version>
          + <exclusions>
          + <exclusion>
          + <groupId>org.elasticsearch</groupId>
          + <artifactId>elasticsearch</artifactId>
          + </exclusion>
          + </exclusions>
          + </dependency>
          +
          + <dependency>
          + <groupId>org.elasticsearch.client</groupId>
          + <artifactId>transport</artifactId>
          + <version>$

          {elasticsearch.version}

          </version>
          + </dependency>
          +
          + <dependency>
          + <groupId>org.apache.logging.log4j</groupId>
          + <artifactId>log4j-api</artifactId>
          + <version>2.7</version>
          + </dependency>
          + <dependency>
          + <groupId>org.apache.logging.log4j</groupId>
          + <artifactId>log4j-core</artifactId>
          + <version>2.7</version>
          + </dependency>
          — End diff –

          How does ES5 work when executed in a Flink program. Does it write the logs correctly into the taskmanager.log file using log4j2?

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95973995 — Diff: flink-connectors/flink-connector-elasticsearch5/pom.xml — @@ -0,0 +1,121 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd "> + + <modelVersion>4.0.0</modelVersion> + + <parent> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connectors</artifactId> + <version>1.3-SNAPSHOT</version> + <relativePath>..</relativePath> + </parent> + + <artifactId>flink-connector-elasticsearch5_2.10</artifactId> + <name>flink-connector-elasticsearch5</name> + + <packaging>jar</packaging> + + <!-- Allow users to pass custom connector versions --> + <properties> + <elasticsearch.version>5.0.0</elasticsearch.version> + </properties> + + <dependencies> + + <!-- core dependencies --> + + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java_2.10</artifactId> + <version>$ {project.version}</version> + <scope>provided</scope> + </dependency> + + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-elasticsearch-base_2.10</artifactId> + <version>${project.version} </version> + <exclusions> + <exclusion> + <groupId>org.elasticsearch</groupId> + <artifactId>elasticsearch</artifactId> + </exclusion> + </exclusions> + </dependency> + + <dependency> + <groupId>org.elasticsearch.client</groupId> + <artifactId>transport</artifactId> + <version>$ {elasticsearch.version} </version> + </dependency> + + <dependency> + <groupId>org.apache.logging.log4j</groupId> + <artifactId>log4j-api</artifactId> + <version>2.7</version> + </dependency> + <dependency> + <groupId>org.apache.logging.log4j</groupId> + <artifactId>log4j-core</artifactId> + <version>2.7</version> + </dependency> — End diff – How does ES5 work when executed in a Flink program. Does it write the logs correctly into the taskmanager.log file using log4j2?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95972131

          — Diff: flink-connectors/flink-connector-elasticsearch-base/src/test/java/org/apache/flink/streaming/connectors/elasticsearch/EmbeddedElasticsearchNodeEnvironment.java —
          @@ -0,0 +1,37 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          — End diff –

          This license header seems to be different from the other files.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95972131 — Diff: flink-connectors/flink-connector-elasticsearch-base/src/test/java/org/apache/flink/streaming/connectors/elasticsearch/EmbeddedElasticsearchNodeEnvironment.java — @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. — End diff – This license header seems to be different from the other files.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95972955

          — Diff: flink-connectors/flink-connector-elasticsearch5/src/test/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSinkITCase.java —
          @@ -0,0 +1,68 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.streaming.connectors.elasticsearch5;
          +
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase;
          +import org.junit.Test;
          +
          +import java.net.InetAddress;
          +import java.net.InetSocketAddress;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +
          +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase {
          +
          + @Test
          + public void testTransportClient() throws Exception

          { + runTransportClientTest(); + }

          +
          + @Test
          + public void testNullTransportClient() throws Exception

          { + runNullTransportClientTest(); + }

          +
          + @Test
          + public void testEmptyTransportClient() throws Exception

          { + runEmptyTransportClientTest(); + }

          +
          + @Test
          + public void testTransportClientFails() throws Exception

          { + runTransportClientFailsTest(); + }

          +
          + @Override
          + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig,
          + List<InetSocketAddress> transportAddresses,
          + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          — End diff –

          Whitespace

          (In general, this PR contains a lot of empty lines / vertical whitespace.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95972955 — Diff: flink-connectors/flink-connector-elasticsearch5/src/test/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSinkITCase.java — @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.streaming.connectors.elasticsearch5; + +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase; +import org.junit.Test; + +import java.net.InetAddress; +import java.net.InetSocketAddress; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase { + + @Test + public void testTransportClient() throws Exception { + runTransportClientTest(); + } + + @Test + public void testNullTransportClient() throws Exception { + runNullTransportClientTest(); + } + + @Test + public void testEmptyTransportClient() throws Exception { + runEmptyTransportClientTest(); + } + + @Test + public void testTransportClientFails() throws Exception { + runTransportClientFailsTest(); + } + + @Override + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig, + List<InetSocketAddress> transportAddresses, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { — End diff – Whitespace (In general, this PR contains a lot of empty lines / vertical whitespace.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95969879

          — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java —
          @@ -0,0 +1,235 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.streaming.connectors.elasticsearch;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.InstantiationUtil;
          +import org.elasticsearch.action.ActionRequest;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +import static org.apache.flink.util.Preconditions.checkNotNull;
          +
          +/**
          + * Base class for all Flink Elasticsearch Sinks.
          + *
          + * <p>
          + * This class implements the common behaviour across Elasticsearch versions, such as
          + * the use of an internal

          {@link BulkProcessor}

          to buffer multiple

          {@link ActionRequest}

          s before
          + * sending the requests to the cluster, as well as passing input records to the user provided
          + *

          {@link ElasticsearchSinkFunction}

          for processing.
          + *
          + * <p>
          + * The version specific behaviours for creating a

          {@link Client}

          to connect to a Elasticsearch cluster
          + * should be defined by concrete implementations of a

          {@link ElasticsearchClientFactory}

          , which is to be provided to the
          + * constructor of this class.
          + *
          + * @param <T> Type of the elements emitted by this sink
          + */
          +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> {
          +
          + private static final long serialVersionUID = -1007596293618451942L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class);
          +
          + // ------------------------------------------------------------------------
          + // Internal bulk processor configuration
          + // ------------------------------------------------------------------------
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private final Integer bulkProcessorFlushMaxActions;
          + private final Integer bulkProcessorFlushMaxSizeMb;
          + private final Integer bulkProcessorFlushIntervalMillis;
          — End diff –

          Why are you using boxed types here?

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95969879 — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java — @@ -0,0 +1,235 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.elasticsearch; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.InstantiationUtil; +import org.elasticsearch.action.ActionRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * Base class for all Flink Elasticsearch Sinks. + * + * <p> + * This class implements the common behaviour across Elasticsearch versions, such as + * the use of an internal {@link BulkProcessor} to buffer multiple {@link ActionRequest} s before + * sending the requests to the cluster, as well as passing input records to the user provided + * {@link ElasticsearchSinkFunction} for processing. + * + * <p> + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory} , which is to be provided to the + * constructor of this class. + * + * @param <T> Type of the elements emitted by this sink + */ +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> { + + private static final long serialVersionUID = -1007596293618451942L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class); + + // ------------------------------------------------------------------------ + // Internal bulk processor configuration + // ------------------------------------------------------------------------ + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private final Integer bulkProcessorFlushMaxActions; + private final Integer bulkProcessorFlushMaxSizeMb; + private final Integer bulkProcessorFlushIntervalMillis; — End diff – Why are you using boxed types here?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95969123

          — Diff: docs/dev/connectors/elasticsearch2.md —
          @@ -1,173 +0,0 @@


          -title: "Elasticsearch 2.x Connector"
          -nav-title: Elasticsearch 2.x
          -nav-parent_id: connectors
          -nav-pos: 5


          <!-
          -Licensed to the Apache Software Foundation (ASF) under one
          -or more contributor license agreements. See the NOTICE file
          -distributed with this work for additional information
          -regarding copyright ownership. The ASF licenses this file
          -to you under the Apache License, Version 2.0 (the
          -"License"); you may not use this file except in compliance
          -with the License. You may obtain a copy of the License at
          -

          • http://www.apache.org/licenses/LICENSE-2.0
            -
            -Unless required by applicable law or agreed to in writing,
            -software distributed under the License is distributed on an
            -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
            -KIND, either express or implied. See the License for the
            -specific language governing permissions and limitations
            -under the License.
            --->
            -
            -This connector provides a Sink that can write to an
            -[Elasticsearch 2.x](https://elastic.co/) Index. To use this connector, add the
            -following dependency to your project:
            -
            - {% highlight xml %}

            -<dependency>

          • <groupId>org.apache.flink</groupId>
          • <artifactId>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</artifactId>
          • <version>{{site.version }}</version>
            -</dependency>
            - {% endhighlight %}
            -
            -Note that the streaming connectors are currently not part of the binary
            -distribution. See
            -[here](site.baseurl/dev/linking)
            -for information about how to package the program with the libraries for
            -cluster execution.
            -
            -#### Installing Elasticsearch 2.x
            -
            -Instructions for setting up an Elasticsearch cluster can be found
            -[here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
            -Make sure to set and remember a cluster name. This must be set when
            -creating a Sink for writing to your cluster
            -
            -#### Elasticsearch 2.x Sink
            -The connector provides a Sink that can send data to an Elasticsearch 2.x Index.
            -
            -The sink communicates with Elasticsearch via Transport Client
            -
            -See [here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/transport-client.html)
            -for information about the Transport Client.
            -
            -The code below shows how to create a sink that uses a `TransportClient` for communication:
            -
            -<div class="codetabs" markdown="1">
            -<div data-lang="java" markdown="1">
            -{% highlight java %}
            -File dataDir = ....;
            -
            -DataStream<String> input = ...;
            -
            -Map<String, String> config = new HashMap<>();
            -// This instructs the sink to emit after every element, otherwise they would be buffered
            -config.put("bulk.flush.max.actions", "1");
            -config.put("cluster.name", "my-cluster-name");
            -
            -List<InetSocketAddress> transports = new ArrayList<>();
            -transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
            -transports.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300));
            -
            -input.addSink(new ElasticsearchSink(config, transports, new ElasticsearchSinkFunction<String>() {
            - public IndexRequest createIndexRequest(String element) { - Map<String, String> json = new HashMap<>(); - json.put("data", element); - - return Requests.indexRequest() - .index("my-index") - .type("my-type") - .source(json); - }
            -
            - @Override
            - public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { - indexer.add(createIndexRequest(element)); - }
            -}));
            -{% endhighlight %}

            -</div>
            -<div data-lang="scala" markdown="1">
            -

            {% highlight scala %}

            -val dataDir = ....;
            -
            -val input: DataStream[String] = ...
            -
            -val config = new util.HashMap[String, String]
            -config.put("bulk.flush.max.actions", "1")
            -config.put("cluster.name", "my-cluster-name")
            -
            -val transports = new ArrayList[String]
            -transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300))
            -transports.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300));
            -
            -input.addSink(new ElasticsearchSink(config, transports, new ElasticsearchSinkFunction[String] {

          • def createIndexRequest(element: String): IndexRequest = { - val json = new util.HashMap[String, AnyRef] - json.put("data", element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) - }

            -

          • override def process(element: String, ctx: RuntimeContext, indexer: RequestIndexer) { - indexer.add(createIndexRequest(element)) - }

            -}))
            -

            {% endhighlight %}

            -</div>
            -</div>
            -
            -A Map of Strings is used to configure the Sink. The configuration keys
            -are documented in the Elasticsearch documentation
            -[here](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html).
            -Especially important is the `cluster.name`. parameter that must correspond to
            -the name of your cluster and with ElasticSearch 2x you also need to specify `path.home`.
            -
            -Internally, the sink uses a `BulkProcessor` to send Action requests to the cluster.
            -This will buffer elements and Action Requests before sending to the cluster. The behaviour of the
            -`BulkProcessor` can be configured using these config keys:

          • * *bulk.flush.max.actions*: Maximum amount of elements to buffer
          • * *bulk.flush.max.size.mb*: Maximum amount of data (in megabytes) to buffer
          • * *bulk.flush.interval.ms*: Interval at which to flush data regardless of the other two
          • settings in milliseconds
            -
            -This now provides a list of Elasticsearch Nodes
            -to which the sink should connect via a `TransportClient`.
            -
            -More information about Elasticsearch can be found [here](https://elastic.co).
            -
            -
                  1. Packaging the Elasticsearch Connector into an Uber-jar
                    -
                    -For the execution of your Flink program,
                    -it is recommended to build a so-called uber-jar (executable jar) containing all your dependencies
                    -(see [here](site.baseurl/dev/linking) for further information).
                    -
                    -However,
                    -when an uber-jar containing an Elasticsearch sink is executed,
                    -an `IllegalArgumentException` may occur,
                    -which is caused by conflicting files of Elasticsearch and it's dependencies
                    -in `META-INF/services`:
                    -
                    -```
                    -IllegalArgumentException[An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [es090, completion090, XBloomFilter]]
                    -```
                    -
                    -If the uber-jar is build by means of maven,
                    -this issue can be avoided by adding the following bits to the pom file:
                    -
                    -```
                    -<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
          • <resource>META-INF/services/org.apache.lucene.codecs.Codec</resource>
            -</transformer>
            -<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
          • <resource>META-INF/services/org.apache.lucene.codecs.DocValuesFormat</resource>
            -</transformer>
            -<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
          • <resource>META-INF/services/org.apache.lucene.codecs.PostingsFormat</resource>
            -</transformer>
              • End diff –

          Ah, I see. This has been there before Still, would be great if you could fix it

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95969123 — Diff: docs/dev/connectors/elasticsearch2.md — @@ -1,173 +0,0 @@ -title: "Elasticsearch 2.x Connector" -nav-title: Elasticsearch 2.x -nav-parent_id: connectors -nav-pos: 5 <! - -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -This connector provides a Sink that can write to an - [Elasticsearch 2.x] ( https://elastic.co/ ) Index. To use this connector, add the -following dependency to your project: - - {% highlight xml %} -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</artifactId> <version>{{site.version }}</version> -</dependency> - {% endhighlight %} - -Note that the streaming connectors are currently not part of the binary -distribution. See - [here] ( site.baseurl /dev/linking) -for information about how to package the program with the libraries for -cluster execution. - -#### Installing Elasticsearch 2.x - -Instructions for setting up an Elasticsearch cluster can be found - [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html ). -Make sure to set and remember a cluster name. This must be set when -creating a Sink for writing to your cluster - -#### Elasticsearch 2.x Sink -The connector provides a Sink that can send data to an Elasticsearch 2.x Index. - -The sink communicates with Elasticsearch via Transport Client - -See [here] ( https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/transport-client.html ) -for information about the Transport Client. - -The code below shows how to create a sink that uses a `TransportClient` for communication: - -<div class="codetabs" markdown="1"> -<div data-lang="java" markdown="1"> -{% highlight java %} -File dataDir = ....; - -DataStream<String> input = ...; - -Map<String, String> config = new HashMap<>(); -// This instructs the sink to emit after every element, otherwise they would be buffered -config.put("bulk.flush.max.actions", "1"); -config.put("cluster.name", "my-cluster-name"); - -List<InetSocketAddress> transports = new ArrayList<>(); -transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); -transports.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300)); - -input.addSink(new ElasticsearchSink(config, transports, new ElasticsearchSinkFunction<String>() { - public IndexRequest createIndexRequest(String element) { - Map<String, String> json = new HashMap<>(); - json.put("data", element); - - return Requests.indexRequest() - .index("my-index") - .type("my-type") - .source(json); - } - - @Override - public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { - indexer.add(createIndexRequest(element)); - } -})); -{% endhighlight %} -</div> -<div data-lang="scala" markdown="1"> - {% highlight scala %} -val dataDir = ....; - -val input: DataStream [String] = ... - -val config = new util.HashMap [String, String] -config.put("bulk.flush.max.actions", "1") -config.put("cluster.name", "my-cluster-name") - -val transports = new ArrayList [String] -transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)) -transports.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300)); - -input.addSink(new ElasticsearchSink(config, transports, new ElasticsearchSinkFunction [String] { def createIndexRequest(element: String): IndexRequest = { - val json = new util.HashMap[String, AnyRef] - json.put("data", element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) - } - override def process(element: String, ctx: RuntimeContext, indexer: RequestIndexer) { - indexer.add(createIndexRequest(element)) - } -})) - {% endhighlight %} -</div> -</div> - -A Map of Strings is used to configure the Sink. The configuration keys -are documented in the Elasticsearch documentation - [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html ). -Especially important is the `cluster.name`. parameter that must correspond to -the name of your cluster and with ElasticSearch 2x you also need to specify `path.home`. - -Internally, the sink uses a `BulkProcessor` to send Action requests to the cluster. -This will buffer elements and Action Requests before sending to the cluster. The behaviour of the -`BulkProcessor` can be configured using these config keys: * * bulk.flush.max.actions *: Maximum amount of elements to buffer * * bulk.flush.max.size.mb *: Maximum amount of data (in megabytes) to buffer * * bulk.flush.interval.ms *: Interval at which to flush data regardless of the other two settings in milliseconds - -This now provides a list of Elasticsearch Nodes -to which the sink should connect via a `TransportClient`. - -More information about Elasticsearch can be found [here] ( https://elastic.co ). - - Packaging the Elasticsearch Connector into an Uber-jar - -For the execution of your Flink program, -it is recommended to build a so-called uber-jar (executable jar) containing all your dependencies -(see [here] ( site.baseurl /dev/linking) for further information). - -However, -when an uber-jar containing an Elasticsearch sink is executed, -an `IllegalArgumentException` may occur, -which is caused by conflicting files of Elasticsearch and it's dependencies -in `META-INF/services`: - -``` -IllegalArgumentException[An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [es090, completion090, XBloomFilter] ] -``` - -If the uber-jar is build by means of maven, -this issue can be avoided by adding the following bits to the pom file: - -``` -<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>META-INF/services/org.apache.lucene.codecs.Codec</resource> -</transformer> -<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>META-INF/services/org.apache.lucene.codecs.DocValuesFormat</resource> -</transformer> -<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>META-INF/services/org.apache.lucene.codecs.PostingsFormat</resource> -</transformer> End diff – Ah, I see. This has been there before Still, would be great if you could fix it
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95974130

          — Diff: flink-connectors/flink-connector-elasticsearch2/src/test/java/org/apache/flink/streaming/connectors/elasticsearch2/ElasticsearchSinkITCase.java —
          @@ -17,217 +17,51 @@
          */
          package org.apache.flink.streaming.connectors.elasticsearch2;

          -import org.apache.flink.api.common.functions.RuntimeContext;
          -import org.apache.flink.api.java.tuple.Tuple2;
          -import org.apache.flink.runtime.client.JobExecutionException;
          -import org.apache.flink.streaming.api.datastream.DataStreamSource;
          -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
          -import org.apache.flink.streaming.api.functions.source.SourceFunction;
          -import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase;
          -import org.elasticsearch.action.get.GetRequest;
          -import org.elasticsearch.action.get.GetResponse;
          -import org.elasticsearch.action.index.IndexRequest;
          -import org.elasticsearch.client.Client;
          -import org.elasticsearch.client.Requests;
          -import org.elasticsearch.common.settings.Settings;
          -import org.elasticsearch.node.Node;
          -import org.elasticsearch.node.NodeBuilder;
          -import org.junit.Assert;
          -import org.junit.ClassRule;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase;
          import org.junit.Test;
          -import org.junit.rules.TemporaryFolder;

          -import java.io.File;
          import java.net.InetAddress;
          import java.net.InetSocketAddress;
          import java.util.ArrayList;
          -import java.util.HashMap;
          import java.util.List;
          import java.util.Map;

          -public class ElasticsearchSinkITCase extends StreamingMultipleProgramsTestBase {
          -

          • private static final int NUM_ELEMENTS = 20;
            -
          • @ClassRule
          • public static TemporaryFolder tempFolder = new TemporaryFolder();
            +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase {

          @Test
          public void testTransportClient() throws Exception {
          -

          • File dataDir = tempFolder.newFolder();
            -
          • Node node = NodeBuilder.nodeBuilder()
          • .settings(Settings.settingsBuilder()
          • .put("path.home", dataDir.getParent())
          • .put("http.enabled", false)
          • .put("path.data", dataDir.getAbsolutePath()))
          • // set a custom cluster name to verify that user config works correctly
          • .clusterName("my-transport-client-cluster")
          • .node();
            -
          • final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            -
          • DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction());
            -
          • Map<String, String> config = new HashMap<>();
          • // This instructs the sink to emit after every element, otherwise they would be buffered
          • config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
          • config.put("cluster.name", "my-transport-client-cluster");
            -
          • // Can't use {@link TransportAddress}

            as its not Serializable in Elasticsearch 2.x

          • List<InetSocketAddress> transports = new ArrayList<>();
          • transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
            -
          • source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction()));
            -
          • env.execute("Elasticsearch TransportClient Test");
            -
          • // verify the results
          • Client client = node.client();
          • for (int i = 0; i < NUM_ELEMENTS; i++) { - GetResponse response = client.get(new GetRequest("my-index", - "my-type", Integer.toString(i))).actionGet(); - Assert.assertEquals("message #" + i, response.getSource().get("data")); - }

            -

          • node.close();
            + runTransportClientTest();
            }
          • @Test(expected = IllegalArgumentException.class)
          • public void testNullTransportClient() throws Exception {
            -
          • File dataDir = tempFolder.newFolder();
            -
          • Node node = NodeBuilder.nodeBuilder()
          • .settings(Settings.settingsBuilder()
          • .put("path.home", dataDir.getParent())
          • .put("http.enabled", false)
          • .put("path.data", dataDir.getAbsolutePath()))
          • // set a custom cluster name to verify that user config works correctly
          • .clusterName("my-transport-client-cluster")
          • .node();
            -
          • final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            -
          • DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction());
            -
          • Map<String, String> config = new HashMap<>();
          • // This instructs the sink to emit after every element, otherwise they would be buffered
          • config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
          • config.put("cluster.name", "my-transport-client-cluster");
            -
          • source.addSink(new ElasticsearchSink<>(config, null, new TestElasticsearchSinkFunction()));
            -
          • env.execute("Elasticsearch TransportClient Test");
            -
          • // verify the results
          • Client client = node.client();
          • for (int i = 0; i < NUM_ELEMENTS; i++) {
          • GetResponse response = client.get(new GetRequest("my-index",
          • "my-type", Integer.toString)).actionGet();
          • Assert.assertEquals("message #" + i, response.getSource().get("data"));
            + @Test
            + public void testNullTransportClient() throws Exception { + runNullTransportClientTest(); }
          • node.close();
          • }
            -
          • @Test(expected = IllegalArgumentException.class)
          • public void testEmptyTransportClient() throws Exception {
            -
          • File dataDir = tempFolder.newFolder();
            -
          • Node node = NodeBuilder.nodeBuilder()
          • .settings(Settings.settingsBuilder()
          • .put("path.home", dataDir.getParent())
          • .put("http.enabled", false)
          • .put("path.data", dataDir.getAbsolutePath()))
          • // set a custom cluster name to verify that user config works correctly
          • .clusterName("my-transport-client-cluster")
          • .node();
            -
          • final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            -
          • DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction());
            -
          • Map<String, String> config = new HashMap<>();
          • // This instructs the sink to emit after every element, otherwise they would be buffered
          • config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
          • config.put("cluster.name", "my-transport-client-cluster");
            -
          • source.addSink(new ElasticsearchSink<>(config, new ArrayList<InetSocketAddress>(), new TestElasticsearchSinkFunction()));
            -
          • env.execute("Elasticsearch TransportClient Test");
            -
          • // verify the results
          • Client client = node.client();
          • for (int i = 0; i < NUM_ELEMENTS; i++) {
          • GetResponse response = client.get(new GetRequest("my-index",
          • "my-type", Integer.toString)).actionGet();
          • Assert.assertEquals("message #" + i, response.getSource().get("data"));
            + @Test
            + public void testEmptyTransportClient() throws Exception { + runEmptyTransportClientTest(); }
          • node.close();
          • }
            -
          • @Test(expected = JobExecutionException.class)
            + @Test
            public void testTransportClientFails() throws Exception { - // this checks whether the TransportClient fails early when there is no cluster to - // connect to. There isn't a similar test for the Node Client version since that - // one will block and wait for a cluster to come online - - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - - Map<String, String> config = new HashMap<>(); - // This instructs the sink to emit after every element, otherwise they would be buffered - config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); - config.put("cluster.name", "my-node-client-cluster"); - - List<InetSocketAddress> transports = new ArrayList<>(); - transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); - - source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction())); - - env.execute("Elasticsearch Node Client Test"); + runTransportClientFailsTest(); }
          • private static class TestSourceFunction implements SourceFunction<Tuple2<Integer, String>> {
          • private static final long serialVersionUID = 1L;
            -
          • private volatile boolean running = true;
            -
          • @Override
          • public void run(SourceContext<Tuple2<Integer, String>> ctx) throws Exception {
          • for (int i = 0; i < NUM_ELEMENTS && running; i++) { - ctx.collect(Tuple2.of(i, "message #" + i)); - }
          • }
            -
          • @Override
          • public void cancel() { - running = false; - }

            + @Override
            + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig,
            + List<InetSocketAddress> transportAddresses,
            + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
            + return new ElasticsearchSink<>(userConfig, transportAddresses, elasticsearchSinkFunction);

              • End diff –

          Whitespace

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95974130 — Diff: flink-connectors/flink-connector-elasticsearch2/src/test/java/org/apache/flink/streaming/connectors/elasticsearch2/ElasticsearchSinkITCase.java — @@ -17,217 +17,51 @@ */ package org.apache.flink.streaming.connectors.elasticsearch2; -import org.apache.flink.api.common.functions.RuntimeContext; -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.runtime.client.JobExecutionException; -import org.apache.flink.streaming.api.datastream.DataStreamSource; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase; -import org.elasticsearch.action.get.GetRequest; -import org.elasticsearch.action.get.GetResponse; -import org.elasticsearch.action.index.IndexRequest; -import org.elasticsearch.client.Client; -import org.elasticsearch.client.Requests; -import org.elasticsearch.common.settings.Settings; -import org.elasticsearch.node.Node; -import org.elasticsearch.node.NodeBuilder; -import org.junit.Assert; -import org.junit.ClassRule; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase; import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import java.io.File; import java.net.InetAddress; import java.net.InetSocketAddress; import java.util.ArrayList; -import java.util.HashMap; import java.util.List; import java.util.Map; -public class ElasticsearchSinkITCase extends StreamingMultipleProgramsTestBase { - private static final int NUM_ELEMENTS = 20; - @ClassRule public static TemporaryFolder tempFolder = new TemporaryFolder(); +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase { @Test public void testTransportClient() throws Exception { - File dataDir = tempFolder.newFolder(); - Node node = NodeBuilder.nodeBuilder() .settings(Settings.settingsBuilder() .put("path.home", dataDir.getParent()) .put("http.enabled", false) .put("path.data", dataDir.getAbsolutePath())) // set a custom cluster name to verify that user config works correctly .clusterName("my-transport-client-cluster") .node(); - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); config.put("cluster.name", "my-transport-client-cluster"); - // Can't use {@link TransportAddress} as its not Serializable in Elasticsearch 2.x List<InetSocketAddress> transports = new ArrayList<>(); transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); - source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction())); - env.execute("Elasticsearch TransportClient Test"); - // verify the results Client client = node.client(); for (int i = 0; i < NUM_ELEMENTS; i++) { - GetResponse response = client.get(new GetRequest("my-index", - "my-type", Integer.toString(i))).actionGet(); - Assert.assertEquals("message #" + i, response.getSource().get("data")); - } - node.close(); + runTransportClientTest(); } @Test(expected = IllegalArgumentException.class) public void testNullTransportClient() throws Exception { - File dataDir = tempFolder.newFolder(); - Node node = NodeBuilder.nodeBuilder() .settings(Settings.settingsBuilder() .put("path.home", dataDir.getParent()) .put("http.enabled", false) .put("path.data", dataDir.getAbsolutePath())) // set a custom cluster name to verify that user config works correctly .clusterName("my-transport-client-cluster") .node(); - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); config.put("cluster.name", "my-transport-client-cluster"); - source.addSink(new ElasticsearchSink<>(config, null, new TestElasticsearchSinkFunction())); - env.execute("Elasticsearch TransportClient Test"); - // verify the results Client client = node.client(); for (int i = 0; i < NUM_ELEMENTS; i++) { GetResponse response = client.get(new GetRequest("my-index", "my-type", Integer.toString )).actionGet(); Assert.assertEquals("message #" + i, response.getSource().get("data")); + @Test + public void testNullTransportClient() throws Exception { + runNullTransportClientTest(); } node.close(); } - @Test(expected = IllegalArgumentException.class) public void testEmptyTransportClient() throws Exception { - File dataDir = tempFolder.newFolder(); - Node node = NodeBuilder.nodeBuilder() .settings(Settings.settingsBuilder() .put("path.home", dataDir.getParent()) .put("http.enabled", false) .put("path.data", dataDir.getAbsolutePath())) // set a custom cluster name to verify that user config works correctly .clusterName("my-transport-client-cluster") .node(); - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); config.put("cluster.name", "my-transport-client-cluster"); - source.addSink(new ElasticsearchSink<>(config, new ArrayList<InetSocketAddress>(), new TestElasticsearchSinkFunction())); - env.execute("Elasticsearch TransportClient Test"); - // verify the results Client client = node.client(); for (int i = 0; i < NUM_ELEMENTS; i++) { GetResponse response = client.get(new GetRequest("my-index", "my-type", Integer.toString )).actionGet(); Assert.assertEquals("message #" + i, response.getSource().get("data")); + @Test + public void testEmptyTransportClient() throws Exception { + runEmptyTransportClientTest(); } node.close(); } - @Test(expected = JobExecutionException.class) + @Test public void testTransportClientFails() throws Exception { - // this checks whether the TransportClient fails early when there is no cluster to - // connect to. There isn't a similar test for the Node Client version since that - // one will block and wait for a cluster to come online - - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - - Map<String, String> config = new HashMap<>(); - // This instructs the sink to emit after every element, otherwise they would be buffered - config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); - config.put("cluster.name", "my-node-client-cluster"); - - List<InetSocketAddress> transports = new ArrayList<>(); - transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); - - source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction())); - - env.execute("Elasticsearch Node Client Test"); + runTransportClientFailsTest(); } private static class TestSourceFunction implements SourceFunction<Tuple2<Integer, String>> { private static final long serialVersionUID = 1L; - private volatile boolean running = true; - @Override public void run(SourceContext<Tuple2<Integer, String>> ctx) throws Exception { for (int i = 0; i < NUM_ELEMENTS && running; i++) { - ctx.collect(Tuple2.of(i, "message #" + i)); - } } - @Override public void cancel() { - running = false; - } + @Override + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig, + List<InetSocketAddress> transportAddresses, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { + return new ElasticsearchSink<>(userConfig, transportAddresses, elasticsearchSinkFunction); End diff – Whitespace
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95968855

          — Diff: docs/dev/connectors/elasticsearch.md —
          @@ -23,158 +23,284 @@ specific language governing permissions and limitations
          under the License.
          -->

          -This connector provides a Sink that can write to an
          -[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
          -following dependency to your project:
          -
          -

          {% highlight xml %}

          -<dependency>

          • <groupId>org.apache.flink</groupId>
          • <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId>
          • <version>{{site.version }}</version>
            -</dependency>
            - {% endhighlight %}
            +This connector provides sinks that can request document actions to an
            +[Elasticsearch](https://elastic.co/) Index. To use this connector, add one
            +of the following dependencies to your project, depending on the version
            +of the Elasticsearch installation:
            +
            +<table class="table table-bordered">
            + <thead>
            + <tr>
            + <th class="text-left">Maven Dependency</th>
            + <th class="text-left">Supported since</th>
            + <th class="text-left">Elasticsearch version</th>
            + </tr>
            + </thead>
            + <tbody>
            + <tr>
            + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>1.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>2.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td>
            + <td>1.2.0</td>
            + <td>5.x</td>
            + </tr>
            + </tbody>
            +</table>

            Note that the streaming connectors are currently not part of the binary
            -distribution. See
            -[here](site.baseurl/dev/linking)
            -for information about how to package the program with the libraries for
            -cluster execution.
            +distribution. See [here](site.baseurl/dev/linking) for information
            +about how to package the program with the libraries for cluster execution.

            #### Installing Elasticsearch

            Instructions for setting up an Elasticsearch cluster can be found
            [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
            Make sure to set and remember a cluster name. This must be set when
            -creating a Sink for writing to your cluster
            +creating an `ElasticsearchSink` for requesting document actions against your cluster.

            #### Elasticsearch Sink
            -The connector provides a Sink that can send data to an Elasticsearch Index.
            -
            -The sink can use two different methods for communicating with Elasticsearch:
            -
            -1. An embedded Node
            -2. The TransportClient

            -See [here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
            -for information about the differences between the two modes.
            +The `ElasticsearchSink` uses a `TransportClient` to communicate with an
            +Elasticsearch cluster.

            -This code shows how to create a sink that uses an embedded Node for
            -communication:
            +The example below shows how to configure and create a sink:

            <div class="codetabs" markdown="1">
            -<div data-lang="java" markdown="1">
            +<div data-lang="java, Elasticsearch 1.x" markdown="1">
            {% highlight java %}
            DataStream<String> input = ...;

            -Map<String, String> config = Maps.newHashMap();
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")
            // This instructs the sink to emit after every element, otherwise they would be buffered
            config.put("bulk.flush.max.actions", "1");
            -config.put("cluster.name", "my-cluster-name");

            -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() {
            - @Override
            - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) {
            - Map<String, Object> json = new HashMap<>();
            - json.put("data", element);
            +List<TransportAddress> transportAddresses = new ArrayList<String>();
            +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300));
            +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300));

            +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() {
            + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); }
            +
            + @Override
            + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + }
            }));
            {% endhighlight %}

            </div>
            -<div data-lang="scala" markdown="1">
            +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1">
            +

            {% highlight java %}
            +DataStream<String> input = ...;
            +
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")
            +// This instructs the sink to emit after every element, otherwise they would be buffered
            +config.put("bulk.flush.max.actions", "1");
            +
            +List<InetSocketAddress> transportAddresses = new ArrayList<>();
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300));
            +
            +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() {
            + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + }
            +
            + @Override
            + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + }
            +}));{% endhighlight %}
            +</div>
            +<div data-lang="scala, Elasticsearch 1.x" markdown="1">
            {% highlight scala %}
            val input: DataStream[String] = ...

            -val config = new util.HashMap[String, String]
            +val config = new java.util.HashMap[String, String]
            +config.put("cluster.name", "my-cluster-name")
            +// This instructs the sink to emit after every element, otherwise they would be buffered
            config.put("bulk.flush.max.actions", "1")
            +
            +val transportAddresses = new java.util.ArrayList[TransportAddress]
            +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300))
            +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300))
            +
            +input.addSink(new ElasticsearchSink(config, transportAddresses, new ElasticsearchSinkFunction[String] {
            + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] + json.put("data", element) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + }
            +}))
            +{% endhighlight %}
            +</div>
            +<div data-lang="scala, Elasticsearch 2.x / 5.x" markdown="1">
            +{% highlight scala %}
            +val input: DataStream[String] = ...
            +
            +val config = new java.util.HashMap[String, String]
            config.put("cluster.name", "my-cluster-name")
            +// This instructs the sink to emit after every element, otherwise they would be buffered
            +config.put("bulk.flush.max.actions", "1")
            +
            +val transportAddresses = new java.util.ArrayList[InetSocketAddress]
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300))
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300))

            -text.addSink(new ElasticsearchSink(config, new IndexRequestBuilder[String] {
            - override def createIndexRequest(element: String, ctx: RuntimeContext): IndexRequest = {
            - val json = new util.HashMap[String, AnyRef]
            +input.addSink(new ElasticsearchSink(config, transportAddresses, new ElasticsearchSinkFunction[String] {
            + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] json.put("data", element) - println("SENDING: " + element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); }
            }))
            {% endhighlight %}
            </div>
            </div>

            -Note how a Map of Strings is used to configure the Sink. The configuration keys
            -are documented in the Elasticsearch documentation
            +Note how a `Map` of `String`s is used to configure the `ElasticsearchSink`.
            +The configuration keys are documented in the Elasticsearch documentation
            [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html).
            Especially important is the `cluster.name` parameter that must correspond to
            the name of your cluster.

            -Internally, the sink uses a `BulkProcessor` to send index requests to the cluster.
            -This will buffer elements before sending a request to the cluster. The behaviour of the
            -`BulkProcessor` can be configured using these config keys:
            +Also note that the example only demonstrates performing a single index
            +request for each incoming element. Generally, the `ElasticsearchSinkFunction`
            +can be used to perform multiple requests of different types (ex.,
            +`DeleteRequest`, `UpdateRequest`, etc.).
            +
            +Internally, the sink uses a `BulkProcessor` to send acttion requests to the cluster.
            +This will buffer elements before sending them in bulk to the cluster. The behaviour of the
            +`BulkProcessor` can be set using these config keys in the provided `Map` configuration:
            * *bulk.flush.max.actions*: Maximum amount of elements to buffer
            * *bulk.flush.max.size.mb*: Maximum amount of data (in megabytes) to buffer
            * *bulk.flush.interval.ms*: Interval at which to flush data regardless of the other two
            settings in milliseconds

            -This example code does the same, but with a `TransportClient`:
            +#### Communication using Embedded Node (only for Elasticsearch 1.x)
            +
            +For Elasticsearch versions 1.x, communication using an embedded node is
            +also supported. See [here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
            +for information about the differences between communicating with Elasticsearch
            +with an embedded node and a `TransportClient`.
            +
            +Below is an example of how to create an `ElasticsearchSink` use an
            +embedded node instead of a `TransportClient`:

            <div class="codetabs" markdown="1">
            <div data-lang="java" markdown="1">
            {% highlight java %}

            DataStream<String> input = ...;

          -Map<String, String> config = Maps.newHashMap();
          +Map<String, String> config = new HashMap<>;
          // This instructs the sink to emit after every element, otherwise they would be buffered
          config.put("bulk.flush.max.actions", "1");
          config.put("cluster.name", "my-cluster-name");

          -List<TransportAddress> transports = new ArrayList<String>();
          -transports.add(new InetSocketTransportAddress("node-1", 9300));
          -transports.add(new InetSocketTransportAddress("node-2", 9300));
          -
          -input.addSink(new ElasticsearchSink<>(config, transports, new IndexRequestBuilder<String>() {

          • @Override
          • public IndexRequest createIndexRequest(String element, RuntimeContext ctx) {
          • Map<String, Object> json = new HashMap<>();
            +input.addSink(new ElasticsearchSink<>(config, new ElasticsearchSinkFunction<String>()
            Unknown macro: { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); json.put("data", element); - + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } }

            ));

            {% endhighlight %}
            </div>
            <div data-lang="scala" markdown="1">
            {% highlight scala %}
            val input: DataStream[String] = ...

            -val config = new util.HashMap[String, String]
            +val config = new java.util.HashMap[String, String]
            config.put("bulk.flush.max.actions", "1")
            config.put("cluster.name", "my-cluster-name")

            -val transports = new ArrayList[String]
            -transports.add(new InetSocketTransportAddress("node-1", 9300))
            -transports.add(new InetSocketTransportAddress("node-2", 9300))
            -
            -text.addSink(new ElasticsearchSink(config, transports, new IndexRequestBuilder[String] {
            - override def createIndexRequest(element: String, ctx: RuntimeContext): IndexRequest = {
            - val json = new util.HashMap[String, AnyRef]
            +input.addSink(new ElasticsearchSink(config, new ElasticsearchSinkFunction[String] {
            + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] json.put("data", element) - println("SENDING: " + element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); }
            }))
            {% endhighlight %}

            </div>
            </div>

          -The difference is that we now need to provide a list of Elasticsearch Nodes
          -to which the sink should connect using a `TransportClient`.
          +The difference is that now we do not need to provide a list of addresses
          +of Elasticsearch nodes.

          More information about Elasticsearch can be found [here](https://elastic.co).
          +
          +#### Packaging the Elasticsearch Connector into an Uber-jar
          +
          +For the execution of your Flink program,
          +it is recommended to build a so-called uber-jar (executable jar) containing all your dependencies
          +(see [here](site.baseurl/dev/linking) for further information).
          +
          +However,
          +when an uber-jar containing an Elasticsearch sink is executed,
          +an `IllegalArgumentException` may occur,
          +which is caused by conflicting files of Elasticsearch and it's dependencies
          +in `META-INF/services`:
          +
          +```
          +IllegalArgumentException[An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [es090, completion090, XBloomFilter]]
          +```
          +
          +If the uber-jar is build by means of Maven,
          +this issue can be avoided by adding the following bits to the pom file:
          +
          +```
          +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
          + <resource>META-INF/services/org.apache.lucene.codecs.Codec</resource>
          +</transformer>
          +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
          + <resource>META-INF/services/org.apache.lucene.codecs.DocValuesFormat</resource>
          +</transformer>
          +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
          + <resource>META-INF/services/org.apache.lucene.codecs.PostingsFormat</resource>
          +</transformer>
          +```
          — End diff –

          +1 for explaining how to resolve this issue!
          It would be good if you could mention in the docs where to add the transformers exactly.
          Also, I'm wondering if you could use the ServicesTransformer instead: http://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95968855 — Diff: docs/dev/connectors/elasticsearch.md — @@ -23,158 +23,284 @@ specific language governing permissions and limitations under the License. --> -This connector provides a Sink that can write to an - [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add the -following dependency to your project: - - {% highlight xml %} -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId> <version>{{site.version }}</version> -</dependency> - {% endhighlight %} +This connector provides sinks that can request document actions to an + [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add one +of the following dependencies to your project, depending on the version +of the Elasticsearch installation: + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Maven Dependency</th> + <th class="text-left">Supported since</th> + <th class="text-left">Elasticsearch version</th> + </tr> + </thead> + <tbody> + <tr> + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>1.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>2.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td> + <td>1.2.0</td> + <td>5.x</td> + </tr> + </tbody> +</table> Note that the streaming connectors are currently not part of the binary -distribution. See - [here] ( site.baseurl /dev/linking) -for information about how to package the program with the libraries for -cluster execution. +distribution. See [here] ( site.baseurl /dev/linking) for information +about how to package the program with the libraries for cluster execution. #### Installing Elasticsearch Instructions for setting up an Elasticsearch cluster can be found [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html ). Make sure to set and remember a cluster name. This must be set when -creating a Sink for writing to your cluster +creating an `ElasticsearchSink` for requesting document actions against your cluster. #### Elasticsearch Sink -The connector provides a Sink that can send data to an Elasticsearch Index. - -The sink can use two different methods for communicating with Elasticsearch: - -1. An embedded Node -2. The TransportClient -See [here] ( https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html ) -for information about the differences between the two modes. +The `ElasticsearchSink` uses a `TransportClient` to communicate with an +Elasticsearch cluster. -This code shows how to create a sink that uses an embedded Node for -communication: +The example below shows how to configure and create a sink: <div class="codetabs" markdown="1"> -<div data-lang="java" markdown="1"> +<div data-lang="java, Elasticsearch 1.x" markdown="1"> {% highlight java %} DataStream<String> input = ...; -Map<String, String> config = Maps.newHashMap(); +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") // This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1"); -config.put("cluster.name", "my-cluster-name"); -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() { - @Override - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) { - Map<String, Object> json = new HashMap<>(); - json.put("data", element); +List<TransportAddress> transportAddresses = new ArrayList<String>(); +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300)); +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300)); +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } })); {% endhighlight %} </div> -<div data-lang="scala" markdown="1"> +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1"> + {% highlight java %} +DataStream<String> input = ...; + +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") +// This instructs the sink to emit after every element, otherwise they would be buffered +config.put("bulk.flush.max.actions", "1"); + +List<InetSocketAddress> transportAddresses = new ArrayList<>(); +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300)); + +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } +}));{% endhighlight %} +</div> +<div data-lang="scala, Elasticsearch 1.x" markdown="1"> {% highlight scala %} val input: DataStream [String] = ... -val config = new util.HashMap [String, String] +val config = new java.util.HashMap [String, String] +config.put("cluster.name", "my-cluster-name") +// This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1") + +val transportAddresses = new java.util.ArrayList [TransportAddress] +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300)) +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300)) + +input.addSink(new ElasticsearchSink(config, transportAddresses, new ElasticsearchSinkFunction [String] { + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] + json.put("data", element) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + } +})) +{% endhighlight %} +</div> +<div data-lang="scala, Elasticsearch 2.x / 5.x" markdown="1"> +{% highlight scala %} +val input: DataStream [String] = ... + +val config = new java.util.HashMap [String, String] config.put("cluster.name", "my-cluster-name") +// This instructs the sink to emit after every element, otherwise they would be buffered +config.put("bulk.flush.max.actions", "1") + +val transportAddresses = new java.util.ArrayList [InetSocketAddress] +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)) +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300)) -text.addSink(new ElasticsearchSink(config, new IndexRequestBuilder [String] { - override def createIndexRequest(element: String, ctx: RuntimeContext): IndexRequest = { - val json = new util.HashMap [String, AnyRef] +input.addSink(new ElasticsearchSink(config, transportAddresses, new ElasticsearchSinkFunction [String] { + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] json.put("data", element) - println("SENDING: " + element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); } })) {% endhighlight %} </div> </div> -Note how a Map of Strings is used to configure the Sink. The configuration keys -are documented in the Elasticsearch documentation +Note how a `Map` of `String`s is used to configure the `ElasticsearchSink`. +The configuration keys are documented in the Elasticsearch documentation [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html ). Especially important is the `cluster.name` parameter that must correspond to the name of your cluster. -Internally, the sink uses a `BulkProcessor` to send index requests to the cluster. -This will buffer elements before sending a request to the cluster. The behaviour of the -`BulkProcessor` can be configured using these config keys: +Also note that the example only demonstrates performing a single index +request for each incoming element. Generally, the `ElasticsearchSinkFunction` +can be used to perform multiple requests of different types (ex., +`DeleteRequest`, `UpdateRequest`, etc.). + +Internally, the sink uses a `BulkProcessor` to send acttion requests to the cluster. +This will buffer elements before sending them in bulk to the cluster. The behaviour of the +`BulkProcessor` can be set using these config keys in the provided `Map` configuration: * * bulk.flush.max.actions *: Maximum amount of elements to buffer * * bulk.flush.max.size.mb *: Maximum amount of data (in megabytes) to buffer * * bulk.flush.interval.ms *: Interval at which to flush data regardless of the other two settings in milliseconds -This example code does the same, but with a `TransportClient`: +#### Communication using Embedded Node (only for Elasticsearch 1.x) + +For Elasticsearch versions 1.x, communication using an embedded node is +also supported. See [here] ( https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html ) +for information about the differences between communicating with Elasticsearch +with an embedded node and a `TransportClient`. + +Below is an example of how to create an `ElasticsearchSink` use an +embedded node instead of a `TransportClient`: <div class="codetabs" markdown="1"> <div data-lang="java" markdown="1"> {% highlight java %} DataStream<String> input = ...; -Map<String, String> config = Maps.newHashMap(); +Map<String, String> config = new HashMap<>; // This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1"); config.put("cluster.name", "my-cluster-name"); -List<TransportAddress> transports = new ArrayList<String>(); -transports.add(new InetSocketTransportAddress("node-1", 9300)); -transports.add(new InetSocketTransportAddress("node-2", 9300)); - -input.addSink(new ElasticsearchSink<>(config, transports, new IndexRequestBuilder<String>() { @Override public IndexRequest createIndexRequest(String element, RuntimeContext ctx) { Map<String, Object> json = new HashMap<>(); +input.addSink(new ElasticsearchSink<>(config, new ElasticsearchSinkFunction<String>() Unknown macro: { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); json.put("data", element); - + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } } )); {% endhighlight %} </div> <div data-lang="scala" markdown="1"> {% highlight scala %} val input: DataStream [String] = ... -val config = new util.HashMap [String, String] +val config = new java.util.HashMap [String, String] config.put("bulk.flush.max.actions", "1") config.put("cluster.name", "my-cluster-name") -val transports = new ArrayList [String] -transports.add(new InetSocketTransportAddress("node-1", 9300)) -transports.add(new InetSocketTransportAddress("node-2", 9300)) - -text.addSink(new ElasticsearchSink(config, transports, new IndexRequestBuilder [String] { - override def createIndexRequest(element: String, ctx: RuntimeContext): IndexRequest = { - val json = new util.HashMap [String, AnyRef] +input.addSink(new ElasticsearchSink(config, new ElasticsearchSinkFunction [String] { + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] json.put("data", element) - println("SENDING: " + element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); } })) {% endhighlight %} </div> </div> -The difference is that we now need to provide a list of Elasticsearch Nodes -to which the sink should connect using a `TransportClient`. +The difference is that now we do not need to provide a list of addresses +of Elasticsearch nodes. More information about Elasticsearch can be found [here] ( https://elastic.co ). + +#### Packaging the Elasticsearch Connector into an Uber-jar + +For the execution of your Flink program, +it is recommended to build a so-called uber-jar (executable jar) containing all your dependencies +(see [here] ( site.baseurl /dev/linking) for further information). + +However, +when an uber-jar containing an Elasticsearch sink is executed, +an `IllegalArgumentException` may occur, +which is caused by conflicting files of Elasticsearch and it's dependencies +in `META-INF/services`: + +``` +IllegalArgumentException[An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [es090, completion090, XBloomFilter] ] +``` + +If the uber-jar is build by means of Maven, +this issue can be avoided by adding the following bits to the pom file: + +``` +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> + <resource>META-INF/services/org.apache.lucene.codecs.Codec</resource> +</transformer> +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> + <resource>META-INF/services/org.apache.lucene.codecs.DocValuesFormat</resource> +</transformer> +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> + <resource>META-INF/services/org.apache.lucene.codecs.PostingsFormat</resource> +</transformer> +``` — End diff – +1 for explaining how to resolve this issue! It would be good if you could mention in the docs where to add the transformers exactly. Also, I'm wondering if you could use the ServicesTransformer instead: http://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95972691

          — Diff: flink-connectors/flink-connector-elasticsearch5/src/test/resources/log4j2.properties —
          @@ -0,0 +1,27 @@
          +################################################################################
          +# Licensed to the Apache Software Foundation (ASF) under one
          +# or more contributor license agreements. See the NOTICE file
          +# distributed with this work for additional information
          +# regarding copyright ownership. The ASF licenses this file
          +# to you under the Apache License, Version 2.0 (the
          +# "License"); you may not use this file except in compliance
          +# with the License. You may obtain a copy of the License at
          +#
          +# http://www.apache.org/licenses/LICENSE-2.0
          +#
          +# Unless required by applicable law or agreed to in writing, software
          +# distributed under the License is distributed on an "AS IS" BASIS,
          +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          +# See the License for the specific language governing permissions and
          +# limitations under the License.
          +################################################################################
          +
          +log4j.rootLogger=OFF, testlogger
          — End diff –

          Are you sure this is a valid log4j2 example?

          Log4j2 files seem to look differently: http://howtodoinjava.com/log4j2/log4j-2-properties-file-configuration-example/

          Is the ES5 connector pulling log4j2 as a dependency? Can we avoid that?

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95972691 — Diff: flink-connectors/flink-connector-elasticsearch5/src/test/resources/log4j2.properties — @@ -0,0 +1,27 @@ +################################################################################ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +################################################################################ + +log4j.rootLogger=OFF, testlogger — End diff – Are you sure this is a valid log4j2 example? Log4j2 files seem to look differently: http://howtodoinjava.com/log4j2/log4j-2-properties-file-configuration-example/ Is the ES5 connector pulling log4j2 as a dependency? Can we avoid that?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95973065

          — Diff: flink-connectors/flink-connector-elasticsearch5/src/main/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchClientFactoryImpl.java —
          @@ -0,0 +1,88 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.streaming.connectors.elasticsearch5;
          +
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchClientFactory;
          +import org.apache.flink.streaming.connectors.elasticsearch.util.ElasticsearchUtils;
          +import org.apache.flink.util.Preconditions;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.client.transport.TransportClient;
          +import org.elasticsearch.common.network.NetworkModule;
          +import org.elasticsearch.common.settings.Settings;
          +import org.elasticsearch.common.transport.TransportAddress;
          +import org.elasticsearch.transport.Netty3Plugin;
          +import org.elasticsearch.transport.client.PreBuiltTransportClient;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.net.InetSocketAddress;
          +import java.util.List;
          +import java.util.Map;
          +
          +/**
          + * Concrete implementation of

          {@link ElasticsearchClientFactory}

          for Elasticsearch version 5.x.
          + *
          + * Flink Elasticsearch Sink for versions 5.x uses a

          {@link TransportClient}

          for communication with an Elasticsearch cluster.
          + */
          +class ElasticsearchClientFactoryImpl implements ElasticsearchClientFactory {
          +
          + private static final long serialVersionUID = -7185607275081428567L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchClientFactoryImpl.class);
          +
          + /**
          + * User-provided transport addresses.
          + *
          + * We are using

          {@link InetSocketAddress}

          because

          {@link TransportAddress}

          is not serializable in Elasticsearch 5.x.
          — End diff –

          Tab indentation

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95973065 — Diff: flink-connectors/flink-connector-elasticsearch5/src/main/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchClientFactoryImpl.java — @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.elasticsearch5; + +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchClientFactory; +import org.apache.flink.streaming.connectors.elasticsearch.util.ElasticsearchUtils; +import org.apache.flink.util.Preconditions; +import org.elasticsearch.client.Client; +import org.elasticsearch.client.transport.TransportClient; +import org.elasticsearch.common.network.NetworkModule; +import org.elasticsearch.common.settings.Settings; +import org.elasticsearch.common.transport.TransportAddress; +import org.elasticsearch.transport.Netty3Plugin; +import org.elasticsearch.transport.client.PreBuiltTransportClient; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.InetSocketAddress; +import java.util.List; +import java.util.Map; + +/** + * Concrete implementation of {@link ElasticsearchClientFactory} for Elasticsearch version 5.x. + * + * Flink Elasticsearch Sink for versions 5.x uses a {@link TransportClient} for communication with an Elasticsearch cluster. + */ +class ElasticsearchClientFactoryImpl implements ElasticsearchClientFactory { + + private static final long serialVersionUID = -7185607275081428567L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchClientFactoryImpl.class); + + /** + * User-provided transport addresses. + * + * We are using {@link InetSocketAddress} because {@link TransportAddress} is not serializable in Elasticsearch 5.x. — End diff – Tab indentation
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95977306

          — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java —
          @@ -0,0 +1,235 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.streaming.connectors.elasticsearch;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.InstantiationUtil;
          +import org.elasticsearch.action.ActionRequest;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +import static org.apache.flink.util.Preconditions.checkNotNull;
          +
          +/**
          + * Base class for all Flink Elasticsearch Sinks.
          + *
          + * <p>
          + * This class implements the common behaviour across Elasticsearch versions, such as
          + * the use of an internal

          {@link BulkProcessor}

          to buffer multiple

          {@link ActionRequest}

          s before
          + * sending the requests to the cluster, as well as passing input records to the user provided
          + *

          {@link ElasticsearchSinkFunction}

          for processing.
          + *
          + * <p>
          + * The version specific behaviours for creating a

          {@link Client}

          to connect to a Elasticsearch cluster
          + * should be defined by concrete implementations of a

          {@link ElasticsearchClientFactory}

          , which is to be provided to the
          + * constructor of this class.
          + *
          + * @param <T> Type of the elements emitted by this sink
          + */
          +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> {
          +
          + private static final long serialVersionUID = -1007596293618451942L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class);
          +
          + // ------------------------------------------------------------------------
          + // Internal bulk processor configuration
          + // ------------------------------------------------------------------------
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private final Integer bulkProcessorFlushMaxActions;
          + private final Integer bulkProcessorFlushMaxSizeMb;
          + private final Integer bulkProcessorFlushIntervalMillis;
          — End diff –

          I'll change this to `int` and use special values to represent that the user hasn't set a value.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95977306 — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java — @@ -0,0 +1,235 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.elasticsearch; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.InstantiationUtil; +import org.elasticsearch.action.ActionRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * Base class for all Flink Elasticsearch Sinks. + * + * <p> + * This class implements the common behaviour across Elasticsearch versions, such as + * the use of an internal {@link BulkProcessor} to buffer multiple {@link ActionRequest} s before + * sending the requests to the cluster, as well as passing input records to the user provided + * {@link ElasticsearchSinkFunction} for processing. + * + * <p> + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory} , which is to be provided to the + * constructor of this class. + * + * @param <T> Type of the elements emitted by this sink + */ +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> { + + private static final long serialVersionUID = -1007596293618451942L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class); + + // ------------------------------------------------------------------------ + // Internal bulk processor configuration + // ------------------------------------------------------------------------ + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private final Integer bulkProcessorFlushMaxActions; + private final Integer bulkProcessorFlushMaxSizeMb; + private final Integer bulkProcessorFlushIntervalMillis; — End diff – I'll change this to `int` and use special values to represent that the user hasn't set a value.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95977656

          — Diff: flink-connectors/flink-connector-elasticsearch5/src/test/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSinkITCase.java —
          @@ -0,0 +1,68 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.streaming.connectors.elasticsearch5;
          +
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase;
          +import org.junit.Test;
          +
          +import java.net.InetAddress;
          +import java.net.InetSocketAddress;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +
          +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase {
          +
          + @Test
          + public void testTransportClient() throws Exception

          { + runTransportClientTest(); + }

          +
          + @Test
          + public void testNullTransportClient() throws Exception

          { + runNullTransportClientTest(); + }

          +
          + @Test
          + public void testEmptyTransportClient() throws Exception

          { + runEmptyTransportClientTest(); + }

          +
          + @Test
          + public void testTransportClientFails() throws Exception

          { + runTransportClientFailsTest(); + }

          +
          + @Override
          + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig,
          + List<InetSocketAddress> transportAddresses,
          + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          — End diff –

          Do you mean empty line? I don't think there are trailing whitespaces.
          It isn't an empty line actually, the method parameters are at the right.
          Need to drag the horizontal scroll to see it, otherwise on the IDE the seem to be aligned.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95977656 — Diff: flink-connectors/flink-connector-elasticsearch5/src/test/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSinkITCase.java — @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.streaming.connectors.elasticsearch5; + +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase; +import org.junit.Test; + +import java.net.InetAddress; +import java.net.InetSocketAddress; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase { + + @Test + public void testTransportClient() throws Exception { + runTransportClientTest(); + } + + @Test + public void testNullTransportClient() throws Exception { + runNullTransportClientTest(); + } + + @Test + public void testEmptyTransportClient() throws Exception { + runEmptyTransportClientTest(); + } + + @Test + public void testTransportClientFails() throws Exception { + runTransportClientFailsTest(); + } + + @Override + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig, + List<InetSocketAddress> transportAddresses, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { — End diff – Do you mean empty line? I don't think there are trailing whitespaces. It isn't an empty line actually, the method parameters are at the right. Need to drag the horizontal scroll to see it, otherwise on the IDE the seem to be aligned.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95977827

          — Diff: flink-connectors/flink-connector-elasticsearch5/src/test/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSinkITCase.java —
          @@ -0,0 +1,68 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.streaming.connectors.elasticsearch5;
          +
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase;
          +import org.junit.Test;
          +
          +import java.net.InetAddress;
          +import java.net.InetSocketAddress;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +
          +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase {
          +
          + @Test
          + public void testTransportClient() throws Exception

          { + runTransportClientTest(); + }

          +
          + @Test
          + public void testNullTransportClient() throws Exception

          { + runNullTransportClientTest(); + }

          +
          + @Test
          + public void testEmptyTransportClient() throws Exception

          { + runEmptyTransportClientTest(); + }

          +
          + @Test
          + public void testTransportClientFails() throws Exception

          { + runTransportClientFailsTest(); + }

          +
          + @Override
          + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig,
          + List<InetSocketAddress> transportAddresses,
          + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          — End diff –

          I'll double-check the styling in this PR nevertheless.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95977827 — Diff: flink-connectors/flink-connector-elasticsearch5/src/test/java/org/apache/flink/streaming/connectors/elasticsearch5/ElasticsearchSinkITCase.java — @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.streaming.connectors.elasticsearch5; + +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase; +import org.junit.Test; + +import java.net.InetAddress; +import java.net.InetSocketAddress; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase { + + @Test + public void testTransportClient() throws Exception { + runTransportClientTest(); + } + + @Test + public void testNullTransportClient() throws Exception { + runNullTransportClientTest(); + } + + @Test + public void testEmptyTransportClient() throws Exception { + runEmptyTransportClientTest(); + } + + @Test + public void testTransportClientFails() throws Exception { + runTransportClientFailsTest(); + } + + @Override + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig, + List<InetSocketAddress> transportAddresses, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { — End diff – I'll double-check the styling in this PR nevertheless.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95978090

          — Diff: flink-connectors/flink-connector-elasticsearch2/src/test/java/org/apache/flink/streaming/connectors/elasticsearch2/ElasticsearchSinkITCase.java —
          @@ -17,217 +17,51 @@
          */
          package org.apache.flink.streaming.connectors.elasticsearch2;

          -import org.apache.flink.api.common.functions.RuntimeContext;
          -import org.apache.flink.api.java.tuple.Tuple2;
          -import org.apache.flink.runtime.client.JobExecutionException;
          -import org.apache.flink.streaming.api.datastream.DataStreamSource;
          -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
          -import org.apache.flink.streaming.api.functions.source.SourceFunction;
          -import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase;
          -import org.elasticsearch.action.get.GetRequest;
          -import org.elasticsearch.action.get.GetResponse;
          -import org.elasticsearch.action.index.IndexRequest;
          -import org.elasticsearch.client.Client;
          -import org.elasticsearch.client.Requests;
          -import org.elasticsearch.common.settings.Settings;
          -import org.elasticsearch.node.Node;
          -import org.elasticsearch.node.NodeBuilder;
          -import org.junit.Assert;
          -import org.junit.ClassRule;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase;
          import org.junit.Test;
          -import org.junit.rules.TemporaryFolder;

          -import java.io.File;
          import java.net.InetAddress;
          import java.net.InetSocketAddress;
          import java.util.ArrayList;
          -import java.util.HashMap;
          import java.util.List;
          import java.util.Map;

          -public class ElasticsearchSinkITCase extends StreamingMultipleProgramsTestBase {
          -

          • private static final int NUM_ELEMENTS = 20;
            -
          • @ClassRule
          • public static TemporaryFolder tempFolder = new TemporaryFolder();
            +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase {

          @Test
          public void testTransportClient() throws Exception {
          -

          • File dataDir = tempFolder.newFolder();
            -
          • Node node = NodeBuilder.nodeBuilder()
          • .settings(Settings.settingsBuilder()
          • .put("path.home", dataDir.getParent())
          • .put("http.enabled", false)
          • .put("path.data", dataDir.getAbsolutePath()))
          • // set a custom cluster name to verify that user config works correctly
          • .clusterName("my-transport-client-cluster")
          • .node();
            -
          • final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            -
          • DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction());
            -
          • Map<String, String> config = new HashMap<>();
          • // This instructs the sink to emit after every element, otherwise they would be buffered
          • config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
          • config.put("cluster.name", "my-transport-client-cluster");
            -
          • // Can't use {@link TransportAddress}

            as its not Serializable in Elasticsearch 2.x

          • List<InetSocketAddress> transports = new ArrayList<>();
          • transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
            -
          • source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction()));
            -
          • env.execute("Elasticsearch TransportClient Test");
            -
          • // verify the results
          • Client client = node.client();
          • for (int i = 0; i < NUM_ELEMENTS; i++) { - GetResponse response = client.get(new GetRequest("my-index", - "my-type", Integer.toString(i))).actionGet(); - Assert.assertEquals("message #" + i, response.getSource().get("data")); - }

            -

          • node.close();
            + runTransportClientTest();
            }
          • @Test(expected = IllegalArgumentException.class)
          • public void testNullTransportClient() throws Exception {
            -
          • File dataDir = tempFolder.newFolder();
            -
          • Node node = NodeBuilder.nodeBuilder()
          • .settings(Settings.settingsBuilder()
          • .put("path.home", dataDir.getParent())
          • .put("http.enabled", false)
          • .put("path.data", dataDir.getAbsolutePath()))
          • // set a custom cluster name to verify that user config works correctly
          • .clusterName("my-transport-client-cluster")
          • .node();
            -
          • final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            -
          • DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction());
            -
          • Map<String, String> config = new HashMap<>();
          • // This instructs the sink to emit after every element, otherwise they would be buffered
          • config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
          • config.put("cluster.name", "my-transport-client-cluster");
            -
          • source.addSink(new ElasticsearchSink<>(config, null, new TestElasticsearchSinkFunction()));
            -
          • env.execute("Elasticsearch TransportClient Test");
            -
          • // verify the results
          • Client client = node.client();
          • for (int i = 0; i < NUM_ELEMENTS; i++) {
          • GetResponse response = client.get(new GetRequest("my-index",
          • "my-type", Integer.toString)).actionGet();
          • Assert.assertEquals("message #" + i, response.getSource().get("data"));
            + @Test
            + public void testNullTransportClient() throws Exception { + runNullTransportClientTest(); }
          • node.close();
          • }
            -
          • @Test(expected = IllegalArgumentException.class)
          • public void testEmptyTransportClient() throws Exception {
            -
          • File dataDir = tempFolder.newFolder();
            -
          • Node node = NodeBuilder.nodeBuilder()
          • .settings(Settings.settingsBuilder()
          • .put("path.home", dataDir.getParent())
          • .put("http.enabled", false)
          • .put("path.data", dataDir.getAbsolutePath()))
          • // set a custom cluster name to verify that user config works correctly
          • .clusterName("my-transport-client-cluster")
          • .node();
            -
          • final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            -
          • DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction());
            -
          • Map<String, String> config = new HashMap<>();
          • // This instructs the sink to emit after every element, otherwise they would be buffered
          • config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
          • config.put("cluster.name", "my-transport-client-cluster");
            -
          • source.addSink(new ElasticsearchSink<>(config, new ArrayList<InetSocketAddress>(), new TestElasticsearchSinkFunction()));
            -
          • env.execute("Elasticsearch TransportClient Test");
            -
          • // verify the results
          • Client client = node.client();
          • for (int i = 0; i < NUM_ELEMENTS; i++) {
          • GetResponse response = client.get(new GetRequest("my-index",
          • "my-type", Integer.toString)).actionGet();
          • Assert.assertEquals("message #" + i, response.getSource().get("data"));
            + @Test
            + public void testEmptyTransportClient() throws Exception { + runEmptyTransportClientTest(); }
          • node.close();
          • }
            -
          • @Test(expected = JobExecutionException.class)
            + @Test
            public void testTransportClientFails() throws Exception { - // this checks whether the TransportClient fails early when there is no cluster to - // connect to. There isn't a similar test for the Node Client version since that - // one will block and wait for a cluster to come online - - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - - Map<String, String> config = new HashMap<>(); - // This instructs the sink to emit after every element, otherwise they would be buffered - config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); - config.put("cluster.name", "my-node-client-cluster"); - - List<InetSocketAddress> transports = new ArrayList<>(); - transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); - - source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction())); - - env.execute("Elasticsearch Node Client Test"); + runTransportClientFailsTest(); }
          • private static class TestSourceFunction implements SourceFunction<Tuple2<Integer, String>> {
          • private static final long serialVersionUID = 1L;
            -
          • private volatile boolean running = true;
            -
          • @Override
          • public void run(SourceContext<Tuple2<Integer, String>> ctx) throws Exception {
          • for (int i = 0; i < NUM_ELEMENTS && running; i++) { - ctx.collect(Tuple2.of(i, "message #" + i)); - }
          • }
            -
          • @Override
          • public void cancel() { - running = false; - }

            + @Override
            + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig,
            + List<InetSocketAddress> transportAddresses,
            + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
            + return new ElasticsearchSink<>(userConfig, transportAddresses, elasticsearchSinkFunction);

              • End diff –

          Same as above comment: there don't seem to be trailing spaces.
          If you meant empty lines, there are actually method parameters on the far right.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95978090 — Diff: flink-connectors/flink-connector-elasticsearch2/src/test/java/org/apache/flink/streaming/connectors/elasticsearch2/ElasticsearchSinkITCase.java — @@ -17,217 +17,51 @@ */ package org.apache.flink.streaming.connectors.elasticsearch2; -import org.apache.flink.api.common.functions.RuntimeContext; -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.runtime.client.JobExecutionException; -import org.apache.flink.streaming.api.datastream.DataStreamSource; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase; -import org.elasticsearch.action.get.GetRequest; -import org.elasticsearch.action.get.GetResponse; -import org.elasticsearch.action.index.IndexRequest; -import org.elasticsearch.client.Client; -import org.elasticsearch.client.Requests; -import org.elasticsearch.common.settings.Settings; -import org.elasticsearch.node.Node; -import org.elasticsearch.node.NodeBuilder; -import org.junit.Assert; -import org.junit.ClassRule; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase; import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import java.io.File; import java.net.InetAddress; import java.net.InetSocketAddress; import java.util.ArrayList; -import java.util.HashMap; import java.util.List; import java.util.Map; -public class ElasticsearchSinkITCase extends StreamingMultipleProgramsTestBase { - private static final int NUM_ELEMENTS = 20; - @ClassRule public static TemporaryFolder tempFolder = new TemporaryFolder(); +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase { @Test public void testTransportClient() throws Exception { - File dataDir = tempFolder.newFolder(); - Node node = NodeBuilder.nodeBuilder() .settings(Settings.settingsBuilder() .put("path.home", dataDir.getParent()) .put("http.enabled", false) .put("path.data", dataDir.getAbsolutePath())) // set a custom cluster name to verify that user config works correctly .clusterName("my-transport-client-cluster") .node(); - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); config.put("cluster.name", "my-transport-client-cluster"); - // Can't use {@link TransportAddress} as its not Serializable in Elasticsearch 2.x List<InetSocketAddress> transports = new ArrayList<>(); transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); - source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction())); - env.execute("Elasticsearch TransportClient Test"); - // verify the results Client client = node.client(); for (int i = 0; i < NUM_ELEMENTS; i++) { - GetResponse response = client.get(new GetRequest("my-index", - "my-type", Integer.toString(i))).actionGet(); - Assert.assertEquals("message #" + i, response.getSource().get("data")); - } - node.close(); + runTransportClientTest(); } @Test(expected = IllegalArgumentException.class) public void testNullTransportClient() throws Exception { - File dataDir = tempFolder.newFolder(); - Node node = NodeBuilder.nodeBuilder() .settings(Settings.settingsBuilder() .put("path.home", dataDir.getParent()) .put("http.enabled", false) .put("path.data", dataDir.getAbsolutePath())) // set a custom cluster name to verify that user config works correctly .clusterName("my-transport-client-cluster") .node(); - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); config.put("cluster.name", "my-transport-client-cluster"); - source.addSink(new ElasticsearchSink<>(config, null, new TestElasticsearchSinkFunction())); - env.execute("Elasticsearch TransportClient Test"); - // verify the results Client client = node.client(); for (int i = 0; i < NUM_ELEMENTS; i++) { GetResponse response = client.get(new GetRequest("my-index", "my-type", Integer.toString )).actionGet(); Assert.assertEquals("message #" + i, response.getSource().get("data")); + @Test + public void testNullTransportClient() throws Exception { + runNullTransportClientTest(); } node.close(); } - @Test(expected = IllegalArgumentException.class) public void testEmptyTransportClient() throws Exception { - File dataDir = tempFolder.newFolder(); - Node node = NodeBuilder.nodeBuilder() .settings(Settings.settingsBuilder() .put("path.home", dataDir.getParent()) .put("http.enabled", false) .put("path.data", dataDir.getAbsolutePath())) // set a custom cluster name to verify that user config works correctly .clusterName("my-transport-client-cluster") .node(); - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); config.put("cluster.name", "my-transport-client-cluster"); - source.addSink(new ElasticsearchSink<>(config, new ArrayList<InetSocketAddress>(), new TestElasticsearchSinkFunction())); - env.execute("Elasticsearch TransportClient Test"); - // verify the results Client client = node.client(); for (int i = 0; i < NUM_ELEMENTS; i++) { GetResponse response = client.get(new GetRequest("my-index", "my-type", Integer.toString )).actionGet(); Assert.assertEquals("message #" + i, response.getSource().get("data")); + @Test + public void testEmptyTransportClient() throws Exception { + runEmptyTransportClientTest(); } node.close(); } - @Test(expected = JobExecutionException.class) + @Test public void testTransportClientFails() throws Exception { - // this checks whether the TransportClient fails early when there is no cluster to - // connect to. There isn't a similar test for the Node Client version since that - // one will block and wait for a cluster to come online - - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - - Map<String, String> config = new HashMap<>(); - // This instructs the sink to emit after every element, otherwise they would be buffered - config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); - config.put("cluster.name", "my-node-client-cluster"); - - List<InetSocketAddress> transports = new ArrayList<>(); - transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); - - source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction())); - - env.execute("Elasticsearch Node Client Test"); + runTransportClientFailsTest(); } private static class TestSourceFunction implements SourceFunction<Tuple2<Integer, String>> { private static final long serialVersionUID = 1L; - private volatile boolean running = true; - @Override public void run(SourceContext<Tuple2<Integer, String>> ctx) throws Exception { for (int i = 0; i < NUM_ELEMENTS && running; i++) { - ctx.collect(Tuple2.of(i, "message #" + i)); - } } - @Override public void cancel() { - running = false; - } + @Override + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig, + List<InetSocketAddress> transportAddresses, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { + return new ElasticsearchSink<>(userConfig, transportAddresses, elasticsearchSinkFunction); End diff – Same as above comment: there don't seem to be trailing spaces. If you meant empty lines, there are actually method parameters on the far right.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95978123

          — Diff: docs/dev/connectors/elasticsearch.md —
          @@ -23,158 +23,284 @@ specific language governing permissions and limitations
          under the License.
          -->

          -This connector provides a Sink that can write to an
          -[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
          -following dependency to your project:
          -
          -

          {% highlight xml %}

          -<dependency>

          • <groupId>org.apache.flink</groupId>
          • <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId>
          • <version>{{site.version }}</version>
            -</dependency>
            - {% endhighlight %}
            +This connector provides sinks that can request document actions to an
            +[Elasticsearch](https://elastic.co/) Index. To use this connector, add one
            +of the following dependencies to your project, depending on the version
            +of the Elasticsearch installation:
            +
            +<table class="table table-bordered">
            + <thead>
            + <tr>
            + <th class="text-left">Maven Dependency</th>
            + <th class="text-left">Supported since</th>
            + <th class="text-left">Elasticsearch version</th>
            + </tr>
            + </thead>
            + <tbody>
            + <tr>
            + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>1.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>2.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td>
            + <td>1.2.0</td>
            + <td>5.x</td>
            + </tr>
            + </tbody>
            +</table>

            Note that the streaming connectors are currently not part of the binary
            -distribution. See
            -[here](site.baseurl/dev/linking)
            -for information about how to package the program with the libraries for
            -cluster execution.
            +distribution. See [here](site.baseurl/dev/linking) for information
            +about how to package the program with the libraries for cluster execution.

            #### Installing Elasticsearch

            Instructions for setting up an Elasticsearch cluster can be found
            [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
            Make sure to set and remember a cluster name. This must be set when
            -creating a Sink for writing to your cluster
            +creating an `ElasticsearchSink` for requesting document actions against your cluster.

            #### Elasticsearch Sink
            -The connector provides a Sink that can send data to an Elasticsearch Index.
            -
            -The sink can use two different methods for communicating with Elasticsearch:
            -
            -1. An embedded Node
            -2. The TransportClient

            -See [here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
            -for information about the differences between the two modes.
            +The `ElasticsearchSink` uses a `TransportClient` to communicate with an
            +Elasticsearch cluster.

            -This code shows how to create a sink that uses an embedded Node for
            -communication:
            +The example below shows how to configure and create a sink:

            <div class="codetabs" markdown="1">
            -<div data-lang="java" markdown="1">
            +<div data-lang="java, Elasticsearch 1.x" markdown="1">
            {% highlight java %}
            DataStream<String> input = ...;

            -Map<String, String> config = Maps.newHashMap();
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")
            // This instructs the sink to emit after every element, otherwise they would be buffered
            config.put("bulk.flush.max.actions", "1");
            -config.put("cluster.name", "my-cluster-name");

            -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() {
            - @Override
            - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) {
            - Map<String, Object> json = new HashMap<>();
            - json.put("data", element);
            +List<TransportAddress> transportAddresses = new ArrayList<String>();
            +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300));
            +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300));

            +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() {
            + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); }
            +
            + @Override
            + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + }
            }));
            {% endhighlight %}

            </div>
            -<div data-lang="scala" markdown="1">
            +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1">
            +

            {% highlight java %}
            +DataStream<String> input = ...;
            +
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")
            +// This instructs the sink to emit after every element, otherwise they would be buffered
            +config.put("bulk.flush.max.actions", "1");
            +
            +List<InetSocketAddress> transportAddresses = new ArrayList<>();
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300));
            +
            +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() {
            + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + }
            +
            + @Override
            + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + }
            +}));{% endhighlight %}
            +</div>
            +<div data-lang="scala, Elasticsearch 1.x" markdown="1">
            {% highlight scala %}
            val input: DataStream[String] = ...

            -val config = new util.HashMap[String, String]
            +val config = new java.util.HashMap[String, String]
            +config.put("cluster.name", "my-cluster-name")
            +// This instructs the sink to emit after every element, otherwise they would be buffered
            config.put("bulk.flush.max.actions", "1")
            +
            +val transportAddresses = new java.util.ArrayList[TransportAddress]
            +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300))
            +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300))
            +
            +input.addSink(new ElasticsearchSink(config, transportAddresses, new ElasticsearchSinkFunction[String] {
            + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] + json.put("data", element) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + }
            +}))
            +{% endhighlight %}
            +</div>
            +<div data-lang="scala, Elasticsearch 2.x / 5.x" markdown="1">
            +{% highlight scala %}
            +val input: DataStream[String] = ...
            +
            +val config = new java.util.HashMap[String, String]
            config.put("cluster.name", "my-cluster-name")
            +// This instructs the sink to emit after every element, otherwise they would be buffered
            +config.put("bulk.flush.max.actions", "1")
            +
            +val transportAddresses = new java.util.ArrayList[InetSocketAddress]
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300))
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300))

            -text.addSink(new ElasticsearchSink(config, new IndexRequestBuilder[String] {
            - override def createIndexRequest(element: String, ctx: RuntimeContext): IndexRequest = {
            - val json = new util.HashMap[String, AnyRef]
            +input.addSink(new ElasticsearchSink(config, transportAddresses, new ElasticsearchSinkFunction[String] {
            + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] json.put("data", element) - println("SENDING: " + element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); }
            }))
            {% endhighlight %}
            </div>
            </div>

            -Note how a Map of Strings is used to configure the Sink. The configuration keys
            -are documented in the Elasticsearch documentation
            +Note how a `Map` of `String`s is used to configure the `ElasticsearchSink`.
            +The configuration keys are documented in the Elasticsearch documentation
            [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html).
            Especially important is the `cluster.name` parameter that must correspond to
            the name of your cluster.

            -Internally, the sink uses a `BulkProcessor` to send index requests to the cluster.
            -This will buffer elements before sending a request to the cluster. The behaviour of the
            -`BulkProcessor` can be configured using these config keys:
            +Also note that the example only demonstrates performing a single index
            +request for each incoming element. Generally, the `ElasticsearchSinkFunction`
            +can be used to perform multiple requests of different types (ex.,
            +`DeleteRequest`, `UpdateRequest`, etc.).
            +
            +Internally, the sink uses a `BulkProcessor` to send acttion requests to the cluster.
            +This will buffer elements before sending them in bulk to the cluster. The behaviour of the
            +`BulkProcessor` can be set using these config keys in the provided `Map` configuration:
            * *bulk.flush.max.actions*: Maximum amount of elements to buffer
            * *bulk.flush.max.size.mb*: Maximum amount of data (in megabytes) to buffer
            * *bulk.flush.interval.ms*: Interval at which to flush data regardless of the other two
            settings in milliseconds

            -This example code does the same, but with a `TransportClient`:
            +#### Communication using Embedded Node (only for Elasticsearch 1.x)
            +
            +For Elasticsearch versions 1.x, communication using an embedded node is
            +also supported. See [here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
            +for information about the differences between communicating with Elasticsearch
            +with an embedded node and a `TransportClient`.
            +
            +Below is an example of how to create an `ElasticsearchSink` use an
            +embedded node instead of a `TransportClient`:

            <div class="codetabs" markdown="1">
            <div data-lang="java" markdown="1">
            {% highlight java %}

            DataStream<String> input = ...;

          -Map<String, String> config = Maps.newHashMap();
          +Map<String, String> config = new HashMap<>;
          // This instructs the sink to emit after every element, otherwise they would be buffered
          config.put("bulk.flush.max.actions", "1");
          config.put("cluster.name", "my-cluster-name");

          -List<TransportAddress> transports = new ArrayList<String>();
          -transports.add(new InetSocketTransportAddress("node-1", 9300));
          -transports.add(new InetSocketTransportAddress("node-2", 9300));
          -
          -input.addSink(new ElasticsearchSink<>(config, transports, new IndexRequestBuilder<String>() {

          • @Override
          • public IndexRequest createIndexRequest(String element, RuntimeContext ctx) {
          • Map<String, Object> json = new HashMap<>();
            +input.addSink(new ElasticsearchSink<>(config, new ElasticsearchSinkFunction<String>()
            Unknown macro: { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); json.put("data", element); - + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } }

            ));

            {% endhighlight %}
            </div>
            <div data-lang="scala" markdown="1">
            {% highlight scala %}
            val input: DataStream[String] = ...

            -val config = new util.HashMap[String, String]
            +val config = new java.util.HashMap[String, String]
            config.put("bulk.flush.max.actions", "1")
            config.put("cluster.name", "my-cluster-name")

            -val transports = new ArrayList[String]
            -transports.add(new InetSocketTransportAddress("node-1", 9300))
            -transports.add(new InetSocketTransportAddress("node-2", 9300))
            -
            -text.addSink(new ElasticsearchSink(config, transports, new IndexRequestBuilder[String] {
            - override def createIndexRequest(element: String, ctx: RuntimeContext): IndexRequest = {
            - val json = new util.HashMap[String, AnyRef]
            +input.addSink(new ElasticsearchSink(config, new ElasticsearchSinkFunction[String] {
            + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] json.put("data", element) - println("SENDING: " + element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); }
            }))
            {% endhighlight %}

            </div>
            </div>

          -The difference is that we now need to provide a list of Elasticsearch Nodes
          -to which the sink should connect using a `TransportClient`.
          +The difference is that now we do not need to provide a list of addresses
          +of Elasticsearch nodes.

          More information about Elasticsearch can be found [here](https://elastic.co).
          +
          +#### Packaging the Elasticsearch Connector into an Uber-jar
          +
          +For the execution of your Flink program,
          +it is recommended to build a so-called uber-jar (executable jar) containing all your dependencies
          +(see [here](site.baseurl/dev/linking) for further information).
          +
          +However,
          +when an uber-jar containing an Elasticsearch sink is executed,
          +an `IllegalArgumentException` may occur,
          +which is caused by conflicting files of Elasticsearch and it's dependencies
          +in `META-INF/services`:
          +
          +```
          +IllegalArgumentException[An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [es090, completion090, XBloomFilter]]
          +```
          +
          +If the uber-jar is build by means of Maven,
          +this issue can be avoided by adding the following bits to the pom file:
          +
          +```
          +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
          + <resource>META-INF/services/org.apache.lucene.codecs.Codec</resource>
          +</transformer>
          +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
          + <resource>META-INF/services/org.apache.lucene.codecs.DocValuesFormat</resource>
          +</transformer>
          +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
          + <resource>META-INF/services/org.apache.lucene.codecs.PostingsFormat</resource>
          +</transformer>
          +```
          — End diff –

          I'll try it out!

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95978123 — Diff: docs/dev/connectors/elasticsearch.md — @@ -23,158 +23,284 @@ specific language governing permissions and limitations under the License. --> -This connector provides a Sink that can write to an - [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add the -following dependency to your project: - - {% highlight xml %} -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId> <version>{{site.version }}</version> -</dependency> - {% endhighlight %} +This connector provides sinks that can request document actions to an + [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add one +of the following dependencies to your project, depending on the version +of the Elasticsearch installation: + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Maven Dependency</th> + <th class="text-left">Supported since</th> + <th class="text-left">Elasticsearch version</th> + </tr> + </thead> + <tbody> + <tr> + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>1.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>2.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td> + <td>1.2.0</td> + <td>5.x</td> + </tr> + </tbody> +</table> Note that the streaming connectors are currently not part of the binary -distribution. See - [here] ( site.baseurl /dev/linking) -for information about how to package the program with the libraries for -cluster execution. +distribution. See [here] ( site.baseurl /dev/linking) for information +about how to package the program with the libraries for cluster execution. #### Installing Elasticsearch Instructions for setting up an Elasticsearch cluster can be found [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html ). Make sure to set and remember a cluster name. This must be set when -creating a Sink for writing to your cluster +creating an `ElasticsearchSink` for requesting document actions against your cluster. #### Elasticsearch Sink -The connector provides a Sink that can send data to an Elasticsearch Index. - -The sink can use two different methods for communicating with Elasticsearch: - -1. An embedded Node -2. The TransportClient -See [here] ( https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html ) -for information about the differences between the two modes. +The `ElasticsearchSink` uses a `TransportClient` to communicate with an +Elasticsearch cluster. -This code shows how to create a sink that uses an embedded Node for -communication: +The example below shows how to configure and create a sink: <div class="codetabs" markdown="1"> -<div data-lang="java" markdown="1"> +<div data-lang="java, Elasticsearch 1.x" markdown="1"> {% highlight java %} DataStream<String> input = ...; -Map<String, String> config = Maps.newHashMap(); +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") // This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1"); -config.put("cluster.name", "my-cluster-name"); -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() { - @Override - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) { - Map<String, Object> json = new HashMap<>(); - json.put("data", element); +List<TransportAddress> transportAddresses = new ArrayList<String>(); +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300)); +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300)); +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } })); {% endhighlight %} </div> -<div data-lang="scala" markdown="1"> +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1"> + {% highlight java %} +DataStream<String> input = ...; + +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") +// This instructs the sink to emit after every element, otherwise they would be buffered +config.put("bulk.flush.max.actions", "1"); + +List<InetSocketAddress> transportAddresses = new ArrayList<>(); +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300)); + +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } +}));{% endhighlight %} +</div> +<div data-lang="scala, Elasticsearch 1.x" markdown="1"> {% highlight scala %} val input: DataStream [String] = ... -val config = new util.HashMap [String, String] +val config = new java.util.HashMap [String, String] +config.put("cluster.name", "my-cluster-name") +// This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1") + +val transportAddresses = new java.util.ArrayList [TransportAddress] +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300)) +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300)) + +input.addSink(new ElasticsearchSink(config, transportAddresses, new ElasticsearchSinkFunction [String] { + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] + json.put("data", element) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + } +})) +{% endhighlight %} +</div> +<div data-lang="scala, Elasticsearch 2.x / 5.x" markdown="1"> +{% highlight scala %} +val input: DataStream [String] = ... + +val config = new java.util.HashMap [String, String] config.put("cluster.name", "my-cluster-name") +// This instructs the sink to emit after every element, otherwise they would be buffered +config.put("bulk.flush.max.actions", "1") + +val transportAddresses = new java.util.ArrayList [InetSocketAddress] +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)) +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300)) -text.addSink(new ElasticsearchSink(config, new IndexRequestBuilder [String] { - override def createIndexRequest(element: String, ctx: RuntimeContext): IndexRequest = { - val json = new util.HashMap [String, AnyRef] +input.addSink(new ElasticsearchSink(config, transportAddresses, new ElasticsearchSinkFunction [String] { + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] json.put("data", element) - println("SENDING: " + element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); } })) {% endhighlight %} </div> </div> -Note how a Map of Strings is used to configure the Sink. The configuration keys -are documented in the Elasticsearch documentation +Note how a `Map` of `String`s is used to configure the `ElasticsearchSink`. +The configuration keys are documented in the Elasticsearch documentation [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html ). Especially important is the `cluster.name` parameter that must correspond to the name of your cluster. -Internally, the sink uses a `BulkProcessor` to send index requests to the cluster. -This will buffer elements before sending a request to the cluster. The behaviour of the -`BulkProcessor` can be configured using these config keys: +Also note that the example only demonstrates performing a single index +request for each incoming element. Generally, the `ElasticsearchSinkFunction` +can be used to perform multiple requests of different types (ex., +`DeleteRequest`, `UpdateRequest`, etc.). + +Internally, the sink uses a `BulkProcessor` to send acttion requests to the cluster. +This will buffer elements before sending them in bulk to the cluster. The behaviour of the +`BulkProcessor` can be set using these config keys in the provided `Map` configuration: * * bulk.flush.max.actions *: Maximum amount of elements to buffer * * bulk.flush.max.size.mb *: Maximum amount of data (in megabytes) to buffer * * bulk.flush.interval.ms *: Interval at which to flush data regardless of the other two settings in milliseconds -This example code does the same, but with a `TransportClient`: +#### Communication using Embedded Node (only for Elasticsearch 1.x) + +For Elasticsearch versions 1.x, communication using an embedded node is +also supported. See [here] ( https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html ) +for information about the differences between communicating with Elasticsearch +with an embedded node and a `TransportClient`. + +Below is an example of how to create an `ElasticsearchSink` use an +embedded node instead of a `TransportClient`: <div class="codetabs" markdown="1"> <div data-lang="java" markdown="1"> {% highlight java %} DataStream<String> input = ...; -Map<String, String> config = Maps.newHashMap(); +Map<String, String> config = new HashMap<>; // This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1"); config.put("cluster.name", "my-cluster-name"); -List<TransportAddress> transports = new ArrayList<String>(); -transports.add(new InetSocketTransportAddress("node-1", 9300)); -transports.add(new InetSocketTransportAddress("node-2", 9300)); - -input.addSink(new ElasticsearchSink<>(config, transports, new IndexRequestBuilder<String>() { @Override public IndexRequest createIndexRequest(String element, RuntimeContext ctx) { Map<String, Object> json = new HashMap<>(); +input.addSink(new ElasticsearchSink<>(config, new ElasticsearchSinkFunction<String>() Unknown macro: { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); json.put("data", element); - + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } } )); {% endhighlight %} </div> <div data-lang="scala" markdown="1"> {% highlight scala %} val input: DataStream [String] = ... -val config = new util.HashMap [String, String] +val config = new java.util.HashMap [String, String] config.put("bulk.flush.max.actions", "1") config.put("cluster.name", "my-cluster-name") -val transports = new ArrayList [String] -transports.add(new InetSocketTransportAddress("node-1", 9300)) -transports.add(new InetSocketTransportAddress("node-2", 9300)) - -text.addSink(new ElasticsearchSink(config, transports, new IndexRequestBuilder [String] { - override def createIndexRequest(element: String, ctx: RuntimeContext): IndexRequest = { - val json = new util.HashMap [String, AnyRef] +input.addSink(new ElasticsearchSink(config, new ElasticsearchSinkFunction [String] { + def createIndexRequest(element: String): IndexRequest = { + val json = new java.util.HashMap[String, String] json.put("data", element) - println("SENDING: " + element) - Requests.indexRequest.index("my-index").`type`("my-type").source(json) + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); } })) {% endhighlight %} </div> </div> -The difference is that we now need to provide a list of Elasticsearch Nodes -to which the sink should connect using a `TransportClient`. +The difference is that now we do not need to provide a list of addresses +of Elasticsearch nodes. More information about Elasticsearch can be found [here] ( https://elastic.co ). + +#### Packaging the Elasticsearch Connector into an Uber-jar + +For the execution of your Flink program, +it is recommended to build a so-called uber-jar (executable jar) containing all your dependencies +(see [here] ( site.baseurl /dev/linking) for further information). + +However, +when an uber-jar containing an Elasticsearch sink is executed, +an `IllegalArgumentException` may occur, +which is caused by conflicting files of Elasticsearch and it's dependencies +in `META-INF/services`: + +``` +IllegalArgumentException[An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [es090, completion090, XBloomFilter] ] +``` + +If the uber-jar is build by means of Maven, +this issue can be avoided by adding the following bits to the pom file: + +``` +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> + <resource>META-INF/services/org.apache.lucene.codecs.Codec</resource> +</transformer> +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> + <resource>META-INF/services/org.apache.lucene.codecs.DocValuesFormat</resource> +</transformer> +<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> + <resource>META-INF/services/org.apache.lucene.codecs.PostingsFormat</resource> +</transformer> +``` — End diff – I'll try it out!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95979032

          — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java —
          @@ -0,0 +1,235 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.streaming.connectors.elasticsearch;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.InstantiationUtil;
          +import org.elasticsearch.action.ActionRequest;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +import static org.apache.flink.util.Preconditions.checkNotNull;
          +
          +/**
          + * Base class for all Flink Elasticsearch Sinks.
          + *
          + * <p>
          + * This class implements the common behaviour across Elasticsearch versions, such as
          + * the use of an internal

          {@link BulkProcessor}

          to buffer multiple

          {@link ActionRequest}

          s before
          + * sending the requests to the cluster, as well as passing input records to the user provided
          + *

          {@link ElasticsearchSinkFunction}

          for processing.
          + *
          + * <p>
          + * The version specific behaviours for creating a

          {@link Client}

          to connect to a Elasticsearch cluster
          + * should be defined by concrete implementations of a

          {@link ElasticsearchClientFactory}

          , which is to be provided to the
          + * constructor of this class.
          + *
          + * @param <T> Type of the elements emitted by this sink
          + */
          +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> {
          +
          + private static final long serialVersionUID = -1007596293618451942L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class);
          +
          + // ------------------------------------------------------------------------
          + // Internal bulk processor configuration
          + // ------------------------------------------------------------------------
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private final Integer bulkProcessorFlushMaxActions;
          + private final Integer bulkProcessorFlushMaxSizeMb;
          + private final Integer bulkProcessorFlushIntervalMillis;
          — End diff –

          Ah, I see. You need boxed types for nullability. Then I would actually keep it as is.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95979032 — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java — @@ -0,0 +1,235 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.elasticsearch; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.InstantiationUtil; +import org.elasticsearch.action.ActionRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * Base class for all Flink Elasticsearch Sinks. + * + * <p> + * This class implements the common behaviour across Elasticsearch versions, such as + * the use of an internal {@link BulkProcessor} to buffer multiple {@link ActionRequest} s before + * sending the requests to the cluster, as well as passing input records to the user provided + * {@link ElasticsearchSinkFunction} for processing. + * + * <p> + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory} , which is to be provided to the + * constructor of this class. + * + * @param <T> Type of the elements emitted by this sink + */ +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> { + + private static final long serialVersionUID = -1007596293618451942L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class); + + // ------------------------------------------------------------------------ + // Internal bulk processor configuration + // ------------------------------------------------------------------------ + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private final Integer bulkProcessorFlushMaxActions; + private final Integer bulkProcessorFlushMaxSizeMb; + private final Integer bulkProcessorFlushIntervalMillis; — End diff – Ah, I see. You need boxed types for nullability. Then I would actually keep it as is.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95979205

          — Diff: docs/dev/connectors/elasticsearch2.md —
          @@ -1,173 +0,0 @@


          — End diff –

          Will do!

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95979205 — Diff: docs/dev/connectors/elasticsearch2.md — @@ -1,173 +0,0 @@ — End diff – Will do!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95979234

          — Diff: flink-connectors/flink-connector-elasticsearch2/src/test/java/org/apache/flink/streaming/connectors/elasticsearch2/ElasticsearchSinkITCase.java —
          @@ -17,217 +17,51 @@
          */
          package org.apache.flink.streaming.connectors.elasticsearch2;

          -import org.apache.flink.api.common.functions.RuntimeContext;
          -import org.apache.flink.api.java.tuple.Tuple2;
          -import org.apache.flink.runtime.client.JobExecutionException;
          -import org.apache.flink.streaming.api.datastream.DataStreamSource;
          -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
          -import org.apache.flink.streaming.api.functions.source.SourceFunction;
          -import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase;
          -import org.elasticsearch.action.get.GetRequest;
          -import org.elasticsearch.action.get.GetResponse;
          -import org.elasticsearch.action.index.IndexRequest;
          -import org.elasticsearch.client.Client;
          -import org.elasticsearch.client.Requests;
          -import org.elasticsearch.common.settings.Settings;
          -import org.elasticsearch.node.Node;
          -import org.elasticsearch.node.NodeBuilder;
          -import org.junit.Assert;
          -import org.junit.ClassRule;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;
          +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase;
          import org.junit.Test;
          -import org.junit.rules.TemporaryFolder;

          -import java.io.File;
          import java.net.InetAddress;
          import java.net.InetSocketAddress;
          import java.util.ArrayList;
          -import java.util.HashMap;
          import java.util.List;
          import java.util.Map;

          -public class ElasticsearchSinkITCase extends StreamingMultipleProgramsTestBase {
          -

          • private static final int NUM_ELEMENTS = 20;
            -
          • @ClassRule
          • public static TemporaryFolder tempFolder = new TemporaryFolder();
            +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase {

          @Test
          public void testTransportClient() throws Exception {
          -

          • File dataDir = tempFolder.newFolder();
            -
          • Node node = NodeBuilder.nodeBuilder()
          • .settings(Settings.settingsBuilder()
          • .put("path.home", dataDir.getParent())
          • .put("http.enabled", false)
          • .put("path.data", dataDir.getAbsolutePath()))
          • // set a custom cluster name to verify that user config works correctly
          • .clusterName("my-transport-client-cluster")
          • .node();
            -
          • final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            -
          • DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction());
            -
          • Map<String, String> config = new HashMap<>();
          • // This instructs the sink to emit after every element, otherwise they would be buffered
          • config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
          • config.put("cluster.name", "my-transport-client-cluster");
            -
          • // Can't use {@link TransportAddress}

            as its not Serializable in Elasticsearch 2.x

          • List<InetSocketAddress> transports = new ArrayList<>();
          • transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
            -
          • source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction()));
            -
          • env.execute("Elasticsearch TransportClient Test");
            -
          • // verify the results
          • Client client = node.client();
          • for (int i = 0; i < NUM_ELEMENTS; i++) { - GetResponse response = client.get(new GetRequest("my-index", - "my-type", Integer.toString(i))).actionGet(); - Assert.assertEquals("message #" + i, response.getSource().get("data")); - }

            -

          • node.close();
            + runTransportClientTest();
            }
          • @Test(expected = IllegalArgumentException.class)
          • public void testNullTransportClient() throws Exception {
            -
          • File dataDir = tempFolder.newFolder();
            -
          • Node node = NodeBuilder.nodeBuilder()
          • .settings(Settings.settingsBuilder()
          • .put("path.home", dataDir.getParent())
          • .put("http.enabled", false)
          • .put("path.data", dataDir.getAbsolutePath()))
          • // set a custom cluster name to verify that user config works correctly
          • .clusterName("my-transport-client-cluster")
          • .node();
            -
          • final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            -
          • DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction());
            -
          • Map<String, String> config = new HashMap<>();
          • // This instructs the sink to emit after every element, otherwise they would be buffered
          • config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
          • config.put("cluster.name", "my-transport-client-cluster");
            -
          • source.addSink(new ElasticsearchSink<>(config, null, new TestElasticsearchSinkFunction()));
            -
          • env.execute("Elasticsearch TransportClient Test");
            -
          • // verify the results
          • Client client = node.client();
          • for (int i = 0; i < NUM_ELEMENTS; i++) {
          • GetResponse response = client.get(new GetRequest("my-index",
          • "my-type", Integer.toString)).actionGet();
          • Assert.assertEquals("message #" + i, response.getSource().get("data"));
            + @Test
            + public void testNullTransportClient() throws Exception { + runNullTransportClientTest(); }
          • node.close();
          • }
            -
          • @Test(expected = IllegalArgumentException.class)
          • public void testEmptyTransportClient() throws Exception {
            -
          • File dataDir = tempFolder.newFolder();
            -
          • Node node = NodeBuilder.nodeBuilder()
          • .settings(Settings.settingsBuilder()
          • .put("path.home", dataDir.getParent())
          • .put("http.enabled", false)
          • .put("path.data", dataDir.getAbsolutePath()))
          • // set a custom cluster name to verify that user config works correctly
          • .clusterName("my-transport-client-cluster")
          • .node();
            -
          • final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            -
          • DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction());
            -
          • Map<String, String> config = new HashMap<>();
          • // This instructs the sink to emit after every element, otherwise they would be buffered
          • config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1");
          • config.put("cluster.name", "my-transport-client-cluster");
            -
          • source.addSink(new ElasticsearchSink<>(config, new ArrayList<InetSocketAddress>(), new TestElasticsearchSinkFunction()));
            -
          • env.execute("Elasticsearch TransportClient Test");
            -
          • // verify the results
          • Client client = node.client();
          • for (int i = 0; i < NUM_ELEMENTS; i++) {
          • GetResponse response = client.get(new GetRequest("my-index",
          • "my-type", Integer.toString)).actionGet();
          • Assert.assertEquals("message #" + i, response.getSource().get("data"));
            + @Test
            + public void testEmptyTransportClient() throws Exception { + runEmptyTransportClientTest(); }
          • node.close();
          • }
            -
          • @Test(expected = JobExecutionException.class)
            + @Test
            public void testTransportClientFails() throws Exception { - // this checks whether the TransportClient fails early when there is no cluster to - // connect to. There isn't a similar test for the Node Client version since that - // one will block and wait for a cluster to come online - - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - - Map<String, String> config = new HashMap<>(); - // This instructs the sink to emit after every element, otherwise they would be buffered - config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); - config.put("cluster.name", "my-node-client-cluster"); - - List<InetSocketAddress> transports = new ArrayList<>(); - transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); - - source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction())); - - env.execute("Elasticsearch Node Client Test"); + runTransportClientFailsTest(); }
          • private static class TestSourceFunction implements SourceFunction<Tuple2<Integer, String>> {
          • private static final long serialVersionUID = 1L;
            -
          • private volatile boolean running = true;
            -
          • @Override
          • public void run(SourceContext<Tuple2<Integer, String>> ctx) throws Exception {
          • for (int i = 0; i < NUM_ELEMENTS && running; i++) { - ctx.collect(Tuple2.of(i, "message #" + i)); - }
          • }
            -
          • @Override
          • public void cancel() { - running = false; - }

            + @Override
            + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig,
            + List<InetSocketAddress> transportAddresses,
            + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
            + return new ElasticsearchSink<>(userConfig, transportAddresses, elasticsearchSinkFunction);

              • End diff –

          I mean the two newlines, which are empty

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95979234 — Diff: flink-connectors/flink-connector-elasticsearch2/src/test/java/org/apache/flink/streaming/connectors/elasticsearch2/ElasticsearchSinkITCase.java — @@ -17,217 +17,51 @@ */ package org.apache.flink.streaming.connectors.elasticsearch2; -import org.apache.flink.api.common.functions.RuntimeContext; -import org.apache.flink.api.java.tuple.Tuple2; -import org.apache.flink.runtime.client.JobExecutionException; -import org.apache.flink.streaming.api.datastream.DataStreamSource; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase; -import org.elasticsearch.action.get.GetRequest; -import org.elasticsearch.action.get.GetResponse; -import org.elasticsearch.action.index.IndexRequest; -import org.elasticsearch.client.Client; -import org.elasticsearch.client.Requests; -import org.elasticsearch.common.settings.Settings; -import org.elasticsearch.node.Node; -import org.elasticsearch.node.NodeBuilder; -import org.junit.Assert; -import org.junit.ClassRule; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction; +import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase; import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import java.io.File; import java.net.InetAddress; import java.net.InetSocketAddress; import java.util.ArrayList; -import java.util.HashMap; import java.util.List; import java.util.Map; -public class ElasticsearchSinkITCase extends StreamingMultipleProgramsTestBase { - private static final int NUM_ELEMENTS = 20; - @ClassRule public static TemporaryFolder tempFolder = new TemporaryFolder(); +public class ElasticsearchSinkITCase extends ElasticsearchSinkTestBase { @Test public void testTransportClient() throws Exception { - File dataDir = tempFolder.newFolder(); - Node node = NodeBuilder.nodeBuilder() .settings(Settings.settingsBuilder() .put("path.home", dataDir.getParent()) .put("http.enabled", false) .put("path.data", dataDir.getAbsolutePath())) // set a custom cluster name to verify that user config works correctly .clusterName("my-transport-client-cluster") .node(); - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); config.put("cluster.name", "my-transport-client-cluster"); - // Can't use {@link TransportAddress} as its not Serializable in Elasticsearch 2.x List<InetSocketAddress> transports = new ArrayList<>(); transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); - source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction())); - env.execute("Elasticsearch TransportClient Test"); - // verify the results Client client = node.client(); for (int i = 0; i < NUM_ELEMENTS; i++) { - GetResponse response = client.get(new GetRequest("my-index", - "my-type", Integer.toString(i))).actionGet(); - Assert.assertEquals("message #" + i, response.getSource().get("data")); - } - node.close(); + runTransportClientTest(); } @Test(expected = IllegalArgumentException.class) public void testNullTransportClient() throws Exception { - File dataDir = tempFolder.newFolder(); - Node node = NodeBuilder.nodeBuilder() .settings(Settings.settingsBuilder() .put("path.home", dataDir.getParent()) .put("http.enabled", false) .put("path.data", dataDir.getAbsolutePath())) // set a custom cluster name to verify that user config works correctly .clusterName("my-transport-client-cluster") .node(); - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); config.put("cluster.name", "my-transport-client-cluster"); - source.addSink(new ElasticsearchSink<>(config, null, new TestElasticsearchSinkFunction())); - env.execute("Elasticsearch TransportClient Test"); - // verify the results Client client = node.client(); for (int i = 0; i < NUM_ELEMENTS; i++) { GetResponse response = client.get(new GetRequest("my-index", "my-type", Integer.toString )).actionGet(); Assert.assertEquals("message #" + i, response.getSource().get("data")); + @Test + public void testNullTransportClient() throws Exception { + runNullTransportClientTest(); } node.close(); } - @Test(expected = IllegalArgumentException.class) public void testEmptyTransportClient() throws Exception { - File dataDir = tempFolder.newFolder(); - Node node = NodeBuilder.nodeBuilder() .settings(Settings.settingsBuilder() .put("path.home", dataDir.getParent()) .put("http.enabled", false) .put("path.data", dataDir.getAbsolutePath())) // set a custom cluster name to verify that user config works correctly .clusterName("my-transport-client-cluster") .node(); - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - Map<String, String> config = new HashMap<>(); // This instructs the sink to emit after every element, otherwise they would be buffered config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); config.put("cluster.name", "my-transport-client-cluster"); - source.addSink(new ElasticsearchSink<>(config, new ArrayList<InetSocketAddress>(), new TestElasticsearchSinkFunction())); - env.execute("Elasticsearch TransportClient Test"); - // verify the results Client client = node.client(); for (int i = 0; i < NUM_ELEMENTS; i++) { GetResponse response = client.get(new GetRequest("my-index", "my-type", Integer.toString )).actionGet(); Assert.assertEquals("message #" + i, response.getSource().get("data")); + @Test + public void testEmptyTransportClient() throws Exception { + runEmptyTransportClientTest(); } node.close(); } - @Test(expected = JobExecutionException.class) + @Test public void testTransportClientFails() throws Exception { - // this checks whether the TransportClient fails early when there is no cluster to - // connect to. There isn't a similar test for the Node Client version since that - // one will block and wait for a cluster to come online - - final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - DataStreamSource<Tuple2<Integer, String>> source = env.addSource(new TestSourceFunction()); - - Map<String, String> config = new HashMap<>(); - // This instructs the sink to emit after every element, otherwise they would be buffered - config.put(ElasticsearchSink.CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS, "1"); - config.put("cluster.name", "my-node-client-cluster"); - - List<InetSocketAddress> transports = new ArrayList<>(); - transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); - - source.addSink(new ElasticsearchSink<>(config, transports, new TestElasticsearchSinkFunction())); - - env.execute("Elasticsearch Node Client Test"); + runTransportClientFailsTest(); } private static class TestSourceFunction implements SourceFunction<Tuple2<Integer, String>> { private static final long serialVersionUID = 1L; - private volatile boolean running = true; - @Override public void run(SourceContext<Tuple2<Integer, String>> ctx) throws Exception { for (int i = 0; i < NUM_ELEMENTS && running; i++) { - ctx.collect(Tuple2.of(i, "message #" + i)); - } } - @Override public void cancel() { - running = false; - } + @Override + protected <T> ElasticsearchSinkBase<T> createElasticsearchSink(Map<String, String> userConfig, + List<InetSocketAddress> transportAddresses, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { + return new ElasticsearchSink<>(userConfig, transportAddresses, elasticsearchSinkFunction); End diff – I mean the two newlines, which are empty
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95979634

          — Diff: flink-connectors/pom.xml —
          @@ -85,6 +86,17 @@ under the License.
          <module>flink-connector-kinesis</module>
          </modules>
          </profile>
          +
          + <!--
          + Since Elasticsearch 5.x requires Java 8 at a minimum, we use this profile
          + to include it as part of Java 8 builds only.
          + -->
          + <profile>
          + <id>include-elasticsearch5</id>
          — End diff –

          Can you make this a profile that activates itself automatically when java 8 is available?
          http://maven.apache.org/guides/introduction/introduction-to-profiles.html

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95979634 — Diff: flink-connectors/pom.xml — @@ -85,6 +86,17 @@ under the License. <module>flink-connector-kinesis</module> </modules> </profile> + + <!-- + Since Elasticsearch 5.x requires Java 8 at a minimum, we use this profile + to include it as part of Java 8 builds only. + --> + <profile> + <id>include-elasticsearch5</id> — End diff – Can you make this a profile that activates itself automatically when java 8 is available? http://maven.apache.org/guides/introduction/introduction-to-profiles.html
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r95979696

          — Diff: .travis.yml —
          @@ -16,14 +16,14 @@ matrix:
          include:

          1. Always run test groups A and B together
          • jdk: "oraclejdk8"
          • env: PROFILE="-Dhadoop.version=2.7.2 -Dscala-2.11 -Pinclude-yarn-tests,flink-fast-tests-a,include-kinesis -Dmaven.javadoc.skip=true"
            + env: PROFILE="-Dhadoop.version=2.7.2 -Dscala-2.11 -Pinclude-yarn-tests,flink-fast-tests-a,include-kinesis,include-elasticsearch5 -Dmaven.javadoc.skip=true"
          • jdk: "oraclejdk8"
          • env: PROFILE="-Dhadoop.version=2.7.2 -Dscala-2.11 -Pinclude-yarn-tests,flink-fast-tests-b,include-kinesis -Dmaven.javadoc.skip=true"
            + env: PROFILE="-Dhadoop.version=2.7.2 -Dscala-2.11 -Pinclude-yarn-tests,flink-fast-tests-b,include-kinesis,include-elasticsearch5 -Dmaven.javadoc.skip=true"
          • jdk: "oraclejdk8"
          • env: PROFILE="-Dhadoop.version=2.6.3 -Pinclude-yarn-tests,flink-fast-tests-a,include-kinesis -Dmaven.javadoc.skip=true"
            + env: PROFILE="-Dhadoop.version=2.6.3 -Pinclude-yarn-tests,flink-fast-tests-a,include-kinesis,include-elasticsearch5 -Dmaven.javadoc.skip=true"
          • jdk: "oraclejdk8"
          • env: PROFILE="-Dhadoop.version=2.6.3 -Pinclude-yarn-tests,flink-fast-tests-b,include-kinesis -Dmaven.javadoc.skip=true"
            + env: PROFILE="-Dhadoop.version=2.6.3 -Pinclude-yarn-tests,flink-fast-tests-b,include-kinesis,include-elasticsearch5 -Dmaven.javadoc.skip=true"
              • End diff –

          With the automatic activation, you don't need these changes (see my other comment at the profile def)

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r95979696 — Diff: .travis.yml — @@ -16,14 +16,14 @@ matrix: include: Always run test groups A and B together jdk: "oraclejdk8" env: PROFILE="-Dhadoop.version=2.7.2 -Dscala-2.11 -Pinclude-yarn-tests,flink-fast-tests-a,include-kinesis -Dmaven.javadoc.skip=true" + env: PROFILE="-Dhadoop.version=2.7.2 -Dscala-2.11 -Pinclude-yarn-tests,flink-fast-tests-a,include-kinesis,include-elasticsearch5 -Dmaven.javadoc.skip=true" jdk: "oraclejdk8" env: PROFILE="-Dhadoop.version=2.7.2 -Dscala-2.11 -Pinclude-yarn-tests,flink-fast-tests-b,include-kinesis -Dmaven.javadoc.skip=true" + env: PROFILE="-Dhadoop.version=2.7.2 -Dscala-2.11 -Pinclude-yarn-tests,flink-fast-tests-b,include-kinesis,include-elasticsearch5 -Dmaven.javadoc.skip=true" jdk: "oraclejdk8" env: PROFILE="-Dhadoop.version=2.6.3 -Pinclude-yarn-tests,flink-fast-tests-a,include-kinesis -Dmaven.javadoc.skip=true" + env: PROFILE="-Dhadoop.version=2.6.3 -Pinclude-yarn-tests,flink-fast-tests-a,include-kinesis,include-elasticsearch5 -Dmaven.javadoc.skip=true" jdk: "oraclejdk8" env: PROFILE="-Dhadoop.version=2.6.3 -Pinclude-yarn-tests,flink-fast-tests-b,include-kinesis -Dmaven.javadoc.skip=true" + env: PROFILE="-Dhadoop.version=2.6.3 -Pinclude-yarn-tests,flink-fast-tests-b,include-kinesis,include-elasticsearch5 -Dmaven.javadoc.skip=true" End diff – With the automatic activation, you don't need these changes (see my other comment at the profile def)
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          Hi @rmetzger, I've addressed your review comments. I still have some problems however, on the following:

          1. I can't really reproduce the problem with the conflicts in `META-INF/services` dependencies, hence couldn't really try out whether that actually works correctly or if would actually use the `ServiceResourceTransformer` instead of `AppenderTransformer`. From the Maven documentation and what I understand from what the original ES2 documentation was trying to address, I think we can use the `ServiceResourceTransformer`. What do you think?

          2. Regarding Log4j 2 dependency: That is required for the ES 5 Java client to log correctly, since the ES 5 Java API now uses Log4j 2 and does not detect logging implementations, must have a Log4j 2 API in the classpath. So the way I think the ES 5 connector is working is that the connector logs and Flink itself are logging using Log4j 1, while the internally used ES Java client is using Log4j 2 included exclusively in ES 5's POM.

          I am still figuring out how to get the internal ES Java client in the ES 5 connector to log to TaskManager logs when using cluster execution, though (the connector log and Flink log is correctly logged, only the ES Java client log is missing).

          I've included this `log4j2.properties` in a test project to be packaged for execution:
          ```
          appender.file.type=File
          appender.file.filename=$

          {log.file}

          appender.file.name=file
          appender.file.layout.type=PatternLayout
          appender.file.layout.pattern=%-4r [%t] %-5p %c %x - %m%n

          rootLogger.level=info
          rootLogger.appenderRef.file.ref=file
          ```

          Somehow, it isn't picking the `log.file` property, which is set by the `flink-daemon.sh` script as a system property. Changing `log.file` to some another specific file path works.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 Hi @rmetzger, I've addressed your review comments. I still have some problems however, on the following: 1. I can't really reproduce the problem with the conflicts in `META-INF/services` dependencies, hence couldn't really try out whether that actually works correctly or if would actually use the `ServiceResourceTransformer` instead of `AppenderTransformer`. From the Maven documentation and what I understand from what the original ES2 documentation was trying to address, I think we can use the `ServiceResourceTransformer`. What do you think? 2. Regarding Log4j 2 dependency: That is required for the ES 5 Java client to log correctly, since the ES 5 Java API now uses Log4j 2 and does not detect logging implementations, must have a Log4j 2 API in the classpath. So the way I think the ES 5 connector is working is that the connector logs and Flink itself are logging using Log4j 1, while the internally used ES Java client is using Log4j 2 included exclusively in ES 5's POM. I am still figuring out how to get the internal ES Java client in the ES 5 connector to log to TaskManager logs when using cluster execution, though (the connector log and Flink log is correctly logged, only the ES Java client log is missing). I've included this `log4j2.properties` in a test project to be packaged for execution: ``` appender.file.type=File appender.file.filename=$ {log.file} appender.file.name=file appender.file.layout.type=PatternLayout appender.file.layout.pattern=%-4r [%t] %-5p %c %x - %m%n rootLogger.level=info rootLogger.appenderRef.file.ref=file ``` Somehow, it isn't picking the `log.file` property, which is set by the `flink-daemon.sh` script as a system property. Changing `log.file` to some another specific file path works.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/3112

          Gordon and I had a quick offline chat about this and decided the following:
          1. We'll use the `ServiceResourceTransformer` because that's the correct way of solving the problem
          2. There is a log4j2 to SLF4J bridge we'll try to use.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3112 Gordon and I had a quick offline chat about this and decided the following: 1. We'll use the `ServiceResourceTransformer` because that's the correct way of solving the problem 2. There is a log4j2 to SLF4J bridge we'll try to use.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          @rmetzger The comments are all addressed now, including the last two problems.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 @rmetzger The comments are all addressed now, including the last two problems.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fpompermaier commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r96265039

          — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java —
          @@ -0,0 +1,234 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.streaming.connectors.elasticsearch;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.InstantiationUtil;
          +import org.elasticsearch.action.ActionRequest;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +import static org.apache.flink.util.Preconditions.checkNotNull;
          +
          +/**
          + * Base class for all Flink Elasticsearch Sinks.
          + *
          + * <p>
          + * This class implements the common behaviour across Elasticsearch versions, such as
          + * the use of an internal

          {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before
          + * sending the requests to the cluster, as well as passing input records to the user provided
          + * {@link ElasticsearchSinkFunction} for processing.
          + *
          + * <p>
          + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster
          + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the
          + * constructor of this class.
          + *
          + * @param <T> Type of the elements handled by this sink
          + */
          +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> {
          +
          + private static final long serialVersionUID = -1007596293618451942L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class);
          +
          + // ------------------------------------------------------------------------
          + // Internal bulk processor configuration
          + // ------------------------------------------------------------------------
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private final Integer bulkProcessorFlushMaxActions;
          + private final Integer bulkProcessorFlushMaxSizeMb;
          + private final Integer bulkProcessorFlushIntervalMillis;
          +
          + // ------------------------------------------------------------------------
          + // User-facing API and configuration
          + // ------------------------------------------------------------------------
          +
          + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */
          + private final Map<String, String> userConfig;
          +
          + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */
          + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction;
          +
          + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */
          + private transient BulkProcessorIndexer requestIndexer;
          +
          + // ------------------------------------------------------------------------
          + // Internals for the Flink Elasticsearch Sink
          + // ------------------------------------------------------------------------
          +
          + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */
          + private final ElasticsearchClientFactory clientFactory;
          +
          + /** Elasticsearch client created using the client factory. */
          + private transient Client client;
          +
          + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */
          + private transient BulkProcessor bulkProcessor;
          +
          + /** Set from inside the {@link BulkProcessor}

          listener if there where failures during processing. */
          + private final AtomicBoolean hasFailure = new AtomicBoolean(false);
          +
          + /** Set from inside the @link BulkProcessor} listener if a

          {@link Throwable}

          was thrown during processing. */
          + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>();
          +
          + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory,
          + Map<String, String> userConfig,
          + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          + this.clientFactory = checkNotNull(clientFactory);
          + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction);
          +
          + // we eagerly check if the user-provided sink function is serializable;
          + // otherwise, if it isn't serializable, users will merely get a non-informative error message
          + // "ElasticsearchSinkBase is not serializable"
          + try

          { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + }

          catch (Exception e)

          { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + }

          +
          + checkNotNull(userConfig);
          +
          + // extract and remove bulk processor related configuration from the user-provided config,
          + // so that the resulting user config only contains configuration related to the Elasticsearch client.
          + ParameterTool params = ParameterTool.fromMap(userConfig);
          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS))

          { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + }

          else

          { + bulkProcessorFlushMaxActions = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB))

          { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + }

          else

          { + bulkProcessorFlushMaxSizeMb = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS))

          { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + }

          else

          { + bulkProcessorFlushIntervalMillis = null; + }

          +
          + this.userConfig = userConfig;
          + }
          +
          + @Override
          + public void open(Configuration parameters) throws Exception {
          + client = clientFactory.create(userConfig);
          +
          + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder(
          + client,
          + new BulkProcessor.Listener() {
          + @Override
          + public void beforeBulk(long executionId, BulkRequest request) { }
          +
          + @Override
          + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
          + if (response.hasFailures()) {
          + for (BulkItemResponse itemResp : response.getItems()) {
          + if (itemResp.isFailed())

          { + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage()); + failureThrowable.compareAndSet(null, new RuntimeException(itemResp.getFailureMessage())); + }

          + }
          + hasFailure.set(true);
          + }
          + }
          +
          + @Override
          + public void afterBulk(long executionId, BulkRequest request, Throwable failure)

          { + LOG.error(failure.getMessage()); + failureThrowable.compareAndSet(null, failure); + hasFailure.set(true); + }

          + }
          + );
          +
          + // This makes flush() blocking
          + bulkProcessorBuilder.setConcurrentRequests(0);
          +
          + if (bulkProcessorFlushMaxActions != null)

          { + bulkProcessorBuilder.setBulkActions(bulkProcessorFlushMaxActions); + }

          +
          + if (bulkProcessorFlushMaxSizeMb != null)

          { + bulkProcessorBuilder.setBulkSize(new ByteSizeValue(bulkProcessorFlushMaxSizeMb, ByteSizeUnit.MB)); + }

          +
          + if (bulkProcessorFlushIntervalMillis != null)

          { + bulkProcessorBuilder.setFlushInterval(TimeValue.timeValueMillis(bulkProcessorFlushIntervalMillis)); + }

          +
          + bulkProcessor = bulkProcessorBuilder.build();
          + requestIndexer = new BulkProcessorIndexer(bulkProcessor);
          + }
          +
          + @Override
          + public void invoke(T value) throws Exception

          { + elasticsearchSinkFunction.process(value, getRuntimeContext(), requestIndexer); + }

          +
          + @Override
          + public void close() throws Exception {
          + if (bulkProcessor != null)

          { + bulkProcessor.close(); + bulkProcessor = null; + }

          +
          + if (client != null)

          { + client.close(); + client = null; + }

          +
          + clientFactory.cleanup();
          +
          + if (hasFailure.get()) {
          — End diff –

          Do you think it could be possible to address also FLINK-5353 in this PR...?

          Show
          githubbot ASF GitHub Bot added a comment - Github user fpompermaier commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r96265039 — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java — @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.elasticsearch; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.InstantiationUtil; +import org.elasticsearch.action.ActionRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * Base class for all Flink Elasticsearch Sinks. + * + * <p> + * This class implements the common behaviour across Elasticsearch versions, such as + * the use of an internal {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before + * sending the requests to the cluster, as well as passing input records to the user provided + * {@link ElasticsearchSinkFunction} for processing. + * + * <p> + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the + * constructor of this class. + * + * @param <T> Type of the elements handled by this sink + */ +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> { + + private static final long serialVersionUID = -1007596293618451942L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class); + + // ------------------------------------------------------------------------ + // Internal bulk processor configuration + // ------------------------------------------------------------------------ + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private final Integer bulkProcessorFlushMaxActions; + private final Integer bulkProcessorFlushMaxSizeMb; + private final Integer bulkProcessorFlushIntervalMillis; + + // ------------------------------------------------------------------------ + // User-facing API and configuration + // ------------------------------------------------------------------------ + + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */ + private final Map<String, String> userConfig; + + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */ + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction; + + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */ + private transient BulkProcessorIndexer requestIndexer; + + // ------------------------------------------------------------------------ + // Internals for the Flink Elasticsearch Sink + // ------------------------------------------------------------------------ + + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */ + private final ElasticsearchClientFactory clientFactory; + + /** Elasticsearch client created using the client factory. */ + private transient Client client; + + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */ + private transient BulkProcessor bulkProcessor; + + /** Set from inside the {@link BulkProcessor} listener if there where failures during processing. */ + private final AtomicBoolean hasFailure = new AtomicBoolean(false); + + /** Set from inside the @link BulkProcessor} listener if a {@link Throwable} was thrown during processing. */ + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>(); + + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory, + Map<String, String> userConfig, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { + this.clientFactory = checkNotNull(clientFactory); + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction); + + // we eagerly check if the user-provided sink function is serializable; + // otherwise, if it isn't serializable, users will merely get a non-informative error message + // "ElasticsearchSinkBase is not serializable" + try { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + } catch (Exception e) { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + } + + checkNotNull(userConfig); + + // extract and remove bulk processor related configuration from the user-provided config, + // so that the resulting user config only contains configuration related to the Elasticsearch client. + ParameterTool params = ParameterTool.fromMap(userConfig); + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS)) { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + } else { + bulkProcessorFlushMaxActions = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB)) { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + } else { + bulkProcessorFlushMaxSizeMb = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS)) { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + } else { + bulkProcessorFlushIntervalMillis = null; + } + + this.userConfig = userConfig; + } + + @Override + public void open(Configuration parameters) throws Exception { + client = clientFactory.create(userConfig); + + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder( + client, + new BulkProcessor.Listener() { + @Override + public void beforeBulk(long executionId, BulkRequest request) { } + + @Override + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) { + if (response.hasFailures()) { + for (BulkItemResponse itemResp : response.getItems()) { + if (itemResp.isFailed()) { + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage()); + failureThrowable.compareAndSet(null, new RuntimeException(itemResp.getFailureMessage())); + } + } + hasFailure.set(true); + } + } + + @Override + public void afterBulk(long executionId, BulkRequest request, Throwable failure) { + LOG.error(failure.getMessage()); + failureThrowable.compareAndSet(null, failure); + hasFailure.set(true); + } + } + ); + + // This makes flush() blocking + bulkProcessorBuilder.setConcurrentRequests(0); + + if (bulkProcessorFlushMaxActions != null) { + bulkProcessorBuilder.setBulkActions(bulkProcessorFlushMaxActions); + } + + if (bulkProcessorFlushMaxSizeMb != null) { + bulkProcessorBuilder.setBulkSize(new ByteSizeValue(bulkProcessorFlushMaxSizeMb, ByteSizeUnit.MB)); + } + + if (bulkProcessorFlushIntervalMillis != null) { + bulkProcessorBuilder.setFlushInterval(TimeValue.timeValueMillis(bulkProcessorFlushIntervalMillis)); + } + + bulkProcessor = bulkProcessorBuilder.build(); + requestIndexer = new BulkProcessorIndexer(bulkProcessor); + } + + @Override + public void invoke(T value) throws Exception { + elasticsearchSinkFunction.process(value, getRuntimeContext(), requestIndexer); + } + + @Override + public void close() throws Exception { + if (bulkProcessor != null) { + bulkProcessor.close(); + bulkProcessor = null; + } + + if (client != null) { + client.close(); + client = null; + } + + clientFactory.cleanup(); + + if (hasFailure.get()) { — End diff – Do you think it could be possible to address also FLINK-5353 in this PR...?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r96274454

          — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java —
          @@ -0,0 +1,234 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.streaming.connectors.elasticsearch;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.InstantiationUtil;
          +import org.elasticsearch.action.ActionRequest;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +import static org.apache.flink.util.Preconditions.checkNotNull;
          +
          +/**
          + * Base class for all Flink Elasticsearch Sinks.
          + *
          + * <p>
          + * This class implements the common behaviour across Elasticsearch versions, such as
          + * the use of an internal

          {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before
          + * sending the requests to the cluster, as well as passing input records to the user provided
          + * {@link ElasticsearchSinkFunction} for processing.
          + *
          + * <p>
          + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster
          + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the
          + * constructor of this class.
          + *
          + * @param <T> Type of the elements handled by this sink
          + */
          +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> {
          +
          + private static final long serialVersionUID = -1007596293618451942L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class);
          +
          + // ------------------------------------------------------------------------
          + // Internal bulk processor configuration
          + // ------------------------------------------------------------------------
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private final Integer bulkProcessorFlushMaxActions;
          + private final Integer bulkProcessorFlushMaxSizeMb;
          + private final Integer bulkProcessorFlushIntervalMillis;
          +
          + // ------------------------------------------------------------------------
          + // User-facing API and configuration
          + // ------------------------------------------------------------------------
          +
          + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */
          + private final Map<String, String> userConfig;
          +
          + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */
          + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction;
          +
          + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */
          + private transient BulkProcessorIndexer requestIndexer;
          +
          + // ------------------------------------------------------------------------
          + // Internals for the Flink Elasticsearch Sink
          + // ------------------------------------------------------------------------
          +
          + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */
          + private final ElasticsearchClientFactory clientFactory;
          +
          + /** Elasticsearch client created using the client factory. */
          + private transient Client client;
          +
          + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */
          + private transient BulkProcessor bulkProcessor;
          +
          + /** Set from inside the {@link BulkProcessor}

          listener if there where failures during processing. */
          + private final AtomicBoolean hasFailure = new AtomicBoolean(false);
          +
          + /** Set from inside the @link BulkProcessor} listener if a

          {@link Throwable}

          was thrown during processing. */
          + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>();
          +
          + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory,
          + Map<String, String> userConfig,
          + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          + this.clientFactory = checkNotNull(clientFactory);
          + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction);
          +
          + // we eagerly check if the user-provided sink function is serializable;
          + // otherwise, if it isn't serializable, users will merely get a non-informative error message
          + // "ElasticsearchSinkBase is not serializable"
          + try

          { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + }

          catch (Exception e)

          { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + }

          +
          + checkNotNull(userConfig);
          +
          + // extract and remove bulk processor related configuration from the user-provided config,
          + // so that the resulting user config only contains configuration related to the Elasticsearch client.
          + ParameterTool params = ParameterTool.fromMap(userConfig);
          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS))

          { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + }

          else

          { + bulkProcessorFlushMaxActions = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB))

          { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + }

          else

          { + bulkProcessorFlushMaxSizeMb = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS))

          { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + }

          else

          { + bulkProcessorFlushIntervalMillis = null; + }

          +
          + this.userConfig = userConfig;
          + }
          +
          + @Override
          + public void open(Configuration parameters) throws Exception {
          + client = clientFactory.create(userConfig);
          +
          + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder(
          + client,
          + new BulkProcessor.Listener() {
          + @Override
          + public void beforeBulk(long executionId, BulkRequest request) { }
          +
          + @Override
          + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
          + if (response.hasFailures()) {
          + for (BulkItemResponse itemResp : response.getItems()) {
          + if (itemResp.isFailed())

          { + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage()); + failureThrowable.compareAndSet(null, new RuntimeException(itemResp.getFailureMessage())); + }

          + }
          + hasFailure.set(true);
          + }
          + }
          +
          + @Override
          + public void afterBulk(long executionId, BulkRequest request, Throwable failure)

          { + LOG.error(failure.getMessage()); + failureThrowable.compareAndSet(null, failure); + hasFailure.set(true); + }

          + }
          + );
          +
          + // This makes flush() blocking
          + bulkProcessorBuilder.setConcurrentRequests(0);
          +
          + if (bulkProcessorFlushMaxActions != null)

          { + bulkProcessorBuilder.setBulkActions(bulkProcessorFlushMaxActions); + }

          +
          + if (bulkProcessorFlushMaxSizeMb != null)

          { + bulkProcessorBuilder.setBulkSize(new ByteSizeValue(bulkProcessorFlushMaxSizeMb, ByteSizeUnit.MB)); + }

          +
          + if (bulkProcessorFlushIntervalMillis != null)

          { + bulkProcessorBuilder.setFlushInterval(TimeValue.timeValueMillis(bulkProcessorFlushIntervalMillis)); + }

          +
          + bulkProcessor = bulkProcessorBuilder.build();
          + requestIndexer = new BulkProcessorIndexer(bulkProcessor);
          + }
          +
          + @Override
          + public void invoke(T value) throws Exception

          { + elasticsearchSinkFunction.process(value, getRuntimeContext(), requestIndexer); + }

          +
          + @Override
          + public void close() throws Exception {
          + if (bulkProcessor != null)

          { + bulkProcessor.close(); + bulkProcessor = null; + }

          +
          + if (client != null)

          { + client.close(); + client = null; + }

          +
          + clientFactory.cleanup();
          +
          + if (hasFailure.get()) {
          — End diff –

          I would suggest to first merge this PR and then work on FLINK-5353 to not mix up too many things at the same time.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r96274454 — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java — @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.elasticsearch; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.InstantiationUtil; +import org.elasticsearch.action.ActionRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * Base class for all Flink Elasticsearch Sinks. + * + * <p> + * This class implements the common behaviour across Elasticsearch versions, such as + * the use of an internal {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before + * sending the requests to the cluster, as well as passing input records to the user provided + * {@link ElasticsearchSinkFunction} for processing. + * + * <p> + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the + * constructor of this class. + * + * @param <T> Type of the elements handled by this sink + */ +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> { + + private static final long serialVersionUID = -1007596293618451942L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class); + + // ------------------------------------------------------------------------ + // Internal bulk processor configuration + // ------------------------------------------------------------------------ + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private final Integer bulkProcessorFlushMaxActions; + private final Integer bulkProcessorFlushMaxSizeMb; + private final Integer bulkProcessorFlushIntervalMillis; + + // ------------------------------------------------------------------------ + // User-facing API and configuration + // ------------------------------------------------------------------------ + + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */ + private final Map<String, String> userConfig; + + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */ + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction; + + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */ + private transient BulkProcessorIndexer requestIndexer; + + // ------------------------------------------------------------------------ + // Internals for the Flink Elasticsearch Sink + // ------------------------------------------------------------------------ + + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */ + private final ElasticsearchClientFactory clientFactory; + + /** Elasticsearch client created using the client factory. */ + private transient Client client; + + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */ + private transient BulkProcessor bulkProcessor; + + /** Set from inside the {@link BulkProcessor} listener if there where failures during processing. */ + private final AtomicBoolean hasFailure = new AtomicBoolean(false); + + /** Set from inside the @link BulkProcessor} listener if a {@link Throwable} was thrown during processing. */ + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>(); + + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory, + Map<String, String> userConfig, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { + this.clientFactory = checkNotNull(clientFactory); + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction); + + // we eagerly check if the user-provided sink function is serializable; + // otherwise, if it isn't serializable, users will merely get a non-informative error message + // "ElasticsearchSinkBase is not serializable" + try { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + } catch (Exception e) { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + } + + checkNotNull(userConfig); + + // extract and remove bulk processor related configuration from the user-provided config, + // so that the resulting user config only contains configuration related to the Elasticsearch client. + ParameterTool params = ParameterTool.fromMap(userConfig); + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS)) { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + } else { + bulkProcessorFlushMaxActions = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB)) { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + } else { + bulkProcessorFlushMaxSizeMb = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS)) { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + } else { + bulkProcessorFlushIntervalMillis = null; + } + + this.userConfig = userConfig; + } + + @Override + public void open(Configuration parameters) throws Exception { + client = clientFactory.create(userConfig); + + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder( + client, + new BulkProcessor.Listener() { + @Override + public void beforeBulk(long executionId, BulkRequest request) { } + + @Override + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) { + if (response.hasFailures()) { + for (BulkItemResponse itemResp : response.getItems()) { + if (itemResp.isFailed()) { + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage()); + failureThrowable.compareAndSet(null, new RuntimeException(itemResp.getFailureMessage())); + } + } + hasFailure.set(true); + } + } + + @Override + public void afterBulk(long executionId, BulkRequest request, Throwable failure) { + LOG.error(failure.getMessage()); + failureThrowable.compareAndSet(null, failure); + hasFailure.set(true); + } + } + ); + + // This makes flush() blocking + bulkProcessorBuilder.setConcurrentRequests(0); + + if (bulkProcessorFlushMaxActions != null) { + bulkProcessorBuilder.setBulkActions(bulkProcessorFlushMaxActions); + } + + if (bulkProcessorFlushMaxSizeMb != null) { + bulkProcessorBuilder.setBulkSize(new ByteSizeValue(bulkProcessorFlushMaxSizeMb, ByteSizeUnit.MB)); + } + + if (bulkProcessorFlushIntervalMillis != null) { + bulkProcessorBuilder.setFlushInterval(TimeValue.timeValueMillis(bulkProcessorFlushIntervalMillis)); + } + + bulkProcessor = bulkProcessorBuilder.build(); + requestIndexer = new BulkProcessorIndexer(bulkProcessor); + } + + @Override + public void invoke(T value) throws Exception { + elasticsearchSinkFunction.process(value, getRuntimeContext(), requestIndexer); + } + + @Override + public void close() throws Exception { + if (bulkProcessor != null) { + bulkProcessor.close(); + bulkProcessor = null; + } + + if (client != null) { + client.close(); + client = null; + } + + clientFactory.cleanup(); + + if (hasFailure.get()) { — End diff – I would suggest to first merge this PR and then work on FLINK-5353 to not mix up too many things at the same time.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          Rebased to resolve conflicts in documentation changes.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 Rebased to resolve conflicts in documentation changes.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikedias commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r98340549

          — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java —
          @@ -0,0 +1,234 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.streaming.connectors.elasticsearch;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.InstantiationUtil;
          +import org.elasticsearch.action.ActionRequest;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +import static org.apache.flink.util.Preconditions.checkNotNull;
          +
          +/**
          + * Base class for all Flink Elasticsearch Sinks.
          + *
          + * <p>
          + * This class implements the common behaviour across Elasticsearch versions, such as
          + * the use of an internal

          {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before
          + * sending the requests to the cluster, as well as passing input records to the user provided
          + * {@link ElasticsearchSinkFunction} for processing.
          + *
          + * <p>
          + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster
          + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the
          + * constructor of this class.
          + *
          + * @param <T> Type of the elements handled by this sink
          + */
          +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> {
          +
          + private static final long serialVersionUID = -1007596293618451942L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class);
          +
          + // ------------------------------------------------------------------------
          + // Internal bulk processor configuration
          + // ------------------------------------------------------------------------
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private final Integer bulkProcessorFlushMaxActions;
          + private final Integer bulkProcessorFlushMaxSizeMb;
          + private final Integer bulkProcessorFlushIntervalMillis;
          +
          + // ------------------------------------------------------------------------
          + // User-facing API and configuration
          + // ------------------------------------------------------------------------
          +
          + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */
          + private final Map<String, String> userConfig;
          +
          + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */
          + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction;
          +
          + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */
          + private transient BulkProcessorIndexer requestIndexer;
          +
          + // ------------------------------------------------------------------------
          + // Internals for the Flink Elasticsearch Sink
          + // ------------------------------------------------------------------------
          +
          + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */
          + private final ElasticsearchClientFactory clientFactory;
          +
          + /** Elasticsearch client created using the client factory. */
          + private transient Client client;
          +
          + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */
          + private transient BulkProcessor bulkProcessor;
          +
          + /** Set from inside the {@link BulkProcessor}

          listener if there where failures during processing. */
          + private final AtomicBoolean hasFailure = new AtomicBoolean(false);
          +
          + /** Set from inside the @link BulkProcessor} listener if a

          {@link Throwable}

          was thrown during processing. */
          + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>();
          +
          + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory,
          + Map<String, String> userConfig,
          + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          + this.clientFactory = checkNotNull(clientFactory);
          + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction);
          +
          + // we eagerly check if the user-provided sink function is serializable;
          + // otherwise, if it isn't serializable, users will merely get a non-informative error message
          + // "ElasticsearchSinkBase is not serializable"
          + try

          { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + }

          catch (Exception e)

          { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + }

          +
          + checkNotNull(userConfig);
          +
          + // extract and remove bulk processor related configuration from the user-provided config,
          + // so that the resulting user config only contains configuration related to the Elasticsearch client.
          + ParameterTool params = ParameterTool.fromMap(userConfig);
          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS))

          { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + }

          else

          { + bulkProcessorFlushMaxActions = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB))

          { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + }

          else

          { + bulkProcessorFlushMaxSizeMb = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS))

          { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + }

          else

          { + bulkProcessorFlushIntervalMillis = null; + }

          +
          + this.userConfig = userConfig;
          + }
          +
          + @Override
          + public void open(Configuration parameters) throws Exception {
          + client = clientFactory.create(userConfig);
          +
          + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder(
          + client,
          + new BulkProcessor.Listener() {
          + @Override
          + public void beforeBulk(long executionId, BulkRequest request) { }
          +
          + @Override
          + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
          + if (response.hasFailures()) {
          + for (BulkItemResponse itemResp : response.getItems()) {
          + if (itemResp.isFailed())

          { + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage()); + failureThrowable.compareAndSet(null, new RuntimeException(itemResp.getFailureMessage())); + }

          + }
          + hasFailure.set(true);
          + }
          + }
          +
          + @Override
          + public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
          + LOG.error(failure.getMessage());
          — End diff –

          It's important to log the stacktrace as well: `LOG.error(failure.getMessage(), failure);`

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikedias commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r98340549 — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java — @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.elasticsearch; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.InstantiationUtil; +import org.elasticsearch.action.ActionRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * Base class for all Flink Elasticsearch Sinks. + * + * <p> + * This class implements the common behaviour across Elasticsearch versions, such as + * the use of an internal {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before + * sending the requests to the cluster, as well as passing input records to the user provided + * {@link ElasticsearchSinkFunction} for processing. + * + * <p> + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the + * constructor of this class. + * + * @param <T> Type of the elements handled by this sink + */ +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> { + + private static final long serialVersionUID = -1007596293618451942L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class); + + // ------------------------------------------------------------------------ + // Internal bulk processor configuration + // ------------------------------------------------------------------------ + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private final Integer bulkProcessorFlushMaxActions; + private final Integer bulkProcessorFlushMaxSizeMb; + private final Integer bulkProcessorFlushIntervalMillis; + + // ------------------------------------------------------------------------ + // User-facing API and configuration + // ------------------------------------------------------------------------ + + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */ + private final Map<String, String> userConfig; + + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */ + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction; + + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */ + private transient BulkProcessorIndexer requestIndexer; + + // ------------------------------------------------------------------------ + // Internals for the Flink Elasticsearch Sink + // ------------------------------------------------------------------------ + + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */ + private final ElasticsearchClientFactory clientFactory; + + /** Elasticsearch client created using the client factory. */ + private transient Client client; + + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */ + private transient BulkProcessor bulkProcessor; + + /** Set from inside the {@link BulkProcessor} listener if there where failures during processing. */ + private final AtomicBoolean hasFailure = new AtomicBoolean(false); + + /** Set from inside the @link BulkProcessor} listener if a {@link Throwable} was thrown during processing. */ + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>(); + + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory, + Map<String, String> userConfig, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { + this.clientFactory = checkNotNull(clientFactory); + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction); + + // we eagerly check if the user-provided sink function is serializable; + // otherwise, if it isn't serializable, users will merely get a non-informative error message + // "ElasticsearchSinkBase is not serializable" + try { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + } catch (Exception e) { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + } + + checkNotNull(userConfig); + + // extract and remove bulk processor related configuration from the user-provided config, + // so that the resulting user config only contains configuration related to the Elasticsearch client. + ParameterTool params = ParameterTool.fromMap(userConfig); + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS)) { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + } else { + bulkProcessorFlushMaxActions = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB)) { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + } else { + bulkProcessorFlushMaxSizeMb = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS)) { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + } else { + bulkProcessorFlushIntervalMillis = null; + } + + this.userConfig = userConfig; + } + + @Override + public void open(Configuration parameters) throws Exception { + client = clientFactory.create(userConfig); + + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder( + client, + new BulkProcessor.Listener() { + @Override + public void beforeBulk(long executionId, BulkRequest request) { } + + @Override + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) { + if (response.hasFailures()) { + for (BulkItemResponse itemResp : response.getItems()) { + if (itemResp.isFailed()) { + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage()); + failureThrowable.compareAndSet(null, new RuntimeException(itemResp.getFailureMessage())); + } + } + hasFailure.set(true); + } + } + + @Override + public void afterBulk(long executionId, BulkRequest request, Throwable failure) { + LOG.error(failure.getMessage()); — End diff – It's important to log the stacktrace as well: `LOG.error(failure.getMessage(), failure);`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikedias commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r98340529

          — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java —
          @@ -0,0 +1,234 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.streaming.connectors.elasticsearch;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.InstantiationUtil;
          +import org.elasticsearch.action.ActionRequest;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +import static org.apache.flink.util.Preconditions.checkNotNull;
          +
          +/**
          + * Base class for all Flink Elasticsearch Sinks.
          + *
          + * <p>
          + * This class implements the common behaviour across Elasticsearch versions, such as
          + * the use of an internal

          {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before
          + * sending the requests to the cluster, as well as passing input records to the user provided
          + * {@link ElasticsearchSinkFunction} for processing.
          + *
          + * <p>
          + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster
          + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the
          + * constructor of this class.
          + *
          + * @param <T> Type of the elements handled by this sink
          + */
          +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> {
          +
          + private static final long serialVersionUID = -1007596293618451942L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class);
          +
          + // ------------------------------------------------------------------------
          + // Internal bulk processor configuration
          + // ------------------------------------------------------------------------
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private final Integer bulkProcessorFlushMaxActions;
          + private final Integer bulkProcessorFlushMaxSizeMb;
          + private final Integer bulkProcessorFlushIntervalMillis;
          +
          + // ------------------------------------------------------------------------
          + // User-facing API and configuration
          + // ------------------------------------------------------------------------
          +
          + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */
          + private final Map<String, String> userConfig;
          +
          + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */
          + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction;
          +
          + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */
          + private transient BulkProcessorIndexer requestIndexer;
          +
          + // ------------------------------------------------------------------------
          + // Internals for the Flink Elasticsearch Sink
          + // ------------------------------------------------------------------------
          +
          + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */
          + private final ElasticsearchClientFactory clientFactory;
          +
          + /** Elasticsearch client created using the client factory. */
          + private transient Client client;
          +
          + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */
          + private transient BulkProcessor bulkProcessor;
          +
          + /** Set from inside the {@link BulkProcessor}

          listener if there where failures during processing. */
          + private final AtomicBoolean hasFailure = new AtomicBoolean(false);
          +
          + /** Set from inside the @link BulkProcessor} listener if a

          {@link Throwable}

          was thrown during processing. */
          + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>();
          +
          + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory,
          + Map<String, String> userConfig,
          + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          + this.clientFactory = checkNotNull(clientFactory);
          + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction);
          +
          + // we eagerly check if the user-provided sink function is serializable;
          + // otherwise, if it isn't serializable, users will merely get a non-informative error message
          + // "ElasticsearchSinkBase is not serializable"
          + try

          { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + }

          catch (Exception e)

          { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + }

          +
          + checkNotNull(userConfig);
          +
          + // extract and remove bulk processor related configuration from the user-provided config,
          + // so that the resulting user config only contains configuration related to the Elasticsearch client.
          + ParameterTool params = ParameterTool.fromMap(userConfig);
          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS))

          { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + }

          else

          { + bulkProcessorFlushMaxActions = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB))

          { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + }

          else

          { + bulkProcessorFlushMaxSizeMb = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS))

          { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + }

          else

          { + bulkProcessorFlushIntervalMillis = null; + }

          +
          + this.userConfig = userConfig;
          + }
          +
          + @Override
          + public void open(Configuration parameters) throws Exception {
          + client = clientFactory.create(userConfig);
          +
          + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder(
          + client,
          + new BulkProcessor.Listener() {
          + @Override
          + public void beforeBulk(long executionId, BulkRequest request) { }
          +
          + @Override
          + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
          + if (response.hasFailures()) {
          + for (BulkItemResponse itemResp : response.getItems()) {
          + if (itemResp.isFailed()) {
          + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage());
          + failureThrowable.compareAndSet(null, new RuntimeException(itemResp.getFailureMessage()));
          — End diff –

          Could be replaced by ` failureThrowable.compareAndSet(null, itemResp.getFailure().getCause());`

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikedias commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r98340529 — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java — @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.elasticsearch; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.InstantiationUtil; +import org.elasticsearch.action.ActionRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * Base class for all Flink Elasticsearch Sinks. + * + * <p> + * This class implements the common behaviour across Elasticsearch versions, such as + * the use of an internal {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before + * sending the requests to the cluster, as well as passing input records to the user provided + * {@link ElasticsearchSinkFunction} for processing. + * + * <p> + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the + * constructor of this class. + * + * @param <T> Type of the elements handled by this sink + */ +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> { + + private static final long serialVersionUID = -1007596293618451942L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class); + + // ------------------------------------------------------------------------ + // Internal bulk processor configuration + // ------------------------------------------------------------------------ + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private final Integer bulkProcessorFlushMaxActions; + private final Integer bulkProcessorFlushMaxSizeMb; + private final Integer bulkProcessorFlushIntervalMillis; + + // ------------------------------------------------------------------------ + // User-facing API and configuration + // ------------------------------------------------------------------------ + + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */ + private final Map<String, String> userConfig; + + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */ + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction; + + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */ + private transient BulkProcessorIndexer requestIndexer; + + // ------------------------------------------------------------------------ + // Internals for the Flink Elasticsearch Sink + // ------------------------------------------------------------------------ + + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */ + private final ElasticsearchClientFactory clientFactory; + + /** Elasticsearch client created using the client factory. */ + private transient Client client; + + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */ + private transient BulkProcessor bulkProcessor; + + /** Set from inside the {@link BulkProcessor} listener if there where failures during processing. */ + private final AtomicBoolean hasFailure = new AtomicBoolean(false); + + /** Set from inside the @link BulkProcessor} listener if a {@link Throwable} was thrown during processing. */ + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>(); + + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory, + Map<String, String> userConfig, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { + this.clientFactory = checkNotNull(clientFactory); + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction); + + // we eagerly check if the user-provided sink function is serializable; + // otherwise, if it isn't serializable, users will merely get a non-informative error message + // "ElasticsearchSinkBase is not serializable" + try { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + } catch (Exception e) { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + } + + checkNotNull(userConfig); + + // extract and remove bulk processor related configuration from the user-provided config, + // so that the resulting user config only contains configuration related to the Elasticsearch client. + ParameterTool params = ParameterTool.fromMap(userConfig); + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS)) { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + } else { + bulkProcessorFlushMaxActions = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB)) { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + } else { + bulkProcessorFlushMaxSizeMb = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS)) { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + } else { + bulkProcessorFlushIntervalMillis = null; + } + + this.userConfig = userConfig; + } + + @Override + public void open(Configuration parameters) throws Exception { + client = clientFactory.create(userConfig); + + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder( + client, + new BulkProcessor.Listener() { + @Override + public void beforeBulk(long executionId, BulkRequest request) { } + + @Override + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) { + if (response.hasFailures()) { + for (BulkItemResponse itemResp : response.getItems()) { + if (itemResp.isFailed()) { + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage()); + failureThrowable.compareAndSet(null, new RuntimeException(itemResp.getFailureMessage())); — End diff – Could be replaced by ` failureThrowable.compareAndSet(null, itemResp.getFailure().getCause());`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user mikedias commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r98340492

          — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java —
          @@ -0,0 +1,234 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.streaming.connectors.elasticsearch;
          +
          +import org.apache.flink.api.java.utils.ParameterTool;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
          +import org.apache.flink.util.InstantiationUtil;
          +import org.elasticsearch.action.ActionRequest;
          +import org.elasticsearch.action.bulk.BulkItemResponse;
          +import org.elasticsearch.action.bulk.BulkProcessor;
          +import org.elasticsearch.action.bulk.BulkRequest;
          +import org.elasticsearch.action.bulk.BulkResponse;
          +import org.elasticsearch.client.Client;
          +import org.elasticsearch.common.unit.ByteSizeUnit;
          +import org.elasticsearch.common.unit.ByteSizeValue;
          +import org.elasticsearch.common.unit.TimeValue;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.util.Map;
          +import java.util.concurrent.atomic.AtomicBoolean;
          +import java.util.concurrent.atomic.AtomicReference;
          +
          +import static org.apache.flink.util.Preconditions.checkNotNull;
          +
          +/**
          + * Base class for all Flink Elasticsearch Sinks.
          + *
          + * <p>
          + * This class implements the common behaviour across Elasticsearch versions, such as
          + * the use of an internal

          {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before
          + * sending the requests to the cluster, as well as passing input records to the user provided
          + * {@link ElasticsearchSinkFunction} for processing.
          + *
          + * <p>
          + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster
          + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the
          + * constructor of this class.
          + *
          + * @param <T> Type of the elements handled by this sink
          + */
          +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> {
          +
          + private static final long serialVersionUID = -1007596293618451942L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class);
          +
          + // ------------------------------------------------------------------------
          + // Internal bulk processor configuration
          + // ------------------------------------------------------------------------
          +
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions";
          + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb";
          + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms";
          +
          + private final Integer bulkProcessorFlushMaxActions;
          + private final Integer bulkProcessorFlushMaxSizeMb;
          + private final Integer bulkProcessorFlushIntervalMillis;
          +
          + // ------------------------------------------------------------------------
          + // User-facing API and configuration
          + // ------------------------------------------------------------------------
          +
          + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */
          + private final Map<String, String> userConfig;
          +
          + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */
          + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction;
          +
          + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */
          + private transient BulkProcessorIndexer requestIndexer;
          +
          + // ------------------------------------------------------------------------
          + // Internals for the Flink Elasticsearch Sink
          + // ------------------------------------------------------------------------
          +
          + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */
          + private final ElasticsearchClientFactory clientFactory;
          +
          + /** Elasticsearch client created using the client factory. */
          + private transient Client client;
          +
          + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */
          + private transient BulkProcessor bulkProcessor;
          +
          + /** Set from inside the {@link BulkProcessor}

          listener if there where failures during processing. */
          + private final AtomicBoolean hasFailure = new AtomicBoolean(false);
          +
          + /** Set from inside the @link BulkProcessor} listener if a

          {@link Throwable}

          was thrown during processing. */
          + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>();
          +
          + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory,
          + Map<String, String> userConfig,
          + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) {
          + this.clientFactory = checkNotNull(clientFactory);
          + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction);
          +
          + // we eagerly check if the user-provided sink function is serializable;
          + // otherwise, if it isn't serializable, users will merely get a non-informative error message
          + // "ElasticsearchSinkBase is not serializable"
          + try

          { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + }

          catch (Exception e)

          { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + }

          +
          + checkNotNull(userConfig);
          +
          + // extract and remove bulk processor related configuration from the user-provided config,
          + // so that the resulting user config only contains configuration related to the Elasticsearch client.
          + ParameterTool params = ParameterTool.fromMap(userConfig);
          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS))

          { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + }

          else

          { + bulkProcessorFlushMaxActions = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB))

          { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + }

          else

          { + bulkProcessorFlushMaxSizeMb = null; + }

          +
          + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS))

          { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + }

          else

          { + bulkProcessorFlushIntervalMillis = null; + }

          +
          + this.userConfig = userConfig;
          + }
          +
          + @Override
          + public void open(Configuration parameters) throws Exception {
          + client = clientFactory.create(userConfig);
          +
          + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder(
          + client,
          + new BulkProcessor.Listener() {
          + @Override
          + public void beforeBulk(long executionId, BulkRequest request) { }
          +
          + @Override
          + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
          + if (response.hasFailures()) {
          + for (BulkItemResponse itemResp : response.getItems()) {
          + if (itemResp.isFailed()) {
          + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage());
          — End diff –

          It's important to log the stacktrace as well: `LOG.error("message", itemResp.getFailure().getCause())`

          Show
          githubbot ASF GitHub Bot added a comment - Github user mikedias commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r98340492 — Diff: flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java — @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.elasticsearch; + +import org.apache.flink.api.java.utils.ParameterTool; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.flink.util.InstantiationUtil; +import org.elasticsearch.action.ActionRequest; +import org.elasticsearch.action.bulk.BulkItemResponse; +import org.elasticsearch.action.bulk.BulkProcessor; +import org.elasticsearch.action.bulk.BulkRequest; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.unit.ByteSizeUnit; +import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.unit.TimeValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * Base class for all Flink Elasticsearch Sinks. + * + * <p> + * This class implements the common behaviour across Elasticsearch versions, such as + * the use of an internal {@link BulkProcessor} to buffer multiple {@link ActionRequest}s before + * sending the requests to the cluster, as well as passing input records to the user provided + * {@link ElasticsearchSinkFunction} for processing. + * + * <p> + * The version specific behaviours for creating a {@link Client} to connect to a Elasticsearch cluster + * should be defined by concrete implementations of a {@link ElasticsearchClientFactory}, which is to be provided to the + * constructor of this class. + * + * @param <T> Type of the elements handled by this sink + */ +public abstract class ElasticsearchSinkBase<T> extends RichSinkFunction<T> { + + private static final long serialVersionUID = -1007596293618451942L; + + private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchSinkBase.class); + + // ------------------------------------------------------------------------ + // Internal bulk processor configuration + // ------------------------------------------------------------------------ + + public static final String CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS = "bulk.flush.max.actions"; + public static final String CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB = "bulk.flush.max.size.mb"; + public static final String CONFIG_KEY_BULK_FLUSH_INTERVAL_MS = "bulk.flush.interval.ms"; + + private final Integer bulkProcessorFlushMaxActions; + private final Integer bulkProcessorFlushMaxSizeMb; + private final Integer bulkProcessorFlushIntervalMillis; + + // ------------------------------------------------------------------------ + // User-facing API and configuration + // ------------------------------------------------------------------------ + + /** The user specified config map that we forward to Elasticsearch when we create the {@link Client}. */ + private final Map<String, String> userConfig; + + /** The function that is used to construct mulitple {@link ActionRequest ActionRequests} from each incoming element. */ + private final ElasticsearchSinkFunction<T> elasticsearchSinkFunction; + + /** Provided to the user via the {@link ElasticsearchSinkFunction} to add {@link ActionRequest ActionRequests}. */ + private transient BulkProcessorIndexer requestIndexer; + + // ------------------------------------------------------------------------ + // Internals for the Flink Elasticsearch Sink + // ------------------------------------------------------------------------ + + /** Version-specific factory for Elasticsearch clients, provided by concrete subclasses. */ + private final ElasticsearchClientFactory clientFactory; + + /** Elasticsearch client created using the client factory. */ + private transient Client client; + + /** Bulk processor to buffer and send requests to Elasticsearch, created using the client. */ + private transient BulkProcessor bulkProcessor; + + /** Set from inside the {@link BulkProcessor} listener if there where failures during processing. */ + private final AtomicBoolean hasFailure = new AtomicBoolean(false); + + /** Set from inside the @link BulkProcessor} listener if a {@link Throwable} was thrown during processing. */ + private final AtomicReference<Throwable> failureThrowable = new AtomicReference<>(); + + public ElasticsearchSinkBase(ElasticsearchClientFactory clientFactory, + Map<String, String> userConfig, + ElasticsearchSinkFunction<T> elasticsearchSinkFunction) { + this.clientFactory = checkNotNull(clientFactory); + this.elasticsearchSinkFunction = checkNotNull(elasticsearchSinkFunction); + + // we eagerly check if the user-provided sink function is serializable; + // otherwise, if it isn't serializable, users will merely get a non-informative error message + // "ElasticsearchSinkBase is not serializable" + try { + InstantiationUtil.serializeObject(elasticsearchSinkFunction); + } catch (Exception e) { + throw new IllegalArgumentException( + "The implementation of the provided ElasticsearchSinkFunction is not serializable. " + + "The object probably contains or references non serializable fields."); + } + + checkNotNull(userConfig); + + // extract and remove bulk processor related configuration from the user-provided config, + // so that the resulting user config only contains configuration related to the Elasticsearch client. + ParameterTool params = ParameterTool.fromMap(userConfig); + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS)) { + bulkProcessorFlushMaxActions = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_ACTIONS); + } else { + bulkProcessorFlushMaxActions = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB)) { + bulkProcessorFlushMaxSizeMb = params.getInt(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_MAX_SIZE_MB); + } else { + bulkProcessorFlushMaxSizeMb = null; + } + + if (params.has(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS)) { + bulkProcessorFlushIntervalMillis = params.getInt(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + userConfig.remove(CONFIG_KEY_BULK_FLUSH_INTERVAL_MS); + } else { + bulkProcessorFlushIntervalMillis = null; + } + + this.userConfig = userConfig; + } + + @Override + public void open(Configuration parameters) throws Exception { + client = clientFactory.create(userConfig); + + BulkProcessor.Builder bulkProcessorBuilder = BulkProcessor.builder( + client, + new BulkProcessor.Listener() { + @Override + public void beforeBulk(long executionId, BulkRequest request) { } + + @Override + public void afterBulk(long executionId, BulkRequest request, BulkResponse response) { + if (response.hasFailures()) { + for (BulkItemResponse itemResp : response.getItems()) { + if (itemResp.isFailed()) { + LOG.error("Failed to index document in Elasticsearch: " + itemResp.getFailureMessage()); — End diff – It's important to log the stacktrace as well: `LOG.error("message", itemResp.getFailure().getCause())`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          Thank you for your comments @mikedias, I will address them.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 Thank you for your comments @mikedias, I will address them.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          While rebasing #2861 on the restructured ES connectors in this PR, I jumped around different options on whether or not the `BulkProcessor` build process should be abstracted. In the end, I think it will still be best to keep that abstracted to keep duplicated code to the minimum.

          To bridge the error logging problem, I introduced a `ElasticsearchApiCallBridge` interface to solve this. This call bridge can be further extended for other incompatible APIs that we bump into in the future.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 While rebasing #2861 on the restructured ES connectors in this PR, I jumped around different options on whether or not the `BulkProcessor` build process should be abstracted. In the end, I think it will still be best to keep that abstracted to keep duplicated code to the minimum. To bridge the error logging problem, I introduced a `ElasticsearchApiCallBridge` interface to solve this. This call bridge can be further extended for other incompatible APIs that we bump into in the future.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          I think the PR is in a stable state for a final review now. I'll keep the PR as is until another review, sorry for jumping around as I was trying options for more future change proof.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 I think the PR is in a stable state for a final review now. I'll keep the PR as is until another review, sorry for jumping around as I was trying options for more future change proof.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fpompermaier commented on the issue:

          https://github.com/apache/flink/pull/3112

          Hi @tzulitai, congrats for the great work! For my use case it is important also to be resilient to malformed documents (https://issues.apache.org/jira/browse/FLINK-5353). Do you think you could address that issue as well?
          That would be awesome for us

          Show
          githubbot ASF GitHub Bot added a comment - Github user fpompermaier commented on the issue: https://github.com/apache/flink/pull/3112 Hi @tzulitai, congrats for the great work! For my use case it is important also to be resilient to malformed documents ( https://issues.apache.org/jira/browse/FLINK-5353 ). Do you think you could address that issue as well? That would be awesome for us
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          Hi @fpompermaier! Thanks for bringing it up. However, as Robert suggested, lets not include that as this PR. But I'll like to discuss with you on the matter on the corresponding JIRA concurrently while this PR is in progress. I've left a comment on the JIRA already, lets move the discussion for this there

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 Hi @fpompermaier! Thanks for bringing it up. However, as Robert suggested, lets not include that as this PR. But I'll like to discuss with you on the matter on the corresponding JIRA concurrently while this PR is in progress. I've left a comment on the JIRA already, lets move the discussion for this there
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r99782995

          — Diff: docs/dev/connectors/elasticsearch.md —
          @@ -23,158 +23,291 @@ specific language governing permissions and limitations
          under the License.
          -->

          -This connector provides a Sink that can write to an
          -[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
          -following dependency to your project:
          -
          -

          {% highlight xml %}

          -<dependency>

          • <groupId>org.apache.flink</groupId>
          • <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId>
          • <version>{{site.version }}</version>
            -</dependency>
            - {% endhighlight %}

            +This connector provides sinks that can request document actions to an
            +[Elasticsearch](https://elastic.co/) Index. To use this connector, add one
            +of the following dependencies to your project, depending on the version
            +of the Elasticsearch installation:
            +
            +<table class="table table-bordered">
            + <thead>
            + <tr>
            + <th class="text-left">Maven Dependency</th>
            + <th class="text-left">Supported since</th>
            + <th class="text-left">Elasticsearch version</th>
            + </tr>
            + </thead>
            + <tbody>
            + <tr>
            + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>1.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>2.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td>
            + <td>1.2.0</td>
            + <td>5.x</td>
            + </tr>
            + </tbody>
            +</table>

          Note that the streaming connectors are currently not part of the binary
          -distribution. See
          -[here](site.baseurl/dev/linking.html)
          -for information about how to package the program with the libraries for
          -cluster execution.
          +distribution. See [here](site.baseurl/dev/linking.html) for information
          +about how to package the program with the libraries for cluster execution.

                1. Installing Elasticsearch

          Instructions for setting up an Elasticsearch cluster can be found
          [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
          Make sure to set and remember a cluster name. This must be set when
          -creating a Sink for writing to your cluster
          +creating an `ElasticsearchSink` for requesting document actions against your cluster.

                1. Elasticsearch Sink
                  -The connector provides a Sink that can send data to an Elasticsearch Index.
                  -
                  -The sink can use two different methods for communicating with Elasticsearch:
                  -
                  -1. An embedded Node
                  -2. The TransportClient

          -See [here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
          -for information about the differences between the two modes.
          +The `ElasticsearchSink` uses a `TransportClient` to communicate with an
          +Elasticsearch cluster.

          -This code shows how to create a sink that uses an embedded Node for
          -communication:
          +The example below shows how to configure and create a sink:

          <div class="codetabs" markdown="1">
          -<div data-lang="java" markdown="1">
          +<div data-lang="java, Elasticsearch 1.x" markdown="1">

          {% highlight java %}

          DataStream<String> input = ...;

          -Map<String, String> config = Maps.newHashMap();
          +Map<String, String> config = new HashMap<>();
          +config.put("cluster.name", "my-cluster-name")
          — End diff –

          There's a semikolon missing

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r99782995 — Diff: docs/dev/connectors/elasticsearch.md — @@ -23,158 +23,291 @@ specific language governing permissions and limitations under the License. --> -This connector provides a Sink that can write to an - [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add the -following dependency to your project: - - {% highlight xml %} -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId> <version>{{site.version }}</version> -</dependency> - {% endhighlight %} +This connector provides sinks that can request document actions to an + [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add one +of the following dependencies to your project, depending on the version +of the Elasticsearch installation: + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Maven Dependency</th> + <th class="text-left">Supported since</th> + <th class="text-left">Elasticsearch version</th> + </tr> + </thead> + <tbody> + <tr> + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>1.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>2.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td> + <td>1.2.0</td> + <td>5.x</td> + </tr> + </tbody> +</table> Note that the streaming connectors are currently not part of the binary -distribution. See - [here] ( site.baseurl /dev/linking.html) -for information about how to package the program with the libraries for -cluster execution. +distribution. See [here] ( site.baseurl /dev/linking.html) for information +about how to package the program with the libraries for cluster execution. Installing Elasticsearch Instructions for setting up an Elasticsearch cluster can be found [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html ). Make sure to set and remember a cluster name. This must be set when -creating a Sink for writing to your cluster +creating an `ElasticsearchSink` for requesting document actions against your cluster. Elasticsearch Sink -The connector provides a Sink that can send data to an Elasticsearch Index. - -The sink can use two different methods for communicating with Elasticsearch: - -1. An embedded Node -2. The TransportClient -See [here] ( https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html ) -for information about the differences between the two modes. +The `ElasticsearchSink` uses a `TransportClient` to communicate with an +Elasticsearch cluster. -This code shows how to create a sink that uses an embedded Node for -communication: +The example below shows how to configure and create a sink: <div class="codetabs" markdown="1"> -<div data-lang="java" markdown="1"> +<div data-lang="java, Elasticsearch 1.x" markdown="1"> {% highlight java %} DataStream<String> input = ...; -Map<String, String> config = Maps.newHashMap(); +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") — End diff – There's a semikolon missing
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r99783050

          — Diff: docs/dev/connectors/elasticsearch.md —
          @@ -23,158 +23,291 @@ specific language governing permissions and limitations
          under the License.
          -->

          -This connector provides a Sink that can write to an
          -[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
          -following dependency to your project:
          -
          -

          {% highlight xml %}

          -<dependency>

          • <groupId>org.apache.flink</groupId>
          • <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId>
          • <version>{{site.version }}</version>
            -</dependency>
            - {% endhighlight %}
            +This connector provides sinks that can request document actions to an
            +[Elasticsearch](https://elastic.co/) Index. To use this connector, add one
            +of the following dependencies to your project, depending on the version
            +of the Elasticsearch installation:
            +
            +<table class="table table-bordered">
            + <thead>
            + <tr>
            + <th class="text-left">Maven Dependency</th>
            + <th class="text-left">Supported since</th>
            + <th class="text-left">Elasticsearch version</th>
            + </tr>
            + </thead>
            + <tbody>
            + <tr>
            + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>1.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>2.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td>
            + <td>1.2.0</td>
            + <td>5.x</td>
            + </tr>
            + </tbody>
            +</table>

            Note that the streaming connectors are currently not part of the binary
            -distribution. See
            -[here](site.baseurl/dev/linking.html)
            -for information about how to package the program with the libraries for
            -cluster execution.
            +distribution. See [here](site.baseurl/dev/linking.html) for information
            +about how to package the program with the libraries for cluster execution.

            #### Installing Elasticsearch

            Instructions for setting up an Elasticsearch cluster can be found
            [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
            Make sure to set and remember a cluster name. This must be set when
            -creating a Sink for writing to your cluster
            +creating an `ElasticsearchSink` for requesting document actions against your cluster.

            #### Elasticsearch Sink
            -The connector provides a Sink that can send data to an Elasticsearch Index.
            -
            -The sink can use two different methods for communicating with Elasticsearch:
            -
            -1. An embedded Node
            -2. The TransportClient

            -See [here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
            -for information about the differences between the two modes.
            +The `ElasticsearchSink` uses a `TransportClient` to communicate with an
            +Elasticsearch cluster.

            -This code shows how to create a sink that uses an embedded Node for
            -communication:
            +The example below shows how to configure and create a sink:

            <div class="codetabs" markdown="1">
            -<div data-lang="java" markdown="1">
            +<div data-lang="java, Elasticsearch 1.x" markdown="1">
            {% highlight java %}
            DataStream<String> input = ...;

            -Map<String, String> config = Maps.newHashMap();
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")
            // This instructs the sink to emit after every element, otherwise they would be buffered
            config.put("bulk.flush.max.actions", "1");
            -config.put("cluster.name", "my-cluster-name");

            -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() {
            - @Override
            - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) {
            - Map<String, Object> json = new HashMap<>();
            - json.put("data", element);
            +List<TransportAddress> transportAddresses = new ArrayList<String>();
            +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300));
            +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300));

            +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() {
            + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); }
            +
            + @Override
            + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + }
            }));
            {% endhighlight %}

            </div>
            -<div data-lang="scala" markdown="1">
            +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1">
            +

            {% highlight java %}

            +DataStream<String> input = ...;
            +
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")

              • End diff –

          ; missing

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r99783050 — Diff: docs/dev/connectors/elasticsearch.md — @@ -23,158 +23,291 @@ specific language governing permissions and limitations under the License. --> -This connector provides a Sink that can write to an - [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add the -following dependency to your project: - - {% highlight xml %} -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId> <version>{{site.version }}</version> -</dependency> - {% endhighlight %} +This connector provides sinks that can request document actions to an + [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add one +of the following dependencies to your project, depending on the version +of the Elasticsearch installation: + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Maven Dependency</th> + <th class="text-left">Supported since</th> + <th class="text-left">Elasticsearch version</th> + </tr> + </thead> + <tbody> + <tr> + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>1.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>2.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td> + <td>1.2.0</td> + <td>5.x</td> + </tr> + </tbody> +</table> Note that the streaming connectors are currently not part of the binary -distribution. See - [here] ( site.baseurl /dev/linking.html) -for information about how to package the program with the libraries for -cluster execution. +distribution. See [here] ( site.baseurl /dev/linking.html) for information +about how to package the program with the libraries for cluster execution. #### Installing Elasticsearch Instructions for setting up an Elasticsearch cluster can be found [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html ). Make sure to set and remember a cluster name. This must be set when -creating a Sink for writing to your cluster +creating an `ElasticsearchSink` for requesting document actions against your cluster. #### Elasticsearch Sink -The connector provides a Sink that can send data to an Elasticsearch Index. - -The sink can use two different methods for communicating with Elasticsearch: - -1. An embedded Node -2. The TransportClient -See [here] ( https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html ) -for information about the differences between the two modes. +The `ElasticsearchSink` uses a `TransportClient` to communicate with an +Elasticsearch cluster. -This code shows how to create a sink that uses an embedded Node for -communication: +The example below shows how to configure and create a sink: <div class="codetabs" markdown="1"> -<div data-lang="java" markdown="1"> +<div data-lang="java, Elasticsearch 1.x" markdown="1"> {% highlight java %} DataStream<String> input = ...; -Map<String, String> config = Maps.newHashMap(); +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") // This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1"); -config.put("cluster.name", "my-cluster-name"); -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() { - @Override - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) { - Map<String, Object> json = new HashMap<>(); - json.put("data", element); +List<TransportAddress> transportAddresses = new ArrayList<String>(); +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300)); +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300)); +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } })); {% endhighlight %} </div> -<div data-lang="scala" markdown="1"> +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1"> + {% highlight java %} +DataStream<String> input = ...; + +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") End diff – ; missing
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r99783072

          — Diff: docs/dev/connectors/elasticsearch.md —
          @@ -23,158 +23,291 @@ specific language governing permissions and limitations
          under the License.
          -->

          -This connector provides a Sink that can write to an
          -[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
          -following dependency to your project:
          -
          -

          {% highlight xml %}

          -<dependency>

          • <groupId>org.apache.flink</groupId>
          • <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId>
          • <version>{{site.version }}</version>
            -</dependency>
            - {% endhighlight %}
            +This connector provides sinks that can request document actions to an
            +[Elasticsearch](https://elastic.co/) Index. To use this connector, add one
            +of the following dependencies to your project, depending on the version
            +of the Elasticsearch installation:
            +
            +<table class="table table-bordered">
            + <thead>
            + <tr>
            + <th class="text-left">Maven Dependency</th>
            + <th class="text-left">Supported since</th>
            + <th class="text-left">Elasticsearch version</th>
            + </tr>
            + </thead>
            + <tbody>
            + <tr>
            + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>1.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>2.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td>
            + <td>1.2.0</td>
            + <td>5.x</td>
            + </tr>
            + </tbody>
            +</table>

            Note that the streaming connectors are currently not part of the binary
            -distribution. See
            -[here](site.baseurl/dev/linking.html)
            -for information about how to package the program with the libraries for
            -cluster execution.
            +distribution. See [here](site.baseurl/dev/linking.html) for information
            +about how to package the program with the libraries for cluster execution.

            #### Installing Elasticsearch

            Instructions for setting up an Elasticsearch cluster can be found
            [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
            Make sure to set and remember a cluster name. This must be set when
            -creating a Sink for writing to your cluster
            +creating an `ElasticsearchSink` for requesting document actions against your cluster.

            #### Elasticsearch Sink
            -The connector provides a Sink that can send data to an Elasticsearch Index.
            -
            -The sink can use two different methods for communicating with Elasticsearch:
            -
            -1. An embedded Node
            -2. The TransportClient

            -See [here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
            -for information about the differences between the two modes.
            +The `ElasticsearchSink` uses a `TransportClient` to communicate with an
            +Elasticsearch cluster.

            -This code shows how to create a sink that uses an embedded Node for
            -communication:
            +The example below shows how to configure and create a sink:

            <div class="codetabs" markdown="1">
            -<div data-lang="java" markdown="1">
            +<div data-lang="java, Elasticsearch 1.x" markdown="1">
            {% highlight java %}
            DataStream<String> input = ...;

            -Map<String, String> config = Maps.newHashMap();
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")
            // This instructs the sink to emit after every element, otherwise they would be buffered
            config.put("bulk.flush.max.actions", "1");
            -config.put("cluster.name", "my-cluster-name");

            -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() {
            - @Override
            - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) {
            - Map<String, Object> json = new HashMap<>();
            - json.put("data", element);
            +List<TransportAddress> transportAddresses = new ArrayList<String>();
            +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300));
            +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300));

            +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() {
            + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); }
            +
            + @Override
            + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + }
            }));
            {% endhighlight %}

            </div>
            -<div data-lang="scala" markdown="1">
            +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1">
            +

            {% highlight java %}

            +DataStream<String> input = ...;
            +
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")
            +// This instructs the sink to emit after every element, otherwise they would be buffered
            +config.put("bulk.flush.max.actions", "1");
            +
            +List<InetSocketAddress> transportAddresses = new ArrayList<>();
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300));
            +
            +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>()

            Unknown macro: { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } +}

            ));

            {% endhighlight %}

            +</div>
            +<div data-lang="scala, Elasticsearch 1.x" markdown="1">

            {% highlight scala %}

            val input: DataStream[String] = ...

          -val config = new util.HashMap[String, String]
          +val config = new java.util.HashMap[String, String]
          +config.put("cluster.name", "my-cluster-name")
          — End diff –

          ; again

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r99783072 — Diff: docs/dev/connectors/elasticsearch.md — @@ -23,158 +23,291 @@ specific language governing permissions and limitations under the License. --> -This connector provides a Sink that can write to an - [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add the -following dependency to your project: - - {% highlight xml %} -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId> <version>{{site.version }}</version> -</dependency> - {% endhighlight %} +This connector provides sinks that can request document actions to an + [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add one +of the following dependencies to your project, depending on the version +of the Elasticsearch installation: + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Maven Dependency</th> + <th class="text-left">Supported since</th> + <th class="text-left">Elasticsearch version</th> + </tr> + </thead> + <tbody> + <tr> + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>1.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>2.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td> + <td>1.2.0</td> + <td>5.x</td> + </tr> + </tbody> +</table> Note that the streaming connectors are currently not part of the binary -distribution. See - [here] ( site.baseurl /dev/linking.html) -for information about how to package the program with the libraries for -cluster execution. +distribution. See [here] ( site.baseurl /dev/linking.html) for information +about how to package the program with the libraries for cluster execution. #### Installing Elasticsearch Instructions for setting up an Elasticsearch cluster can be found [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html ). Make sure to set and remember a cluster name. This must be set when -creating a Sink for writing to your cluster +creating an `ElasticsearchSink` for requesting document actions against your cluster. #### Elasticsearch Sink -The connector provides a Sink that can send data to an Elasticsearch Index. - -The sink can use two different methods for communicating with Elasticsearch: - -1. An embedded Node -2. The TransportClient -See [here] ( https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html ) -for information about the differences between the two modes. +The `ElasticsearchSink` uses a `TransportClient` to communicate with an +Elasticsearch cluster. -This code shows how to create a sink that uses an embedded Node for -communication: +The example below shows how to configure and create a sink: <div class="codetabs" markdown="1"> -<div data-lang="java" markdown="1"> +<div data-lang="java, Elasticsearch 1.x" markdown="1"> {% highlight java %} DataStream<String> input = ...; -Map<String, String> config = Maps.newHashMap(); +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") // This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1"); -config.put("cluster.name", "my-cluster-name"); -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() { - @Override - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) { - Map<String, Object> json = new HashMap<>(); - json.put("data", element); +List<TransportAddress> transportAddresses = new ArrayList<String>(); +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300)); +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300)); +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } })); {% endhighlight %} </div> -<div data-lang="scala" markdown="1"> +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1"> + {% highlight java %} +DataStream<String> input = ...; + +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") +// This instructs the sink to emit after every element, otherwise they would be buffered +config.put("bulk.flush.max.actions", "1"); + +List<InetSocketAddress> transportAddresses = new ArrayList<>(); +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300)); + +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() Unknown macro: { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } +} )); {% endhighlight %} +</div> +<div data-lang="scala, Elasticsearch 1.x" markdown="1"> {% highlight scala %} val input: DataStream [String] = ... -val config = new util.HashMap [String, String] +val config = new java.util.HashMap [String, String] +config.put("cluster.name", "my-cluster-name") — End diff – ; again
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/3112

          +1 to merge.

          I've tried the ES2 connector (just to check one of the connectors) and it worked well.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3112 +1 to merge. I've tried the ES2 connector (just to check one of the connectors) and it worked well.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          Thank you for the review! I'll address the missing semicolons and merge this today.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 Thank you for the review! I'll address the missing semicolons and merge this today.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3112#discussion_r99837460

          — Diff: docs/dev/connectors/elasticsearch.md —
          @@ -23,158 +23,291 @@ specific language governing permissions and limitations
          under the License.
          -->

          -This connector provides a Sink that can write to an
          -[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
          -following dependency to your project:
          -
          -

          {% highlight xml %}

          -<dependency>

          • <groupId>org.apache.flink</groupId>
          • <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId>
          • <version>{{site.version }}</version>
            -</dependency>
            - {% endhighlight %}
            +This connector provides sinks that can request document actions to an
            +[Elasticsearch](https://elastic.co/) Index. To use this connector, add one
            +of the following dependencies to your project, depending on the version
            +of the Elasticsearch installation:
            +
            +<table class="table table-bordered">
            + <thead>
            + <tr>
            + <th class="text-left">Maven Dependency</th>
            + <th class="text-left">Supported since</th>
            + <th class="text-left">Elasticsearch version</th>
            + </tr>
            + </thead>
            + <tbody>
            + <tr>
            + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>1.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td>
            + <td>1.0.0</td>
            + <td>2.x</td>
            + </tr>
            + <tr>
            + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td>
            + <td>1.2.0</td>
            + <td>5.x</td>
            + </tr>
            + </tbody>
            +</table>

            Note that the streaming connectors are currently not part of the binary
            -distribution. See
            -[here](site.baseurl/dev/linking.html)
            -for information about how to package the program with the libraries for
            -cluster execution.
            +distribution. See [here](site.baseurl/dev/linking.html) for information
            +about how to package the program with the libraries for cluster execution.

            #### Installing Elasticsearch

            Instructions for setting up an Elasticsearch cluster can be found
            [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
            Make sure to set and remember a cluster name. This must be set when
            -creating a Sink for writing to your cluster
            +creating an `ElasticsearchSink` for requesting document actions against your cluster.

            #### Elasticsearch Sink
            -The connector provides a Sink that can send data to an Elasticsearch Index.
            -
            -The sink can use two different methods for communicating with Elasticsearch:
            -
            -1. An embedded Node
            -2. The TransportClient

            -See [here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
            -for information about the differences between the two modes.
            +The `ElasticsearchSink` uses a `TransportClient` to communicate with an
            +Elasticsearch cluster.

            -This code shows how to create a sink that uses an embedded Node for
            -communication:
            +The example below shows how to configure and create a sink:

            <div class="codetabs" markdown="1">
            -<div data-lang="java" markdown="1">
            +<div data-lang="java, Elasticsearch 1.x" markdown="1">
            {% highlight java %}
            DataStream<String> input = ...;

            -Map<String, String> config = Maps.newHashMap();
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")
            // This instructs the sink to emit after every element, otherwise they would be buffered
            config.put("bulk.flush.max.actions", "1");
            -config.put("cluster.name", "my-cluster-name");

            -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() {
            - @Override
            - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) {
            - Map<String, Object> json = new HashMap<>();
            - json.put("data", element);
            +List<TransportAddress> transportAddresses = new ArrayList<String>();
            +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300));
            +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300));

            +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() {
            + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); }
            +
            + @Override
            + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + }
            }));
            {% endhighlight %}

            </div>
            -<div data-lang="scala" markdown="1">
            +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1">
            +

            {% highlight java %}

            +DataStream<String> input = ...;
            +
            +Map<String, String> config = new HashMap<>();
            +config.put("cluster.name", "my-cluster-name")
            +// This instructs the sink to emit after every element, otherwise they would be buffered
            +config.put("bulk.flush.max.actions", "1");
            +
            +List<InetSocketAddress> transportAddresses = new ArrayList<>();
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
            +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300));
            +
            +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>()

            Unknown macro: { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } +}

            ));

            {% endhighlight %}

            +</div>
            +<div data-lang="scala, Elasticsearch 1.x" markdown="1">

            {% highlight scala %}

            val input: DataStream[String] = ...

          -val config = new util.HashMap[String, String]
          +val config = new java.util.HashMap[String, String]
          +config.put("cluster.name", "my-cluster-name")
          — End diff –

          This is scala code, so the semicolon is ignored .

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3112#discussion_r99837460 — Diff: docs/dev/connectors/elasticsearch.md — @@ -23,158 +23,291 @@ specific language governing permissions and limitations under the License. --> -This connector provides a Sink that can write to an - [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add the -following dependency to your project: - - {% highlight xml %} -<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch{{ site.scala_version_suffix }}</artifactId> <version>{{site.version }}</version> -</dependency> - {% endhighlight %} +This connector provides sinks that can request document actions to an + [Elasticsearch] ( https://elastic.co/ ) Index. To use this connector, add one +of the following dependencies to your project, depending on the version +of the Elasticsearch installation: + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Maven Dependency</th> + <th class="text-left">Supported since</th> + <th class="text-left">Elasticsearch version</th> + </tr> + </thead> + <tbody> + <tr> + <td>flink-connector-elasticsearch{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>1.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch2{{ site.scala_version_suffix }}</td> + <td>1.0.0</td> + <td>2.x</td> + </tr> + <tr> + <td>flink-connector-elasticsearch5{{ site.scala_version_suffix }}</td> + <td>1.2.0</td> + <td>5.x</td> + </tr> + </tbody> +</table> Note that the streaming connectors are currently not part of the binary -distribution. See - [here] ( site.baseurl /dev/linking.html) -for information about how to package the program with the libraries for -cluster execution. +distribution. See [here] ( site.baseurl /dev/linking.html) for information +about how to package the program with the libraries for cluster execution. #### Installing Elasticsearch Instructions for setting up an Elasticsearch cluster can be found [here] ( https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html ). Make sure to set and remember a cluster name. This must be set when -creating a Sink for writing to your cluster +creating an `ElasticsearchSink` for requesting document actions against your cluster. #### Elasticsearch Sink -The connector provides a Sink that can send data to an Elasticsearch Index. - -The sink can use two different methods for communicating with Elasticsearch: - -1. An embedded Node -2. The TransportClient -See [here] ( https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html ) -for information about the differences between the two modes. +The `ElasticsearchSink` uses a `TransportClient` to communicate with an +Elasticsearch cluster. -This code shows how to create a sink that uses an embedded Node for -communication: +The example below shows how to configure and create a sink: <div class="codetabs" markdown="1"> -<div data-lang="java" markdown="1"> +<div data-lang="java, Elasticsearch 1.x" markdown="1"> {% highlight java %} DataStream<String> input = ...; -Map<String, String> config = Maps.newHashMap(); +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") // This instructs the sink to emit after every element, otherwise they would be buffered config.put("bulk.flush.max.actions", "1"); -config.put("cluster.name", "my-cluster-name"); -input.addSink(new ElasticsearchSink<>(config, new IndexRequestBuilder<String>() { - @Override - public IndexRequest createIndexRequest(String element, RuntimeContext ctx) { - Map<String, Object> json = new HashMap<>(); - json.put("data", element); +List<TransportAddress> transportAddresses = new ArrayList<String>(); +transportAddresses.add(new InetSocketTransportAddress("127.0.0.1", 9300)); +transportAddresses.add(new InetSocketTransportAddress("10.2.3.1", 9300)); +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + return Requests.indexRequest() .index("my-index") .type("my-type") .source(json); } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } })); {% endhighlight %} </div> -<div data-lang="scala" markdown="1"> +<div data-lang="java, Elasticsearch 2.x / 5.x" markdown="1"> + {% highlight java %} +DataStream<String> input = ...; + +Map<String, String> config = new HashMap<>(); +config.put("cluster.name", "my-cluster-name") +// This instructs the sink to emit after every element, otherwise they would be buffered +config.put("bulk.flush.max.actions", "1"); + +List<InetSocketAddress> transportAddresses = new ArrayList<>(); +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300)); +transportAddresses.add(new InetSocketAddress(InetAddress.getByName("10.2.3.1"), 9300)); + +input.addSink(new ElasticsearchSink<>(config, transportAddresses, new ElasticsearchSinkFunction<String>() Unknown macro: { + public IndexRequest createIndexRequest(String element) { + Map<String, String> json = new HashMap<>(); + json.put("data", element); + + return Requests.indexRequest() + .index("my-index") + .type("my-type") + .source(json); + } + + @Override + public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { + indexer.add(createIndexRequest(element)); + } +} )); {% endhighlight %} +</div> +<div data-lang="scala, Elasticsearch 1.x" markdown="1"> {% highlight scala %} val input: DataStream [String] = ... -val config = new util.HashMap [String, String] +val config = new java.util.HashMap [String, String] +config.put("cluster.name", "my-cluster-name") — End diff – This is scala code, so the semicolon is ignored .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3112

          Merging to `master` ..

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3112 Merging to `master` ..
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3112

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3112
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/2767

          Hi @mikedias! Your contribution has been merged with 8699b03d79a441ca33d9f62b96490d29a0efaf44 and b5caaef82add4a6f424094d526700c77b011724e. Could you manually close this PR? The bot only closed the new restructure PR.

          Thanks a lot for your contribution!

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2767 Hi @mikedias! Your contribution has been merged with 8699b03d79a441ca33d9f62b96490d29a0efaf44 and b5caaef82add4a6f424094d526700c77b011724e. Could you manually close this PR? The bot only closed the new restructure PR. Thanks a lot for your contribution!
          Hide
          tzulitai Tzu-Li (Gordon) Tai added a comment - - edited

          Solved for master via http://git-wip-us.apache.org/repos/asf/flink/commit/b5caaef.

          Thank you for the contribution Mike Dias!

          Show
          tzulitai Tzu-Li (Gordon) Tai added a comment - - edited Solved for master via http://git-wip-us.apache.org/repos/asf/flink/commit/b5caaef . Thank you for the contribution Mike Dias !
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/2767

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2767
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/2767

          I closed the PR manually using another commit.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2767 I closed the PR manually using another commit.

            People

            • Assignee:
              Unassigned
              Reporter:
              mike_dias Mike Dias
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development