Details

      Description

      Many states in keyed streams are organized as key-value pairs. Currently, these states are implemented by storing the entire map into a ValueState or a ListState. The implementation however is very costly because all entries have to be serialized/deserialized when updating a single entry. To improve the efficiency of these states, MapStates are urgently needed.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang closed the pull request at:

          https://github.com/apache/flink/pull/3336

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang closed the pull request at: https://github.com/apache/flink/pull/3336
          Hide
          aljoscha Aljoscha Krettek added a comment -

          Implemented in:
          30c9e2b683bf7f4776ffc23b6a860946a4429ae5

          Show
          aljoscha Aljoscha Krettek added a comment - Implemented in: 30c9e2b683bf7f4776ffc23b6a860946a4429ae5
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on the issue:

          https://github.com/apache/flink/pull/3336

          Thanks a lot for working on this @shixiaogang. ! 😃

          I just merged your PR, could you please close is.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/3336 Thanks a lot for working on this @shixiaogang. ! 😃 I just merged your PR, could you please close is.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on the issue:

          https://github.com/apache/flink/pull/3336

          @aljoscha Thanks a lot for your hard work. I have fixed the typos in the documentation.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on the issue: https://github.com/apache/flink/pull/3336 @aljoscha Thanks a lot for your hard work. I have fixed the typos in the documentation.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on the issue:

          https://github.com/apache/flink/pull/3336

          I had to very minor comments about typos in Javadoc. Otherwise this looks very good now, @shixiaogang 👍

          @StefanRRichter Could you please go ahead and merge when you are satisfied with the internals?

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/3336 I had to very minor comments about typos in Javadoc. Otherwise this looks very good now, @shixiaogang 👍 @StefanRRichter Could you please go ahead and merge when you are satisfied with the internals?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102426884

          — Diff: docs/dev/stream/state.md —
          @@ -118,6 +118,11 @@ added to the state. Contrary to `ReducingState`, the aggregate type may be diffe
          of elements that are added to the state. The interface is the same as for `ListState` but elements
          added using `add(T)` are folded into an aggregate using a specified `FoldFunction`.

          +* `MapState<UK, UV>`: This keeps a list of mappings. You can put key-value pairs into the state and retrieve
          +retrieve an `Iterable` over all currently stored mappings. Mappings are added using `put(UK, UV)` or
          +`putAll(map<UK, UV>)`. The value associated with a user key can be retrieved using `get(UK)`. The iterable
          — End diff –

          lowercase `map`

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102426884 — Diff: docs/dev/stream/state.md — @@ -118,6 +118,11 @@ added to the state. Contrary to `ReducingState`, the aggregate type may be diffe of elements that are added to the state. The interface is the same as for `ListState` but elements added using `add(T)` are folded into an aggregate using a specified `FoldFunction`. +* `MapState<UK, UV>`: This keeps a list of mappings. You can put key-value pairs into the state and retrieve +retrieve an `Iterable` over all currently stored mappings. Mappings are added using `put(UK, UV)` or +`putAll(map<UK, UV>)`. The value associated with a user key can be retrieved using `get(UK)`. The iterable — End diff – lowercase `map`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102426841

          — Diff: docs/dev/stream/state.md —
          @@ -118,6 +118,11 @@ added to the state. Contrary to `ReducingState`, the aggregate type may be diffe
          of elements that are added to the state. The interface is the same as for `ListState` but elements
          added using `add(T)` are folded into an aggregate using a specified `FoldFunction`.

          +* `MapState<UK, UV>`: This keeps a list of mappings. You can put key-value pairs into the state and retrieve
          +retrieve an `Iterable` over all currently stored mappings. Mappings are added using `put(UK, UV)` or
          — End diff –

          duplicate "retrieve"

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102426841 — Diff: docs/dev/stream/state.md — @@ -118,6 +118,11 @@ added to the state. Contrary to `ReducingState`, the aggregate type may be diffe of elements that are added to the state. The interface is the same as for `ListState` but elements added using `add(T)` are folded into an aggregate using a specified `FoldFunction`. +* `MapState<UK, UV>`: This keeps a list of mappings. You can put key-value pairs into the state and retrieve +retrieve an `Iterable` over all currently stored mappings. Mappings are added using `put(UK, UV)` or — End diff – duplicate "retrieve"
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wenlong88 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102381085

          — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapStateDescriptor.java —
          @@ -0,0 +1,147 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.common.state;
          +
          +import org.apache.flink.annotation.PublicEvolving;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.common.typeutils.base.MapSerializer;
          +import org.apache.flink.api.java.typeutils.MapTypeInfo;
          +
          +import java.util.Map;
          +
          +/**
          + * A

          {@link StateDescriptor}

          for

          {@link MapState}

          . This can be used to create state where the type
          + * is a list that can be appended and iterated over.
          — End diff –

          commons error, caused by copy and paste

          Show
          githubbot ASF GitHub Bot added a comment - Github user wenlong88 commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102381085 — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapStateDescriptor.java — @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common.state; + +import org.apache.flink.annotation.PublicEvolving; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.common.typeutils.base.MapSerializer; +import org.apache.flink.api.java.typeutils.MapTypeInfo; + +import java.util.Map; + +/** + * A {@link StateDescriptor} for {@link MapState} . This can be used to create state where the type + * is a list that can be appended and iterated over. — End diff – commons error, caused by copy and paste
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on the issue:

          https://github.com/apache/flink/pull/3336

          @StefanRRichter Very thanks for your work. I have rebased the pull request and resolved the conflicts.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on the issue: https://github.com/apache/flink/pull/3336 @StefanRRichter Very thanks for your work. I have rebased the pull request and resolved the conflicts.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102225285

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -382,11 +342,26 @@ private UV deserializeUserValue(byte[] rawValueBytes)

          { this.rawValueBytes = rawValueBytes; this.deleted = false; }
          • +
            + public void remove() {
            + deleted = true;
            + rawValueBytes = null;
            +
            + try

            { + db.remove(columnFamily, writeOptions, rawKeyBytes); + }

            catch (RocksDBException e) {
            + throw new RuntimeException("Error while removing data from RocksDB.", e);

              • End diff –

          I modify the method signature because I find, except `ValueState`, the methods in other states all throw `Exception`. I think it's okay because `MapState` is a common interface which has no idea of the implementation. The implementation of these methods, however, should throw some specific exception like `IOException` or `RocksDBException`.

          I think it's reasonable. What do you think?

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102225285 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -382,11 +342,26 @@ private UV deserializeUserValue(byte[] rawValueBytes) { this.rawValueBytes = rawValueBytes; this.deleted = false; } + + public void remove() { + deleted = true; + rawValueBytes = null; + + try { + db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); End diff – I modify the method signature because I find, except `ValueState`, the methods in other states all throw `Exception`. I think it's okay because `MapState` is a common interface which has no idea of the implementation. The implementation of these methods, however, should throw some specific exception like `IOException` or `RocksDBException`. I think it's reasonable. What do you think?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102194100

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          — End diff –

          Yes, I think we need a follow-up issue because the other state types also have the `WriteOptions` that are not properly cleaned up.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102194100 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; — End diff – Yes, I think we need a follow-up issue because the other state types also have the `WriteOptions` that are not properly cleaned up.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on the issue:

          https://github.com/apache/flink/pull/3336

          Besides the problem with the `WriteOption`, which I suggest to resolve in another PR, this looks good to merge for me now. +1 from me and waiting for the second approval through @aljoscha.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/3336 Besides the problem with the `WriteOption`, which I suggest to resolve in another PR, this looks good to merge for me now. +1 from me and waiting for the second approval through @aljoscha.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102188520

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          — End diff –

          In any case, the problem still persists that the native resource `WriteOptions` is never closed properly. It might be ok, because right now the task will typically end when is should be released, but it is still not completely clean and can produce leaks. I would be ok with merging this and creating another JIRA for this. What do you think @aljoscha ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102188520 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; — End diff – In any case, the problem still persists that the native resource `WriteOptions` is never closed properly. It might be ok, because right now the task will typically end when is should be released, but it is still not completely clean and can produce leaks. I would be ok with merging this and creating another JIRA for this. What do you think @aljoscha ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102187482

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -382,11 +342,26 @@ private UV deserializeUserValue(byte[] rawValueBytes)

          { this.rawValueBytes = rawValueBytes; this.deleted = false; }
          • +
            + public void remove() {
            + deleted = true;
            + rawValueBytes = null;
            +
            + try

            { + db.remove(columnFamily, writeOptions, rawKeyBytes); + }

            catch (RocksDBException e) {
            + throw new RuntimeException("Error while removing data from RocksDB.", e);

              • End diff –

          The intention of my comment about `RuntimeException` was not about changing the method signatures for throwing RocksDBExcepion. My suggestion was to only use a proper subclass of `RuntimeException`. We should avoid using `RuntimeException` directly, similar to how we should avoid throwing the class `Exception` directly. I know that there is some code in Flink that does not follow this, but I think it is better code style to stick with more appropriate subclasses.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102187482 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -382,11 +342,26 @@ private UV deserializeUserValue(byte[] rawValueBytes) { this.rawValueBytes = rawValueBytes; this.deleted = false; } + + public void remove() { + deleted = true; + rawValueBytes = null; + + try { + db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); End diff – The intention of my comment about `RuntimeException` was not about changing the method signatures for throwing RocksDBExcepion. My suggestion was to only use a proper subclass of `RuntimeException`. We should avoid using `RuntimeException` directly, similar to how we should avoid throwing the class `Exception` directly. I know that there is some code in Flink that does not follow this, but I think it is better code style to stick with more appropriate subclasses.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102173749

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java —
          @@ -834,7 +836,7 @@ private void restoreKVStateData() throws IOException, RocksDBException {
          }

          @Override

          • protected <N, T> InternalValueState<N, T> createValueState(
            + public <N, T> InternalValueState<N, T> createValueState(
              • End diff –

          Hmm I see, let's keep it as you did it then.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102173749 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java — @@ -834,7 +836,7 @@ private void restoreKVStateData() throws IOException, RocksDBException { } @Override protected <N, T> InternalValueState<N, T> createValueState( + public <N, T> InternalValueState<N, T> createValueState( End diff – Hmm I see, let's keep it as you did it then.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102173582

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          — End diff –

          It was there initially because it would core dump in the very first version of the RocksDB backend otherwise. Not sure if this would still persist now. Maybe it was a problem with the cleanup.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102173582 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; — End diff – It was there initially because it would core dump in the very first version of the RocksDB backend otherwise. Not sure if this would still persist now. Maybe it was a problem with the cleanup.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on the issue:

          https://github.com/apache/flink/pull/3336

          I have added the documentation for `MapState`. You may take a look to see if it's properly written.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on the issue: https://github.com/apache/flink/pull/3336 I have added the documentation for `MapState`. You may take a look to see if it's properly written.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on the issue:

          https://github.com/apache/flink/pull/3336

          @StefanRRichter I have updated the pull request as suggested. Now the map serializer supports the serialization of null values.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on the issue: https://github.com/apache/flink/pull/3336 @StefanRRichter I have updated the pull request as suggested. Now the map serializer supports the serialization of null values.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on the issue:

          https://github.com/apache/flink/pull/3336

          I have one more point that I forgot in my initial review. It would be great if you could also provide a paragraph in the Flink documentation about the MapState. This would improve the accessibility and visibility of this nice feature a lot for all users.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/3336 I have one more point that I forgot in my initial review. It would be great if you could also provide a paragraph in the Flink documentation about the MapState. This would improve the accessibility and visibility of this nice feature a lot for all users.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102164095

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          — End diff –

          What I meant was not a global singleton, just a single iterable per state. Since this is just a minor optimization, I could also just live with the code as is.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102164095 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { — End diff – What I meant was not a global singleton, just a single iterable per state. Since this is just a minor optimization, I could also just live with the code as is.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102163881

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null) {
          + remove(userKey);
          — End diff –

          I think that this would be a weakness of the serializer, or that some data types simply would not allow for `null` values (`null` would be a programming error for them). My feeling is that this code should not have to work around this problem. Probably any type serializer could trivally support null by returning a byte[0].

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102163881 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); — End diff – I think that this would be a weakness of the serializer, or that some data types simply would not allow for `null` values (`null` would be a programming error for them). My feeling is that this code should not have to work around this problem. Probably any type serializer could trivally support null by returning a byte [0] .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102163224

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e) {
          + throw new RuntimeException("Error while getting data from RocksDB.", e);
          — End diff –

          A different PR is fine. In this case, could you create a JIRA so that we do not forget about this point?

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102163224 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); — End diff – A different PR is fine. In this case, could you create a JIRA so that we do not forget about this point?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102162964

          — Diff: flink-runtime/src/test/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializerTest.java —
          @@ -410,6 +415,124 @@ public void testDeserializeListTooShort2() throws Exception {
          KvStateRequestSerializer.deserializeList(new byte[]

          {1, 1, 1, 1, 1, 1, 1, 1, 2, 3}

          ,
          LongSerializer.INSTANCE);
          }
          +
          + /**
          + * Tests map serialization utils.
          + */
          + @Test
          + public void testMapSerialization() throws Exception

          { + final long key = 0L; + + // objects for heap state list serialisation + final HeapKeyedStateBackend<Long> longHeapKeyedStateBackend = + new HeapKeyedStateBackend<>( + mock(TaskKvStateRegistry.class), + LongSerializer.INSTANCE, + ClassLoader.getSystemClassLoader(), + 1, new KeyGroupRange(0, 0) + ); + longHeapKeyedStateBackend.setCurrentKey(key); + + final InternalMapState<VoidNamespace, Long, String> mapState = longHeapKeyedStateBackend.createMapState( + VoidNamespaceSerializer.INSTANCE, + new MapStateDescriptor<>("test", LongSerializer.INSTANCE, StringSerializer.INSTANCE)); + + testMapSerialization(key, mapState); + }

          +
          + /**
          + * Verifies that the serialization of a map using the given map state
          + * matches the deserialization with

          {@link KvStateRequestSerializer#deserializeList}

          .
          + *
          + * @param key
          + * key of the map state
          + * @param mapState
          + * map state using the

          {@link VoidNamespace}

          , must also be a

          {@link InternalKvState}

          instance
          + *
          + * @throws Exception
          + */
          + public static void testMapSerialization(
          + final long key,
          + final InternalMapState<VoidNamespace, Long, String> mapState) throws Exception {
          +
          + TypeSerializer<Long> userKeySerializer = LongSerializer.INSTANCE;
          + TypeSerializer<String> userValueSerializer = StringSerializer.INSTANCE;
          + mapState.setCurrentNamespace(VoidNamespace.INSTANCE);
          +
          + // List
          + final int numElements = 10;
          +
          + final Map<Long, String> expectedValues = new HashMap<>();
          + for (int i = 0; i < numElements; i++) {
          + final long value = ThreadLocalRandom.current().nextLong();
          — End diff –

          I understand your reason. Maybe we could also just print the random seed for each run, so that in case of a test error, we can immediately reproduce it?

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102162964 — Diff: flink-runtime/src/test/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializerTest.java — @@ -410,6 +415,124 @@ public void testDeserializeListTooShort2() throws Exception { KvStateRequestSerializer.deserializeList(new byte[] {1, 1, 1, 1, 1, 1, 1, 1, 2, 3} , LongSerializer.INSTANCE); } + + /** + * Tests map serialization utils. + */ + @Test + public void testMapSerialization() throws Exception { + final long key = 0L; + + // objects for heap state list serialisation + final HeapKeyedStateBackend<Long> longHeapKeyedStateBackend = + new HeapKeyedStateBackend<>( + mock(TaskKvStateRegistry.class), + LongSerializer.INSTANCE, + ClassLoader.getSystemClassLoader(), + 1, new KeyGroupRange(0, 0) + ); + longHeapKeyedStateBackend.setCurrentKey(key); + + final InternalMapState<VoidNamespace, Long, String> mapState = longHeapKeyedStateBackend.createMapState( + VoidNamespaceSerializer.INSTANCE, + new MapStateDescriptor<>("test", LongSerializer.INSTANCE, StringSerializer.INSTANCE)); + + testMapSerialization(key, mapState); + } + + /** + * Verifies that the serialization of a map using the given map state + * matches the deserialization with {@link KvStateRequestSerializer#deserializeList} . + * + * @param key + * key of the map state + * @param mapState + * map state using the {@link VoidNamespace} , must also be a {@link InternalKvState} instance + * + * @throws Exception + */ + public static void testMapSerialization( + final long key, + final InternalMapState<VoidNamespace, Long, String> mapState) throws Exception { + + TypeSerializer<Long> userKeySerializer = LongSerializer.INSTANCE; + TypeSerializer<String> userValueSerializer = StringSerializer.INSTANCE; + mapState.setCurrentNamespace(VoidNamespace.INSTANCE); + + // List + final int numElements = 10; + + final Map<Long, String> expectedValues = new HashMap<>(); + for (int i = 0; i < numElements; i++) { + final long value = ThreadLocalRandom.current().nextLong(); — End diff – I understand your reason. Maybe we could also just print the random seed for each run, so that in case of a test error, we can immediately reproduce it?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102162632

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/DefaultKeyedStateStore.java —
          @@ -93,6 +95,18 @@ public DefaultKeyedStateStore(KeyedStateBackend<?> keyedStateBackend, ExecutionC
          }
          }

          + @Override
          + public <UK, UV> MapState<UK, UV> getMapState(MapStateDescriptor<UK, UV> stateProperties) {
          + requireNonNull(stateProperties, "The state properties must not be null");
          + try

          { + stateProperties.initializeSerializerUnlessSet(executionConfig); + MapState<UK, UV> originalState = getPartitionedState(stateProperties); + return new UserFacingMapState<>(originalState); + }

          catch (Exception e) {
          + throw new RuntimeException("Error while getting state", e);
          — End diff –

          What I meant was using a more specific subclass of `RuntimeException` (e.g. `IllegalAccess`,`IllegalState`, `IndexOutOfBounds`, etc.) that better reflects the error, but still can leave the method signature unchanged.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102162632 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/DefaultKeyedStateStore.java — @@ -93,6 +95,18 @@ public DefaultKeyedStateStore(KeyedStateBackend<?> keyedStateBackend, ExecutionC } } + @Override + public <UK, UV> MapState<UK, UV> getMapState(MapStateDescriptor<UK, UV> stateProperties) { + requireNonNull(stateProperties, "The state properties must not be null"); + try { + stateProperties.initializeSerializerUnlessSet(executionConfig); + MapState<UK, UV> originalState = getPartitionedState(stateProperties); + return new UserFacingMapState<>(originalState); + } catch (Exception e) { + throw new RuntimeException("Error while getting state", e); — End diff – What I meant was using a more specific subclass of `RuntimeException` (e.g. `IllegalAccess`,`IllegalState`, `IndexOutOfBounds`, etc.) that better reflects the error, but still can leave the method signature unchanged.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102162187

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          — End diff –

          Do you have a core dump file for this problem that you could with us? I would try to take a look into this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102162187 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; — End diff – Do you have a core dump file for this problem that you could with us? I would try to take a look into this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102153318

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          — End diff –

          The `Iterable` is used to iterate over the mappings under the current key. Though we can reuse a singleton object for different keys, it will make the implementation complicated.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102153318 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { — End diff – The `Iterable` is used to iterate over the mappings under the current key. Though we can reuse a singleton object for different keys, it will make the implementation complicated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102138099

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializer.java —
          @@ -484,6 +487,71 @@ public static Throwable deserializeServerFailure(ByteBuf buf) throws IOException
          return null;
          }
          }
          +
          + /**
          + * Serializes all values of the Iterable with the given serializer.
          + *
          + * @param entries Key-value pairs to serialize
          + * @param keySerializer Serializer for UK
          + * @param valueSerializer Serializer for UV
          + * @param <UK> Type of the keys
          + * @param <UV> Type of the values
          + * @return Serialized values or <code>null</code> if values <code>null</code> or empty
          + * @throws IOException On failure during serialization
          + */
          + public static <UK, UV> byte[] serializeMap(Iterable<Map.Entry<UK, UV>> entries, TypeSerializer<UK> keySerializer, TypeSerializer<UV> valueSerializer) throws IOException {
          + if (entries != null) {
          + Iterator<Map.Entry<UK, UV>> it = entries.iterator();
          +
          + if (it.hasNext()) {
          + // Serialize
          + DataOutputSerializer dos = new DataOutputSerializer(32);
          +
          + while (it.hasNext())

          { + Map.Entry<UK, UV> entry = it.next(); + + keySerializer.serialize(entry.getKey(), dos); + valueSerializer.serialize(entry.getValue(), dos); + }

          +
          + return dos.getCopyOfBuffer();
          + } else {
          + return null;
          — End diff –

          The function is unused now. I will delete it in the update.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102138099 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializer.java — @@ -484,6 +487,71 @@ public static Throwable deserializeServerFailure(ByteBuf buf) throws IOException return null; } } + + /** + * Serializes all values of the Iterable with the given serializer. + * + * @param entries Key-value pairs to serialize + * @param keySerializer Serializer for UK + * @param valueSerializer Serializer for UV + * @param <UK> Type of the keys + * @param <UV> Type of the values + * @return Serialized values or <code>null</code> if values <code>null</code> or empty + * @throws IOException On failure during serialization + */ + public static <UK, UV> byte[] serializeMap(Iterable<Map.Entry<UK, UV>> entries, TypeSerializer<UK> keySerializer, TypeSerializer<UV> valueSerializer) throws IOException { + if (entries != null) { + Iterator<Map.Entry<UK, UV>> it = entries.iterator(); + + if (it.hasNext()) { + // Serialize + DataOutputSerializer dos = new DataOutputSerializer(32); + + while (it.hasNext()) { + Map.Entry<UK, UV> entry = it.next(); + + keySerializer.serialize(entry.getKey(), dos); + valueSerializer.serialize(entry.getValue(), dos); + } + + return dos.getCopyOfBuffer(); + } else { + return null; — End diff – The function is unused now. I will delete it in the update.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102135289

          — Diff: flink-runtime/src/test/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializerTest.java —
          @@ -410,6 +415,124 @@ public void testDeserializeListTooShort2() throws Exception {
          KvStateRequestSerializer.deserializeList(new byte[]

          {1, 1, 1, 1, 1, 1, 1, 1, 2, 3}

          ,
          LongSerializer.INSTANCE);
          }
          +
          + /**
          + * Tests map serialization utils.
          + */
          + @Test
          + public void testMapSerialization() throws Exception

          { + final long key = 0L; + + // objects for heap state list serialisation + final HeapKeyedStateBackend<Long> longHeapKeyedStateBackend = + new HeapKeyedStateBackend<>( + mock(TaskKvStateRegistry.class), + LongSerializer.INSTANCE, + ClassLoader.getSystemClassLoader(), + 1, new KeyGroupRange(0, 0) + ); + longHeapKeyedStateBackend.setCurrentKey(key); + + final InternalMapState<VoidNamespace, Long, String> mapState = longHeapKeyedStateBackend.createMapState( + VoidNamespaceSerializer.INSTANCE, + new MapStateDescriptor<>("test", LongSerializer.INSTANCE, StringSerializer.INSTANCE)); + + testMapSerialization(key, mapState); + }

          +
          + /**
          + * Verifies that the serialization of a map using the given map state
          + * matches the deserialization with

          {@link KvStateRequestSerializer#deserializeList}

          .
          + *
          + * @param key
          + * key of the map state
          + * @param mapState
          + * map state using the

          {@link VoidNamespace}

          , must also be a

          {@link InternalKvState}

          instance
          + *
          + * @throws Exception
          + */
          + public static void testMapSerialization(
          + final long key,
          + final InternalMapState<VoidNamespace, Long, String> mapState) throws Exception {
          +
          + TypeSerializer<Long> userKeySerializer = LongSerializer.INSTANCE;
          + TypeSerializer<String> userValueSerializer = StringSerializer.INSTANCE;
          + mapState.setCurrentNamespace(VoidNamespace.INSTANCE);
          +
          + // List
          + final int numElements = 10;
          +
          + final Map<Long, String> expectedValues = new HashMap<>();
          + for (int i = 0; i < numElements; i++) {
          + final long value = ThreadLocalRandom.current().nextLong();
          — End diff –

          I prefer to use `ThreadLocalRandom.current()` which is also used in other tests in this file. Though it makes difficult to reproduce the case, it may help to find corner cases.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102135289 — Diff: flink-runtime/src/test/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializerTest.java — @@ -410,6 +415,124 @@ public void testDeserializeListTooShort2() throws Exception { KvStateRequestSerializer.deserializeList(new byte[] {1, 1, 1, 1, 1, 1, 1, 1, 2, 3} , LongSerializer.INSTANCE); } + + /** + * Tests map serialization utils. + */ + @Test + public void testMapSerialization() throws Exception { + final long key = 0L; + + // objects for heap state list serialisation + final HeapKeyedStateBackend<Long> longHeapKeyedStateBackend = + new HeapKeyedStateBackend<>( + mock(TaskKvStateRegistry.class), + LongSerializer.INSTANCE, + ClassLoader.getSystemClassLoader(), + 1, new KeyGroupRange(0, 0) + ); + longHeapKeyedStateBackend.setCurrentKey(key); + + final InternalMapState<VoidNamespace, Long, String> mapState = longHeapKeyedStateBackend.createMapState( + VoidNamespaceSerializer.INSTANCE, + new MapStateDescriptor<>("test", LongSerializer.INSTANCE, StringSerializer.INSTANCE)); + + testMapSerialization(key, mapState); + } + + /** + * Verifies that the serialization of a map using the given map state + * matches the deserialization with {@link KvStateRequestSerializer#deserializeList} . + * + * @param key + * key of the map state + * @param mapState + * map state using the {@link VoidNamespace} , must also be a {@link InternalKvState} instance + * + * @throws Exception + */ + public static void testMapSerialization( + final long key, + final InternalMapState<VoidNamespace, Long, String> mapState) throws Exception { + + TypeSerializer<Long> userKeySerializer = LongSerializer.INSTANCE; + TypeSerializer<String> userValueSerializer = StringSerializer.INSTANCE; + mapState.setCurrentNamespace(VoidNamespace.INSTANCE); + + // List + final int numElements = 10; + + final Map<Long, String> expectedValues = new HashMap<>(); + for (int i = 0; i < numElements; i++) { + final long value = ThreadLocalRandom.current().nextLong(); — End diff – I prefer to use `ThreadLocalRandom.current()` which is also used in other tests in this file. Though it makes difficult to reproduce the case, it may help to find corner cases.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102129362

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java —
          @@ -834,7 +836,7 @@ private void restoreKVStateData() throws IOException, RocksDBException {
          }

          @Override

          • protected <N, T> InternalValueState<N, T> createValueState(
            + public <N, T> InternalValueState<N, T> createValueState(
              • End diff –

          It is mainly due to the unit tests in `KvStateRequestSerializerTest` which need the accessors to `InternalKvState`. A better choice to use `getPartitionState()` to obtain a user-facing state and convert it to an internal state. What do you think?

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102129362 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java — @@ -834,7 +836,7 @@ private void restoreKVStateData() throws IOException, RocksDBException { } @Override protected <N, T> InternalValueState<N, T> createValueState( + public <N, T> InternalValueState<N, T> createValueState( End diff – It is mainly due to the unit tests in `KvStateRequestSerializerTest` which need the accessors to `InternalKvState`. A better choice to use `getPartitionState()` to obtain a user-facing state and convert it to an internal state. What do you think?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102128355

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/DefaultKeyedStateStore.java —
          @@ -93,6 +95,18 @@ public DefaultKeyedStateStore(KeyedStateBackend<?> keyedStateBackend, ExecutionC
          }
          }

          + @Override
          + public <UK, UV> MapState<UK, UV> getMapState(MapStateDescriptor<UK, UV> stateProperties) {
          + requireNonNull(stateProperties, "The state properties must not be null");
          + try

          { + stateProperties.initializeSerializerUnlessSet(executionConfig); + MapState<UK, UV> originalState = getPartitionedState(stateProperties); + return new UserFacingMapState<>(originalState); + }

          catch (Exception e) {
          + throw new RuntimeException("Error while getting state", e);
          — End diff –

          Currently, `KeyedStateStore#getState()` does not throw exception in its declaration. `RuntimeException` is the only exception that can be thrown. Since the modification to the interface will affect user code (users will have to deal with thrown exceptions), I am not sure it's okay to modify the function declaration in `KeyedStateStore`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102128355 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/DefaultKeyedStateStore.java — @@ -93,6 +95,18 @@ public DefaultKeyedStateStore(KeyedStateBackend<?> keyedStateBackend, ExecutionC } } + @Override + public <UK, UV> MapState<UK, UV> getMapState(MapStateDescriptor<UK, UV> stateProperties) { + requireNonNull(stateProperties, "The state properties must not be null"); + try { + stateProperties.initializeSerializerUnlessSet(executionConfig); + MapState<UK, UV> originalState = getPartitionedState(stateProperties); + return new UserFacingMapState<>(originalState); + } catch (Exception e) { + throw new RuntimeException("Error while getting state", e); — End diff – Currently, `KeyedStateStore#getState()` does not throw exception in its declaration. `RuntimeException` is the only exception that can be thrown. Since the modification to the interface will affect user code (users will have to deal with thrown exceptions), I am not sure it's okay to modify the function declaration in `KeyedStateStore`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102127867

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          + @Override
          + public Iterator<UK> iterator() {
          + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UK next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + return new Iterable<UV>() {
          + @Override
          + public Iterator<UV> iterator() {
          + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UV next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) {
          + @Override
          + public Map.Entry<UK, UV> next()

          { + return nextEntry(); + }
          + };
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + final Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + // Return null to make the behavior consistent with other states.
          + if (!iterator.hasNext()) { + return null; + } else {
          + return new Iterable<Map.Entry<UK, UV>>() {
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + }
          + };
          + }
          + }
          +
          + @Override
          + public void add(Map<UK, UV> map) throws Exception {
          + if (map == null) { + return; + }
          +
          + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + }
          + }
          +
          + @Override
          + public void clear() {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + }
          + }
          +
          + @Override
          + @SuppressWarnings("unchecked")
          + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception {
          + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace");
          +
          + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation
          + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace(
          + serializedKeyAndNamespace,
          + backend.getKeySerializer(),
          + namespaceSerializer);
          +
          + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups());
          +
          + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128);
          + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream);
          +
          + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView);
          + byte[] keyPrefixBytes = outputStream.toByteArray();
          +
          + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) {
          + @Override
          + public Map.Entry<UK, UV> next() { + return nextEntry(); + }

          + };
          +
          + // Return null to make the behavior consistent
          + if (!iterator.hasNext())

          { + return null; + }

          +
          + outputStream.reset();
          +
          + while (iterator.hasNext())

          { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + }

          +
          + return outputStream.toByteArray();
          + }
          +
          + // ------------------------------------------------------------------------
          + // Serialization Methods
          + // ------------------------------------------------------------------------
          +
          + private byte[] serializeCurrentKey() {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the current key."); + }

          + }
          +
          + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user key.", e); + }
          + }
          +
          + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) {
          + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + }

          + }
          +
          + private byte[] serializeUserValue(UV userValue) {
          + try

          { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user value.", e); + }

          + }
          +
          + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) {
          + try

          { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in); + UK userKey = userKeySerializer.deserialize(in); + + return new Tuple4<>(keyAndNamespace.f0, keyAndNamespace.f1, keyAndNamespace.f2, userKey); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while deserializing the user key.", e); + }

          + }
          +
          + private UV deserializeUserValue(byte[] rawValueBytes) {
          + try

          { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawValueBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + return userValueSerializer.deserialize(in); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while deserializing the user value.", e); + }

          + }
          +
          + // ------------------------------------------------------------------------
          + // Internal Classes
          + // ------------------------------------------------------------------------
          +
          + /** A map entry in RocksDBMapState */
          + private class RocksDBMapEntry implements Map.Entry<UK, UV> {
          + private final RocksDB db;
          +
          + /** The raw bytes of the key stored in RocksDB. Each user key is stored in RocksDB
          + * with the format #KeyGroup#Key#Namespace#UserKey. */
          + private final byte[] rawKeyBytes;
          +
          + /** The raw bytes of the value stored in RocksDB */
          + private final byte[] rawValueBytes;
          +
          + /** True if the entry has been deleted. */
          + private boolean deleted;
          +
          + /** The user key and value. The deserialization is performed lazily, i.e. the key
          + * and the value is deserialized only when they are accessed. */
          + private UK userKey = null;
          + private UV userValue = null;
          +
          + RocksDBMapEntry(final RocksDB db, final byte[] rawKeyBytes, final byte[] rawValueBytes)

          { + this.db = db; + + this.rawKeyBytes = rawKeyBytes; + this.rawValueBytes = rawValueBytes; + this.deleted = false; + }

          +
          + @Override
          + public UK getKey() {
          + if (userKey == null)

          { + userKey = deserializeUserKey(rawKeyBytes).f3; + }

          +
          + return userKey;
          + }
          +
          + @Override
          + public UV getValue() {
          + if (deleted)

          { + return null; + }

          else {
          + if (userValue == null)

          { + userValue = deserializeUserValue(rawValueBytes); + }

          +
          + return userValue;
          + }
          + }
          +
          + @Override
          + public UV setValue(UV value) {
          + if (deleted)

          { + throw new IllegalStateException("The value has already been deleted."); + }

          +
          + UV oldValue = getValue();
          +
          + if (value == null) {
          + deleted = true;
          +
          + try

          { + db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + } else {
          + userValue = value;
          +
          + byte[] rawValueBytes = serializeUserValue(value);
          + try

          { + db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB.", e); + }

          + }
          +
          + return oldValue;
          + }
          + }
          +
          + /** An auxiliary utility to scan all entries under the given key. */
          + private abstract class RocksDBMapIterator<T> implements Iterator<T> {
          +
          + final static int CACHE_SIZE_BASE = 1;
          + final static int CACHE_SIZE_LIMIT = 128;
          +
          + /** The db where data resides. */
          + private final RocksDB db;
          +
          + /**
          + * The prefix bytes of the key being accessed. All entries under the same key
          + * has the same prefix, hence we can stop the iterating once coming across an
          + * entry with a different prefix.
          + */
          + private final byte[] keyPrefixBytes;
          +
          + /**
          + * True if all entries have been accessed or the iterator has come across an
          + * entry with a different prefix.
          + */
          + private boolean expired = false;
          +
          + /** A in-memory cache for the entries in the rocksdb. */
          + private ArrayList<RocksDBMapEntry> cacheEntries = new ArrayList<>();
          + private int cacheIndex = 0;
          +
          +
          + RocksDBMapIterator(final RocksDB db, final byte[] keyPrefixBytes)

          { + this.db = db; + this.keyPrefixBytes = keyPrefixBytes; + }

          +
          + @Override
          + public boolean hasNext()

          { + loadCache(); + + return (cacheIndex < cacheEntries.size()); + }

          +
          + @Override
          + public void remove() {
          + if (cacheIndex == 0 || cacheIndex > cacheEntries.size())

          { + throw new IllegalStateException(); + }
          +
          + RocksDBMapEntry lastEntry = cacheEntries.get(cacheIndex - 1);
          +
          + try { + db.remove(columnFamily, writeOptions, lastEntry.rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }
          +
          + lastEntry.deleted = true;
          + }
          +
          + final RocksDBMapEntry nextEntry() {
          + loadCache();
          +
          + if (cacheIndex == cacheEntries.size()) {
          + if (!expired) { + throw new IllegalStateException(); + }
          +
          + return null;
          + }
          +
          + RocksDBMapEntry entry = cacheEntries.get(cacheIndex);
          + cacheIndex++;
          +
          + return entry;
          + }
          +
          + private void loadCache() {
          + if (cacheIndex > cacheEntries.size()) { + throw new IllegalStateException(); + }

          +
          + // Load cache entries only when the cache is empty and there still exist unread entries
          + if (cacheIndex < cacheEntries.size() || expired)

          { + return; + }

          +
          + RocksIterator iterator = db.newIterator(columnFamily);
          — End diff –

          Yeah, it's much better. Will update it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102127867 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { + @Override + public Iterator<UK> iterator() { + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) { + @Override + public UK next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + } + }; + } + }; + } + + @Override + public Iterable<UV> values() { + return new Iterable<UV>() { + @Override + public Iterator<UV> iterator() { + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) { + @Override + public UV next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + } + }; + } + }; + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + final Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + // Return null to make the behavior consistent with other states. + if (!iterator.hasNext()) { + return null; + } else { + return new Iterable<Map.Entry<UK, UV>>() { + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + } + }; + } + } + + @Override + public void add(Map<UK, UV> map) throws Exception { + if (map == null) { + return; + } + + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + } + } + + @Override + public void clear() { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + } + } + + @Override + @SuppressWarnings("unchecked") + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception { + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace"); + + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace( + serializedKeyAndNamespace, + backend.getKeySerializer(), + namespaceSerializer); + + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups()); + + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128); + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream); + + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView); + byte[] keyPrefixBytes = outputStream.toByteArray(); + + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + + // Return null to make the behavior consistent + if (!iterator.hasNext()) { + return null; + } + + outputStream.reset(); + + while (iterator.hasNext()) { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + } + + return outputStream.toByteArray(); + } + + // ------------------------------------------------------------------------ + // Serialization Methods + // ------------------------------------------------------------------------ + + private byte[] serializeCurrentKey() { + try { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the current key."); + } + } + + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) { + try { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) { + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserValue(UV userValue) { + try { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user value.", e); + } + } + + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) { + try { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in); + UK userKey = userKeySerializer.deserialize(in); + + return new Tuple4<>(keyAndNamespace.f0, keyAndNamespace.f1, keyAndNamespace.f2, userKey); + } catch (IOException e) { + throw new RuntimeException("Error while deserializing the user key.", e); + } + } + + private UV deserializeUserValue(byte[] rawValueBytes) { + try { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawValueBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + return userValueSerializer.deserialize(in); + } catch (IOException e) { + throw new RuntimeException("Error while deserializing the user value.", e); + } + } + + // ------------------------------------------------------------------------ + // Internal Classes + // ------------------------------------------------------------------------ + + /** A map entry in RocksDBMapState */ + private class RocksDBMapEntry implements Map.Entry<UK, UV> { + private final RocksDB db; + + /** The raw bytes of the key stored in RocksDB. Each user key is stored in RocksDB + * with the format #KeyGroup#Key#Namespace#UserKey. */ + private final byte[] rawKeyBytes; + + /** The raw bytes of the value stored in RocksDB */ + private final byte[] rawValueBytes; + + /** True if the entry has been deleted. */ + private boolean deleted; + + /** The user key and value. The deserialization is performed lazily, i.e. the key + * and the value is deserialized only when they are accessed. */ + private UK userKey = null; + private UV userValue = null; + + RocksDBMapEntry(final RocksDB db, final byte[] rawKeyBytes, final byte[] rawValueBytes) { + this.db = db; + + this.rawKeyBytes = rawKeyBytes; + this.rawValueBytes = rawValueBytes; + this.deleted = false; + } + + @Override + public UK getKey() { + if (userKey == null) { + userKey = deserializeUserKey(rawKeyBytes).f3; + } + + return userKey; + } + + @Override + public UV getValue() { + if (deleted) { + return null; + } else { + if (userValue == null) { + userValue = deserializeUserValue(rawValueBytes); + } + + return userValue; + } + } + + @Override + public UV setValue(UV value) { + if (deleted) { + throw new IllegalStateException("The value has already been deleted."); + } + + UV oldValue = getValue(); + + if (value == null) { + deleted = true; + + try { + db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } else { + userValue = value; + + byte[] rawValueBytes = serializeUserValue(value); + try { + db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB.", e); + } + } + + return oldValue; + } + } + + /** An auxiliary utility to scan all entries under the given key. */ + private abstract class RocksDBMapIterator<T> implements Iterator<T> { + + final static int CACHE_SIZE_BASE = 1; + final static int CACHE_SIZE_LIMIT = 128; + + /** The db where data resides. */ + private final RocksDB db; + + /** + * The prefix bytes of the key being accessed. All entries under the same key + * has the same prefix, hence we can stop the iterating once coming across an + * entry with a different prefix. + */ + private final byte[] keyPrefixBytes; + + /** + * True if all entries have been accessed or the iterator has come across an + * entry with a different prefix. + */ + private boolean expired = false; + + /** A in-memory cache for the entries in the rocksdb. */ + private ArrayList<RocksDBMapEntry> cacheEntries = new ArrayList<>(); + private int cacheIndex = 0; + + + RocksDBMapIterator(final RocksDB db, final byte[] keyPrefixBytes) { + this.db = db; + this.keyPrefixBytes = keyPrefixBytes; + } + + @Override + public boolean hasNext() { + loadCache(); + + return (cacheIndex < cacheEntries.size()); + } + + @Override + public void remove() { + if (cacheIndex == 0 || cacheIndex > cacheEntries.size()) { + throw new IllegalStateException(); + } + + RocksDBMapEntry lastEntry = cacheEntries.get(cacheIndex - 1); + + try { + db.remove(columnFamily, writeOptions, lastEntry.rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + + lastEntry.deleted = true; + } + + final RocksDBMapEntry nextEntry() { + loadCache(); + + if (cacheIndex == cacheEntries.size()) { + if (!expired) { + throw new IllegalStateException(); + } + + return null; + } + + RocksDBMapEntry entry = cacheEntries.get(cacheIndex); + cacheIndex++; + + return entry; + } + + private void loadCache() { + if (cacheIndex > cacheEntries.size()) { + throw new IllegalStateException(); + } + + // Load cache entries only when the cache is empty and there still exist unread entries + if (cacheIndex < cacheEntries.size() || expired) { + return; + } + + RocksIterator iterator = db.newIterator(columnFamily); — End diff – Yeah, it's much better. Will update it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102127767

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          + @Override
          + public Iterator<UK> iterator() {
          + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UK next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + return new Iterable<UV>() {
          + @Override
          + public Iterator<UV> iterator() {
          + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UV next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) {
          + @Override
          + public Map.Entry<UK, UV> next()

          { + return nextEntry(); + }
          + };
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + final Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + // Return null to make the behavior consistent with other states.
          + if (!iterator.hasNext()) { + return null; + } else {
          + return new Iterable<Map.Entry<UK, UV>>() {
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + }
          + };
          + }
          + }
          +
          + @Override
          + public void add(Map<UK, UV> map) throws Exception {
          + if (map == null) { + return; + }
          +
          + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + }
          + }
          +
          + @Override
          + public void clear() {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + }
          + }
          +
          + @Override
          + @SuppressWarnings("unchecked")
          + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception {
          + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace");
          +
          + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation
          + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace(
          + serializedKeyAndNamespace,
          + backend.getKeySerializer(),
          + namespaceSerializer);
          +
          + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups());
          +
          + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128);
          + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream);
          +
          + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView);
          + byte[] keyPrefixBytes = outputStream.toByteArray();
          +
          + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) {
          + @Override
          + public Map.Entry<UK, UV> next() { + return nextEntry(); + }

          + };
          +
          + // Return null to make the behavior consistent
          + if (!iterator.hasNext())

          { + return null; + }

          +
          + outputStream.reset();
          +
          + while (iterator.hasNext())

          { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + }

          +
          + return outputStream.toByteArray();
          + }
          +
          + // ------------------------------------------------------------------------
          + // Serialization Methods
          + // ------------------------------------------------------------------------
          +
          + private byte[] serializeCurrentKey() {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the current key."); + }

          + }
          +
          + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user key.", e); + }
          + }
          +
          + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) {
          + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + }

          + }
          +
          + private byte[] serializeUserValue(UV userValue) {
          + try

          { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user value.", e); + }

          + }
          +
          + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) {
          + try

          { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in); + UK userKey = userKeySerializer.deserialize(in); + + return new Tuple4<>(keyAndNamespace.f0, keyAndNamespace.f1, keyAndNamespace.f2, userKey); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while deserializing the user key.", e); + }

          + }
          +
          + private UV deserializeUserValue(byte[] rawValueBytes) {
          + try

          { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawValueBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + return userValueSerializer.deserialize(in); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while deserializing the user value.", e); + }

          + }
          +
          + // ------------------------------------------------------------------------
          + // Internal Classes
          + // ------------------------------------------------------------------------
          +
          + /** A map entry in RocksDBMapState */
          + private class RocksDBMapEntry implements Map.Entry<UK, UV> {
          + private final RocksDB db;
          +
          + /** The raw bytes of the key stored in RocksDB. Each user key is stored in RocksDB
          + * with the format #KeyGroup#Key#Namespace#UserKey. */
          + private final byte[] rawKeyBytes;
          +
          + /** The raw bytes of the value stored in RocksDB */
          + private final byte[] rawValueBytes;
          +
          + /** True if the entry has been deleted. */
          + private boolean deleted;
          +
          + /** The user key and value. The deserialization is performed lazily, i.e. the key
          + * and the value is deserialized only when they are accessed. */
          + private UK userKey = null;
          + private UV userValue = null;
          +
          + RocksDBMapEntry(final RocksDB db, final byte[] rawKeyBytes, final byte[] rawValueBytes)

          { + this.db = db; + + this.rawKeyBytes = rawKeyBytes; + this.rawValueBytes = rawValueBytes; + this.deleted = false; + }

          +
          + @Override
          + public UK getKey() {
          + if (userKey == null)

          { + userKey = deserializeUserKey(rawKeyBytes).f3; + }

          +
          + return userKey;
          + }
          +
          + @Override
          + public UV getValue() {
          + if (deleted)

          { + return null; + }

          else {
          + if (userValue == null) {
          + userValue = deserializeUserValue(rawValueBytes);
          — End diff –

          Good point. I will update it as suggested.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102127767 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { + @Override + public Iterator<UK> iterator() { + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) { + @Override + public UK next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + } + }; + } + }; + } + + @Override + public Iterable<UV> values() { + return new Iterable<UV>() { + @Override + public Iterator<UV> iterator() { + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) { + @Override + public UV next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + } + }; + } + }; + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + final Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + // Return null to make the behavior consistent with other states. + if (!iterator.hasNext()) { + return null; + } else { + return new Iterable<Map.Entry<UK, UV>>() { + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + } + }; + } + } + + @Override + public void add(Map<UK, UV> map) throws Exception { + if (map == null) { + return; + } + + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + } + } + + @Override + public void clear() { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + } + } + + @Override + @SuppressWarnings("unchecked") + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception { + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace"); + + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace( + serializedKeyAndNamespace, + backend.getKeySerializer(), + namespaceSerializer); + + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups()); + + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128); + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream); + + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView); + byte[] keyPrefixBytes = outputStream.toByteArray(); + + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + + // Return null to make the behavior consistent + if (!iterator.hasNext()) { + return null; + } + + outputStream.reset(); + + while (iterator.hasNext()) { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + } + + return outputStream.toByteArray(); + } + + // ------------------------------------------------------------------------ + // Serialization Methods + // ------------------------------------------------------------------------ + + private byte[] serializeCurrentKey() { + try { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the current key."); + } + } + + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) { + try { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) { + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserValue(UV userValue) { + try { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user value.", e); + } + } + + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) { + try { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in); + UK userKey = userKeySerializer.deserialize(in); + + return new Tuple4<>(keyAndNamespace.f0, keyAndNamespace.f1, keyAndNamespace.f2, userKey); + } catch (IOException e) { + throw new RuntimeException("Error while deserializing the user key.", e); + } + } + + private UV deserializeUserValue(byte[] rawValueBytes) { + try { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawValueBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + return userValueSerializer.deserialize(in); + } catch (IOException e) { + throw new RuntimeException("Error while deserializing the user value.", e); + } + } + + // ------------------------------------------------------------------------ + // Internal Classes + // ------------------------------------------------------------------------ + + /** A map entry in RocksDBMapState */ + private class RocksDBMapEntry implements Map.Entry<UK, UV> { + private final RocksDB db; + + /** The raw bytes of the key stored in RocksDB. Each user key is stored in RocksDB + * with the format #KeyGroup#Key#Namespace#UserKey. */ + private final byte[] rawKeyBytes; + + /** The raw bytes of the value stored in RocksDB */ + private final byte[] rawValueBytes; + + /** True if the entry has been deleted. */ + private boolean deleted; + + /** The user key and value. The deserialization is performed lazily, i.e. the key + * and the value is deserialized only when they are accessed. */ + private UK userKey = null; + private UV userValue = null; + + RocksDBMapEntry(final RocksDB db, final byte[] rawKeyBytes, final byte[] rawValueBytes) { + this.db = db; + + this.rawKeyBytes = rawKeyBytes; + this.rawValueBytes = rawValueBytes; + this.deleted = false; + } + + @Override + public UK getKey() { + if (userKey == null) { + userKey = deserializeUserKey(rawKeyBytes).f3; + } + + return userKey; + } + + @Override + public UV getValue() { + if (deleted) { + return null; + } else { + if (userValue == null) { + userValue = deserializeUserValue(rawValueBytes); — End diff – Good point. I will update it as suggested.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102126863

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null) {
          + remove(userKey);
          — End diff –

          I remove those mappings whose values from RocksDB because currently we have no method to serialize the null value. Many type serializers do not support the serialization of `null` (e.g. IntSerializer). It also follows `RocksDBValueState` which removes the data when the value is updated to `null`.

          In current implementation, the `get(UK)` method can still get the correct result if we remove the mapping from RocksDB because we always return `null` when the key is found in the RocksDB. The only concern is that the result returned by `contains(UK)` and `size()` will be different from a Map which supports `null` values.

          Actually, I prefer not to remove the entry if its value is `null`. But at prior to that, I think a different pull request is urgently needed to let all type serializers support `null` values.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102126863 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); — End diff – I remove those mappings whose values from RocksDB because currently we have no method to serialize the null value. Many type serializers do not support the serialization of `null` (e.g. IntSerializer). It also follows `RocksDBValueState` which removes the data when the value is updated to `null`. In current implementation, the `get(UK)` method can still get the correct result if we remove the mapping from RocksDB because we always return `null` when the key is found in the RocksDB. The only concern is that the result returned by `contains(UK)` and `size()` will be different from a Map which supports `null` values. Actually, I prefer not to remove the entry if its value is `null`. But at prior to that, I think a different pull request is urgently needed to let all type serializers support `null` values.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102125445

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e) {
          + throw new RuntimeException("Error while getting data from RocksDB.", e);
          — End diff –

          This is to be consistent with the behavior in other RocksDB states. I agree with you that we should use a more specific exception. It also helps to reduce the stack printed in the cases of exception. But I think we should do the changes in a different pull request because it involves the changes in other states. What do you think?

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102125445 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); — End diff – This is to be consistent with the behavior in other RocksDB states. I agree with you that we should use a more specific exception. It also helps to reduce the stack printed in the cases of exception. But I think we should do the changes in a different pull request because it involves the changes in other states. What do you think?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102125062

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          — End diff –

          To be honest, i have no idea why we can't put `writeOptions` in base class. We put it in `AbstractRocksDBState` and do not come across any problem in our production environment.

          Maybe @aljoscha is more familiar with the problem.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102125062 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; — End diff – To be honest, i have no idea why we can't put `writeOptions` in base class. We put it in `AbstractRocksDBState` and do not come across any problem in our production environment. Maybe @aljoscha is more familiar with the problem.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102034484

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          + @Override
          + public Iterator<UK> iterator() {
          + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UK next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + return new Iterable<UV>() {
          + @Override
          + public Iterator<UV> iterator() {
          + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UV next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) {
          + @Override
          + public Map.Entry<UK, UV> next()

          { + return nextEntry(); + }

          + };
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + final Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + // Return null to make the behavior consistent with other states.
          + if (!iterator.hasNext())

          { + return null; + }

          else {
          + return new Iterable<Map.Entry<UK, UV>>() {
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator()

          { + return iterator; + }

          + };
          + }
          + }
          +
          + @Override
          + public void add(Map<UK, UV> map) throws Exception {
          + if (map == null)

          { + return; + }

          +
          + for (Map.Entry<UK, UV> entry : map.entrySet())

          { + put(entry.getKey(), entry.getValue()); + }

          + }
          +
          + @Override
          + public void clear() {
          — End diff –

          Yes that is true, my bad. Then this shortcut doesn't work of course.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102034484 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { + @Override + public Iterator<UK> iterator() { + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) { + @Override + public UK next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + } + }; + } + }; + } + + @Override + public Iterable<UV> values() { + return new Iterable<UV>() { + @Override + public Iterator<UV> iterator() { + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) { + @Override + public UV next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + } + }; + } + }; + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + final Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + // Return null to make the behavior consistent with other states. + if (!iterator.hasNext()) { + return null; + } else { + return new Iterable<Map.Entry<UK, UV>>() { + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + } + }; + } + } + + @Override + public void add(Map<UK, UV> map) throws Exception { + if (map == null) { + return; + } + + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + } + } + + @Override + public void clear() { — End diff – Yes that is true, my bad. Then this shortcut doesn't work of course.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102033998

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          + @Override
          + public Iterator<UK> iterator() {
          + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UK next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + return new Iterable<UV>() {
          + @Override
          + public Iterator<UV> iterator() {
          + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UV next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) {
          + @Override
          + public Map.Entry<UK, UV> next()

          { + return nextEntry(); + }

          + };
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + final Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + // Return null to make the behavior consistent with other states.
          + if (!iterator.hasNext())

          { + return null; + }

          else {
          + return new Iterable<Map.Entry<UK, UV>>() {
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator()

          { + return iterator; + }

          + };
          + }
          + }
          +
          + @Override
          + public void add(Map<UK, UV> map) throws Exception {
          + if (map == null)

          { + return; + }

          +
          + for (Map.Entry<UK, UV> entry : map.entrySet())

          { + put(entry.getKey(), entry.getValue()); + }

          + }
          +
          + @Override
          + public void clear() {
          — End diff –

          I think that doesn't work be because the iterator doesn't iterator over the whole column family, just over the entries of the current state key, which can have several user-key/value pairs.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102033998 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { + @Override + public Iterator<UK> iterator() { + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) { + @Override + public UK next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + } + }; + } + }; + } + + @Override + public Iterable<UV> values() { + return new Iterable<UV>() { + @Override + public Iterator<UV> iterator() { + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) { + @Override + public UV next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + } + }; + } + }; + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + final Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + // Return null to make the behavior consistent with other states. + if (!iterator.hasNext()) { + return null; + } else { + return new Iterable<Map.Entry<UK, UV>>() { + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + } + }; + } + } + + @Override + public void add(Map<UK, UV> map) throws Exception { + if (map == null) { + return; + } + + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + } + } + + @Override + public void clear() { — End diff – I think that doesn't work be because the iterator doesn't iterator over the whole column family, just over the entries of the current state key, which can have several user-key/value pairs.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102033285

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java —
          @@ -834,7 +836,7 @@ private void restoreKVStateData() throws IOException, RocksDBException {
          }

          @Override

          • protected <N, T> InternalValueState<N, T> createValueState(
            + public <N, T> InternalValueState<N, T> createValueState(
              • End diff –

          I think these should stay `private`/`protected`. It was mistake to make them `public` in the other state backends. Creating a state can always be achieved with `getPartitionedState()` and a `StateDescriptor`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102033285 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java — @@ -834,7 +836,7 @@ private void restoreKVStateData() throws IOException, RocksDBException { } @Override protected <N, T> InternalValueState<N, T> createValueState( + public <N, T> InternalValueState<N, T> createValueState( End diff – I think these should stay `private`/`protected`. It was mistake to make them `public` in the other state backends. Creating a state can always be achieved with `getPartitionedState()` and a `StateDescriptor`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101999084

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          — End diff –

          As the iterable is stateless, you could have one singleton object implement this to avoid some object creation. Same holds for the method below. Same case is also in line 210.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101999084 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { — End diff – As the iterable is stateless, you could have one singleton object implement this to avoid some object creation. Same holds for the method below. Same case is also in line 210.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102022926

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          + @Override
          + public Iterator<UK> iterator() {
          + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UK next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + return new Iterable<UV>() {
          + @Override
          + public Iterator<UV> iterator() {
          + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UV next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) {
          + @Override
          + public Map.Entry<UK, UV> next()

          { + return nextEntry(); + }
          + };
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + final Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + // Return null to make the behavior consistent with other states.
          + if (!iterator.hasNext()) { + return null; + } else {
          + return new Iterable<Map.Entry<UK, UV>>() {
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + }
          + };
          + }
          + }
          +
          + @Override
          + public void add(Map<UK, UV> map) throws Exception {
          + if (map == null) { + return; + }
          +
          + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + }
          + }
          +
          + @Override
          + public void clear() {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + }
          + }
          +
          + @Override
          + @SuppressWarnings("unchecked")
          + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception {
          + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace");
          +
          + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation
          + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace(
          + serializedKeyAndNamespace,
          + backend.getKeySerializer(),
          + namespaceSerializer);
          +
          + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups());
          +
          + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128);
          + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream);
          +
          + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView);
          + byte[] keyPrefixBytes = outputStream.toByteArray();
          +
          + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) {
          + @Override
          + public Map.Entry<UK, UV> next() { + return nextEntry(); + }

          + };
          +
          + // Return null to make the behavior consistent
          + if (!iterator.hasNext())

          { + return null; + }

          +
          + outputStream.reset();
          +
          + while (iterator.hasNext())

          { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + }

          +
          + return outputStream.toByteArray();
          + }
          +
          + // ------------------------------------------------------------------------
          + // Serialization Methods
          + // ------------------------------------------------------------------------
          +
          + private byte[] serializeCurrentKey() {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the current key."); + }

          + }
          +
          + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user key.", e); + }
          + }
          +
          + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) {
          + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + }

          + }
          +
          + private byte[] serializeUserValue(UV userValue) {
          + try

          { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user value.", e); + }

          + }
          +
          + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) {
          + try {
          + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes);
          + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais);
          +
          + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in);
          — End diff –

          Similar savings can be done with `readKeyWithGroupAndNamespace(...)`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102022926 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { + @Override + public Iterator<UK> iterator() { + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) { + @Override + public UK next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + } + }; + } + }; + } + + @Override + public Iterable<UV> values() { + return new Iterable<UV>() { + @Override + public Iterator<UV> iterator() { + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) { + @Override + public UV next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + } + }; + } + }; + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + final Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + // Return null to make the behavior consistent with other states. + if (!iterator.hasNext()) { + return null; + } else { + return new Iterable<Map.Entry<UK, UV>>() { + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + } + }; + } + } + + @Override + public void add(Map<UK, UV> map) throws Exception { + if (map == null) { + return; + } + + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + } + } + + @Override + public void clear() { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + } + } + + @Override + @SuppressWarnings("unchecked") + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception { + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace"); + + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace( + serializedKeyAndNamespace, + backend.getKeySerializer(), + namespaceSerializer); + + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups()); + + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128); + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream); + + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView); + byte[] keyPrefixBytes = outputStream.toByteArray(); + + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + + // Return null to make the behavior consistent + if (!iterator.hasNext()) { + return null; + } + + outputStream.reset(); + + while (iterator.hasNext()) { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + } + + return outputStream.toByteArray(); + } + + // ------------------------------------------------------------------------ + // Serialization Methods + // ------------------------------------------------------------------------ + + private byte[] serializeCurrentKey() { + try { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the current key."); + } + } + + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) { + try { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) { + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserValue(UV userValue) { + try { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user value.", e); + } + } + + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) { + try { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in); — End diff – Similar savings can be done with `readKeyWithGroupAndNamespace(...)`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101994880

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/HashMapSerializer.java —
          @@ -0,0 +1,179 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.runtime.state;
          +
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.core.memory.DataInputView;
          +import org.apache.flink.core.memory.DataOutputView;
          +import org.apache.flink.util.Preconditions;
          +
          +import java.io.IOException;
          +import java.util.HashMap;
          +import java.util.List;
          +import java.util.Map;
          +
          +/**
          + * A serializer for

          {@link List Lists}

          . The serializer relies on an element serializer
          — End diff –

          Small mistake from copy-paste in the JavaDoc.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101994880 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/HashMapSerializer.java — @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.state; + +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.core.memory.DataInputView; +import org.apache.flink.core.memory.DataOutputView; +import org.apache.flink.util.Preconditions; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * A serializer for {@link List Lists} . The serializer relies on an element serializer — End diff – Small mistake from copy-paste in the JavaDoc.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101990286

          — Diff: flink-core/src/main/java/org/apache/flink/api/common/typeutils/base/MapSerializer.java —
          @@ -0,0 +1,179 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.common.typeutils.base;
          +
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.core.memory.DataInputView;
          +import org.apache.flink.core.memory.DataOutputView;
          +import org.apache.flink.util.Preconditions;
          +
          +import java.io.IOException;
          +import java.util.HashMap;
          +import java.util.List;
          +import java.util.Map;
          +
          +/**
          + * A serializer for

          {@link List Lists}

          . The serializer relies on an element serializer
          + * for teh serialization of the list's elements.
          + *
          + * <p>The serialization format for the list is as follows: four bytes for the length of the lost,
          + * followed by the serialized representation of each element.
          + *
          + * @param <K> The type of the keys in the map.
          + * @param <V> The type of the values in the map.
          + */
          +public class MapSerializer<K, V> extends TypeSerializer<Map<K, V>> {
          — End diff –

          You should also add a test for the serializer itself. Testing all the functions comes almost for free by extending``SerializerTestBase``.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101990286 — Diff: flink-core/src/main/java/org/apache/flink/api/common/typeutils/base/MapSerializer.java — @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common.typeutils.base; + +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.core.memory.DataInputView; +import org.apache.flink.core.memory.DataOutputView; +import org.apache.flink.util.Preconditions; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * A serializer for {@link List Lists} . The serializer relies on an element serializer + * for teh serialization of the list's elements. + * + * <p>The serialization format for the list is as follows: four bytes for the length of the lost, + * followed by the serialized representation of each element. + * + * @param <K> The type of the keys in the map. + * @param <V> The type of the values in the map. + */ +public class MapSerializer<K, V> extends TypeSerializer<Map<K, V>> { — End diff – You should also add a test for the serializer itself. Testing all the functions comes almost for free by extending``SerializerTestBase``.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102016693

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          + @Override
          + public Iterator<UK> iterator() {
          + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UK next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + return new Iterable<UV>() {
          + @Override
          + public Iterator<UV> iterator() {
          + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UV next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) {
          + @Override
          + public Map.Entry<UK, UV> next()

          { + return nextEntry(); + }
          + };
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + final Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + // Return null to make the behavior consistent with other states.
          + if (!iterator.hasNext()) { + return null; + } else {
          + return new Iterable<Map.Entry<UK, UV>>() {
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + }
          + };
          + }
          + }
          +
          + @Override
          + public void add(Map<UK, UV> map) throws Exception {
          + if (map == null) { + return; + }
          +
          + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + }
          + }
          +
          + @Override
          + public void clear() {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + }
          + }
          +
          + @Override
          + @SuppressWarnings("unchecked")
          + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception {
          + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace");
          +
          + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation
          + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace(
          + serializedKeyAndNamespace,
          + backend.getKeySerializer(),
          + namespaceSerializer);
          +
          + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups());
          +
          + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128);
          + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream);
          +
          + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView);
          + byte[] keyPrefixBytes = outputStream.toByteArray();
          +
          + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) {
          + @Override
          + public Map.Entry<UK, UV> next() { + return nextEntry(); + }

          + };
          +
          + // Return null to make the behavior consistent
          + if (!iterator.hasNext())

          { + return null; + }

          +
          + outputStream.reset();
          +
          + while (iterator.hasNext())

          { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + }

          +
          + return outputStream.toByteArray();
          + }
          +
          + // ------------------------------------------------------------------------
          + // Serialization Methods
          + // ------------------------------------------------------------------------
          +
          + private byte[] serializeCurrentKey() {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the current key."); + }

          + }
          +
          + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user key.", e); + }
          + }
          +
          + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) {
          + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + }

          + }
          +
          + private byte[] serializeUserValue(UV userValue) {
          + try

          { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user value.", e); + }

          + }
          +
          + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) {
          + try

          { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in); + UK userKey = userKeySerializer.deserialize(in); + + return new Tuple4<>(keyAndNamespace.f0, keyAndNamespace.f1, keyAndNamespace.f2, userKey); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while deserializing the user key.", e); + }

          + }
          +
          + private UV deserializeUserValue(byte[] rawValueBytes) {
          + try

          { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawValueBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + return userValueSerializer.deserialize(in); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while deserializing the user value.", e); + }

          + }
          +
          + // ------------------------------------------------------------------------
          + // Internal Classes
          + // ------------------------------------------------------------------------
          +
          + /** A map entry in RocksDBMapState */
          + private class RocksDBMapEntry implements Map.Entry<UK, UV> {
          + private final RocksDB db;
          +
          + /** The raw bytes of the key stored in RocksDB. Each user key is stored in RocksDB
          + * with the format #KeyGroup#Key#Namespace#UserKey. */
          + private final byte[] rawKeyBytes;
          +
          + /** The raw bytes of the value stored in RocksDB */
          + private final byte[] rawValueBytes;
          +
          + /** True if the entry has been deleted. */
          + private boolean deleted;
          +
          + /** The user key and value. The deserialization is performed lazily, i.e. the key
          + * and the value is deserialized only when they are accessed. */
          + private UK userKey = null;
          + private UV userValue = null;
          +
          + RocksDBMapEntry(final RocksDB db, final byte[] rawKeyBytes, final byte[] rawValueBytes)

          { + this.db = db; + + this.rawKeyBytes = rawKeyBytes; + this.rawValueBytes = rawValueBytes; + this.deleted = false; + }

          +
          + @Override
          + public UK getKey() {
          + if (userKey == null)

          { + userKey = deserializeUserKey(rawKeyBytes).f3; + }

          +
          + return userKey;
          + }
          +
          + @Override
          + public UV getValue() {
          + if (deleted)

          { + return null; + }

          else {
          + if (userValue == null)

          { + userValue = deserializeUserValue(rawValueBytes); + }

          +
          + return userValue;
          + }
          + }
          +
          + @Override
          + public UV setValue(UV value) {
          + if (deleted)

          { + throw new IllegalStateException("The value has already been deleted."); + }

          +
          + UV oldValue = getValue();
          +
          + if (value == null) {
          + deleted = true;
          +
          + try

          { + db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + } else {
          + userValue = value;
          +
          + byte[] rawValueBytes = serializeUserValue(value);
          + try

          { + db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB.", e); + }

          + }
          +
          + return oldValue;
          + }
          + }
          +
          + /** An auxiliary utility to scan all entries under the given key. */
          + private abstract class RocksDBMapIterator<T> implements Iterator<T> {
          +
          + final static int CACHE_SIZE_BASE = 1;
          + final static int CACHE_SIZE_LIMIT = 128;
          +
          + /** The db where data resides. */
          + private final RocksDB db;
          +
          + /**
          + * The prefix bytes of the key being accessed. All entries under the same key
          + * has the same prefix, hence we can stop the iterating once coming across an
          + * entry with a different prefix.
          + */
          + private final byte[] keyPrefixBytes;
          +
          + /**
          + * True if all entries have been accessed or the iterator has come across an
          + * entry with a different prefix.
          + */
          + private boolean expired = false;
          +
          + /** A in-memory cache for the entries in the rocksdb. */
          + private ArrayList<RocksDBMapEntry> cacheEntries = new ArrayList<>();
          + private int cacheIndex = 0;
          +
          +
          + RocksDBMapIterator(final RocksDB db, final byte[] keyPrefixBytes)

          { + this.db = db; + this.keyPrefixBytes = keyPrefixBytes; + }

          +
          + @Override
          + public boolean hasNext()

          { + loadCache(); + + return (cacheIndex < cacheEntries.size()); + }

          +
          + @Override
          + public void remove() {
          + if (cacheIndex == 0 || cacheIndex > cacheEntries.size())

          { + throw new IllegalStateException(); + }
          +
          + RocksDBMapEntry lastEntry = cacheEntries.get(cacheIndex - 1);
          +
          + try { + db.remove(columnFamily, writeOptions, lastEntry.rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }
          +
          + lastEntry.deleted = true;
          + }
          +
          + final RocksDBMapEntry nextEntry() {
          + loadCache();
          +
          + if (cacheIndex == cacheEntries.size()) {
          + if (!expired) { + throw new IllegalStateException(); + }
          +
          + return null;
          + }
          +
          + RocksDBMapEntry entry = cacheEntries.get(cacheIndex);
          + cacheIndex++;
          +
          + return entry;
          + }
          +
          + private void loadCache() {
          + if (cacheIndex > cacheEntries.size()) { + throw new IllegalStateException(); + }

          +
          + // Load cache entries only when the cache is empty and there still exist unread entries
          + if (cacheIndex < cacheEntries.size() || expired)

          { + return; + }

          +
          + RocksIterator iterator = db.newIterator(columnFamily);
          — End diff –

          I suggest using try-with-resources here, so that `iterator` is guaranteed to be closed.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102016693 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { + @Override + public Iterator<UK> iterator() { + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) { + @Override + public UK next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + } + }; + } + }; + } + + @Override + public Iterable<UV> values() { + return new Iterable<UV>() { + @Override + public Iterator<UV> iterator() { + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) { + @Override + public UV next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + } + }; + } + }; + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + final Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + // Return null to make the behavior consistent with other states. + if (!iterator.hasNext()) { + return null; + } else { + return new Iterable<Map.Entry<UK, UV>>() { + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + } + }; + } + } + + @Override + public void add(Map<UK, UV> map) throws Exception { + if (map == null) { + return; + } + + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + } + } + + @Override + public void clear() { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + } + } + + @Override + @SuppressWarnings("unchecked") + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception { + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace"); + + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace( + serializedKeyAndNamespace, + backend.getKeySerializer(), + namespaceSerializer); + + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups()); + + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128); + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream); + + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView); + byte[] keyPrefixBytes = outputStream.toByteArray(); + + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + + // Return null to make the behavior consistent + if (!iterator.hasNext()) { + return null; + } + + outputStream.reset(); + + while (iterator.hasNext()) { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + } + + return outputStream.toByteArray(); + } + + // ------------------------------------------------------------------------ + // Serialization Methods + // ------------------------------------------------------------------------ + + private byte[] serializeCurrentKey() { + try { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the current key."); + } + } + + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) { + try { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) { + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserValue(UV userValue) { + try { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user value.", e); + } + } + + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) { + try { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in); + UK userKey = userKeySerializer.deserialize(in); + + return new Tuple4<>(keyAndNamespace.f0, keyAndNamespace.f1, keyAndNamespace.f2, userKey); + } catch (IOException e) { + throw new RuntimeException("Error while deserializing the user key.", e); + } + } + + private UV deserializeUserValue(byte[] rawValueBytes) { + try { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawValueBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + return userValueSerializer.deserialize(in); + } catch (IOException e) { + throw new RuntimeException("Error while deserializing the user value.", e); + } + } + + // ------------------------------------------------------------------------ + // Internal Classes + // ------------------------------------------------------------------------ + + /** A map entry in RocksDBMapState */ + private class RocksDBMapEntry implements Map.Entry<UK, UV> { + private final RocksDB db; + + /** The raw bytes of the key stored in RocksDB. Each user key is stored in RocksDB + * with the format #KeyGroup#Key#Namespace#UserKey. */ + private final byte[] rawKeyBytes; + + /** The raw bytes of the value stored in RocksDB */ + private final byte[] rawValueBytes; + + /** True if the entry has been deleted. */ + private boolean deleted; + + /** The user key and value. The deserialization is performed lazily, i.e. the key + * and the value is deserialized only when they are accessed. */ + private UK userKey = null; + private UV userValue = null; + + RocksDBMapEntry(final RocksDB db, final byte[] rawKeyBytes, final byte[] rawValueBytes) { + this.db = db; + + this.rawKeyBytes = rawKeyBytes; + this.rawValueBytes = rawValueBytes; + this.deleted = false; + } + + @Override + public UK getKey() { + if (userKey == null) { + userKey = deserializeUserKey(rawKeyBytes).f3; + } + + return userKey; + } + + @Override + public UV getValue() { + if (deleted) { + return null; + } else { + if (userValue == null) { + userValue = deserializeUserValue(rawValueBytes); + } + + return userValue; + } + } + + @Override + public UV setValue(UV value) { + if (deleted) { + throw new IllegalStateException("The value has already been deleted."); + } + + UV oldValue = getValue(); + + if (value == null) { + deleted = true; + + try { + db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } else { + userValue = value; + + byte[] rawValueBytes = serializeUserValue(value); + try { + db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB.", e); + } + } + + return oldValue; + } + } + + /** An auxiliary utility to scan all entries under the given key. */ + private abstract class RocksDBMapIterator<T> implements Iterator<T> { + + final static int CACHE_SIZE_BASE = 1; + final static int CACHE_SIZE_LIMIT = 128; + + /** The db where data resides. */ + private final RocksDB db; + + /** + * The prefix bytes of the key being accessed. All entries under the same key + * has the same prefix, hence we can stop the iterating once coming across an + * entry with a different prefix. + */ + private final byte[] keyPrefixBytes; + + /** + * True if all entries have been accessed or the iterator has come across an + * entry with a different prefix. + */ + private boolean expired = false; + + /** A in-memory cache for the entries in the rocksdb. */ + private ArrayList<RocksDBMapEntry> cacheEntries = new ArrayList<>(); + private int cacheIndex = 0; + + + RocksDBMapIterator(final RocksDB db, final byte[] keyPrefixBytes) { + this.db = db; + this.keyPrefixBytes = keyPrefixBytes; + } + + @Override + public boolean hasNext() { + loadCache(); + + return (cacheIndex < cacheEntries.size()); + } + + @Override + public void remove() { + if (cacheIndex == 0 || cacheIndex > cacheEntries.size()) { + throw new IllegalStateException(); + } + + RocksDBMapEntry lastEntry = cacheEntries.get(cacheIndex - 1); + + try { + db.remove(columnFamily, writeOptions, lastEntry.rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + + lastEntry.deleted = true; + } + + final RocksDBMapEntry nextEntry() { + loadCache(); + + if (cacheIndex == cacheEntries.size()) { + if (!expired) { + throw new IllegalStateException(); + } + + return null; + } + + RocksDBMapEntry entry = cacheEntries.get(cacheIndex); + cacheIndex++; + + return entry; + } + + private void loadCache() { + if (cacheIndex > cacheEntries.size()) { + throw new IllegalStateException(); + } + + // Load cache entries only when the cache is empty and there still exist unread entries + if (cacheIndex < cacheEntries.size() || expired) { + return; + } + + RocksIterator iterator = db.newIterator(columnFamily); — End diff – I suggest using try-with-resources here, so that `iterator` is guaranteed to be closed.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101994211

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapMapState.java —
          @@ -0,0 +1,321 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.runtime.state.heap;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.KeyedStateBackend;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +
          +import java.io.ByteArrayOutputStream;
          +import java.io.IOException;
          +import java.util.HashMap;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + * Heap-backed partitioned

          {@link MapState}

          that is snapshotted into files.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the state.
          + * @param <UV> The type of the values in the state.
          + */
          +public class HeapMapState<K, N, UK, UV>
          + extends AbstractHeapState<K, N, HashMap<UK, UV>, MapState<UK, UV>, MapStateDescriptor<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /**
          + * Creates a new key/value state for the given hash map of key/value pairs.
          + *
          + * @param backend The state backend backing that created this state.
          + * @param stateDesc The state identifier for the state. This contains name
          + * and can create a default state value.
          + * @param stateTable The state tab;e to use in this kev/value state. May contain initial state.
          + */
          + public HeapMapState(KeyedStateBackend<K> backend,
          + MapStateDescriptor<UK, UV> stateDesc,
          + StateTable<K, N, HashMap<UK, UV>> stateTable,
          + TypeSerializer<K> keySerializer,
          + TypeSerializer<N> namespaceSerializer)

          { + super(backend, stateDesc, stateTable, keySerializer, namespaceSerializer); + }

          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null)

          { + return null; + }
          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null) { + return null; + }

          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey());
          + if (userMap == null)

          { + return null; + }
          +
          + return userMap.get(userKey);
          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null) { + namespaceMap = createNewMap(); + stateTable.set(backend.getCurrentKeyGroupIndex(), namespaceMap); + }
          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null) { + keyedMap = createNewMap(); + namespaceMap.put(currentNamespace, keyedMap); + }
          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.getCurrentKey());
          + if (userMap == null) { + userMap = new HashMap<>(); + keyedMap.put(backend.getCurrentKey(), userMap); + }
          +
          + userMap.put(userKey, userValue);
          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null) { + return; + }
          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null) { + return; + }
          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.getCurrentKey());
          + if (userMap == null) { + return; + }
          +
          + userMap.remove(userKey);
          +
          + if (userMap.isEmpty()) { + clear(); + }
          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null) { + return false; + }
          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null) { + return false; + }
          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey());
          +
          + return userMap != null && userMap.containsKey(userKey);
          + }
          +
          + @Override
          + public int size() throws IOException {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null) { + return 0; + }
          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null) { + return 0; + }
          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey());
          +
          + return userMap == null ? 0 : userMap.size();
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null) { + return null; + }

          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null)

          { + return null; + }
          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey());
          +
          + return userMap == null ? null : userMap.keySet();
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null) { + return null; + }

          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null)

          { + return null; + }
          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey());
          +
          + return userMap == null ? null : userMap.values();
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null) { + return null; + }

          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null)

          { + return null; + }
          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey());
          +
          + return userMap == null ? null : userMap.entrySet().iterator();
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null) { + return null; + }

          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null)

          { + return null; + }
          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey());
          +
          + return userMap == null ? null : userMap.entrySet();
          + }
          +
          + @Override
          + public void add(Map<UK, UV> value) throws Exception {
          + Preconditions.checkState(currentNamespace != null, "No namespace set.");
          + Preconditions.checkState(backend.getCurrentKey() != null, "No key set.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex());
          + if (namespaceMap == null) { + namespaceMap = createNewMap(); + stateTable.set(backend.getCurrentKeyGroupIndex(), namespaceMap); + }
          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null) { + keyedMap = createNewMap(); + namespaceMap.put(currentNamespace, keyedMap); + }
          +
          + HashMap<UK, UV> userMap = keyedMap.get(backend.getCurrentKey());
          + if (userMap == null) { + userMap = new HashMap<>(); + keyedMap.put(backend.getCurrentKey(), userMap); + }
          +
          + userMap.putAll(value);
          + }
          +
          + @Override
          + public byte[] getSerializedValue(K key, N namespace) throws Exception {
          + Preconditions.checkState(namespace != null, "No namespace given.");
          + Preconditions.checkState(key != null, "No key given.");
          +
          + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(KeyGroupRangeAssignment.assignToKeyGroup(key, backend.getNumberOfKeyGroups()));
          +
          + if (namespaceMap == null) { + return null; + }

          +
          + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace);
          + if (keyedMap == null)

          { + return null; + }
          +
          + HashMap<UK, UV> result = keyedMap.get(key);
          + if (result == null) { + return null; + }

          +
          + TypeSerializer<UK> userKeySerializer = stateDesc.getKeySerializer();
          + TypeSerializer<UV> userValueSerializer = stateDesc.getValueSerializer();
          +
          + ByteArrayOutputStream baos = new ByteArrayOutputStream();
          — End diff –

          You could use `ByteArrayOutputStreamWithPos` here. It is basically the same as `ByteArrayOutputStream`, but can give a position and is not synchronized on every method.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101994211 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapMapState.java — @@ -0,0 +1,321 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.state.heap; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.KeyedStateBackend; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.HashMap; +import java.util.Iterator; +import java.util.Map; + +/** + * Heap-backed partitioned {@link MapState} that is snapshotted into files. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the state. + * @param <UV> The type of the values in the state. + */ +public class HeapMapState<K, N, UK, UV> + extends AbstractHeapState<K, N, HashMap<UK, UV>, MapState<UK, UV>, MapStateDescriptor<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** + * Creates a new key/value state for the given hash map of key/value pairs. + * + * @param backend The state backend backing that created this state. + * @param stateDesc The state identifier for the state. This contains name + * and can create a default state value. + * @param stateTable The state tab;e to use in this kev/value state. May contain initial state. + */ + public HeapMapState(KeyedStateBackend<K> backend, + MapStateDescriptor<UK, UV> stateDesc, + StateTable<K, N, HashMap<UK, UV>> stateTable, + TypeSerializer<K> keySerializer, + TypeSerializer<N> namespaceSerializer) { + super(backend, stateDesc, stateTable, keySerializer, namespaceSerializer); + } + + @Override + public UV get(UK userKey) throws IOException { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + return null; + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + return null; + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey()); + if (userMap == null) { + return null; + } + + return userMap.get(userKey); + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + namespaceMap = createNewMap(); + stateTable.set(backend.getCurrentKeyGroupIndex(), namespaceMap); + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + keyedMap = createNewMap(); + namespaceMap.put(currentNamespace, keyedMap); + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.getCurrentKey()); + if (userMap == null) { + userMap = new HashMap<>(); + keyedMap.put(backend.getCurrentKey(), userMap); + } + + userMap.put(userKey, userValue); + } + + @Override + public void remove(UK userKey) throws IOException { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + return; + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + return; + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.getCurrentKey()); + if (userMap == null) { + return; + } + + userMap.remove(userKey); + + if (userMap.isEmpty()) { + clear(); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + return false; + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + return false; + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey()); + + return userMap != null && userMap.containsKey(userKey); + } + + @Override + public int size() throws IOException { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + return 0; + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + return 0; + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey()); + + return userMap == null ? 0 : userMap.size(); + } + + @Override + public Iterable<UK> keys() { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + return null; + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + return null; + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey()); + + return userMap == null ? null : userMap.keySet(); + } + + @Override + public Iterable<UV> values() { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + return null; + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + return null; + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey()); + + return userMap == null ? null : userMap.values(); + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + return null; + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + return null; + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey()); + + return userMap == null ? null : userMap.entrySet().iterator(); + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + return null; + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + return null; + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.<K>getCurrentKey()); + + return userMap == null ? null : userMap.entrySet(); + } + + @Override + public void add(Map<UK, UV> value) throws Exception { + Preconditions.checkState(currentNamespace != null, "No namespace set."); + Preconditions.checkState(backend.getCurrentKey() != null, "No key set."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(backend.getCurrentKeyGroupIndex()); + if (namespaceMap == null) { + namespaceMap = createNewMap(); + stateTable.set(backend.getCurrentKeyGroupIndex(), namespaceMap); + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + keyedMap = createNewMap(); + namespaceMap.put(currentNamespace, keyedMap); + } + + HashMap<UK, UV> userMap = keyedMap.get(backend.getCurrentKey()); + if (userMap == null) { + userMap = new HashMap<>(); + keyedMap.put(backend.getCurrentKey(), userMap); + } + + userMap.putAll(value); + } + + @Override + public byte[] getSerializedValue(K key, N namespace) throws Exception { + Preconditions.checkState(namespace != null, "No namespace given."); + Preconditions.checkState(key != null, "No key given."); + + Map<N, Map<K, HashMap<UK, UV>>> namespaceMap = stateTable.get(KeyGroupRangeAssignment.assignToKeyGroup(key, backend.getNumberOfKeyGroups())); + + if (namespaceMap == null) { + return null; + } + + Map<K, HashMap<UK, UV>> keyedMap = namespaceMap.get(currentNamespace); + if (keyedMap == null) { + return null; + } + + HashMap<UK, UV> result = keyedMap.get(key); + if (result == null) { + return null; + } + + TypeSerializer<UK> userKeySerializer = stateDesc.getKeySerializer(); + TypeSerializer<UV> userValueSerializer = stateDesc.getValueSerializer(); + + ByteArrayOutputStream baos = new ByteArrayOutputStream(); — End diff – You could use `ByteArrayOutputStreamWithPos` here. It is basically the same as `ByteArrayOutputStream`, but can give a position and is not synchronized on every method.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101994611

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/HashMapSerializer.java —
          @@ -0,0 +1,179 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.runtime.state;
          +
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.core.memory.DataInputView;
          +import org.apache.flink.core.memory.DataOutputView;
          +import org.apache.flink.util.Preconditions;
          +
          +import java.io.IOException;
          +import java.util.HashMap;
          +import java.util.List;
          +import java.util.Map;
          +
          +/**
          + * A serializer for

          {@link List Lists}

          . The serializer relies on an element serializer
          + * for teh serialization of the list's elements.
          + *
          + * <p>The serialization format for the list is as follows: four bytes for the length of the lost,
          + * followed by the serialized representation of each element.
          + *
          + * @param <K> The type of the keys in the map.
          + * @param <V> The type of the values in the map.
          + */
          +public class HashMapSerializer<K, V> extends TypeSerializer<HashMap<K, V>> {
          — End diff –

          This could also be tested by extending `SerializerTestBase`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101994611 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/HashMapSerializer.java — @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.state; + +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.core.memory.DataInputView; +import org.apache.flink.core.memory.DataOutputView; +import org.apache.flink.util.Preconditions; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * A serializer for {@link List Lists} . The serializer relies on an element serializer + * for teh serialization of the list's elements. + * + * <p>The serialization format for the list is as follows: four bytes for the length of the lost, + * followed by the serialized representation of each element. + * + * @param <K> The type of the keys in the map. + * @param <V> The type of the values in the map. + */ +public class HashMapSerializer<K, V> extends TypeSerializer<HashMap<K, V>> { — End diff – This could also be tested by extending `SerializerTestBase`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102020837

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null) {
          + remove(userKey);
          — End diff –

          I suggest that you add a test that verifies that `null` cannot be used for user keys, but works correctly for user values and also that all iterator work with `null` values.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102020837 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); — End diff – I suggest that you add a test that verifies that `null` cannot be used for user keys, but works correctly for user values and also that all iterator work with `null` values.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102020386

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null) {
          + remove(userKey);
          — End diff –

          Is there a special reason, `remove` is called for `null`? Afterwards, the method continues and serializes `null` to bytes and overrides it anyways.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102020386 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); — End diff – Is there a special reason, `remove` is called for `null`? Afterwards, the method continues and serializes `null` to bytes and overrides it anyways.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101992577

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializer.java —
          @@ -484,6 +487,71 @@ public static Throwable deserializeServerFailure(ByteBuf buf) throws IOException
          return null;
          }
          }
          +
          + /**
          + * Serializes all values of the Iterable with the given serializer.
          + *
          + * @param entries Key-value pairs to serialize
          + * @param keySerializer Serializer for UK
          + * @param valueSerializer Serializer for UV
          + * @param <UK> Type of the keys
          + * @param <UV> Type of the values
          + * @return Serialized values or <code>null</code> if values <code>null</code> or empty
          + * @throws IOException On failure during serialization
          + */
          + public static <UK, UV> byte[] serializeMap(Iterable<Map.Entry<UK, UV>> entries, TypeSerializer<UK> keySerializer, TypeSerializer<UV> valueSerializer) throws IOException {
          + if (entries != null) {
          + Iterator<Map.Entry<UK, UV>> it = entries.iterator();
          +
          + if (it.hasNext()) {
          + // Serialize
          + DataOutputSerializer dos = new DataOutputSerializer(32);
          +
          + while (it.hasNext())

          { + Map.Entry<UK, UV> entry = it.next(); + + keySerializer.serialize(entry.getKey(), dos); + valueSerializer.serialize(entry.getValue(), dos); + }

          +
          + return dos.getCopyOfBuffer();
          + } else {
          + return null;
          — End diff –

          I wonder if null and and empty map should considered the same. From the other code in the class, I think we should return an empty byte[] here. Then, you could also simply use a for-each loop over the iterable to make the code shorter.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101992577 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializer.java — @@ -484,6 +487,71 @@ public static Throwable deserializeServerFailure(ByteBuf buf) throws IOException return null; } } + + /** + * Serializes all values of the Iterable with the given serializer. + * + * @param entries Key-value pairs to serialize + * @param keySerializer Serializer for UK + * @param valueSerializer Serializer for UV + * @param <UK> Type of the keys + * @param <UV> Type of the values + * @return Serialized values or <code>null</code> if values <code>null</code> or empty + * @throws IOException On failure during serialization + */ + public static <UK, UV> byte[] serializeMap(Iterable<Map.Entry<UK, UV>> entries, TypeSerializer<UK> keySerializer, TypeSerializer<UV> valueSerializer) throws IOException { + if (entries != null) { + Iterator<Map.Entry<UK, UV>> it = entries.iterator(); + + if (it.hasNext()) { + // Serialize + DataOutputSerializer dos = new DataOutputSerializer(32); + + while (it.hasNext()) { + Map.Entry<UK, UV> entry = it.next(); + + keySerializer.serialize(entry.getKey(), dos); + valueSerializer.serialize(entry.getValue(), dos); + } + + return dos.getCopyOfBuffer(); + } else { + return null; — End diff – I wonder if null and and empty map should considered the same. From the other code in the class, I think we should return an empty byte[] here. Then, you could also simply use a for-each loop over the iterable to make the code shorter.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102018529

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e) {
          + throw new RuntimeException("Error while getting data from RocksDB.", e);
          — End diff –

          Again, I suggest to use a more specific exception type over runtime exception. There are a couple more cases in this class.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102018529 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); — End diff – Again, I suggest to use a more specific exception type over runtime exception. There are a couple more cases in this class.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102018403

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/DefaultKeyedStateStore.java —
          @@ -93,6 +95,18 @@ public DefaultKeyedStateStore(KeyedStateBackend<?> keyedStateBackend, ExecutionC
          }
          }

          + @Override
          + public <UK, UV> MapState<UK, UV> getMapState(MapStateDescriptor<UK, UV> stateProperties) {
          + requireNonNull(stateProperties, "The state properties must not be null");
          + try

          { + stateProperties.initializeSerializerUnlessSet(executionConfig); + MapState<UK, UV> originalState = getPartitionedState(stateProperties); + return new UserFacingMapState<>(originalState); + }

          catch (Exception e) {
          + throw new RuntimeException("Error while getting state", e);
          — End diff –

          I suggest to use a more specific exception type, runtime exception is very generic.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102018403 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/DefaultKeyedStateStore.java — @@ -93,6 +95,18 @@ public DefaultKeyedStateStore(KeyedStateBackend<?> keyedStateBackend, ExecutionC } } + @Override + public <UK, UV> MapState<UK, UV> getMapState(MapStateDescriptor<UK, UV> stateProperties) { + requireNonNull(stateProperties, "The state properties must not be null"); + try { + stateProperties.initializeSerializerUnlessSet(executionConfig); + MapState<UK, UV> originalState = getPartitionedState(stateProperties); + return new UserFacingMapState<>(originalState); + } catch (Exception e) { + throw new RuntimeException("Error while getting state", e); — End diff – I suggest to use a more specific exception type, runtime exception is very generic.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102023644

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          + @Override
          + public Iterator<UK> iterator() {
          + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UK next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + return new Iterable<UV>() {
          + @Override
          + public Iterator<UV> iterator() {
          + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UV next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) {
          + @Override
          + public Map.Entry<UK, UV> next()

          { + return nextEntry(); + }

          + };
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + final Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + // Return null to make the behavior consistent with other states.
          + if (!iterator.hasNext())

          { + return null; + }

          else {
          + return new Iterable<Map.Entry<UK, UV>>() {
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator()

          { + return iterator; + }

          + };
          + }
          + }
          +
          + @Override
          + public void add(Map<UK, UV> map) throws Exception {
          + if (map == null)

          { + return; + }

          +
          + for (Map.Entry<UK, UV> entry : map.entrySet())

          { + put(entry.getKey(), entry.getValue()); + }

          + }
          +
          + @Override
          + public void clear() {
          — End diff –

          I wonder if it would be more efficient to just drop and recreate the column family?

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102023644 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { + @Override + public Iterator<UK> iterator() { + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) { + @Override + public UK next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + } + }; + } + }; + } + + @Override + public Iterable<UV> values() { + return new Iterable<UV>() { + @Override + public Iterator<UV> iterator() { + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) { + @Override + public UV next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + } + }; + } + }; + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + final Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + // Return null to make the behavior consistent with other states. + if (!iterator.hasNext()) { + return null; + } else { + return new Iterable<Map.Entry<UK, UV>>() { + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + } + }; + } + } + + @Override + public void add(Map<UK, UV> map) throws Exception { + if (map == null) { + return; + } + + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + } + } + + @Override + public void clear() { — End diff – I wonder if it would be more efficient to just drop and recreate the column family?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102019357

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          + @Override
          + public Iterator<UK> iterator() {
          + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UK next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + return new Iterable<UV>() {
          + @Override
          + public Iterator<UV> iterator() {
          + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UV next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) {
          + @Override
          + public Map.Entry<UK, UV> next()

          { + return nextEntry(); + }
          + };
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + final Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + // Return null to make the behavior consistent with other states.
          + if (!iterator.hasNext()) { + return null; + } else {
          + return new Iterable<Map.Entry<UK, UV>>() {
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + }
          + };
          + }
          + }
          +
          + @Override
          + public void add(Map<UK, UV> map) throws Exception {
          + if (map == null) { + return; + }
          +
          + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + }
          + }
          +
          + @Override
          + public void clear() {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + }
          + }
          +
          + @Override
          + @SuppressWarnings("unchecked")
          + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception {
          + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace");
          +
          + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation
          + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace(
          + serializedKeyAndNamespace,
          + backend.getKeySerializer(),
          + namespaceSerializer);
          +
          + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups());
          +
          + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128);
          + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream);
          +
          + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView);
          + byte[] keyPrefixBytes = outputStream.toByteArray();
          +
          + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) {
          + @Override
          + public Map.Entry<UK, UV> next() { + return nextEntry(); + }

          + };
          +
          + // Return null to make the behavior consistent
          + if (!iterator.hasNext())

          { + return null; + }

          +
          + outputStream.reset();
          +
          + while (iterator.hasNext())

          { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + }

          +
          + return outputStream.toByteArray();
          + }
          +
          + // ------------------------------------------------------------------------
          + // Serialization Methods
          + // ------------------------------------------------------------------------
          +
          + private byte[] serializeCurrentKey() {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the current key."); + }

          + }
          +
          + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user key.", e); + }
          + }
          +
          + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) {
          + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + }

          + }
          +
          + private byte[] serializeUserValue(UV userValue) {
          + try

          { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user value.", e); + }

          + }
          +
          + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) {
          + try

          { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in); + UK userKey = userKeySerializer.deserialize(in); + + return new Tuple4<>(keyAndNamespace.f0, keyAndNamespace.f1, keyAndNamespace.f2, userKey); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while deserializing the user key.", e); + }

          + }
          +
          + private UV deserializeUserValue(byte[] rawValueBytes) {
          + try

          { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawValueBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + return userValueSerializer.deserialize(in); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while deserializing the user value.", e); + }

          + }
          +
          + // ------------------------------------------------------------------------
          + // Internal Classes
          + // ------------------------------------------------------------------------
          +
          + /** A map entry in RocksDBMapState */
          + private class RocksDBMapEntry implements Map.Entry<UK, UV> {
          + private final RocksDB db;
          +
          + /** The raw bytes of the key stored in RocksDB. Each user key is stored in RocksDB
          + * with the format #KeyGroup#Key#Namespace#UserKey. */
          + private final byte[] rawKeyBytes;
          +
          + /** The raw bytes of the value stored in RocksDB */
          + private final byte[] rawValueBytes;
          +
          + /** True if the entry has been deleted. */
          + private boolean deleted;
          +
          + /** The user key and value. The deserialization is performed lazily, i.e. the key
          + * and the value is deserialized only when they are accessed. */
          + private UK userKey = null;
          + private UV userValue = null;
          +
          + RocksDBMapEntry(final RocksDB db, final byte[] rawKeyBytes, final byte[] rawValueBytes)

          { + this.db = db; + + this.rawKeyBytes = rawKeyBytes; + this.rawValueBytes = rawValueBytes; + this.deleted = false; + }

          +
          + @Override
          + public UK getKey() {
          + if (userKey == null)

          { + userKey = deserializeUserKey(rawKeyBytes).f3; + }

          +
          + return userKey;
          + }
          +
          + @Override
          + public UV getValue() {
          + if (deleted)

          { + return null; + }

          else {
          + if (userValue == null) {
          + userValue = deserializeUserValue(rawValueBytes);
          — End diff –

          It would be possible to set `rawValueBytes ` to null here, it is no longer used after this line. This is potentially more memory friendly for larger states.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102019357 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { + @Override + public Iterator<UK> iterator() { + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) { + @Override + public UK next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + } + }; + } + }; + } + + @Override + public Iterable<UV> values() { + return new Iterable<UV>() { + @Override + public Iterator<UV> iterator() { + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) { + @Override + public UV next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + } + }; + } + }; + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + final Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + // Return null to make the behavior consistent with other states. + if (!iterator.hasNext()) { + return null; + } else { + return new Iterable<Map.Entry<UK, UV>>() { + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + } + }; + } + } + + @Override + public void add(Map<UK, UV> map) throws Exception { + if (map == null) { + return; + } + + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + } + } + + @Override + public void clear() { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + } + } + + @Override + @SuppressWarnings("unchecked") + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception { + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace"); + + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace( + serializedKeyAndNamespace, + backend.getKeySerializer(), + namespaceSerializer); + + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups()); + + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128); + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream); + + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView); + byte[] keyPrefixBytes = outputStream.toByteArray(); + + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + + // Return null to make the behavior consistent + if (!iterator.hasNext()) { + return null; + } + + outputStream.reset(); + + while (iterator.hasNext()) { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + } + + return outputStream.toByteArray(); + } + + // ------------------------------------------------------------------------ + // Serialization Methods + // ------------------------------------------------------------------------ + + private byte[] serializeCurrentKey() { + try { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the current key."); + } + } + + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) { + try { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) { + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserValue(UV userValue) { + try { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user value.", e); + } + } + + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) { + try { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawKeyBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + Tuple3<Integer, K, N> keyAndNamespace = readKeyWithGroupAndNamespace(bais, in); + UK userKey = userKeySerializer.deserialize(in); + + return new Tuple4<>(keyAndNamespace.f0, keyAndNamespace.f1, keyAndNamespace.f2, userKey); + } catch (IOException e) { + throw new RuntimeException("Error while deserializing the user key.", e); + } + } + + private UV deserializeUserValue(byte[] rawValueBytes) { + try { + ByteArrayInputStreamWithPos bais = new ByteArrayInputStreamWithPos(rawValueBytes); + DataInputViewStreamWrapper in = new DataInputViewStreamWrapper(bais); + + return userValueSerializer.deserialize(in); + } catch (IOException e) { + throw new RuntimeException("Error while deserializing the user value.", e); + } + } + + // ------------------------------------------------------------------------ + // Internal Classes + // ------------------------------------------------------------------------ + + /** A map entry in RocksDBMapState */ + private class RocksDBMapEntry implements Map.Entry<UK, UV> { + private final RocksDB db; + + /** The raw bytes of the key stored in RocksDB. Each user key is stored in RocksDB + * with the format #KeyGroup#Key#Namespace#UserKey. */ + private final byte[] rawKeyBytes; + + /** The raw bytes of the value stored in RocksDB */ + private final byte[] rawValueBytes; + + /** True if the entry has been deleted. */ + private boolean deleted; + + /** The user key and value. The deserialization is performed lazily, i.e. the key + * and the value is deserialized only when they are accessed. */ + private UK userKey = null; + private UV userValue = null; + + RocksDBMapEntry(final RocksDB db, final byte[] rawKeyBytes, final byte[] rawValueBytes) { + this.db = db; + + this.rawKeyBytes = rawKeyBytes; + this.rawValueBytes = rawValueBytes; + this.deleted = false; + } + + @Override + public UK getKey() { + if (userKey == null) { + userKey = deserializeUserKey(rawKeyBytes).f3; + } + + return userKey; + } + + @Override + public UV getValue() { + if (deleted) { + return null; + } else { + if (userValue == null) { + userValue = deserializeUserValue(rawValueBytes); — End diff – It would be possible to set `rawValueBytes ` to null here, it is no longer used after this line. This is potentially more memory friendly for larger states.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102022585

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          +
          + /**
          + * Creates a new

          {@code RocksDBMapState}

          .
          + *
          + * @param namespaceSerializer The serializer for the namespace.
          + * @param stateDesc The state identifier for the state.
          + */
          + public RocksDBMapState(ColumnFamilyHandle columnFamily,
          + TypeSerializer<N> namespaceSerializer,
          + MapStateDescriptor<UK, UV> stateDesc,
          + RocksDBKeyedStateBackend<K> backend)

          { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + }

          +
          + // ------------------------------------------------------------------------
          + // MapState Implementation
          + // ------------------------------------------------------------------------
          +
          + @Override
          + public UV get(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public void put(UK userKey, UV userValue) throws IOException {
          + if (userValue == null)

          { + remove(userKey); + }

          +
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while putting data into RocksDB", e); + }

          + }
          +
          + @Override
          + public void remove(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while removing data from RocksDB.", e); + }

          + }
          +
          + @Override
          + public boolean contains(UK userKey) throws IOException {
          + try

          { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + }

          catch (RocksDBException e)

          { + throw new RuntimeException("Error while getting data from RocksDB", e); + }

          + }
          +
          + @Override
          + public int size() throws IOException {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + int count = 0;
          + while (iterator.hasNext())

          { + count++; + iterator.next(); + }

          +
          + return count;
          + }
          +
          + @Override
          + public Iterable<UK> keys() {
          + return new Iterable<UK>() {
          + @Override
          + public Iterator<UK> iterator() {
          + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UK next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterable<UV> values() {
          + return new Iterable<UV>() {
          + @Override
          + public Iterator<UV> iterator() {
          + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) {
          + @Override
          + public UV next()

          { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + }

          + };
          + }
          + };
          + }
          +
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() {
          + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) {
          + @Override
          + public Map.Entry<UK, UV> next()

          { + return nextEntry(); + }
          + };
          + }
          +
          + @Override
          + public Iterable<Map.Entry<UK, UV>> get() throws Exception {
          + final Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + // Return null to make the behavior consistent with other states.
          + if (!iterator.hasNext()) { + return null; + } else {
          + return new Iterable<Map.Entry<UK, UV>>() {
          + @Override
          + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + }
          + };
          + }
          + }
          +
          + @Override
          + public void add(Map<UK, UV> map) throws Exception {
          + if (map == null) { + return; + }
          +
          + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + }
          + }
          +
          + @Override
          + public void clear() {
          + Iterator<Map.Entry<UK, UV>> iterator = iterator();
          +
          + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + }
          + }
          +
          + @Override
          + @SuppressWarnings("unchecked")
          + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception {
          + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace");
          +
          + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation
          + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace(
          + serializedKeyAndNamespace,
          + backend.getKeySerializer(),
          + namespaceSerializer);
          +
          + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups());
          +
          + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128);
          + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream);
          +
          + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView);
          + byte[] keyPrefixBytes = outputStream.toByteArray();
          +
          + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) {
          + @Override
          + public Map.Entry<UK, UV> next() { + return nextEntry(); + }

          + };
          +
          + // Return null to make the behavior consistent
          + if (!iterator.hasNext())

          { + return null; + }

          +
          + outputStream.reset();
          +
          + while (iterator.hasNext())

          { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + }

          +
          + return outputStream.toByteArray();
          + }
          +
          + // ------------------------------------------------------------------------
          + // Serialization Methods
          + // ------------------------------------------------------------------------
          +
          + private byte[] serializeCurrentKey() {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the current key."); + }

          + }
          +
          + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) {
          + try

          { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user key.", e); + }
          + }
          +
          + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) {
          + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + }

          + }
          +
          + private byte[] serializeUserValue(UV userValue) {
          + try

          { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + }

          catch (IOException e)

          { + throw new RuntimeException("Error while serializing the user value.", e); + }

          + }
          +
          + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) {
          — End diff –

          As far as I can see, all caller of this method are only interested in the user key. We could avoid some object creation overhead, if we only focus on returning the user key, in particular because this method is called very often.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102022585 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; + + /** + * Creates a new {@code RocksDBMapState} . + * + * @param namespaceSerializer The serializer for the namespace. + * @param stateDesc The state identifier for the state. + */ + public RocksDBMapState(ColumnFamilyHandle columnFamily, + TypeSerializer<N> namespaceSerializer, + MapStateDescriptor<UK, UV> stateDesc, + RocksDBKeyedStateBackend<K> backend) { + + super(columnFamily, namespaceSerializer, stateDesc, backend); + + this.userKeySerializer = stateDesc.getKeySerializer(); + this.userValueSerializer = stateDesc.getValueSerializer(); + + writeOptions = new WriteOptions(); + writeOptions.setDisableWAL(true); + } + + // ------------------------------------------------------------------------ + // MapState Implementation + // ------------------------------------------------------------------------ + + @Override + public UV get(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes == null ? null : deserializeUserValue(rawValueBytes)); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB.", e); + } + } + + @Override + public void put(UK userKey, UV userValue) throws IOException { + if (userValue == null) { + remove(userKey); + } + + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = serializeUserValue(userValue); + + backend.db.put(columnFamily, writeOptions, rawKeyBytes, rawValueBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while putting data into RocksDB", e); + } + } + + @Override + public void remove(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + + backend.db.remove(columnFamily, writeOptions, rawKeyBytes); + } catch (RocksDBException e) { + throw new RuntimeException("Error while removing data from RocksDB.", e); + } + } + + @Override + public boolean contains(UK userKey) throws IOException { + try { + byte[] rawKeyBytes = serializeUserKeyWithCurrentKeyAndNamespace(userKey); + byte[] rawValueBytes = backend.db.get(columnFamily, rawKeyBytes); + + return (rawValueBytes != null); + } catch (RocksDBException e) { + throw new RuntimeException("Error while getting data from RocksDB", e); + } + } + + @Override + public int size() throws IOException { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + int count = 0; + while (iterator.hasNext()) { + count++; + iterator.next(); + } + + return count; + } + + @Override + public Iterable<UK> keys() { + return new Iterable<UK>() { + @Override + public Iterator<UK> iterator() { + return new RocksDBMapIterator<UK>(backend.db, serializeCurrentKey()) { + @Override + public UK next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getKey()); + } + }; + } + }; + } + + @Override + public Iterable<UV> values() { + return new Iterable<UV>() { + @Override + public Iterator<UV> iterator() { + return new RocksDBMapIterator<UV>(backend.db, serializeCurrentKey()) { + @Override + public UV next() { + RocksDBMapEntry entry = nextEntry(); + return (entry == null ? null : entry.getValue()); + } + }; + } + }; + } + + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, serializeCurrentKey()) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + } + + @Override + public Iterable<Map.Entry<UK, UV>> get() throws Exception { + final Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + // Return null to make the behavior consistent with other states. + if (!iterator.hasNext()) { + return null; + } else { + return new Iterable<Map.Entry<UK, UV>>() { + @Override + public Iterator<Map.Entry<UK, UV>> iterator() { + return iterator; + } + }; + } + } + + @Override + public void add(Map<UK, UV> map) throws Exception { + if (map == null) { + return; + } + + for (Map.Entry<UK, UV> entry : map.entrySet()) { + put(entry.getKey(), entry.getValue()); + } + } + + @Override + public void clear() { + Iterator<Map.Entry<UK, UV>> iterator = iterator(); + + while (iterator.hasNext()) { + iterator.next(); + iterator.remove(); + } + } + + @Override + @SuppressWarnings("unchecked") + public byte[] getSerializedValue(byte[] serializedKeyAndNamespace) throws Exception { + Preconditions.checkNotNull(serializedKeyAndNamespace, "Serialized key and namespace"); + + //TODO make KvStateRequestSerializer key-group aware to save this round trip and key-group computation + Tuple2<K, N> des = KvStateRequestSerializer.deserializeKeyAndNamespace( + serializedKeyAndNamespace, + backend.getKeySerializer(), + namespaceSerializer); + + int keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(des.f0, backend.getNumberOfKeyGroups()); + + ByteArrayOutputStreamWithPos outputStream = new ByteArrayOutputStreamWithPos(128); + DataOutputViewStreamWrapper outputView = new DataOutputViewStreamWrapper(outputStream); + + writeKeyWithGroupAndNamespace(keyGroup, des.f0, des.f1, outputStream, outputView); + byte[] keyPrefixBytes = outputStream.toByteArray(); + + Iterator<Map.Entry<UK, UV>> iterator = new RocksDBMapIterator<Map.Entry<UK, UV>>(backend.db, keyPrefixBytes) { + @Override + public Map.Entry<UK, UV> next() { + return nextEntry(); + } + }; + + // Return null to make the behavior consistent + if (!iterator.hasNext()) { + return null; + } + + outputStream.reset(); + + while (iterator.hasNext()) { + Map.Entry<UK, UV> entry = iterator.next(); + + userKeySerializer.serialize(entry.getKey(), outputView); + userValueSerializer.serialize(entry.getValue(), outputView); + } + + return outputStream.toByteArray(); + } + + // ------------------------------------------------------------------------ + // Serialization Methods + // ------------------------------------------------------------------------ + + private byte[] serializeCurrentKey() { + try { + writeCurrentKeyWithGroupAndNamespace(); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the current key."); + } + } + + private byte[] serializeUserKeyWithCurrentKeyAndNamespace(UK userKey) { + try { + writeCurrentKeyWithGroupAndNamespace(); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserKey(int keyGroup, K key, N namespace, UK userKey) { + try { + writeKeyWithGroupAndNamespace(keyGroup, key, namespace, keySerializationStream, keySerializationDataOutputView); + userKeySerializer.serialize(userKey, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user key.", e); + } + } + + private byte[] serializeUserValue(UV userValue) { + try { + keySerializationStream.reset(); + userValueSerializer.serialize(userValue, keySerializationDataOutputView); + + return keySerializationStream.toByteArray(); + } catch (IOException e) { + throw new RuntimeException("Error while serializing the user value.", e); + } + } + + private Tuple4<Integer, K, N, UK> deserializeUserKey(byte[] rawKeyBytes) { — End diff – As far as I can see, all caller of this method are only interested in the user key. We could avoid some object creation overhead, if we only focus on returning the user key, in particular because this method is called very often.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101997876

          — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java —
          @@ -0,0 +1,579 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + * <p/>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p/>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.contrib.streaming.state;
          +
          +import org.apache.flink.api.common.state.MapState;
          +import org.apache.flink.api.common.state.MapStateDescriptor;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.java.tuple.Tuple2;
          +import org.apache.flink.api.java.tuple.Tuple3;
          +import org.apache.flink.api.java.tuple.Tuple4;
          +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos;
          +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos;
          +import org.apache.flink.core.memory.DataInputViewStreamWrapper;
          +import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
          +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer;
          +import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
          +import org.apache.flink.runtime.state.internal.InternalMapState;
          +import org.apache.flink.util.Preconditions;
          +import org.rocksdb.ColumnFamilyHandle;
          +import org.rocksdb.RocksDB;
          +import org.rocksdb.RocksDBException;
          +import org.rocksdb.RocksIterator;
          +import org.rocksdb.WriteOptions;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link MapState}

          implementation that stores state in RocksDB.
          + * <p>
          + * <p>

          {@link RocksDBStateBackend}

          must ensure that we set the
          + *

          {@link org.rocksdb.StringAppendOperator}

          on the column family that we use for our state since
          + * we use the

          {@code merge()}

          call.
          + *
          + * @param <K> The type of the key.
          + * @param <N> The type of the namespace.
          + * @param <UK> The type of the keys in the map state.
          + * @param <UV> The type of the values in the map state.
          + */
          +public class RocksDBMapState<K, N, UK, UV>
          + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>>
          + implements InternalMapState<N, UK, UV> {
          +
          + /** Serializer for the keys and values */
          + private final TypeSerializer<UK> userKeySerializer;
          + private final TypeSerializer<UV> userValueSerializer;
          +
          + /**
          + * We disable writes to the write-ahead-log here. We can't have these in the base class
          + * because JNI segfaults for some reason if they are.
          + */
          + private final WriteOptions writeOptions;
          — End diff –

          This comment sounds a bit scary to me. Do you have any idea why this is happening? Besides I am afraid that this might be a potential memory leak. A native object is created here through JNI, but never explicitly closed.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101997876 — Diff: flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java — @@ -0,0 +1,579 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.contrib.streaming.state; + +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.core.memory.ByteArrayInputStreamWithPos; +import org.apache.flink.core.memory.ByteArrayOutputStreamWithPos; +import org.apache.flink.core.memory.DataInputViewStreamWrapper; +import org.apache.flink.core.memory.DataOutputViewStreamWrapper; +import org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.runtime.state.internal.InternalMapState; +import org.apache.flink.util.Preconditions; +import org.rocksdb.ColumnFamilyHandle; +import org.rocksdb.RocksDB; +import org.rocksdb.RocksDBException; +import org.rocksdb.RocksIterator; +import org.rocksdb.WriteOptions; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link MapState} implementation that stores state in RocksDB. + * <p> + * <p> {@link RocksDBStateBackend} must ensure that we set the + * {@link org.rocksdb.StringAppendOperator} on the column family that we use for our state since + * we use the {@code merge()} call. + * + * @param <K> The type of the key. + * @param <N> The type of the namespace. + * @param <UK> The type of the keys in the map state. + * @param <UV> The type of the values in the map state. + */ +public class RocksDBMapState<K, N, UK, UV> + extends AbstractRocksDBState<K, N, MapState<UK, UV>, MapStateDescriptor<UK, UV>, Map<UK, UV>> + implements InternalMapState<N, UK, UV> { + + /** Serializer for the keys and values */ + private final TypeSerializer<UK> userKeySerializer; + private final TypeSerializer<UV> userValueSerializer; + + /** + * We disable writes to the write-ahead-log here. We can't have these in the base class + * because JNI segfaults for some reason if they are. + */ + private final WriteOptions writeOptions; — End diff – This comment sounds a bit scary to me. Do you have any idea why this is happening? Besides I am afraid that this might be a potential memory leak. A native object is created here through JNI, but never explicitly closed.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101995663

          — Diff: flink-runtime/src/test/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializerTest.java —
          @@ -410,6 +415,124 @@ public void testDeserializeListTooShort2() throws Exception {
          KvStateRequestSerializer.deserializeList(new byte[]

          {1, 1, 1, 1, 1, 1, 1, 1, 2, 3}

          ,
          LongSerializer.INSTANCE);
          }
          +
          + /**
          + * Tests map serialization utils.
          + */
          + @Test
          + public void testMapSerialization() throws Exception

          { + final long key = 0L; + + // objects for heap state list serialisation + final HeapKeyedStateBackend<Long> longHeapKeyedStateBackend = + new HeapKeyedStateBackend<>( + mock(TaskKvStateRegistry.class), + LongSerializer.INSTANCE, + ClassLoader.getSystemClassLoader(), + 1, new KeyGroupRange(0, 0) + ); + longHeapKeyedStateBackend.setCurrentKey(key); + + final InternalMapState<VoidNamespace, Long, String> mapState = longHeapKeyedStateBackend.createMapState( + VoidNamespaceSerializer.INSTANCE, + new MapStateDescriptor<>("test", LongSerializer.INSTANCE, StringSerializer.INSTANCE)); + + testMapSerialization(key, mapState); + }

          +
          + /**
          + * Verifies that the serialization of a map using the given map state
          + * matches the deserialization with

          {@link KvStateRequestSerializer#deserializeList}

          .
          + *
          + * @param key
          + * key of the map state
          + * @param mapState
          + * map state using the

          {@link VoidNamespace}

          , must also be a

          {@link InternalKvState}

          instance
          + *
          + * @throws Exception
          + */
          + public static void testMapSerialization(
          + final long key,
          + final InternalMapState<VoidNamespace, Long, String> mapState) throws Exception {
          +
          + TypeSerializer<Long> userKeySerializer = LongSerializer.INSTANCE;
          + TypeSerializer<String> userValueSerializer = StringSerializer.INSTANCE;
          + mapState.setCurrentNamespace(VoidNamespace.INSTANCE);
          +
          + // List
          + final int numElements = 10;
          +
          + final Map<Long, String> expectedValues = new HashMap<>();
          + for (int i = 0; i < numElements; i++) {
          + final long value = ThreadLocalRandom.current().nextLong();
          — End diff –

          Although it probably doesn't matter too much here, in general I would suggest to use random generators with a seed, so that in case a test fails, it is easier to reproduce the failing case.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101995663 — Diff: flink-runtime/src/test/java/org/apache/flink/runtime/query/netty/message/KvStateRequestSerializerTest.java — @@ -410,6 +415,124 @@ public void testDeserializeListTooShort2() throws Exception { KvStateRequestSerializer.deserializeList(new byte[] {1, 1, 1, 1, 1, 1, 1, 1, 2, 3} , LongSerializer.INSTANCE); } + + /** + * Tests map serialization utils. + */ + @Test + public void testMapSerialization() throws Exception { + final long key = 0L; + + // objects for heap state list serialisation + final HeapKeyedStateBackend<Long> longHeapKeyedStateBackend = + new HeapKeyedStateBackend<>( + mock(TaskKvStateRegistry.class), + LongSerializer.INSTANCE, + ClassLoader.getSystemClassLoader(), + 1, new KeyGroupRange(0, 0) + ); + longHeapKeyedStateBackend.setCurrentKey(key); + + final InternalMapState<VoidNamespace, Long, String> mapState = longHeapKeyedStateBackend.createMapState( + VoidNamespaceSerializer.INSTANCE, + new MapStateDescriptor<>("test", LongSerializer.INSTANCE, StringSerializer.INSTANCE)); + + testMapSerialization(key, mapState); + } + + /** + * Verifies that the serialization of a map using the given map state + * matches the deserialization with {@link KvStateRequestSerializer#deserializeList} . + * + * @param key + * key of the map state + * @param mapState + * map state using the {@link VoidNamespace} , must also be a {@link InternalKvState} instance + * + * @throws Exception + */ + public static void testMapSerialization( + final long key, + final InternalMapState<VoidNamespace, Long, String> mapState) throws Exception { + + TypeSerializer<Long> userKeySerializer = LongSerializer.INSTANCE; + TypeSerializer<String> userValueSerializer = StringSerializer.INSTANCE; + mapState.setCurrentNamespace(VoidNamespace.INSTANCE); + + // List + final int numElements = 10; + + final Map<Long, String> expectedValues = new HashMap<>(); + for (int i = 0; i < numElements; i++) { + final long value = ThreadLocalRandom.current().nextLong(); — End diff – Although it probably doesn't matter too much here, in general I would suggest to use random generators with a seed, so that in case a test fails, it is easier to reproduce the failing case.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r102002192

          — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java —
          @@ -0,0 +1,111 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.common.state;
          +
          +import org.apache.flink.annotation.PublicEvolving;
          +
          +import java.io.IOException;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link State}

          interface for partitioned key-value state. The key-value pair can be
          + * added, updated and retrieved.
          + *
          + * <p>The state is accessed and modified by user functions, and checkpointed consistently
          + * by the system as part of the distributed snapshots.
          + *
          + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is
          + * automatically supplied by the system, so the function always sees the value mapped to the
          + * key of the current element. That way, the system can handle stream and state partitioning
          + * consistently together.
          + *
          + * @param <UK> Type of the keys in the state.
          + * @param <UV> Type of the values in the state.
          + */
          +@PublicEvolving
          +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> {
          — End diff –

          Thanks 👍

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r102002192 — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java — @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common.state; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.IOException; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link State} interface for partitioned key-value state. The key-value pair can be + * added, updated and retrieved. + * + * <p>The state is accessed and modified by user functions, and checkpointed consistently + * by the system as part of the distributed snapshots. + * + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is + * automatically supplied by the system, so the function always sees the value mapped to the + * key of the current element. That way, the system can handle stream and state partitioning + * consistently together. + * + * @param <UK> Type of the keys in the state. + * @param <UV> Type of the values in the state. + */ +@PublicEvolving +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> { — End diff – Thanks 👍
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101987352

          — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java —
          @@ -0,0 +1,111 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.common.state;
          +
          +import org.apache.flink.annotation.PublicEvolving;
          +
          +import java.io.IOException;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link State}

          interface for partitioned key-value state. The key-value pair can be
          + * added, updated and retrieved.
          + *
          + * <p>The state is accessed and modified by user functions, and checkpointed consistently
          + * by the system as part of the distributed snapshots.
          + *
          + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is
          + * automatically supplied by the system, so the function always sees the value mapped to the
          + * key of the current element. That way, the system can handle stream and state partitioning
          + * consistently together.
          + *
          + * @param <UK> Type of the keys in the state.
          + * @param <UV> Type of the values in the state.
          + */
          +@PublicEvolving
          +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> {
          — End diff –

          I agree that it's `MultiMapState`, instead of `MapState`, that is supposed to be an `AppendingState`.

          I will update the interface hierarchy, making `MapState` not an `AppendingState`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101987352 — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java — @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common.state; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.IOException; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link State} interface for partitioned key-value state. The key-value pair can be + * added, updated and retrieved. + * + * <p>The state is accessed and modified by user functions, and checkpointed consistently + * by the system as part of the distributed snapshots. + * + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is + * automatically supplied by the system, so the function always sees the value mapped to the + * key of the current element. That way, the system can handle stream and state partitioning + * consistently together. + * + * @param <UK> Type of the keys in the state. + * @param <UV> Type of the values in the state. + */ +@PublicEvolving +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> { — End diff – I agree that it's `MultiMapState`, instead of `MapState`, that is supposed to be an `AppendingState`. I will update the interface hierarchy, making `MapState` not an `AppendingState`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101983507

          — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java —
          @@ -0,0 +1,111 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.common.state;
          +
          +import org.apache.flink.annotation.PublicEvolving;
          +
          +import java.io.IOException;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link State}

          interface for partitioned key-value state. The key-value pair can be
          + * added, updated and retrieved.
          + *
          + * <p>The state is accessed and modified by user functions, and checkpointed consistently
          + * by the system as part of the distributed snapshots.
          + *
          + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is
          + * automatically supplied by the system, so the function always sees the value mapped to the
          + * key of the current element. That way, the system can handle stream and state partitioning
          + * consistently together.
          + *
          + * @param <UK> Type of the keys in the state.
          + * @param <UV> Type of the values in the state.
          + */
          +@PublicEvolving
          +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> {
          — End diff –

          Exactly, the semantics are different from how they are described on `AppendingState` and from how it is used. `MapState` can still have a `add(Map<UK, UV> map)` method but I don't think it makes sense that it is an `AppendingState`. What are the use cases where a user is expected to have an `AppendingState<Map<K, V>, Map<K, V>>` instead of simply having a `MapState<K, V>`?

          As I said before, the semantics of `MapState.add()` would be like `ValueState.update()` and `ValueState` is also not an `AppendingState`. I think a `MultiMapState` that would be like a `Map<K, Iterable<V>>` (and would have the semantics of a `ListState`) would be able to satisfy `AppendingState` but I'm not sure whether we would need the interface there.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101983507 — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java — @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common.state; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.IOException; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link State} interface for partitioned key-value state. The key-value pair can be + * added, updated and retrieved. + * + * <p>The state is accessed and modified by user functions, and checkpointed consistently + * by the system as part of the distributed snapshots. + * + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is + * automatically supplied by the system, so the function always sees the value mapped to the + * key of the current element. That way, the system can handle stream and state partitioning + * consistently together. + * + * @param <UK> Type of the keys in the state. + * @param <UV> Type of the values in the state. + */ +@PublicEvolving +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> { — End diff – Exactly, the semantics are different from how they are described on `AppendingState` and from how it is used. `MapState` can still have a `add(Map<UK, UV> map)` method but I don't think it makes sense that it is an `AppendingState`. What are the use cases where a user is expected to have an `AppendingState<Map<K, V>, Map<K, V>>` instead of simply having a `MapState<K, V>`? As I said before, the semantics of `MapState.add()` would be like `ValueState.update()` and `ValueState` is also not an `AppendingState`. I think a `MultiMapState` that would be like a `Map<K, Iterable<V>>` (and would have the semantics of a `ListState`) would be able to satisfy `AppendingState` but I'm not sure whether we would need the interface there.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user shixiaogang commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101936792

          — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java —
          @@ -0,0 +1,111 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.common.state;
          +
          +import org.apache.flink.annotation.PublicEvolving;
          +
          +import java.io.IOException;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link State}

          interface for partitioned key-value state. The key-value pair can be
          + * added, updated and retrieved.
          + *
          + * <p>The state is accessed and modified by user functions, and checkpointed consistently
          + * by the system as part of the distributed snapshots.
          + *
          + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is
          + * automatically supplied by the system, so the function always sees the value mapped to the
          + * key of the current element. That way, the system can handle stream and state partitioning
          + * consistently together.
          + *
          + * @param <UK> Type of the keys in the state.
          + * @param <UV> Type of the values in the state.
          + */
          +@PublicEvolving
          +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> {
          — End diff –

          `MapState` provides the `add` method which puts a collection of key-value pairs into the state. Though the semantics may be a little different in existing `AppendingState`s, I think it's okay for `MapState` to be an `AppendingState` because the interface does not enforce any restriction on the modification of previous data.

          Show
          githubbot ASF GitHub Bot added a comment - Github user shixiaogang commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101936792 — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java — @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common.state; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.IOException; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link State} interface for partitioned key-value state. The key-value pair can be + * added, updated and retrieved. + * + * <p>The state is accessed and modified by user functions, and checkpointed consistently + * by the system as part of the distributed snapshots. + * + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is + * automatically supplied by the system, so the function always sees the value mapped to the + * key of the current element. That way, the system can handle stream and state partitioning + * consistently together. + * + * @param <UK> Type of the keys in the state. + * @param <UV> Type of the values in the state. + */ +@PublicEvolving +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> { — End diff – `MapState` provides the `add` method which puts a collection of key-value pairs into the state. Though the semantics may be a little different in existing `AppendingState`s, I think it's okay for `MapState` to be an `AppendingState` because the interface does not enforce any restriction on the modification of previous data.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101750130

          — Diff: flink-core/src/main/java/org/apache/flink/api/common/typeutils/base/MapSerializer.java —
          @@ -0,0 +1,179 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.common.typeutils.base;
          +
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.core.memory.DataInputView;
          +import org.apache.flink.core.memory.DataOutputView;
          +import org.apache.flink.util.Preconditions;
          +
          +import java.io.IOException;
          +import java.util.HashMap;
          +import java.util.List;
          +import java.util.Map;
          +
          +/**
          + * A serializer for

          {@link List Lists}

          . The serializer relies on an element serializer
          — End diff –

          Wrong Javadoc

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101750130 — Diff: flink-core/src/main/java/org/apache/flink/api/common/typeutils/base/MapSerializer.java — @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common.typeutils.base; + +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.core.memory.DataInputView; +import org.apache.flink.core.memory.DataOutputView; +import org.apache.flink.util.Preconditions; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * A serializer for {@link List Lists} . The serializer relies on an element serializer — End diff – Wrong Javadoc
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101750262

          — Diff: flink-core/src/main/java/org/apache/flink/api/java/typeutils/MapTypeInfo.java —
          @@ -0,0 +1,147 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.java.typeutils;
          +
          +import org.apache.flink.annotation.PublicEvolving;
          +import org.apache.flink.api.common.ExecutionConfig;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.common.typeutils.base.MapSerializer;
          +import org.apache.flink.util.Preconditions;
          +
          +import java.util.Map;
          +
          +import static org.apache.flink.util.Preconditions.checkNotNull;
          +
          +/**
          + * Type information for the map types of the JAVA API.
          — End diff –

          This could be something like "Special

          {@code TypeInformation}

          for use by

          {@link MapStateDescriptor}

          ."

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101750262 — Diff: flink-core/src/main/java/org/apache/flink/api/java/typeutils/MapTypeInfo.java — @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.typeutils; + +import org.apache.flink.annotation.PublicEvolving; +import org.apache.flink.api.common.ExecutionConfig; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.common.typeutils.base.MapSerializer; +import org.apache.flink.util.Preconditions; + +import java.util.Map; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * Type information for the map types of the JAVA API. — End diff – This could be something like "Special {@code TypeInformation} for use by {@link MapStateDescriptor} ."
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101749935

          — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java —
          @@ -0,0 +1,111 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.common.state;
          +
          +import org.apache.flink.annotation.PublicEvolving;
          +
          +import java.io.IOException;
          +import java.util.Iterator;
          +import java.util.Map;
          +
          +/**
          + *

          {@link State}

          interface for partitioned key-value state. The key-value pair can be
          + * added, updated and retrieved.
          + *
          + * <p>The state is accessed and modified by user functions, and checkpointed consistently
          + * by the system as part of the distributed snapshots.
          + *
          + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is
          + * automatically supplied by the system, so the function always sees the value mapped to the
          + * key of the current element. That way, the system can handle stream and state partitioning
          + * consistently together.
          + *
          + * @param <UK> Type of the keys in the state.
          + * @param <UV> Type of the values in the state.
          + */
          +@PublicEvolving
          +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> {
          — End diff –

          What's the reason for making `MapState` an `AppendingState`? `AppendingState` is meant for incorporating elements into the existing state, for example to aggregate something or to accumulate in a bag/list. A `MapState` is more like a `ValueState` in that data that is put in (potentially) replaces previous data.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101749935 — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapState.java — @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common.state; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.IOException; +import java.util.Iterator; +import java.util.Map; + +/** + * {@link State} interface for partitioned key-value state. The key-value pair can be + * added, updated and retrieved. + * + * <p>The state is accessed and modified by user functions, and checkpointed consistently + * by the system as part of the distributed snapshots. + * + * <p>The state is only accessible by functions applied on a KeyedDataStream. The key is + * automatically supplied by the system, so the function always sees the value mapped to the + * key of the current element. That way, the system can handle stream and state partitioning + * consistently together. + * + * @param <UK> Type of the keys in the state. + * @param <UV> Type of the values in the state. + */ +@PublicEvolving +public interface MapState<UK, UV> extends AppendingState<Map<UK, UV>, Iterable<Map.Entry<UK, UV>>> { — End diff – What's the reason for making `MapState` an `AppendingState`? `AppendingState` is meant for incorporating elements into the existing state, for example to aggregate something or to accumulate in a bag/list. A `MapState` is more like a `ValueState` in that data that is put in (potentially) replaces previous data.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3336#discussion_r101750000

          — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapStateDescriptor.java —
          @@ -0,0 +1,132 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.api.common.state;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.common.typeutils.TypeSerializer;
          +import org.apache.flink.api.common.typeutils.base.MapSerializer;
          +import org.apache.flink.api.java.typeutils.MapTypeInfo;
          +
          +import java.util.Map;
          +
          +public class MapStateDescriptor<UK, UV> extends StateDescriptor<MapState<UK, UV>, Map<UK, UV>> {
          — End diff –

          Missing Javadoc

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/3336#discussion_r101750000 — Diff: flink-core/src/main/java/org/apache/flink/api/common/state/MapStateDescriptor.java — @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common.state; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.common.typeutils.base.MapSerializer; +import org.apache.flink.api.java.typeutils.MapTypeInfo; + +import java.util.Map; + +public class MapStateDescriptor<UK, UV> extends StateDescriptor<MapState<UK, UV>, Map<UK, UV>> { — End diff – Missing Javadoc
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user shixiaogang opened a pull request:

          https://github.com/apache/flink/pull/3336

          FLINK-4856[state] Add MapState in KeyedState

          1. Add `MapState` and `MapStateDescriptor`
          2. Implementation of `MapState` in `HeapKeyedStateBackend` and `RocksDBKeyedStateBackend`.
          3. Add accessors to `MapState` in `RuntimeContext`

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/alibaba/flink flink-4856

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3336.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3336


          commit 430b4f596acbff0a9dfdc20fbb2430a8fad819f9
          Author: xiaogang.sxg <xiaogang.sxg@alibaba-inc.com>
          Date: 2017-02-17T03:19:18Z

          Add MapState in KeyedState


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user shixiaogang opened a pull request: https://github.com/apache/flink/pull/3336 FLINK-4856 [state] Add MapState in KeyedState 1. Add `MapState` and `MapStateDescriptor` 2. Implementation of `MapState` in `HeapKeyedStateBackend` and `RocksDBKeyedStateBackend`. 3. Add accessors to `MapState` in `RuntimeContext` You can merge this pull request into a Git repository by running: $ git pull https://github.com/alibaba/flink flink-4856 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3336.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3336 commit 430b4f596acbff0a9dfdc20fbb2430a8fad819f9 Author: xiaogang.sxg <xiaogang.sxg@alibaba-inc.com> Date: 2017-02-17T03:19:18Z Add MapState in KeyedState
          Hide
          aljoscha Aljoscha Krettek added a comment -

          Thanks for taking care of this! Please let me know if you run into any issues/need a code review.

          Show
          aljoscha Aljoscha Krettek added a comment - Thanks for taking care of this! Please let me know if you run into any issues/need a code review.
          Hide
          xiaogang.shi Xiaogang Shi added a comment - - edited

          I have started the implementation of MapStates. But at prior to that, I think we need some modification to current implementation to clarify the concepts. I have started two JIRA to state these problems. You may see FLINK-5023 and FLINK-5024 for the details.

          Show
          xiaogang.shi Xiaogang Shi added a comment - - edited I have started the implementation of MapStates. But at prior to that, I think we need some modification to current implementation to clarify the concepts. I have started two JIRA to state these problems. You may see FLINK-5023 and FLINK-5024 for the details.

            People

            • Assignee:
              xiaogang.shi Xiaogang Shi
              Reporter:
              xiaogang.shi Xiaogang Shi
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development