tableNames, Throwable t);
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/doc-files/system-overview.dot b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/doc-files/system-overview.dot
deleted file mode 100644
index c5a8dbd..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/doc-files/system-overview.dot
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-digraph "API Usage" {
- nodesep=1.2;
-
- DATA [label="ACID\ndataset",shape=oval,style=filled,color="gray"];
- CHANGES [label="Changed\ndata",shape=oval,style=filled,color="gray"];
-
- META_STORE [label="Hive\nMetaStore",shape=box,style=filled,color="darkseagreen3"];
- HIVE_CLI [label="Hive\nCLI",shape=box,style=filled,color="darkseagreen3"];
-
- MERGE1 [label="Compute\nmutations\n(your code)",shape=box,style=filled,color="khaki1"];
- SORT [label="Group\n& sort\n(your code)",shape=box,style=filled,color="khaki1"];
- CLIENT [label="Mutator\nclient",shape=box,style=filled,color="lightblue"];
- BUCKET [label="Bucket ID\nappender",shape=box,style=filled,color="lightblue"];
- COORD [label="Mutator\ncoordinator",shape=box,style=filled,color="lightblue"];
- CLIENT -> COORD [label="Provides\nconf to"];
- CLIENT -> BUCKET [label="Provides\nconf to"];
-
- CLIENT -> META_STORE [label="Manages\ntxns using"];
- CHANGES -> MERGE1 [label="Reads ∆s\nfrom"];
- DATA -> MERGE1 [label="Reads\nROW__IDs\nfrom"];
- BUCKET -> MERGE1 [label="Appends ids\nto inserts"];
- MERGE1 -> SORT;
- SORT -> COORD [label="Issues\nmutations to"];
- COORD -> DATA [label="Writes to"];
- DATA -> HIVE_CLI [label="Read by"];
- META_STORE -> DATA [label="Compacts"];
-}
\ No newline at end of file
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/package.html b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/package.html
deleted file mode 100644
index 7bc75c0..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/package.html
+++ /dev/null
@@ -1,520 +0,0 @@
-
-
-
-
-
-
-
-
-HCatalog Streaming Mutation API
-
-
-
-
-HCatalog Streaming Mutation API -- high level description
-@deprecated as of Hive 3.0.0
-Background
-
-In certain data processing use cases it is necessary to modify existing
-data when new facts arrive. An example of this is the classic ETL merge
-where a copy of a data set is kept in sync with a master by the frequent
-application of deltas. The deltas describe the mutations (inserts,
-updates, deletes) that have occurred to the master since the previous
-sync. To implement such a case using Hadoop traditionally demands that
-the partitions containing records targeted by the mutations be
-rewritten. This is a coarse approach; a partition containing millions of
-records might be rebuilt because of a single record change. Additionally
-these partitions cannot be restated atomically; at some point the old
-partition data must be swapped with the new partition data. When this
-swap occurs, usually by issuing an HDFS
-rm
-followed by a
-mv
-, the possibility exists where the data appears to be unavailable and
-hence any downstream jobs consuming the data might unexpectedly fail.
-Therefore data processing patterns that restate raw data on HDFS cannot
-operate robustly without some external mechanism to orchestrate
-concurrent access to changing data.
-
-
-
-The availability of ACID tables in Hive provides a mechanism that both
-enables concurrent access to data stored in HDFS (so long as it's in the
-ORC+ACID format), and also permits row level mutations or records within
-a table, without the need to rewrite the existing data. But while Hive
-itself supports
-INSERT
-,
-UPDATE
-and
-DELETE
-commands, and the ORC format can support large batches of mutations in a
-transaction, Hive's execution engine currently submits each individual
-mutation operation in a separate transaction and issues table scans (M/R
-jobs) to execute them. It does not currently scale to the demands of
-processing large deltas in an atomic manner. Furthermore it would be
-advantageous to extend atomic batch mutation capabilities beyond Hive by
-making them available to other data processing frameworks. The Streaming
-Mutation API does just this.
-
-
-The Streaming Mutation API, although similar to the Streaming
-API, has a number of differences and are built to enable very different
-use cases. Superficially, the Streaming API can only write new data
-whereas the mutation API can also modify existing data. However the two
-APIs also based on very different transaction models. The Streaming API
-focuses on surfacing a continuous stream of new data into a Hive table
-and does so by batching small sets of writes into multiple short-lived
-transactions. Conversely the mutation API is designed to infrequently
-apply large sets of mutations to a data set in an atomic fashion; all
-mutations will either be applied or they will not. This instead mandates
-the use of a single long-lived transaction. This table summarises the
-attributes of each API:
-
-
-
-
-| Attribute |
-Streaming API |
-Mutation API |
-
-
-| Ingest type |
-Data arrives continuously |
-Ingests are performed periodically and the mutations are
-applied in a single batch |
-
-
-| Transaction scope |
-Transactions are created for small batches of writes |
-The entire set of mutations should be applied within a single
-transaction |
-
-
-| Data availability |
-Surfaces new data to users frequently and quickly |
-Change sets should be applied atomically, either the effect of
-the delta is visible or it is not |
-
-
-| Sensitive to record order |
-No, records do not have pre-existing lastTxnIds or bucketIds.
-Records are likely being written into a single partition (today's date
-for example) |
-Yes, all mutated records have existing RecordIdentifiers
-and must be grouped by (partitionValues, bucketId) and sorted by
-lastTxnId. These record coordinates initially arrive in an order that is
-effectively random.
- |
-
-
-| Impact of a write failure |
-Transaction can be aborted and producer can choose to resubmit
-failed records as ordering is not important. |
-Ingest for the respective must be halted and failed records
-resubmitted to preserve sequence. |
-
-
-| User perception of missing data |
-Data has not arrived yet → "latency?" |
-"This data is inconsistent, some records have been updated, but
-other related records have not" - consider here the classic transfer
-between bank accounts scenario |
-
-
-| API end point scope |
-A given HiveEndPoint instance submits many
-transactions to a specific bucket, in a specific partition, of a
-specific table
- |
-A set ofMutationCoordinators write changes to
-unknown set of buckets, of an unknown set of partitions, of specific
-tables (can be more than one), within a single transaction.
- |
-
-
-
-
-Structure
-The API comprises two main concerns: transaction management, and
-the writing of mutation operations to the data set. The two concerns
-have a minimal coupling as it is expected that transactions will be
-initiated from a single job launcher type processes while the writing of
-mutations will be scaled out across any number of worker nodes. In the
-context of Hadoop M/R these can be more concretely defined as the Tool
-and Map/Reduce task components. However, use of this architecture is not
-mandated and in fact both concerns could be handled within a single
-simple process depending on the requirements.
-
-Note that a suitably configured Hive instance is required to
-operate this system even if you do not intend to access the data from
-within Hive. Internally, transactions are managed by the Hive MetaStore.
-Mutations are performed to HDFS via ORC APIs that bypass the MetaStore.
-Additionally you may wish to configure your MetaStore instance to
-perform periodic data compactions.
-
-
-Note on packaging: The APIs are defined in the org.apache.hive.hcatalog.streaming.mutate
-Java package and included as the hive-hcatalog-streaming jar.
-
-
-Data requirements
-
-Generally speaking, to apply a mutation to a record one must have some
-unique key that identifies the record. However, primary keys are not a
-construct provided by Hive. Internally Hive uses
-RecordIdentifiers
-stored in a virtual
-ROW__ID
-column to uniquely identified records within an ACID table. Therefore,
-any process that wishes to issue mutations to a table via this API must
-have available the corresponding row ids for the target records. What
-this means in practice is that the process issuing mutations must first
-read in a current snapshot the data and then join the mutations on some
-domain specific primary key to obtain the corresponding Hive
-ROW__ID
-. This is effectively what occurs within Hive's table scan process when
-an
-UPDATE
-or
-DELETE
-statement is executed. The
-AcidInputFormat
-provides access to this data via
-AcidRecordReader.getRecordIdentifier()
-.
-
-
-
-The implementation of the ACID format places some constraints on the
-order in which records are written and it is important that this
-ordering is enforced. Additionally, data must be grouped appropriately
-to adhere to the constraints imposed be the
-OrcRecordUpdater
-. Grouping also makes it possible parallelise the writing of mutations
-for the purposes of scaling. Finally, to correctly bucket new records
-(inserts) there is a slightly unintuitive trick that must be applied.
-
-
-All of these data sequencing concerns are the responsibility of
-the client process calling the API which is assumed to have first class
-grouping and sorting capabilities (Hadoop Map/Reduce etc.) The streaming
-API provides nothing more than validators that fail fast when they
-encounter groups and records that are out of sequence.
-
-In short, API client processes should prepare data for the mutate
-API like so:
-
-- MUST: Order records by
ROW__ID.originalTxn,
-then ROW__ID.rowId.
-- MUST: Assign a
ROW__ID containing a
-computed bucketId to records to be inserted.
-- SHOULD: Group/partition by table partition value, then
ROW__ID.bucketId.
-
-
-
-The addition of a bucket ids to insert records prior to grouping and
-sorting seems unintuitive. However, it is required both to ensure
-adequate partitioning of new data and bucket allocation consistent with
-that provided by Hive. In a typical ETL the majority of mutation events
-are inserts, often targeting a single partition (new data for the
-previous day, hour, etc.) If more that one worker is writing said
-events, were we to leave the bucket id empty then all inserts would go
-to a single worker (e.g: reducer) and the workload could be heavily
-skewed. The assignment of a computed bucket allows inserts to be more
-usefully distributed across workers. Additionally, when Hive is working
-with the data it may expect records to have been bucketed in a way that
-is consistent with it's own internal scheme. A convenience type and
-method is provided to more easily compute and append bucket ids:
-BucketIdResolver
-and
-BucketIdResolverImpl
-.
-
-
-Update operations should not attempt to modify values of
-partition or bucketing columns. The API does not prevent this and such
-attempts could lead to data corruption.
-
-Streaming requirements
-A few things are currently required to use streaming.
-
-
-
-- Currently, only ORC storage format is supported. So 'stored
-as orc' must be specified during table creation.
-
-- The hive table must be bucketed, but not sorted. So something
-like 'clustered by (colName) into 10 buckets
-' must be specified during table creation.
-
-- User of the client streaming process must have the necessary
-permissions to write to the table or partition and create partitions in
-the table.
-- Settings required in hive-site.xml for Metastore:
-
-- hive.txn.manager =
-org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
-- hive.support.concurrency = true
-- hive.compactor.initiator.on = true
-- hive.compactor.worker.threads > 0
-
-
-
-
-
-
-Note: Streaming mutations to unpartitioned tables is also
-supported.
-
-
-Record layout
-
-The structure, layout, and encoding of records is the exclusive concern
-of the client ETL mutation process and may be quite different from the
-target Hive ACID table. The mutation API requires concrete
-implementations of the
-MutatorFactory
-and
-Mutator
-classes to extract pertinent data from records and serialize data into
-the ACID files. Fortunately base classes are provided (
-AbstractMutator
-,
-RecordInspectorImpl
-) to simplify this effort and usually all that is required is the
-specification of a suitable
-ObjectInspector
-and the provision of the indexes of the
-ROW__ID
-and bucketed columns within the record structure. Note that all column
-indexes in these classes are with respect to your record structure, not
-the Hive table structure.
-
-
-You will likely also want to use a
-BucketIdResolver
-to append bucket ids to new records for insertion. Fortunately the core
-implementation is provided in
-BucketIdResolverImpl
-but note that bucket column indexes must be presented in the same order
-as they are in the Hive table definition to ensure consistent bucketing.
-Note that you cannot move records between buckets and an exception will
-be thrown if you attempt to do so. In real terms this mean that you
-should not attempt to modify the values in bucket columns with an
-UPDATE
-.
-
-
-Connection and Transaction management
-
-The
-MutatorClient
-class is used to create and manage transactions in which mutations can
-be performed. The scope of a transaction can extend across multiple ACID
-tables. When a client connects it communicates with the meta store to
-verify and acquire meta data for the target tables. An invocation of
-newTransaction
-then opens a transaction with the meta store, finalizes a collection of
-AcidTables
-and returns a new
-Transaction
-instance. The acid tables are light-weight, serializable objects that
-are used by the mutation writing components of the API to target
-specific ACID file locations. Usually your
-MutatorClient
-will be running on some master node and your coordinators on worker
-nodes. In this event the
-AcidTableSerializer
-can be used to encode the tables in a more transportable form, for use
-as a
-Configuration
-property for example.
-
-
-As you would expect, a
-Transaction
-must be initiated with a call to
-begin
-before any mutations can be applied. This invocation acquires a lock on
-the targeted tables using the meta store, and initiates a heartbeat to
-prevent transaction timeouts. It is highly recommended that you register
-a
-LockFailureListener
-with the client so that your process can handle any lock or transaction
-failures. Typically you may wish to abort the job in the event of such
-an error. With the transaction in place you can now start streaming
-mutations with one or more
-MutatorCoordinator
-instances (more on this later), can can finally
-commit
-or
-abort
-the transaction when the change set has been applied, which will release
-the lock with the meta store client. Finally you should
-close
-the mutation client to release any held resources.
-
-
-The
-MutatorClientBuilder
-is provided to simplify the construction of clients.
-
-
-
-WARNING: Hive doesn't currently have a deadlock detector (it is
-being worked on as part of HIVE-9675).
-This API could potentially deadlock with other stream writers or with
-SQL users.
-
-Writing data
-
-
-The
-MutatorCoordinator
-class is used to issue mutations to an ACID table. You will require at
-least one instance per table participating in the transaction. The
-target of a given instance is defined by the respective
-AcidTable
-used to construct the coordinator. It is recommended that a
-MutatorClientBuilder
-is used to simplify the construction process.
-
-
-
-Mutations can be applied by invoking the respective
-insert
-,
-update
-, and
-delete
-methods on the coordinator. These methods each take as parameters the
-target partition of the record and the mutated record. In the case of an
-unpartitioned table you should simply pass an empty list as the
-partition value. For inserts specifically, only the bucket id will be
-extracted from the
-RecordIdentifier
-, the writeId and rowId will be ignored and replaced by
-appropriate values in the
-RecordUpdater
-. Additionally, in the case of deletes, everything but the
-RecordIdentifier
-in the record will be ignored and therefore it is often easier to simply
-submit the original record.
-
-
-
-Caution: As mentioned previously, mutations must arrive in
-specific order for the resultant table data to be consistent.
-Coordinators will verify a naturally ordered sequence of
-(writeId, rowId) and will throw an exception if this sequence
-is broken. This exception should almost certainly be escalated so that
-the transaction is aborted. This, along with the correct ordering of the
-data, is the responsibility of the client using the API.
-
-
-Dynamic Partition Creation:
-
-It is very likely to be desirable to have new partitions created
-automatically (say on a hourly basis). In such cases requiring the Hive
-admin to pre-create the necessary partitions may not be reasonable. The
-API allows coordinators to create partitions as needed (see:
-MutatorClientBuilder.addSinkTable(String, String, boolean)
-). Partition creation being an atomic action, multiple coordinators can
-race to create the partition, but only one would succeed, so
-coordinators clients need not synchronize when creating a partition. The
-user of the coordinator process needs to be given write permissions on
-the Hive table in order to create partitions.
-
-
-Care must be taken when using this option as it requires that the
-coordinators maintain a connection with the meta store database. When
-coordinator are running in a distributed environment (as is likely the
-case) it possible for them to overwhelm the meta store. In such cases it
-may be better to disable partition creation and collect a set of
-affected partitions as part of your ETL merge process. These can then be
-created with a single meta store connection in your client code, once
-the cluster side merge process is complete.
-
-Finally, note that when partition creation is disabled the coordinators
-must synthesize the partition URI as they cannot retrieve it from the
-meta store. This may cause problems if the layout of your partitions in
-HDFS does not follow the Hive standard (as implemented in
-
-org.apache.hadoop.hive.metastore.Warehouse.getPartitionPath(Path,
-LinkedHashMap
-<String , String>).
-
-)
-
-
-Reading data
-
-
-Although this API is concerned with writing changes to data, as
-previously stated we'll almost certainly have to read the existing data
-first to obtain the relevant
-ROW_IDs
-. Therefore it is worth noting that reading ACID data in a robust and
-consistent manner requires the following:
-
-- Obtaining a valid transaction list from the meta store (
ValidTxnList).
-
-- Acquiring a lock with the meta store and issuing heartbeats (
LockImpl
-can help with this).
-
-- Configuring the
OrcInputFormat and then reading
-the data. Make sure that you also pull in the ROW__ID
-values. See: AcidRecordReader.getRecordIdentifier.
-
-- Releasing the lock.
-
-
-
-Example
-
-
-
-So to recap, the sequence of events required to apply mutations
-to a dataset using the API is:
-
-- Create a
MutatorClient to manage a transaction for
-the targeted ACID tables. This set of tables should include any
-transactional destinations or sources. Don't forget to register a LockFailureListener
-so that you can handle transaction failures.
-
-- Open a new
Transaction with the client.
-
-- Get the
AcidTables from the client.
-
-- Begin the transaction.
-- Create at least one
MutatorCoordinator for each
-table. The AcidTableSerializer can help you transport the AcidTables
-when your workers are in a distributed environment.
-
-- Compute your mutation set (this is your ETL merge process).
-- Optionally: collect the set of affected partitions.
-- Append bucket ids to insertion records. A
BucketIdResolver
-can help here.
-
-- Group and sort your data appropriately.
-- Issue mutation events to your coordinators.
-- Close your coordinators.
-- Abort or commit the transaction.
-- Close your mutation client.
-- Optionally: create any affected partitions that do not exist in
-the meta store.
-
-
-See
-ExampleUseCase
-and
-TestMutations.testUpdatesAndDeletes()
-for some very simple usages.
-
-
-
-
-
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdException.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdException.java
deleted file mode 100644
index 040fce3..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdException.java
+++ /dev/null
@@ -1,28 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-public class BucketIdException extends WorkerException {
-
- private static final long serialVersionUID = 1L;
-
- BucketIdException(String message) {
- super(message);
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
deleted file mode 100644
index 3432baa..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolver.java
+++ /dev/null
@@ -1,31 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-/** Computes and appends bucket ids to records that are due to be inserted.
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-public interface BucketIdResolver {
-
- Object attachBucketIdToRecord(Object record);
-
- /** See: {@link org.apache.hadoop.hive.ql.exec.ReduceSinkOperator#computeBucketNumber(Object, int)}. */
- int computeBucketId(Object record);
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
deleted file mode 100644
index 1d51d85..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdResolverImpl.java
+++ /dev/null
@@ -1,93 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.util.List;
-
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat;
-import org.apache.hadoop.hive.ql.io.BucketCodec;
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;
-import org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector;
-import org.apache.hadoop.hive.serde2.objectinspector.StructField;
-
-/**
- * Implementation of a {@link BucketIdResolver} that includes the logic required to calculate a bucket id from a record
- * that is consistent with Hive's own internal computation scheme.
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-public class BucketIdResolverImpl implements BucketIdResolver {
-
- private static final long INVALID_TRANSACTION_ID = -1L;
- private static final long INVALID_ROW_ID = -1L;
-
- private final SettableStructObjectInspector structObjectInspector;
- private final StructField[] bucketFields;
- private final int totalBuckets;
- private final StructField recordIdentifierField;
-
- /**
- * Note that all column indexes are with respect to your record structure, not the Hive table structure. Bucket column
- * indexes must be presented in the same order as they are in the Hive table definition.
- */
- public BucketIdResolverImpl(ObjectInspector objectInspector, int recordIdColumn, int totalBuckets, int[] bucketColumns) {
- this.totalBuckets = totalBuckets;
- if (!(objectInspector instanceof SettableStructObjectInspector)) {
- throw new IllegalArgumentException("Serious problem, expected a StructObjectInspector, " + "but got a "
- + objectInspector.getClass().getName());
- }
-
- if (bucketColumns.length < 1) {
- throw new IllegalArgumentException("No bucket column indexes set.");
- }
- structObjectInspector = (SettableStructObjectInspector) objectInspector;
- List extends StructField> structFields = structObjectInspector.getAllStructFieldRefs();
-
- recordIdentifierField = structFields.get(recordIdColumn);
-
- bucketFields = new StructField[bucketColumns.length];
- for (int i = 0; i < bucketColumns.length; i++) {
- int bucketColumnsIndex = bucketColumns[i];
- bucketFields[i] = structFields.get(bucketColumnsIndex);
- }
- }
-
- @Override
- public Object attachBucketIdToRecord(Object record) {
- int bucketId = computeBucketId(record);
- int bucketProperty =
- BucketCodec.V1.encode(new AcidOutputFormat.Options(null).bucket(bucketId));
- RecordIdentifier recordIdentifier = new RecordIdentifier(INVALID_TRANSACTION_ID, bucketProperty, INVALID_ROW_ID);
- structObjectInspector.setStructFieldData(record, recordIdentifierField, recordIdentifier);
- return record;
- }
-
- @Override
- public int computeBucketId(Object record) {
- Object[] bucketFieldValues = new Object[bucketFields.length];
- ObjectInspector[] bucketFiledInspectors = new ObjectInspector[bucketFields.length];
- for (int columnIndex = 0; columnIndex < bucketFields.length; columnIndex++) {
- bucketFieldValues[columnIndex] = structObjectInspector.getStructFieldData(record, bucketFields[columnIndex]);
- bucketFiledInspectors[columnIndex] = bucketFields[columnIndex].getFieldObjectInspector();
- }
- return ObjectInspectorUtils.getBucketNumber(bucketFieldValues, bucketFiledInspectors, totalBuckets);
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/GroupRevisitedException.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/GroupRevisitedException.java
deleted file mode 100644
index ffa8c3e..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/GroupRevisitedException.java
+++ /dev/null
@@ -1,28 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-public class GroupRevisitedException extends WorkerException {
-
- private static final long serialVersionUID = 1L;
-
- GroupRevisitedException(String message) {
- super(message);
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/GroupingValidator.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/GroupingValidator.java
deleted file mode 100644
index f28b8ff..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/GroupingValidator.java
+++ /dev/null
@@ -1,91 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Objects;
-import java.util.Set;
-
-/**
- * Tracks the (partition, bucket) combinations that have been encountered, checking that a group is not revisited.
- * Potentially memory intensive.
- */
-class GroupingValidator {
-
- private final Map> visited;
- private final StringBuilder partitionKeyBuilder;
- private long groups;
- private String lastPartitionKey;
- private int lastBucketId = -1;
-
- GroupingValidator() {
- visited = new HashMap>();
- partitionKeyBuilder = new StringBuilder(64);
- }
-
- /**
- * Checks that this group is either the same as the last or is a new group.
- */
- boolean isInSequence(List partitionValues, int bucketId) {
- String partitionKey = getPartitionKey(partitionValues);
- if (Objects.equals(lastPartitionKey, partitionKey) && lastBucketId == bucketId) {
- return true;
- }
- lastPartitionKey = partitionKey;
- lastBucketId = bucketId;
-
- Set bucketIdSet = visited.get(partitionKey);
- if (bucketIdSet == null) {
- // If the bucket id set component of this data structure proves to be too large there is the
- // option of moving it to Trove or HPPC in an effort to reduce size.
- bucketIdSet = new HashSet<>();
- visited.put(partitionKey, bucketIdSet);
- }
-
- boolean newGroup = bucketIdSet.add(bucketId);
- if (newGroup) {
- groups++;
- }
- return newGroup;
- }
-
- private String getPartitionKey(List partitionValues) {
- partitionKeyBuilder.setLength(0);
- boolean first = true;
- for (String element : partitionValues) {
- if (first) {
- first = false;
- } else {
- partitionKeyBuilder.append('/');
- }
- partitionKeyBuilder.append(element);
- }
- String partitionKey = partitionKeyBuilder.toString();
- return partitionKey;
- }
-
- @Override
- public String toString() {
- return "GroupingValidator [groups=" + groups + ",lastPartitionKey=" + lastPartitionKey + ",lastBucketId="
- + lastBucketId + "]";
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MetaStorePartitionHelper.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MetaStorePartitionHelper.java
deleted file mode 100644
index fb88f2d..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MetaStorePartitionHelper.java
+++ /dev/null
@@ -1,119 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.io.IOException;
-import java.util.List;
-
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.Warehouse;
-import org.apache.hadoop.hive.metastore.api.AlreadyExistsException;
-import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
-import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.thrift.TException;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-/**
- * A {@link PartitionHelper} implementation that uses the {@link IMetaStoreClient meta store} to both create partitions
- * and obtain information concerning partitions. Exercise care when using this from within workers that are running in a
- * cluster as it may overwhelm the meta store database instance. As an alternative, consider using the
- * {@link WarehousePartitionHelper}, collecting the affected partitions as an output of your merge job, and then
- * retrospectively adding partitions in your client.
- */
-class MetaStorePartitionHelper implements PartitionHelper {
-
- private static final Logger LOG = LoggerFactory.getLogger(MetaStorePartitionHelper.class);
-
- private final IMetaStoreClient metaStoreClient;
- private final String databaseName;
- private final String tableName;
- private final Path tablePath;
-
- MetaStorePartitionHelper(IMetaStoreClient metaStoreClient, String databaseName, String tableName, Path tablePath) {
- this.metaStoreClient = metaStoreClient;
- this.tablePath = tablePath;
- this.databaseName = databaseName;
- this.tableName = tableName;
- }
-
- /** Returns the expected {@link Path} for a given partition value. */
- @Override
- public Path getPathForPartition(List newPartitionValues) throws WorkerException {
- if (newPartitionValues.isEmpty()) {
- LOG.debug("Using path {} for unpartitioned table {}.{}", tablePath, databaseName, tableName);
- return tablePath;
- } else {
- try {
- String location = metaStoreClient
- .getPartition(databaseName, tableName, newPartitionValues)
- .getSd()
- .getLocation();
- LOG.debug("Found path {} for partition {}", location, newPartitionValues);
- return new Path(location);
- } catch (NoSuchObjectException e) {
- throw new WorkerException("Table not found '" + databaseName + "." + tableName + "'.", e);
- } catch (TException e) {
- throw new WorkerException("Failed to get path for partitions '" + newPartitionValues + "' on table '"
- + databaseName + "." + tableName + "' with meta store: " + metaStoreClient, e);
- }
- }
- }
-
- /** Creates the specified partition if it does not already exist. Does nothing if the table is unpartitioned. */
- @Override
- public void createPartitionIfNotExists(List newPartitionValues) throws WorkerException {
- if (newPartitionValues.isEmpty()) {
- return;
- }
-
- try {
- LOG.debug("Attempting to create partition (if not exists) {}.{}:{}", databaseName, tableName, newPartitionValues);
- Table table = metaStoreClient.getTable(databaseName, tableName);
-
- Partition partition = new Partition();
- partition.setDbName(table.getDbName());
- partition.setTableName(table.getTableName());
- StorageDescriptor partitionSd = new StorageDescriptor(table.getSd());
- partitionSd.setLocation(table.getSd().getLocation() + Path.SEPARATOR
- + Warehouse.makePartName(table.getPartitionKeys(), newPartitionValues));
- partition.setSd(partitionSd);
- partition.setValues(newPartitionValues);
-
- metaStoreClient.add_partition(partition);
- } catch (AlreadyExistsException e) {
- LOG.debug("Partition already exisits: {}.{}:{}", databaseName, tableName, newPartitionValues);
- } catch (NoSuchObjectException e) {
- LOG.error("Failed to create partition : " + newPartitionValues, e);
- throw new PartitionCreationException("Table not found '" + databaseName + "." + tableName + "'.", e);
- } catch (TException e) {
- LOG.error("Failed to create partition : " + newPartitionValues, e);
- throw new PartitionCreationException("Failed to create partition '" + newPartitionValues + "' on table '"
- + databaseName + "." + tableName + "'", e);
- }
- }
-
- @Override
- public void close() throws IOException {
- metaStoreClient.close();
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/Mutator.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/Mutator.java
deleted file mode 100644
index e6f968e..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/Mutator.java
+++ /dev/null
@@ -1,40 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.io.Closeable;
-import java.io.Flushable;
-import java.io.IOException;
-
-/**
- * Interface for submitting mutation events to a given partition and bucket in an ACID table. Requires records to arrive
- * in the order defined by the {@link SequenceValidator}.
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-public interface Mutator extends Closeable, Flushable {
-
- void insert(Object record) throws IOException;
-
- void update(Object record) throws IOException;
-
- void delete(Object record) throws IOException;
-
- void flush() throws IOException;
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
deleted file mode 100644
index 67785d0..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinator.java
+++ /dev/null
@@ -1,300 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.io.Closeable;
-import java.io.Flushable;
-import java.io.IOException;
-import java.util.Collections;
-import java.util.List;
-import java.util.Objects;
-
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.common.JavaUtils;
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat;
-import org.apache.hadoop.hive.ql.io.AcidUtils;
-import org.apache.hadoop.hive.ql.io.BucketCodec;
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hadoop.hive.ql.io.RecordUpdater;
-import org.apache.hadoop.util.ReflectionUtils;
-import org.apache.hive.hcatalog.streaming.mutate.client.AcidTable;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-/**
- * Orchestrates the application of an ordered sequence of mutation events to a given ACID table. Events must be grouped
- * by partition, then bucket and ordered by origTxnId, then rowId. Ordering is enforced by the {@link SequenceValidator}
- * and grouping is by the {@link GroupingValidator}. An acid delta file is created for each combination partition, and
- * bucket id (a single write id is implied). Once a delta file has been closed it cannot be reopened. Therefore
- * care is needed as to group the data correctly otherwise failures will occur if a delta belonging to group has been
- * previously closed. The {@link MutatorCoordinator} will seamlessly handle transitions between groups, creating and
- * closing {@link Mutator Mutators} as needed to write to the appropriate partition and bucket. New partitions will be
- * created in the meta store if {@link AcidTable#createPartitions()} is set.
- *
- * {@link #insert(List, Object) Insert} events must be artificially assigned appropriate bucket ids in the preceding
- * grouping phase so that they are grouped correctly. Note that any write id or row id assigned to the
- * {@link RecordIdentifier RecordIdentifier} of such events will be ignored by both the coordinator and the underlying
- * {@link RecordUpdater}.
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-public class MutatorCoordinator implements Closeable, Flushable {
-
- private static final Logger LOG = LoggerFactory.getLogger(MutatorCoordinator.class);
-
- private final MutatorFactory mutatorFactory;
- private final GroupingValidator groupingValidator;
- private final SequenceValidator sequenceValidator;
- private final AcidTable table;
- private final RecordInspector recordInspector;
- private final PartitionHelper partitionHelper;
- private final AcidOutputFormat, ?> outputFormat;
- private final BucketIdResolver bucketIdResolver;
- private final HiveConf configuration;
- private final boolean deleteDeltaIfExists;
-
- private int bucketId;
- private List partitionValues;
- private Path partitionPath;
- private Mutator mutator;
-
- MutatorCoordinator(HiveConf configuration, MutatorFactory mutatorFactory, PartitionHelper partitionHelper,
- AcidTable table, boolean deleteDeltaIfExists) throws WorkerException {
- this(configuration, mutatorFactory, partitionHelper, new GroupingValidator(), new SequenceValidator(), table,
- deleteDeltaIfExists);
- }
-
- /** Visible for testing only. */
- MutatorCoordinator(HiveConf configuration, MutatorFactory mutatorFactory, PartitionHelper partitionHelper,
- GroupingValidator groupingValidator, SequenceValidator sequenceValidator, AcidTable table,
- boolean deleteDeltaIfExists) throws WorkerException {
- this.configuration = configuration;
- this.mutatorFactory = mutatorFactory;
- this.partitionHelper = partitionHelper;
- this.groupingValidator = groupingValidator;
- this.sequenceValidator = sequenceValidator;
- this.table = table;
- this.deleteDeltaIfExists = deleteDeltaIfExists;
- this.recordInspector = this.mutatorFactory.newRecordInspector();
- bucketIdResolver = this.mutatorFactory.newBucketIdResolver(table.getTotalBuckets());
-
- bucketId = -1;
- outputFormat = createOutputFormat(table.getOutputFormatName(), configuration);
- }
-
- /**
- * We expect records grouped by (partitionValues,bucketId) and ordered by (origWriteId,rowId).
- *
- * @throws BucketIdException The bucket ID in the {@link RecordIdentifier} of the record does not match that computed
- * using the values in the record's bucketed columns.
- * @throws RecordSequenceException The record was submitted that was not in the correct ascending (origWriteId, rowId)
- * sequence.
- * @throws GroupRevisitedException If an event was submitted for a (partition, bucketId) combination that has already
- * been closed.
- * @throws PartitionCreationException Could not create a new partition in the meta store.
- * @throws WorkerException
- */
- public void insert(List partitionValues, Object record) throws WorkerException {
- reconfigureState(OperationType.INSERT, partitionValues, record);
- try {
- mutator.insert(record);
- LOG.debug("Inserted into partition={}, record={}", partitionValues, record);
- } catch (IOException e) {
- throw new WorkerException("Failed to insert record '" + record + " using mutator '" + mutator + "'.", e);
- }
- }
-
- /**
- * We expect records grouped by (partitionValues,bucketId) and ordered by (origWriteId,rowId).
- *
- * @throws BucketIdException The bucket ID in the {@link RecordIdentifier} of the record does not match that computed
- * using the values in the record's bucketed columns.
- * @throws RecordSequenceException The record was submitted that was not in the correct ascending (origWriteId, rowId)
- * sequence.
- * @throws GroupRevisitedException If an event was submitted for a (partition, bucketId) combination that has already
- * been closed.
- * @throws PartitionCreationException Could not create a new partition in the meta store.
- * @throws WorkerException
- */
- public void update(List partitionValues, Object record) throws WorkerException {
- reconfigureState(OperationType.UPDATE, partitionValues, record);
- try {
- mutator.update(record);
- LOG.debug("Updated in partition={}, record={}", partitionValues, record);
- } catch (IOException e) {
- throw new WorkerException("Failed to update record '" + record + " using mutator '" + mutator + "'.", e);
- }
- }
-
- /**
- * We expect records grouped by (partitionValues,bucketId) and ordered by (origWriteId,rowId).
- *
- * @throws BucketIdException The bucket ID in the {@link RecordIdentifier} of the record does not match that computed
- * using the values in the record's bucketed columns.
- * @throws RecordSequenceException The record was submitted that was not in the correct ascending (origWriteId, rowId)
- * sequence.
- * @throws GroupRevisitedException If an event was submitted for a (partition, bucketId) combination that has already
- * been closed.
- * @throws PartitionCreationException Could not create a new partition in the meta store.
- * @throws WorkerException
- */
- public void delete(List partitionValues, Object record) throws WorkerException {
- reconfigureState(OperationType.DELETE, partitionValues, record);
- try {
- mutator.delete(record);
- LOG.debug("Deleted from partition={}, record={}", partitionValues, record);
- } catch (IOException e) {
- throw new WorkerException("Failed to delete record '" + record + " using mutator '" + mutator + "'.", e);
- }
- }
-
- @Override
- public void close() throws IOException {
- try {
- if (mutator != null) {
- mutator.close();
- }
- } finally {
- partitionHelper.close();
- }
- }
-
- @Override
- public void flush() throws IOException {
- if (mutator != null) {
- mutator.flush();
- }
- }
-
- private void reconfigureState(OperationType operationType, List newPartitionValues, Object record)
- throws WorkerException {
- RecordIdentifier newRecordIdentifier = extractRecordIdentifier(operationType, newPartitionValues, record);
- int newBucketId = newRecordIdentifier.getBucketProperty();
-
- if (newPartitionValues == null) {
- newPartitionValues = Collections.emptyList();
- }
-
- try {
- if (partitionHasChanged(newPartitionValues)) {
- if (table.createPartitions() && operationType == OperationType.INSERT) {
- partitionHelper.createPartitionIfNotExists(newPartitionValues);
- }
- Path newPartitionPath = partitionHelper.getPathForPartition(newPartitionValues);
- resetMutator(newBucketId, newPartitionValues, newPartitionPath);
- } else if (bucketIdHasChanged(newBucketId)) {
- resetMutator(newBucketId, partitionValues, partitionPath);
- } else {
- validateRecordSequence(operationType, newRecordIdentifier);
- }
- } catch (IOException e) {
- throw new WorkerException("Failed to reset mutator when performing " + operationType + " of record: " + record, e);
- }
- }
-
- private RecordIdentifier extractRecordIdentifier(OperationType operationType, List newPartitionValues,
- Object record) throws BucketIdException {
- RecordIdentifier recordIdentifier = recordInspector.extractRecordIdentifier(record);
- int bucketIdFromRecord = BucketCodec.determineVersion(
- recordIdentifier.getBucketProperty()).decodeWriterId(recordIdentifier.getBucketProperty());
- int computedBucketId = bucketIdResolver.computeBucketId(record);
- if (operationType != OperationType.DELETE && bucketIdFromRecord != computedBucketId) {
- throw new BucketIdException("RecordIdentifier.bucketId != computed bucketId (" + computedBucketId
- + ") for record " + recordIdentifier + " in partition " + newPartitionValues + ".");
- }
- return recordIdentifier;
- }
-
- private void resetMutator(int newBucketId, List newPartitionValues, Path newPartitionPath)
- throws IOException, GroupRevisitedException {
- if (mutator != null) {
- mutator.close();
- }
- validateGrouping(newPartitionValues, newBucketId);
- sequenceValidator.reset();
- if (deleteDeltaIfExists) {
- // TODO: Should this be the concern of the mutator?
- deleteDeltaIfExists(newPartitionPath, table.getWriteId(), newBucketId);
- }
- mutator = mutatorFactory.newMutator(outputFormat, table.getWriteId(), newPartitionPath, newBucketId);
- bucketId = newBucketId;
- partitionValues = newPartitionValues;
- partitionPath = newPartitionPath;
- LOG.debug("Reset mutator: bucketId={}, partition={}, partitionPath={}", bucketId, partitionValues, partitionPath);
- }
-
- private boolean partitionHasChanged(List newPartitionValues) {
- boolean partitionHasChanged = !Objects.equals(this.partitionValues, newPartitionValues);
- if (partitionHasChanged) {
- LOG.debug("Partition changed from={}, to={}", this.partitionValues, newPartitionValues);
- }
- return partitionHasChanged;
- }
-
- private boolean bucketIdHasChanged(int newBucketId) {
- boolean bucketIdHasChanged = this.bucketId != newBucketId;
- if (bucketIdHasChanged) {
- LOG.debug("Bucket ID changed from={}, to={}", this.bucketId, newBucketId);
- }
- return bucketIdHasChanged;
- }
-
- private void validateGrouping(List newPartitionValues, int newBucketId) throws GroupRevisitedException {
- if (!groupingValidator.isInSequence(newPartitionValues, bucketId)) {
- throw new GroupRevisitedException("Group out of sequence: state=" + groupingValidator + ", partition="
- + newPartitionValues + ", bucketId=" + newBucketId);
- }
- }
-
- private void validateRecordSequence(OperationType operationType, RecordIdentifier newRecordIdentifier)
- throws RecordSequenceException {
- boolean identiferOutOfSequence = operationType != OperationType.INSERT
- && !sequenceValidator.isInSequence(newRecordIdentifier);
- if (identiferOutOfSequence) {
- throw new RecordSequenceException("Records not in sequence: state=" + sequenceValidator + ", recordIdentifier="
- + newRecordIdentifier);
- }
- }
-
- @SuppressWarnings("unchecked")
- private AcidOutputFormat, ?> createOutputFormat(String outputFormatName, HiveConf configuration)
- throws WorkerException {
- try {
- return (AcidOutputFormat, ?>) ReflectionUtils.newInstance(JavaUtils.loadClass(outputFormatName), configuration);
- } catch (ClassNotFoundException e) {
- throw new WorkerException("Could not locate class for '" + outputFormatName + "'.", e);
- }
- }
-
- /* A delta may be present from a previous failed task attempt. */
- private void deleteDeltaIfExists(Path partitionPath, long writeId, int bucketId) throws IOException {
- Path deltaPath = AcidUtils.createFilename(partitionPath,
- new AcidOutputFormat.Options(configuration)
- .bucket(bucketId)
- .minimumWriteId(writeId)
- .maximumWriteId(writeId));
- FileSystem fileSystem = deltaPath.getFileSystem(configuration);
- if (fileSystem.exists(deltaPath)) {
- LOG.info("Deleting existing delta path: {}", deltaPath);
- fileSystem.delete(deltaPath, false);
- }
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinatorBuilder.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinatorBuilder.java
deleted file mode 100644
index 698ba7c..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorCoordinatorBuilder.java
+++ /dev/null
@@ -1,121 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.io.IOException;
-import java.util.ArrayList;
-import java.util.List;
-
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.security.UserGroupInformation;
-import org.apache.hive.hcatalog.common.HCatUtil;
-import org.apache.hive.hcatalog.streaming.mutate.HiveConfFactory;
-import org.apache.hive.hcatalog.streaming.mutate.UgiMetaStoreClientFactory;
-import org.apache.hive.hcatalog.streaming.mutate.client.AcidTable;
-
-/** Convenience class for building {@link MutatorCoordinator} instances.
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-public class MutatorCoordinatorBuilder {
-
- private HiveConf configuration;
- private MutatorFactory mutatorFactory;
- private UserGroupInformation authenticatedUser;
- private String metaStoreUri;
- private AcidTable table;
- private boolean deleteDeltaIfExists;
-
- public MutatorCoordinatorBuilder configuration(HiveConf configuration) {
- this.configuration = configuration;
- return this;
- }
-
- public MutatorCoordinatorBuilder authenticatedUser(UserGroupInformation authenticatedUser) {
- this.authenticatedUser = authenticatedUser;
- return this;
- }
-
- public MutatorCoordinatorBuilder metaStoreUri(String metaStoreUri) {
- this.metaStoreUri = metaStoreUri;
- return this;
- }
-
- /** Set the destination ACID table for this client. */
- public MutatorCoordinatorBuilder table(AcidTable table) {
- this.table = table;
- return this;
- }
-
- /**
- * If the delta file already exists, delete it. THis is useful in a MapReduce setting where a number of task retries
- * will attempt to write the same delta file.
- */
- public MutatorCoordinatorBuilder deleteDeltaIfExists() {
- this.deleteDeltaIfExists = true;
- return this;
- }
-
- public MutatorCoordinatorBuilder mutatorFactory(MutatorFactory mutatorFactory) {
- this.mutatorFactory = mutatorFactory;
- return this;
- }
-
- public MutatorCoordinator build() throws WorkerException, MetaException {
- configuration = HiveConfFactory.newInstance(configuration, this.getClass(), metaStoreUri);
-
- PartitionHelper partitionHelper;
- if (table.createPartitions()) {
- partitionHelper = newMetaStorePartitionHelper();
- } else {
- partitionHelper = newWarehousePartitionHelper();
- }
-
- return new MutatorCoordinator(configuration, mutatorFactory, partitionHelper, table, deleteDeltaIfExists);
- }
-
- private PartitionHelper newWarehousePartitionHelper() throws MetaException, WorkerException {
- String location = table.getTable().getSd().getLocation();
- Path tablePath = new Path(location);
- List partitionFields = table.getTable().getPartitionKeys();
- List partitionColumns = new ArrayList<>(partitionFields.size());
- for (FieldSchema field : partitionFields) {
- partitionColumns.add(field.getName());
- }
- return new WarehousePartitionHelper(configuration, tablePath, partitionColumns);
- }
-
- private PartitionHelper newMetaStorePartitionHelper() throws MetaException, WorkerException {
- String user = authenticatedUser == null ? System.getProperty("user.name") : authenticatedUser.getShortUserName();
- boolean secureMode = authenticatedUser == null ? false : authenticatedUser.hasKerberosCredentials();
- try {
- IMetaStoreClient metaStoreClient = new UgiMetaStoreClientFactory(metaStoreUri, configuration, authenticatedUser,
- user, secureMode).newInstance(HCatUtil.getHiveMetastoreClient(configuration));
- String tableLocation = table.getTable().getSd().getLocation();
- Path tablePath = new Path(tableLocation);
- return new MetaStorePartitionHelper(metaStoreClient, table.getDatabaseName(), table.getTableName(), tablePath);
- } catch (IOException e) {
- throw new WorkerException("Could not create meta store client.", e);
- }
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorFactory.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorFactory.java
deleted file mode 100644
index d3d3210..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorFactory.java
+++ /dev/null
@@ -1,38 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.io.IOException;
-
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat;
-
-/**
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-public interface MutatorFactory {
-
- Mutator newMutator(AcidOutputFormat, ?> outputFormat, long writeId, Path partitionPath, int bucketId)
- throws IOException;
-
- RecordInspector newRecordInspector();
-
- BucketIdResolver newBucketIdResolver(int totalBuckets);
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorImpl.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorImpl.java
deleted file mode 100644
index 1e0cb72..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/MutatorImpl.java
+++ /dev/null
@@ -1,114 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.io.IOException;
-
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat;
-import org.apache.hadoop.hive.ql.io.BucketCodec;
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hadoop.hive.ql.io.RecordUpdater;
-import org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
-
-/** Base {@link Mutator} implementation. Creates a suitable {@link RecordUpdater} and delegates mutation events.
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-public class MutatorImpl implements Mutator {
-
- private final long writeId;
- private final Path partitionPath;
- private final int bucketProperty;
- private final Configuration configuration;
- private final int recordIdColumn;
- private final ObjectInspector objectInspector;
- private RecordUpdater updater;
-
- /**
- * @param bucketProperty - from existing {@link RecordIdentifier#getBucketProperty()}
- * @throws IOException
- */
- public MutatorImpl(Configuration configuration, int recordIdColumn, ObjectInspector objectInspector,
- AcidOutputFormat, ?> outputFormat, long writeId, Path partitionPath, int bucketProperty) throws IOException {
- this.configuration = configuration;
- this.recordIdColumn = recordIdColumn;
- this.objectInspector = objectInspector;
- this.writeId = writeId;
- this.partitionPath = partitionPath;
- this.bucketProperty = bucketProperty;
-
- updater = createRecordUpdater(outputFormat);
- }
-
- @Override
- public void insert(Object record) throws IOException {
- updater.insert(writeId, record);
- }
-
- @Override
- public void update(Object record) throws IOException {
- updater.update(writeId, record);
- }
-
- @Override
- public void delete(Object record) throws IOException {
- updater.delete(writeId, record);
- }
-
- /**
- * This implementation does intentionally nothing at this time. We only use a single transaction and
- * {@link OrcRecordUpdater#flush()} will purposefully throw and exception in this instance. We keep this here in the
- * event that we support multiple transactions and to make it clear that the omission of an invocation of
- * {@link OrcRecordUpdater#flush()} was not a mistake.
- */
- @Override
- public void flush() throws IOException {
- // Intentionally do nothing
- }
-
- @Override
- public void close() throws IOException {
- updater.close(false);
- updater = null;
- }
-
- @Override
- public String toString() {
- return "ObjectInspectorMutator [writeId=" + writeId + ", partitionPath=" + partitionPath
- + ", bucketId=" + bucketProperty + "]";
- }
-
- protected RecordUpdater createRecordUpdater(AcidOutputFormat, ?> outputFormat) throws IOException {
- int bucketId = BucketCodec
- .determineVersion(bucketProperty).decodeWriterId(bucketProperty);
- return outputFormat.getRecordUpdater(
- partitionPath,
- new AcidOutputFormat.Options(configuration)
- .inspector(objectInspector)
- .bucket(bucketId)
- .minimumWriteId(writeId)
- .maximumWriteId(writeId)
- .recordIdColumn(recordIdColumn)
- .finalDestination(partitionPath)
- .statementId(-1));
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/OperationType.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/OperationType.java
deleted file mode 100644
index 3dc2886..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/OperationType.java
+++ /dev/null
@@ -1,24 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-enum OperationType {
- INSERT,
- UPDATE,
- DELETE;
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/PartitionCreationException.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/PartitionCreationException.java
deleted file mode 100644
index ed0c989..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/PartitionCreationException.java
+++ /dev/null
@@ -1,32 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-public class PartitionCreationException extends WorkerException {
-
- private static final long serialVersionUID = 1L;
-
- PartitionCreationException(String message, Throwable cause) {
- super(message, cause);
- }
-
- PartitionCreationException(String message) {
- super(message);
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/PartitionHelper.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/PartitionHelper.java
deleted file mode 100644
index d064b0c..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/PartitionHelper.java
+++ /dev/null
@@ -1,37 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.io.Closeable;
-import java.util.List;
-
-import org.apache.hadoop.fs.Path;
-
-/** Implementations are responsible for creating and obtaining path information about partitions.
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-interface PartitionHelper extends Closeable {
-
- /** Return the location of the partition described by the provided values. */
- Path getPathForPartition(List newPartitionValues) throws WorkerException;
-
- /** Create the partition described by the provided values if it does not exist already. */
- void createPartitionIfNotExists(List newPartitionValues) throws WorkerException;
-
-}
\ No newline at end of file
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/RecordInspector.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/RecordInspector.java
deleted file mode 100644
index 5d1f175..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/RecordInspector.java
+++ /dev/null
@@ -1,31 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-
-/** Provide a means to extract {@link RecordIdentifier} from record objects.
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-public interface RecordInspector {
-
- /** Get the {@link RecordIdentifier} from the record - to be used for updates and deletes only. */
- RecordIdentifier extractRecordIdentifier(Object record);
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/RecordInspectorImpl.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/RecordInspectorImpl.java
deleted file mode 100644
index 37329c3..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/RecordInspectorImpl.java
+++ /dev/null
@@ -1,64 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.util.List;
-
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat;
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
-import org.apache.hadoop.hive.serde2.objectinspector.StructField;
-import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
-
-/**
- * Standard {@link RecordInspector} implementation that uses the supplied {@link ObjectInspector} and
- * {@link AcidOutputFormat.Options#recordIdColumn(int) record id column} to extract {@link RecordIdentifier
- * RecordIdentifiers}, and calculate bucket ids from records.
- * @deprecated as of Hive 3.0.0
- */
-@Deprecated
-public class RecordInspectorImpl implements RecordInspector {
-
- private final StructObjectInspector structObjectInspector;
- private final StructField recordIdentifierField;
-
- /**
- * Note that all column indexes are with respect to your record structure, not the Hive table structure.
- */
- public RecordInspectorImpl(ObjectInspector objectInspector, int recordIdColumn) {
- if (!(objectInspector instanceof StructObjectInspector)) {
- throw new IllegalArgumentException("Serious problem, expected a StructObjectInspector, " + "but got a "
- + objectInspector.getClass().getName());
- }
-
- structObjectInspector = (StructObjectInspector) objectInspector;
- List extends StructField> structFields = structObjectInspector.getAllStructFieldRefs();
- recordIdentifierField = structFields.get(recordIdColumn);
- }
-
- public RecordIdentifier extractRecordIdentifier(Object record) {
- return (RecordIdentifier) structObjectInspector.getStructFieldData(record, recordIdentifierField);
- }
-
- @Override
- public String toString() {
- return "RecordInspectorImpl [structObjectInspector=" + structObjectInspector + ", recordIdentifierField="
- + recordIdentifierField + "]";
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/RecordSequenceException.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/RecordSequenceException.java
deleted file mode 100644
index 0d3b471..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/RecordSequenceException.java
+++ /dev/null
@@ -1,28 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-public class RecordSequenceException extends WorkerException {
-
- private static final long serialVersionUID = 1L;
-
- RecordSequenceException(String message) {
- super(message);
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/SequenceValidator.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/SequenceValidator.java
deleted file mode 100644
index 320b987..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/SequenceValidator.java
+++ /dev/null
@@ -1,66 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-/**
- * Verifies that the sequence of {@link RecordIdentifier RecordIdentifiers} are in a valid order for insertion into an
- * ACID delta file in a given partition and bucket.
- */
-class SequenceValidator {
-
- private static final Logger LOG = LoggerFactory.getLogger(SequenceValidator.class);
-
- private Long lastWriteId;
- private Long lastRowId;
-
- SequenceValidator() {
- }
-
- boolean isInSequence(RecordIdentifier recordIdentifier) {
- if (lastWriteId != null && recordIdentifier.getWriteId() < lastWriteId) {
- LOG.debug("Non-sequential write ID. Expected >{}, recordIdentifier={}", lastWriteId, recordIdentifier);
- return false;
- } else if (lastWriteId != null && recordIdentifier.getWriteId() == lastWriteId && lastRowId != null
- && recordIdentifier.getRowId() <= lastRowId) {
- LOG.debug("Non-sequential row ID. Expected >{}, recordIdentifier={}", lastRowId, recordIdentifier);
- return false;
- }
- lastWriteId = recordIdentifier.getWriteId();
- lastRowId = recordIdentifier.getRowId();
- return true;
- }
-
- /**
- * Validator must be reset for each new partition and or bucket.
- */
- void reset() {
- lastWriteId = null;
- lastRowId = null;
- LOG.debug("reset");
- }
-
- @Override
- public String toString() {
- return "SequenceValidator [lastWriteId=" + lastWriteId + ", lastRowId=" + lastRowId + "]";
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/WarehousePartitionHelper.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/WarehousePartitionHelper.java
deleted file mode 100644
index ace329a..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/WarehousePartitionHelper.java
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import java.io.IOException;
-import java.util.LinkedHashMap;
-import java.util.List;
-
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.metastore.Warehouse;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-
-/**
- * A {@link PartitionHelper} implementation that uses the {@link Warehouse} class to obtain partition path information.
- * As this does not require a connection to the meta store database it is safe to use in workers that are distributed on
- * a cluster. However, it does not support the creation of new partitions so you will need to provide a mechanism to
- * collect affected partitions in your merge job and create them from your client.
- */
-class WarehousePartitionHelper implements PartitionHelper {
-
- private final Warehouse warehouse;
- private final Path tablePath;
- private final LinkedHashMap partitions;
- private final List partitionColumns;
-
- WarehousePartitionHelper(Configuration configuration, Path tablePath, List partitionColumns)
- throws MetaException {
- this.tablePath = tablePath;
- this.partitionColumns = partitionColumns;
- this.partitions = new LinkedHashMap<>(partitionColumns.size());
- for (String partitionColumn : partitionColumns) {
- partitions.put(partitionColumn, null);
- }
- warehouse = new Warehouse(configuration);
- }
-
- @Override
- public Path getPathForPartition(List partitionValues) throws WorkerException {
- if (partitionValues.size() != partitionColumns.size()) {
- throw new IllegalArgumentException("Incorrect number of partition values. columns=" + partitionColumns
- + ",values=" + partitionValues);
- }
- if (partitionColumns.isEmpty()) {
- return tablePath;
- }
- for (int columnIndex = 0; columnIndex < partitionValues.size(); columnIndex++) {
- String partitionColumn = partitionColumns.get(columnIndex);
- String partitionValue = partitionValues.get(columnIndex);
- partitions.put(partitionColumn, partitionValue);
- }
- try {
- return warehouse.getPartitionPath(tablePath, partitions);
- } catch (MetaException e) {
- throw new WorkerException("Unable to determine partition path. tablePath=" + tablePath + ",partition="
- + partitionValues, e);
- }
- }
-
- /** Throws {@link UnsupportedOperationException}. */
- @Override
- public void createPartitionIfNotExists(List newPartitionValues) throws WorkerException {
- throw new UnsupportedOperationException("You require a connection to the meta store to do this.");
- }
-
- @Override
- public void close() throws IOException {
- // Nothing to close here.
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/WorkerException.java b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/WorkerException.java
deleted file mode 100644
index 9eb6742..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/WorkerException.java
+++ /dev/null
@@ -1,32 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-public class WorkerException extends Exception {
-
- private static final long serialVersionUID = 1L;
-
- WorkerException(String message, Throwable cause) {
- super(message, cause);
- }
-
- WorkerException(String message) {
- super(message);
- }
-
-}
diff --git a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/package.html b/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/package.html
deleted file mode 100644
index a879b97..0000000
--- a/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/package.html
+++ /dev/null
@@ -1,181 +0,0 @@
-
-
-
-
-
-
-
-
-HCatalog Streaming API
-
-
-
-
-HCatalog Streaming API -- high level description
-
-NOTE: The Streaming API feature is provided as a technology
-preview. The API may undergo incompatible changes in upcoming
-releases.
-
-
-Traditionally adding new data into hive requires gathering a large
-amount of data onto HDFS and then periodically adding a new
-partition. This is essentially a batch insertion. Insertion of
-new data into an existing partition or table is not done in a way that
-gives consistent results to readers. Hive Streaming API allows data to
-be pumped continuously into Hive. The incoming data can be
-continuously committed in small batches (of records) into a Hive
-partition. Once data is committed it becomes immediately visible to
-all Hive queries initiated subsequently.
-
-
-This API is intended for streaming clients such as NiFi, Flume and Storm,
-which continuously generate data. Streaming support is built on top of
-ACID based insert/update support in Hive.
-
-
-The classes and interfaces part of the Hive streaming API are broadly
-categorized into two. The first set provides support for connection
-and transaction management while the second set provides I/O
-support. Transactions are managed by the Hive MetaStore. Writes are
-performed to HDFS via Hive wrapper APIs that bypass MetaStore.
-
-
-Note on packaging: The APIs are defined in the
-org.apache.hive.hcatalog.streaming Java package and included as
-the hive-hcatalog-streaming jar.
-
-STREAMING REQUIREMENTS
-
-
-A few things are currently required to use streaming.
-
-
-
-
- - Currently, only ORC storage format is supported. So
- 'stored as orc' must be specified during table creation.
- - The hive table may be bucketed but must not be sorted.
- - User of the client streaming process must have the necessary
- permissions to write to the table or partition and create partitions in
- the table.
- - Currently, when issuing queries on streaming tables, query client must set
-
- - hive.input.format =
- org.apache.hadoop.hive.ql.io.HiveInputFormat
-
- The above client settings are a temporary requirement and the intention is to
- drop the need for them in the near future.
- - Settings required in hive-site.xml for Metastore:
-
- - hive.txn.manager =
- org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
- - hive.support.concurrency = true
- - hive.compactor.initiator.on = true
- - hive.compactor.worker.threads > 0
-
-
-
-
-Note: Streaming to unpartitioned tables is also
-supported.
-
-Transaction and Connection management
-
-
-The class HiveEndPoint is a Hive end
-point to connect to. An endpoint is either a Hive table or
-partition. An endpoint is cheap to create and does not internally hold
-on to any network connections. Invoking the newConnection method on
-it creates a new connection to the Hive MetaStore for streaming
-purposes. It returns a
-StreamingConnection
-object. Multiple connections can be established on the same
-endpoint. StreamingConnection can then be used to initiate new
-transactions for performing I/O.
-
-Dynamic Partition Creation:
It is very likely that a setup in
-which data is being streamed continuously (e.g. Flume), it is
-desirable to have new partitions created automatically (say on a
-hourly basis). In such cases requiring the Hive admin to pre-create
-the necessary partitions may not be reasonable. Consequently the
-streaming API allows streaming clients to create partitions as
-needed. HiveEndPoind.newConnection() accepts a argument to
-indicate if the partition should be auto created. Partition creation
-being an atomic action, multiple clients can race to create the
-partition, but only one would succeed, so streaming clients need not
-synchronize when creating a partition. The user of the client process
-needs to be given write permissions on the Hive table in order to
-create partitions.
-
-Batching Transactions:
Transactions are implemented slightly
-differently than traditional database systems. Multiple transactions
-are grouped into a Transaction Batch and each transaction has
-an id. Data from each transaction batch gets a single file on HDFS,
-which eventually gets compacted with other files into a larger file
-automatically for efficiency.
-
-Basic Steps:
After connection is established, a streaming
-client first requests for a new batch of transactions. In response it
-receives a set of transaction ids that are part of the transaction
-batch. Subsequently the client proceeds to consume one transaction at
-a time by initiating new transactions. Client will write() one or more
-records per transactions and either commit or abort the current
-transaction before switching to the next one. Each
-TransactionBatch.write() invocation automatically associates
-the I/O attempt with the current transaction id. The user of the
-streaming client needs to have write permissions to the partition or
-table.
-
-
-Concurrency Note: I/O can be performed on multiple
-TransactionBatchs concurrently. However the transactions within a
-transaction batch much be consumed sequentially.
-
-Writing Data
-
-
-These classes and interfaces provide support for writing the data to
-Hive within a transaction.
-RecordWriter is the interface
-implemented by all writers. A writer is responsible for taking a
-record in the form of a byte[] containing data in a known
-format (e.g. CSV) and writing it out in the format supported by Hive
-streaming. A RecordWriter may reorder or drop fields from the incoming
-record if necessary to map them to the corresponding columns in the
-Hive Table. A streaming client will instantiate an appropriate
-RecordWriter type and pass it to
-StreamingConnection.fetchTransactionBatch(). The streaming client
-does not directly interact with the RecordWriter therafter, but
-relies on the TransactionBatch to do so.
-
-
-Currently, out of the box, the streaming API provides two
-implementations of the RecordWriter interface. One handles delimited
-input data (such as CSV, tab separated, etc. and the other for JSON
-(strict syntax). Support for other input formats can be provided by
-additional implementations of the RecordWriter interface.
-
-
-Performance, Concurrency, Etc.
-
- Each StreamingConnection is writing data at the rate the underlying
- FileSystem can accept it. If that is not sufficient, multiple StreamingConnection objects can
- be created concurrently.
-
-
- Each StreamingConnection can have at most 1 outstanding TransactionBatch and each TransactionBatch
- may have at most 2 threads operaing on it.
- See TransactionBatch
-
-
-
-
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/StreamingIntegrationTester.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/StreamingIntegrationTester.java
deleted file mode 100644
index af252aa..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/StreamingIntegrationTester.java
+++ /dev/null
@@ -1,347 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming;
-
-import org.apache.commons.cli.CommandLine;
-import org.apache.commons.cli.GnuParser;
-import org.apache.commons.cli.HelpFormatter;
-import org.apache.commons.cli.OptionBuilder;
-import org.apache.commons.cli.Options;
-import org.apache.commons.cli.ParseException;
-import org.apache.commons.cli.Parser;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-import org.apache.hadoop.hive.common.LogUtils;
-import org.apache.hadoop.util.StringUtils;
-
-import java.util.Arrays;
-import java.util.Random;
-
-/**
- * A stand alone utility to write data into the streaming ingest interface.
- */
-public class StreamingIntegrationTester {
-
- static final private Logger LOG = LoggerFactory.getLogger(StreamingIntegrationTester.class.getName());
-
- public static void main(String[] args) {
-
- try {
- LogUtils.initHiveLog4j();
- } catch (LogUtils.LogInitializationException e) {
- System.err.println("Unable to initialize log4j " + StringUtils.stringifyException(e));
- System.exit(-1);
- }
-
- Options options = new Options();
-
- options.addOption(OptionBuilder
- .hasArg()
- .withArgName("abort-pct")
- .withDescription("Percentage of transactions to abort, defaults to 5")
- .withLongOpt("abortpct")
- .create('a'));
-
- options.addOption(OptionBuilder
- .hasArgs()
- .withArgName("column-names")
- .withDescription("column names of table to write to")
- .withLongOpt("columns")
- .withValueSeparator(',')
- .isRequired()
- .create('c'));
-
- options.addOption(OptionBuilder
- .hasArg()
- .withArgName("database")
- .withDescription("Database of table to write to")
- .withLongOpt("database")
- .isRequired()
- .create('d'));
-
- options.addOption(OptionBuilder
- .hasArg()
- .withArgName("frequency")
- .withDescription("How often to commit a transaction, in seconds, defaults to 1")
- .withLongOpt("frequency")
- .create('f'));
-
- options.addOption(OptionBuilder
- .hasArg()
- .withArgName("iterations")
- .withDescription("Number of batches to write, defaults to 10")
- .withLongOpt("num-batches")
- .create('i'));
-
- options.addOption(OptionBuilder
- .hasArg()
- .withArgName("metastore-uri")
- .withDescription("URI of Hive metastore")
- .withLongOpt("metastore-uri")
- .isRequired()
- .create('m'));
-
- options.addOption(OptionBuilder
- .hasArg()
- .withArgName("num_transactions")
- .withDescription("Number of transactions per batch, defaults to 100")
- .withLongOpt("num-txns")
- .create('n'));
-
- options.addOption(OptionBuilder
- .hasArgs()
- .withArgName("partition-values")
- .withDescription("partition values, must be provided in order of partition columns, " +
- "if not provided table is assumed to not be partitioned")
- .withLongOpt("partition")
- .withValueSeparator(',')
- .create('p'));
-
- options.addOption(OptionBuilder
- .hasArg()
- .withArgName("records-per-transaction")
- .withDescription("records to write in each transaction, defaults to 100")
- .withLongOpt("records-per-txn")
- .withValueSeparator(',')
- .create('r'));
-
- options.addOption(OptionBuilder
- .hasArgs()
- .withArgName("column-types")
- .withDescription("column types, valid values are string, int, float, decimal, date, " +
- "datetime")
- .withLongOpt("schema")
- .withValueSeparator(',')
- .isRequired()
- .create('s'));
-
- options.addOption(OptionBuilder
- .hasArg()
- .withArgName("table")
- .withDescription("Table to write to")
- .withLongOpt("table")
- .isRequired()
- .create('t'));
-
- options.addOption(OptionBuilder
- .hasArg()
- .withArgName("num-writers")
- .withDescription("Number of writers to create, defaults to 2")
- .withLongOpt("writers")
- .create('w'));
-
- options.addOption(OptionBuilder
- .hasArg(false)
- .withArgName("pause")
- .withDescription("Wait on keyboard input after commit & batch close. default: disabled")
- .withLongOpt("pause")
- .create('x'));
-
-
- Parser parser = new GnuParser();
- CommandLine cmdline = null;
- try {
- cmdline = parser.parse(options, args);
- } catch (ParseException e) {
- System.err.println(e.getMessage());
- usage(options);
- }
-
- boolean pause = cmdline.hasOption('x');
- String db = cmdline.getOptionValue('d');
- String table = cmdline.getOptionValue('t');
- String uri = cmdline.getOptionValue('m');
- int txnsPerBatch = Integer.parseInt(cmdline.getOptionValue('n', "100"));
- int writers = Integer.parseInt(cmdline.getOptionValue('w', "2"));
- int batches = Integer.parseInt(cmdline.getOptionValue('i', "10"));
- int recordsPerTxn = Integer.parseInt(cmdline.getOptionValue('r', "100"));
- int frequency = Integer.parseInt(cmdline.getOptionValue('f', "1"));
- int ap = Integer.parseInt(cmdline.getOptionValue('a', "5"));
- float abortPct = ((float)ap) / 100.0f;
- String[] partVals = cmdline.getOptionValues('p');
- String[] cols = cmdline.getOptionValues('c');
- String[] types = cmdline.getOptionValues('s');
-
- StreamingIntegrationTester sit = new StreamingIntegrationTester(db, table, uri,
- txnsPerBatch, writers, batches, recordsPerTxn, frequency, abortPct, partVals, cols, types
- , pause);
- sit.go();
- }
-
- static void usage(Options options) {
- HelpFormatter hf = new HelpFormatter();
- hf.printHelp(HelpFormatter.DEFAULT_WIDTH, "sit [options]", "Usage: ", options, "");
- System.exit(-1);
- }
-
- private String db;
- private String table;
- private String uri;
- private int txnsPerBatch;
- private int writers;
- private int batches;
- private int recordsPerTxn;
- private int frequency;
- private float abortPct;
- private String[] partVals;
- private String[] cols;
- private String[] types;
- private boolean pause;
-
-
- private StreamingIntegrationTester(String db, String table, String uri, int txnsPerBatch,
- int writers, int batches, int recordsPerTxn,
- int frequency, float abortPct, String[] partVals,
- String[] cols, String[] types, boolean pause) {
- this.db = db;
- this.table = table;
- this.uri = uri;
- this.txnsPerBatch = txnsPerBatch;
- this.writers = writers;
- this.batches = batches;
- this.recordsPerTxn = recordsPerTxn;
- this.frequency = frequency;
- this.abortPct = abortPct;
- this.partVals = partVals;
- this.cols = cols;
- this.types = types;
- this.pause = pause;
- }
-
- private void go() {
- HiveEndPoint endPoint = null;
- try {
- if (partVals == null) {
- endPoint = new HiveEndPoint(uri, db, table, null);
- } else {
- endPoint = new HiveEndPoint(uri, db, table, Arrays.asList(partVals));
- }
-
- for (int i = 0; i < writers; i++) {
- Writer w = new Writer(endPoint, i, txnsPerBatch, batches, recordsPerTxn, frequency, abortPct,
- cols, types, pause);
- w.start();
- }
-
- } catch (Throwable t) {
- System.err.println("Caught exception while testing: " + StringUtils.stringifyException(t));
- }
- }
-
- private static class Writer extends Thread {
- private HiveEndPoint endPoint;
- private int txnsPerBatch;
- private int batches;
- private int writerNumber;
- private int recordsPerTxn;
- private int frequency;
- private float abortPct;
- private String[] cols;
- private String[] types;
- private boolean pause;
- private Random rand;
-
- Writer(HiveEndPoint endPoint, int writerNumber, int txnsPerBatch, int batches,
- int recordsPerTxn, int frequency, float abortPct, String[] cols, String[] types
- , boolean pause) {
- this.endPoint = endPoint;
- this.txnsPerBatch = txnsPerBatch;
- this.batches = batches;
- this.writerNumber = writerNumber;
- this.recordsPerTxn = recordsPerTxn;
- this.frequency = frequency * 1000;
- this.abortPct = abortPct;
- this.cols = cols;
- this.types = types;
- this.pause = pause;
- rand = new Random();
- }
-
- @Override
- public void run() {
- StreamingConnection conn = null;
- try {
- conn = endPoint.newConnection(true, "UT_" + Thread.currentThread().getName());
- RecordWriter writer = new DelimitedInputWriter(cols, ",", endPoint);
-
- for (int i = 0; i < batches; i++) {
- long start = System.currentTimeMillis();
- LOG.info("Starting batch " + i);
- TransactionBatch batch = conn.fetchTransactionBatch(txnsPerBatch, writer);
- try {
- while (batch.remainingTransactions() > 0) {
- batch.beginNextTransaction();
- for (int j = 0; j < recordsPerTxn; j++) {
- batch.write(generateRecord(cols, types));
- }
- if (rand.nextFloat() < abortPct) batch.abort();
- else
- batch.commit();
- if (pause) {
- System.out.println("Writer " + writerNumber +
- " committed... press Enter to continue. " + Thread.currentThread().getId());
- System.in.read();
- }
- }
- long end = System.currentTimeMillis();
- if (end - start < frequency) Thread.sleep(frequency - (end - start));
- } finally {
- batch.close();
- if (pause) {
- System.out.println("Writer " + writerNumber +
- " has closed a Batch.. press Enter to continue. " + Thread.currentThread().getId());
- System.in.read();
- }
- }
- }
- } catch (Throwable t) {
- System.err.println("Writer number " + writerNumber
- + " caught exception while testing: " + StringUtils.stringifyException(t));
- } finally {
- if (conn!=null) conn.close();
- }
- }
-
- private byte[] generateRecord(String[] cols, String[] types) {
- // TODO make it so I can randomize the column order
-
- StringBuilder buf = new StringBuilder();
- for (int i = 0; i < types.length; i++) {
- buf.append(generateColumn(types[i]));
- buf.append(",");
- }
- return buf.toString().getBytes();
- }
-
- private String generateColumn(String type) {
- if ("string".equals(type.toLowerCase())) {
- return "When that Aprilis with his showers swoot";
- } else if (type.toLowerCase().startsWith("int")) {
- return "42";
- } else if (type.toLowerCase().startsWith("dec") || type.toLowerCase().equals("float")) {
- return "3.141592654";
- } else if (type.toLowerCase().equals("datetime")) {
- return "2014-03-07 15:33:22";
- } else if (type.toLowerCase().equals("date")) {
- return "1955-11-12";
- } else {
- throw new RuntimeException("Sorry, I don't know the type " + type);
- }
- }
- }
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestDelimitedInputWriter.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestDelimitedInputWriter.java
deleted file mode 100644
index 1aa257f..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestDelimitedInputWriter.java
+++ /dev/null
@@ -1,71 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.hive.hcatalog.streaming;
-
-import com.google.common.collect.Lists;
-import junit.framework.Assert;
-import org.junit.Test;
-
-import java.util.ArrayList;
-import java.util.Arrays;
-
-public class TestDelimitedInputWriter {
- @Test
- public void testFieldReordering() throws Exception {
-
- ArrayList colNames = Lists.newArrayList(new String[]{"col1", "col2", "col3", "col4", "col5"});
- {//1) test dropping fields - first middle & last
- String[] fieldNames = {null, "col2", null, "col4", null};
- int[] mapping = DelimitedInputWriter.getFieldReordering(fieldNames, colNames);
- Assert.assertTrue(Arrays.equals(mapping, new int[]{-1, 1, -1, 3, -1}));
- }
-
- {//2) test reordering
- String[] fieldNames = {"col5", "col4", "col3", "col2", "col1"};
- int[] mapping = DelimitedInputWriter.getFieldReordering(fieldNames, colNames);
- Assert.assertTrue( Arrays.equals(mapping, new int[]{4,3,2,1,0}) );
- }
-
- {//3) test bad field names
- String[] fieldNames = {"xyz", "abc", "col3", "col4", "as"};
- try {
- DelimitedInputWriter.getFieldReordering(fieldNames, colNames);
- Assert.fail();
- } catch (InvalidColumn e) {
- // should throw
- }
- }
-
- {//4) test few field names
- String[] fieldNames = {"col3", "col4"};
- int[] mapping = DelimitedInputWriter.getFieldReordering(fieldNames, colNames);
- Assert.assertTrue( Arrays.equals(mapping, new int[]{2,3}) );
- }
-
- {//5) test extra field names
- String[] fieldNames = {"col5", "col4", "col3", "col2", "col1", "col1"};
- try {
- DelimitedInputWriter.getFieldReordering(fieldNames, colNames);
- Assert.fail();
- } catch (InvalidColumn e) {
- //show throw
- }
- }
- }
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
deleted file mode 100644
index 5e5bc83..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
+++ /dev/null
@@ -1,2342 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.hive.hcatalog.streaming;
-
-import java.io.ByteArrayOutputStream;
-import java.io.File;
-import java.io.FileFilter;
-import java.io.FileNotFoundException;
-import java.io.IOException;
-import java.io.PrintStream;
-import java.net.URI;
-import java.net.URISyntaxException;
-import java.nio.ByteBuffer;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collection;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
-import java.util.concurrent.TimeUnit;
-import java.util.concurrent.atomic.AtomicBoolean;
-
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileStatus;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.fs.RawLocalFileSystem;
-import org.apache.hadoop.fs.permission.FsPermission;
-import org.apache.hadoop.hive.cli.CliSessionState;
-import org.apache.hadoop.hive.common.JavaUtils;
-import org.apache.hadoop.hive.common.ValidWriteIdList;
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.conf.Validator;
-import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.GetOpenTxnsInfoResponse;
-import org.apache.hadoop.hive.metastore.api.LockState;
-import org.apache.hadoop.hive.metastore.api.LockType;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
-import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.metastore.api.ShowLocksRequest;
-import org.apache.hadoop.hive.metastore.api.ShowLocksResponse;
-import org.apache.hadoop.hive.metastore.api.ShowLocksResponseElement;
-import org.apache.hadoop.hive.metastore.api.TxnAbortedException;
-import org.apache.hadoop.hive.metastore.api.TxnInfo;
-import org.apache.hadoop.hive.metastore.api.TxnState;
-import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
-import org.apache.hadoop.hive.metastore.txn.AcidHouseKeeperService;
-import org.apache.hadoop.hive.metastore.txn.TxnDbUtil;
-import org.apache.hadoop.hive.metastore.txn.TxnStore;
-import org.apache.hadoop.hive.metastore.txn.TxnUtils;
-import org.apache.hadoop.hive.ql.DriverFactory;
-import org.apache.hadoop.hive.ql.IDriver;
-import org.apache.hadoop.hive.ql.io.AcidUtils;
-import org.apache.hadoop.hive.ql.io.BucketCodec;
-import org.apache.hadoop.hive.ql.io.IOConstants;
-import org.apache.hadoop.hive.ql.io.orc.OrcFile;
-import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
-import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
-import org.apache.hadoop.hive.ql.io.orc.Reader;
-import org.apache.hadoop.hive.ql.io.orc.RecordReader;
-import org.apache.hadoop.hive.ql.processors.CommandProcessorResponse;
-import org.apache.hadoop.hive.ql.session.SessionState;
-import org.apache.hadoop.hive.ql.txn.compactor.Worker;
-import org.apache.hadoop.hive.serde.serdeConstants;
-import org.apache.hadoop.hive.serde2.objectinspector.StructField;
-import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
-import org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector;
-import org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector;
-import org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector;
-import org.apache.hadoop.hive.shims.Utils;
-import org.apache.hadoop.io.NullWritable;
-import org.apache.hadoop.mapred.InputFormat;
-import org.apache.hadoop.mapred.InputSplit;
-import org.apache.hadoop.mapred.JobConf;
-import org.apache.hadoop.mapred.Reporter;
-import org.apache.hadoop.security.UserGroupInformation;
-import org.apache.orc.impl.OrcAcidUtils;
-import org.apache.orc.tools.FileDump;
-import org.apache.thrift.TException;
-import org.junit.After;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Rule;
-import org.junit.Test;
-import org.junit.rules.TemporaryFolder;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import static org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.BUCKET_COUNT;
-
-
-public class TestStreaming {
- private static final Logger LOG = LoggerFactory.getLogger(TestStreaming.class);
-
- public static class RawFileSystem extends RawLocalFileSystem {
- private static final URI NAME;
- static {
- try {
- NAME = new URI("raw:///");
- } catch (URISyntaxException se) {
- throw new IllegalArgumentException("bad uri", se);
- }
- }
-
- @Override
- public URI getUri() {
- return NAME;
- }
-
-
- @Override
- public FileStatus getFileStatus(Path path) throws IOException {
- File file = pathToFile(path);
- if (!file.exists()) {
- throw new FileNotFoundException("Can't find " + path);
- }
- // get close enough
- short mod = 0;
- if (file.canRead()) {
- mod |= 0444;
- }
- if (file.canWrite()) {
- mod |= 0200;
- }
- if (file.canExecute()) {
- mod |= 0111;
- }
- return new FileStatus(file.length(), file.isDirectory(), 1, 1024,
- file.lastModified(), file.lastModified(),
- FsPermission.createImmutable(mod), "owen", "users", path);
- }
- }
-
- private static final String COL1 = "id";
- private static final String COL2 = "msg";
-
- private final HiveConf conf;
- private IDriver driver;
- private final IMetaStoreClient msClient;
-
- final String metaStoreURI = null;
-
- // partitioned table
- private final static String dbName = "testing";
- private final static String tblName = "alerts";
- private final static String[] fieldNames = new String[]{COL1,COL2};
- List partitionVals;
- private static Path partLoc;
- private static Path partLoc2;
-
- // unpartitioned table
- private final static String dbName2 = "testing2";
- private final static String tblName2 = "alerts";
- private final static String[] fieldNames2 = new String[]{COL1,COL2};
-
-
- // for bucket join testing
- private final static String dbName3 = "testing3";
- private final static String tblName3 = "dimensionTable";
- private final static String dbName4 = "testing4";
- private final static String tblName4 = "factTable";
- List partitionVals2;
-
-
- private final String PART1_CONTINENT = "Asia";
- private final String PART1_COUNTRY = "India";
-
- @Rule
- public TemporaryFolder dbFolder = new TemporaryFolder();
-
-
- public TestStreaming() throws Exception {
- partitionVals = new ArrayList(2);
- partitionVals.add(PART1_CONTINENT);
- partitionVals.add(PART1_COUNTRY);
-
- partitionVals2 = new ArrayList(1);
- partitionVals2.add(PART1_COUNTRY);
-
-
- conf = new HiveConf(this.getClass());
- conf.set("fs.raw.impl", RawFileSystem.class.getName());
- conf
- .setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
- "org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory");
- TxnDbUtil.setConfValues(conf);
- if (metaStoreURI!=null) {
- conf.setVar(HiveConf.ConfVars.METASTOREURIS, metaStoreURI);
- }
- conf.setBoolVar(HiveConf.ConfVars.METASTORE_EXECUTE_SET_UGI, true);
- conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, true);
- dbFolder.create();
-
-
- //1) Start from a clean slate (metastore)
- TxnDbUtil.cleanDb(conf);
- TxnDbUtil.prepDb(conf);
-
- //2) obtain metastore clients
- msClient = new HiveMetaStoreClient(conf);
- }
-
- @Before
- public void setup() throws Exception {
- SessionState.start(new CliSessionState(conf));
- driver = DriverFactory.newDriver(conf);
- driver.setMaxRows(200002);//make sure Driver returns all results
- // drop and recreate the necessary databases and tables
- dropDB(msClient, dbName);
-
- String[] colNames = new String[] {COL1, COL2};
- String[] colTypes = new String[] {serdeConstants.INT_TYPE_NAME, serdeConstants.STRING_TYPE_NAME};
- String[] bucketCols = new String[] {COL1};
- String loc1 = dbFolder.newFolder(dbName + ".db").toString();
- String[] partNames = new String[]{"Continent", "Country"};
- partLoc = createDbAndTable(driver, dbName, tblName, partitionVals, colNames, colTypes, bucketCols, partNames, loc1, 1);
-
- dropDB(msClient, dbName2);
- String loc2 = dbFolder.newFolder(dbName2 + ".db").toString();
- partLoc2 = createDbAndTable(driver, dbName2, tblName2, null, colNames, colTypes, bucketCols, null, loc2, 2);
-
- String loc3 = dbFolder.newFolder("testing5.db").toString();
- createStoreSales("testing5", loc3);
-
- runDDL(driver, "drop table testBucketing3.streamedtable");
- runDDL(driver, "drop table testBucketing3.finaltable");
- runDDL(driver, "drop table testBucketing3.nobucket");
- }
-
- @After
- public void cleanup() throws Exception {
- msClient.close();
- driver.close();
- }
-
- private static List getPartitionKeys() {
- List fields = new ArrayList();
- // Defining partition names in unsorted order
- fields.add(new FieldSchema("continent", serdeConstants.STRING_TYPE_NAME, ""));
- fields.add(new FieldSchema("country", serdeConstants.STRING_TYPE_NAME, ""));
- return fields;
- }
-
- private void createStoreSales(String dbName, String loc) throws Exception {
- String dbUri = "raw://" + new Path(loc).toUri().toString();
- String tableLoc = dbUri + Path.SEPARATOR + "store_sales";
-
- boolean success = runDDL(driver, "create database IF NOT EXISTS " + dbName + " location '" + dbUri + "'");
- Assert.assertTrue(success);
- success = runDDL(driver, "use " + dbName);
- Assert.assertTrue(success);
-
- success = runDDL(driver, "drop table if exists store_sales");
- Assert.assertTrue(success);
- success = runDDL(driver, "create table store_sales\n" +
- "(\n" +
- " ss_sold_date_sk int,\n" +
- " ss_sold_time_sk int,\n" +
- " ss_item_sk int,\n" +
- " ss_customer_sk int,\n" +
- " ss_cdemo_sk int,\n" +
- " ss_hdemo_sk int,\n" +
- " ss_addr_sk int,\n" +
- " ss_store_sk int,\n" +
- " ss_promo_sk int,\n" +
- " ss_ticket_number int,\n" +
- " ss_quantity int,\n" +
- " ss_wholesale_cost decimal(7,2),\n" +
- " ss_list_price decimal(7,2),\n" +
- " ss_sales_price decimal(7,2),\n" +
- " ss_ext_discount_amt decimal(7,2),\n" +
- " ss_ext_sales_price decimal(7,2),\n" +
- " ss_ext_wholesale_cost decimal(7,2),\n" +
- " ss_ext_list_price decimal(7,2),\n" +
- " ss_ext_tax decimal(7,2),\n" +
- " ss_coupon_amt decimal(7,2),\n" +
- " ss_net_paid decimal(7,2),\n" +
- " ss_net_paid_inc_tax decimal(7,2),\n" +
- " ss_net_profit decimal(7,2)\n" +
- ")\n" +
- " partitioned by (dt string)\n" +
- "clustered by (ss_store_sk, ss_promo_sk)\n" +
- "INTO 4 BUCKETS stored as orc " + " location '" + tableLoc + "'" + " TBLPROPERTIES ('orc.compress'='NONE', 'transactional'='true')");
- Assert.assertTrue(success);
-
- success = runDDL(driver, "alter table store_sales add partition(dt='2015')");
- Assert.assertTrue(success);
- }
- /**
- * make sure it works with table where bucket col is not 1st col
- * @throws Exception
- */
- @Test
- public void testBucketingWhereBucketColIsNotFirstCol() throws Exception {
- List partitionVals = new ArrayList();
- partitionVals.add("2015");
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, "testing5", "store_sales", partitionVals);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- DelimitedInputWriter writer = new DelimitedInputWriter(new String[] {"ss_sold_date_sk","ss_sold_time_sk", "ss_item_sk",
- "ss_customer_sk", "ss_cdemo_sk", "ss_hdemo_sk", "ss_addr_sk", "ss_store_sk", "ss_promo_sk", "ss_ticket_number", "ss_quantity",
- "ss_wholesale_cost", "ss_list_price", "ss_sales_price", "ss_ext_discount_amt", "ss_ext_sales_price", "ss_ext_wholesale_cost",
- "ss_ext_list_price", "ss_ext_tax", "ss_coupon_amt", "ss_net_paid", "ss_net_paid_inc_tax", "ss_net_profit"},",", endPt, connection);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(2, writer);
- txnBatch.beginNextTransaction();
-
- StringBuilder row = new StringBuilder();
- for(int i = 0; i < 10; i++) {
- for(int ints = 0; ints < 11; ints++) {
- row.append(ints).append(',');
- }
- for(int decs = 0; decs < 12; decs++) {
- row.append(i + 0.1).append(',');
- }
- row.setLength(row.length() - 1);
- txnBatch.write(row.toString().getBytes());
- }
- txnBatch.commit();
- txnBatch.close();
- connection.close();
-
- ArrayList res = queryTable(driver, "select row__id.bucketid, * from testing5.store_sales");
- for (String re : res) {
- System.out.println(re);
- }
- }
-
- /**
- * Test that streaming can write to unbucketed table.
- */
- @Test
- public void testNoBuckets() throws Exception {
- queryTable(driver, "drop table if exists default.streamingnobuckets");
- //todo: why does it need transactional_properties?
- queryTable(driver, "create table default.streamingnobuckets (a string, b string) stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='default')");
- queryTable(driver, "insert into default.streamingnobuckets values('foo','bar')");
- List rs = queryTable(driver, "select * from default.streamingNoBuckets");
- Assert.assertEquals(1, rs.size());
- Assert.assertEquals("foo\tbar", rs.get(0));
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, "Default", "StreamingNoBuckets", null);
- String[] colNames1 = new String[] { "a", "b" };
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- DelimitedInputWriter wr = new DelimitedInputWriter(colNames1,",", endPt, connection);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(2, wr);
- txnBatch.beginNextTransaction();
- txnBatch.write("a1,b2".getBytes());
- txnBatch.write("a3,b4".getBytes());
- TxnStore txnHandler = TxnUtils.getTxnStore(conf);
- ShowLocksResponse resp = txnHandler.showLocks(new ShowLocksRequest());
- Assert.assertEquals(resp.getLocksSize(), 1);
- Assert.assertEquals("streamingnobuckets", resp.getLocks().get(0).getTablename());
- Assert.assertEquals("default", resp.getLocks().get(0).getDbname());
- txnBatch.commit();
- txnBatch.beginNextTransaction();
- txnBatch.write("a5,b6".getBytes());
- txnBatch.write("a7,b8".getBytes());
- txnBatch.commit();
- txnBatch.close();
-
- Assert.assertEquals("", 0, BucketCodec.determineVersion(536870912).decodeWriterId(536870912));
- rs = queryTable(driver,"select ROW__ID, a, b, INPUT__FILE__NAME from default.streamingnobuckets order by ROW__ID");
-
- Assert.assertTrue(rs.get(0), rs.get(0).startsWith("{\"writeid\":1,\"bucketid\":536870912,\"rowid\":0}\tfoo\tbar"));
- Assert.assertTrue(rs.get(0), rs.get(0).endsWith("streamingnobuckets/delta_0000001_0000001_0000/bucket_00000"));
- Assert.assertTrue(rs.get(1), rs.get(1).startsWith("{\"writeid\":2,\"bucketid\":536870912,\"rowid\":0}\ta1\tb2"));
- Assert.assertTrue(rs.get(1), rs.get(1).endsWith("streamingnobuckets/delta_0000002_0000003/bucket_00000"));
- Assert.assertTrue(rs.get(2), rs.get(2).startsWith("{\"writeid\":2,\"bucketid\":536870912,\"rowid\":1}\ta3\tb4"));
- Assert.assertTrue(rs.get(2), rs.get(2).endsWith("streamingnobuckets/delta_0000002_0000003/bucket_00000"));
- Assert.assertTrue(rs.get(3), rs.get(3).startsWith("{\"writeid\":3,\"bucketid\":536870912,\"rowid\":0}\ta5\tb6"));
- Assert.assertTrue(rs.get(3), rs.get(3).endsWith("streamingnobuckets/delta_0000002_0000003/bucket_00000"));
- Assert.assertTrue(rs.get(4), rs.get(4).startsWith("{\"writeid\":3,\"bucketid\":536870912,\"rowid\":1}\ta7\tb8"));
- Assert.assertTrue(rs.get(4), rs.get(4).endsWith("streamingnobuckets/delta_0000002_0000003/bucket_00000"));
-
- queryTable(driver, "update default.streamingnobuckets set a=0, b=0 where a='a7'");
- queryTable(driver, "delete from default.streamingnobuckets where a='a1'");
- rs = queryTable(driver, "select a, b from default.streamingnobuckets order by a, b");
- int row = 0;
- Assert.assertEquals("at row=" + row, "0\t0", rs.get(row++));
- Assert.assertEquals("at row=" + row, "a3\tb4", rs.get(row++));
- Assert.assertEquals("at row=" + row, "a5\tb6", rs.get(row++));
- Assert.assertEquals("at row=" + row, "foo\tbar", rs.get(row++));
-
- queryTable(driver, "alter table default.streamingnobuckets compact 'major'");
- runWorker(conf);
- rs = queryTable(driver,"select ROW__ID, a, b, INPUT__FILE__NAME from default.streamingnobuckets order by ROW__ID");
-
- Assert.assertTrue(rs.get(0), rs.get(0).startsWith("{\"writeid\":1,\"bucketid\":536870912,\"rowid\":0}\tfoo\tbar"));
- Assert.assertTrue(rs.get(0), rs.get(0).endsWith("streamingnobuckets/base_0000005/bucket_00000"));
- Assert.assertTrue(rs.get(1), rs.get(1).startsWith("{\"writeid\":2,\"bucketid\":536870912,\"rowid\":1}\ta3\tb4"));
- Assert.assertTrue(rs.get(1), rs.get(1).endsWith("streamingnobuckets/base_0000005/bucket_00000"));
- Assert.assertTrue(rs.get(2), rs.get(2).startsWith("{\"writeid\":3,\"bucketid\":536870912,\"rowid\":0}\ta5\tb6"));
- Assert.assertTrue(rs.get(2), rs.get(2).endsWith("streamingnobuckets/base_0000005/bucket_00000"));
- Assert.assertTrue(rs.get(3), rs.get(3).startsWith("{\"writeid\":4,\"bucketid\":536870912,\"rowid\":0}\t0\t0"));
- Assert.assertTrue(rs.get(3), rs.get(3).endsWith("streamingnobuckets/base_0000005/bucket_00000"));
- }
-
- /**
- * this is a clone from TestTxnStatement2....
- */
- public static void runWorker(HiveConf hiveConf) throws MetaException {
- AtomicBoolean stop = new AtomicBoolean(true);
- Worker t = new Worker();
- t.setThreadId((int) t.getId());
- t.setConf(hiveConf);
- AtomicBoolean looped = new AtomicBoolean();
- t.init(stop, looped);
- t.run();
- }
-
- // stream data into streaming table with N buckets, then copy the data into another bucketed table
- // check if bucketing in both was done in the same way
- @Test
- public void testStreamBucketingMatchesRegularBucketing() throws Exception {
- int bucketCount = 100;
-
- String dbUri = "raw://" + new Path(dbFolder.newFolder().toString()).toUri().toString();
- String tableLoc = "'" + dbUri + Path.SEPARATOR + "streamedtable" + "'";
- String tableLoc2 = "'" + dbUri + Path.SEPARATOR + "finaltable" + "'";
- String tableLoc3 = "'" + dbUri + Path.SEPARATOR + "nobucket" + "'";
- try (IDriver driver = DriverFactory.newDriver(conf)) {
- runDDL(driver, "create database testBucketing3");
- runDDL(driver, "use testBucketing3");
- runDDL(driver, "create table streamedtable ( key1 string,key2 int,data string ) clustered by ( key1,key2 ) into "
- + bucketCount + " buckets stored as orc location " + tableLoc + " TBLPROPERTIES ('transactional'='true')");
- // In 'nobucket' table we capture bucketid from streamedtable to workaround a hive bug that prevents joins two identically bucketed tables
- runDDL(driver, "create table nobucket ( bucketid int, key1 string,key2 int,data string ) location " + tableLoc3);
- runDDL(driver,
- "create table finaltable ( bucketid int, key1 string,key2 int,data string ) clustered by ( key1,key2 ) into "
- + bucketCount + " buckets stored as orc location " + tableLoc2 + " TBLPROPERTIES ('transactional'='true')");
-
-
- String[] records = new String[]{
- "PSFAHYLZVC,29,EPNMA",
- "PPPRKWAYAU,96,VUTEE",
- "MIAOFERCHI,3,WBDSI",
- "CEGQAZOWVN,0,WCUZL",
- "XWAKMNSVQF,28,YJVHU",
- "XBWTSAJWME,2,KDQFO",
- "FUVLQTAXAY,5,LDSDG",
- "QTQMDJMGJH,6,QBOMA",
- "EFLOTLWJWN,71,GHWPS",
- "PEQNAOJHCM,82,CAAFI",
- "MOEKQLGZCP,41,RUACR",
- "QZXMCOPTID,37,LFLWE",
- "EYALVWICRD,13,JEZLC",
- "VYWLZAYTXX,16,DMVZX",
- "OSALYSQIXR,47,HNZVE",
- "JGKVHKCEGQ,25,KSCJB",
- "WQFMMYDHET,12,DTRWA",
- "AJOVAYZKZQ,15,YBKFO",
- "YAQONWCUAU,31,QJNHZ",
- "DJBXUEUOEB,35,IYCBL"
- };
-
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, "testBucketing3", "streamedtable", null);
- String[] colNames1 = new String[]{"key1", "key2", "data"};
- DelimitedInputWriter wr = new DelimitedInputWriter(colNames1, ",", endPt);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(2, wr);
- txnBatch.beginNextTransaction();
-
- for (String record : records) {
- txnBatch.write(record.toString().getBytes());
- }
-
- txnBatch.commit();
- txnBatch.close();
- connection.close();
-
- ArrayList res1 = queryTable(driver, "select row__id.bucketid, * from streamedtable order by key2");
- for (String re : res1) {
- System.out.println(re);
- }
-
- driver.run("insert into nobucket select row__id.bucketid,* from streamedtable");
- runDDL(driver, " insert into finaltable select * from nobucket");
- ArrayList res2 = queryTable(driver,
- "select row__id.bucketid,* from finaltable where row__id.bucketid<>bucketid");
- for (String s : res2) {
- LOG.error(s);
- }
- Assert.assertTrue(res2.isEmpty());
- } finally {
- conf.unset(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname);
- }
- }
-
-
- @Test
- public void testTableValidation() throws Exception {
- int bucketCount = 100;
-
- String dbUri = "raw://" + new Path(dbFolder.newFolder().toString()).toUri().toString();
- String tbl1 = "validation1";
- String tbl2 = "validation2";
-
- String tableLoc = "'" + dbUri + Path.SEPARATOR + tbl1 + "'";
- String tableLoc2 = "'" + dbUri + Path.SEPARATOR + tbl2 + "'";
-
- runDDL(driver, "create database testBucketing3");
- runDDL(driver, "use testBucketing3");
-
- runDDL(driver, "create table " + tbl1 + " ( key1 string, data string ) clustered by ( key1 ) into "
- + bucketCount + " buckets stored as orc location " + tableLoc + " TBLPROPERTIES ('transactional'='false')") ;
-
- runDDL(driver, "create table " + tbl2 + " ( key1 string, data string ) clustered by ( key1 ) into "
- + bucketCount + " buckets stored as orc location " + tableLoc2 + " TBLPROPERTIES ('transactional'='false')") ;
-
-
- try {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, "testBucketing3", "validation1", null);
- endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- Assert.assertTrue("InvalidTable exception was not thrown", false);
- } catch (InvalidTable e) {
- // expecting this exception
- }
- try {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, "testBucketing3", "validation2", null);
- endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- Assert.assertTrue("InvalidTable exception was not thrown", false);
- } catch (InvalidTable e) {
- // expecting this exception
- }
- }
-
- /**
- * @deprecated use {@link #checkDataWritten2(Path, long, long, int, String, boolean, String...)} -
- * there is little value in using InputFormat directly
- */
- @Deprecated
- private void checkDataWritten(Path partitionPath, long minTxn, long maxTxn, int buckets, int numExpectedFiles,
- String... records) throws Exception {
- ValidWriteIdList writeIds = msClient.getValidWriteIds(AcidUtils.getFullTableName(dbName, tblName));
- AcidUtils.Directory dir = AcidUtils.getAcidState(partitionPath, conf, writeIds);
- Assert.assertEquals(0, dir.getObsolete().size());
- Assert.assertEquals(0, dir.getOriginalFiles().size());
- List current = dir.getCurrentDirectories();
- System.out.println("Files found: ");
- for (AcidUtils.ParsedDelta pd : current) {
- System.out.println(pd.getPath().toString());
- }
- Assert.assertEquals(numExpectedFiles, current.size());
-
- // find the absolute minimum transaction
- long min = Long.MAX_VALUE;
- long max = Long.MIN_VALUE;
- for (AcidUtils.ParsedDelta pd : current) {
- if (pd.getMaxWriteId() > max) {
- max = pd.getMaxWriteId();
- }
- if (pd.getMinWriteId() < min) {
- min = pd.getMinWriteId();
- }
- }
- Assert.assertEquals(minTxn, min);
- Assert.assertEquals(maxTxn, max);
-
- InputFormat inf = new OrcInputFormat();
- JobConf job = new JobConf();
- job.set("mapred.input.dir", partitionPath.toString());
- job.set(BUCKET_COUNT, Integer.toString(buckets));
- job.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, "id,msg");
- job.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, "bigint:string");
- AcidUtils.setAcidOperationalProperties(job, true, null);
- job.setBoolean(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, true);
- job.set(ValidWriteIdList.VALID_WRITEIDS_KEY, writeIds.toString());
- InputSplit[] splits = inf.getSplits(job, buckets);
- Assert.assertEquals(numExpectedFiles, splits.length);
- org.apache.hadoop.mapred.RecordReader rr =
- inf.getRecordReader(splits[0], job, Reporter.NULL);
-
- NullWritable key = rr.createKey();
- OrcStruct value = rr.createValue();
- for (String record : records) {
- Assert.assertEquals(true, rr.next(key, value));
- Assert.assertEquals(record, value.toString());
- }
- Assert.assertEquals(false, rr.next(key, value));
- }
- /**
- * @param validationQuery query to read from table to compare data against {@code records}
- * @param records expected data. each row is CVS list of values
- */
- private void checkDataWritten2(Path partitionPath, long minTxn, long maxTxn, int numExpectedFiles,
- String validationQuery, boolean vectorize, String... records) throws Exception {
- ValidWriteIdList txns = msClient.getValidWriteIds(AcidUtils.getFullTableName(dbName, tblName));
- AcidUtils.Directory dir = AcidUtils.getAcidState(partitionPath, conf, txns);
- Assert.assertEquals(0, dir.getObsolete().size());
- Assert.assertEquals(0, dir.getOriginalFiles().size());
- List current = dir.getCurrentDirectories();
- System.out.println("Files found: ");
- for (AcidUtils.ParsedDelta pd : current) {
- System.out.println(pd.getPath().toString());
- }
- Assert.assertEquals(numExpectedFiles, current.size());
-
- // find the absolute minimum transaction
- long min = Long.MAX_VALUE;
- long max = Long.MIN_VALUE;
- for (AcidUtils.ParsedDelta pd : current) {
- if (pd.getMaxWriteId() > max) {
- max = pd.getMaxWriteId();
- }
- if (pd.getMinWriteId() < min) {
- min = pd.getMinWriteId();
- }
- }
- Assert.assertEquals(minTxn, min);
- Assert.assertEquals(maxTxn, max);
- boolean isVectorizationEnabled = conf.getBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED);
- if(vectorize) {
- conf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, true);
- }
-
- String currStrategy = conf.getVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY);
- for(String strategy : ((Validator.StringSet)HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY.getValidator()).getExpected()) {
- //run it with each split strategy - make sure there are differences
- conf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, strategy.toUpperCase());
- List actualResult = queryTable(driver, validationQuery);
- for (int i = 0; i < actualResult.size(); i++) {
- Assert.assertEquals("diff at [" + i + "]. actual=" + actualResult + " expected=" +
- Arrays.toString(records), records[i], actualResult.get(i));
- }
- }
- conf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, currStrategy);
- conf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, isVectorizationEnabled);
- }
-
- private void checkNothingWritten(Path partitionPath) throws Exception {
- ValidWriteIdList writeIds = msClient.getValidWriteIds(AcidUtils.getFullTableName(dbName, tblName));
- AcidUtils.Directory dir = AcidUtils.getAcidState(partitionPath, conf, writeIds);
- Assert.assertEquals(0, dir.getObsolete().size());
- Assert.assertEquals(0, dir.getOriginalFiles().size());
- List current = dir.getCurrentDirectories();
- Assert.assertEquals(0, current.size());
- }
-
- @Test
- public void testEndpointConnection() throws Exception {
- // For partitioned table, partitionVals are specified
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName, partitionVals);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName()); //shouldn't throw
- connection.close();
-
- // For unpartitioned table, partitionVals are not specified
- endPt = new HiveEndPoint(metaStoreURI, dbName2, tblName2, null);
- endPt.newConnection(false, "UT_" + Thread.currentThread().getName()).close(); // should not throw
-
- // For partitioned table, partitionVals are not specified
- try {
- endPt = new HiveEndPoint(metaStoreURI, dbName, tblName, null);
- connection = endPt.newConnection(true, "UT_" + Thread.currentThread().getName());
- Assert.assertTrue("ConnectionError was not thrown", false);
- connection.close();
- } catch (ConnectionError e) {
- // expecting this exception
- String errMsg = "doesn't specify any partitions for partitioned table";
- Assert.assertTrue(e.toString().endsWith(errMsg));
- }
-
- // For unpartitioned table, partition values are specified
- try {
- endPt = new HiveEndPoint(metaStoreURI, dbName2, tblName2, partitionVals);
- connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- Assert.assertTrue("ConnectionError was not thrown", false);
- connection.close();
- } catch (ConnectionError e) {
- // expecting this exception
- String errMsg = "specifies partitions for unpartitioned table";
- Assert.assertTrue(e.toString().endsWith(errMsg));
- }
- }
-
- @Test
- public void testAddPartition() throws Exception {
- List newPartVals = new ArrayList(2);
- newPartVals.add(PART1_CONTINENT);
- newPartVals.add("Nepal");
-
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName
- , newPartVals);
-
- // Ensure partition is absent
- try {
- msClient.getPartition(endPt.database, endPt.table, endPt.partitionVals);
- Assert.assertTrue("Partition already exists", false);
- } catch (NoSuchObjectException e) {
- // expect this exception
- }
-
- // Create partition
- Assert.assertNotNull(endPt.newConnection(true, "UT_" + Thread.currentThread().getName()));
-
- // Ensure partition is present
- Partition p = msClient.getPartition(endPt.database, endPt.table, endPt.partitionVals);
- Assert.assertNotNull("Did not find added partition", p);
- }
-
- @Test
- public void testTransactionBatchEmptyCommit() throws Exception {
- // 1) to partitioned table
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames,",", endPt, connection);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(10, writer);
-
- txnBatch.beginNextTransaction();
- txnBatch.commit();
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch.getCurrentTransactionState());
- txnBatch.close();
- connection.close();
-
- // 2) To unpartitioned table
- endPt = new HiveEndPoint(metaStoreURI, dbName2, tblName2, null);
- writer = new DelimitedInputWriter(fieldNames2,",", endPt);
- connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
-
- txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- txnBatch.commit();
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch.getCurrentTransactionState());
- txnBatch.close();
- connection.close();
- }
-
- /**
- * check that transactions that have not heartbeated and timedout get properly aborted
- * @throws Exception
- */
- @Test
- public void testTimeOutReaper() throws Exception {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName2, tblName2, null);
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames2,",", endPt);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(5, writer);
- txnBatch.beginNextTransaction();
- conf.setTimeVar(HiveConf.ConfVars.HIVE_TIMEDOUT_TXN_REAPER_START, 0, TimeUnit.SECONDS);
- //ensure txn timesout
- conf.setTimeVar(HiveConf.ConfVars.HIVE_TXN_TIMEOUT, 1, TimeUnit.MILLISECONDS);
- AcidHouseKeeperService houseKeeperService = new AcidHouseKeeperService();
- houseKeeperService.setConf(conf);
- houseKeeperService.run();
- try {
- //should fail because the TransactionBatch timed out
- txnBatch.commit();
- }
- catch(TransactionError e) {
- Assert.assertTrue("Expected aborted transaction", e.getCause() instanceof TxnAbortedException);
- }
- txnBatch.close();
- txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- txnBatch.commit();
- txnBatch.beginNextTransaction();
- houseKeeperService.run();
- try {
- //should fail because the TransactionBatch timed out
- txnBatch.commit();
- }
- catch(TransactionError e) {
- Assert.assertTrue("Expected aborted transaction", e.getCause() instanceof TxnAbortedException);
- }
- txnBatch.close();
- connection.close();
- }
-
- @Test
- public void testHeartbeat() throws Exception {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName2, tblName2, null);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames2,",", endPt, connection);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(5, writer);
- txnBatch.beginNextTransaction();
- //todo: this should ideally check Transaction heartbeat as well, but heartbeat
- //timestamp is not reported yet
- //GetOpenTxnsInfoResponse txnresp = msClient.showTxns();
- ShowLocksRequest request = new ShowLocksRequest();
- request.setDbname(dbName2);
- request.setTablename(tblName2);
- ShowLocksResponse response = msClient.showLocks(request);
- Assert.assertEquals("Wrong nubmer of locks: " + response, 1, response.getLocks().size());
- ShowLocksResponseElement lock = response.getLocks().get(0);
- long acquiredAt = lock.getAcquiredat();
- long heartbeatAt = lock.getLastheartbeat();
- txnBatch.heartbeat();
- response = msClient.showLocks(request);
- Assert.assertEquals("Wrong number of locks2: " + response, 1, response.getLocks().size());
- lock = response.getLocks().get(0);
- Assert.assertEquals("Acquired timestamp didn't match", acquiredAt, lock.getAcquiredat());
- Assert.assertTrue("Expected new heartbeat (" + lock.getLastheartbeat() +
- ") == old heartbeat(" + heartbeatAt +")", lock.getLastheartbeat() == heartbeatAt);
- txnBatch.close();
- int txnBatchSize = 200;
- txnBatch = connection.fetchTransactionBatch(txnBatchSize, writer);
- for(int i = 0; i < txnBatchSize; i++) {
- txnBatch.beginNextTransaction();
- if(i % 47 == 0) {
- txnBatch.heartbeat();
- }
- if(i % 10 == 0) {
- txnBatch.abort();
- }
- else {
- txnBatch.commit();
- }
- if(i % 37 == 0) {
- txnBatch.heartbeat();
- }
- }
-
- }
- @Test
- public void testTransactionBatchEmptyAbort() throws Exception {
- // 1) to partitioned table
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- StreamingConnection connection = endPt.newConnection(true, "UT_" + Thread.currentThread().getName());
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames,",", endPt, connection);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- txnBatch.abort();
- Assert.assertEquals(TransactionBatch.TxnState.ABORTED
- , txnBatch.getCurrentTransactionState());
- txnBatch.close();
- connection.close();
-
- // 2) to unpartitioned table
- endPt = new HiveEndPoint(metaStoreURI, dbName2, tblName2, null);
- writer = new DelimitedInputWriter(fieldNames,",", endPt);
- connection = endPt.newConnection(true, "UT_" + Thread.currentThread().getName());
-
- txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- txnBatch.abort();
- Assert.assertEquals(TransactionBatch.TxnState.ABORTED
- , txnBatch.getCurrentTransactionState());
- txnBatch.close();
- connection.close();
- }
-
- @Test
- public void testTransactionBatchCommit_Delimited() throws Exception {
- testTransactionBatchCommit_Delimited(null);
- }
- @Test
- public void testTransactionBatchCommit_DelimitedUGI() throws Exception {
- testTransactionBatchCommit_Delimited(Utils.getUGI());
- }
- private void testTransactionBatchCommit_Delimited(UserGroupInformation ugi) throws Exception {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- StreamingConnection connection = endPt.newConnection(true, conf, ugi, "UT_" + Thread.currentThread().getName());
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames,",", endPt, conf, connection);
-
- // 1st Txn
- TransactionBatch txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- Assert.assertEquals(TransactionBatch.TxnState.OPEN
- , txnBatch.getCurrentTransactionState());
- txnBatch.write("1,Hello streaming".getBytes());
- txnBatch.commit();
-
- checkDataWritten(partLoc, 1, 10, 1, 1, "{1, Hello streaming}");
-
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch.getCurrentTransactionState());
-
- // 2nd Txn
- txnBatch.beginNextTransaction();
- Assert.assertEquals(TransactionBatch.TxnState.OPEN
- , txnBatch.getCurrentTransactionState());
- txnBatch.write("2,Welcome to streaming".getBytes());
-
- // data should not be visible
- checkDataWritten(partLoc, 1, 10, 1, 1, "{1, Hello streaming}");
-
- txnBatch.commit();
-
- checkDataWritten(partLoc, 1, 10, 1, 1, "{1, Hello streaming}",
- "{2, Welcome to streaming}");
-
- txnBatch.close();
- Assert.assertEquals(TransactionBatch.TxnState.INACTIVE
- , txnBatch.getCurrentTransactionState());
-
-
- connection.close();
-
-
- // To Unpartitioned table
- endPt = new HiveEndPoint(metaStoreURI, dbName2, tblName2, null);
- connection = endPt.newConnection(true, conf, ugi, "UT_" + Thread.currentThread().getName());
- writer = new DelimitedInputWriter(fieldNames,",", endPt, conf, connection);
-
- // 1st Txn
- txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- Assert.assertEquals(TransactionBatch.TxnState.OPEN
- , txnBatch.getCurrentTransactionState());
- txnBatch.write("1,Hello streaming".getBytes());
- txnBatch.commit();
-
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch.getCurrentTransactionState());
- connection.close();
- }
-
- @Test
- public void testTransactionBatchCommit_Regex() throws Exception {
- testTransactionBatchCommit_Regex(null);
- }
- @Test
- public void testTransactionBatchCommit_RegexUGI() throws Exception {
- testTransactionBatchCommit_Regex(Utils.getUGI());
- }
- private void testTransactionBatchCommit_Regex(UserGroupInformation ugi) throws Exception {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- StreamingConnection connection = endPt.newConnection(true, conf, ugi, "UT_" + Thread.currentThread().getName());
- String regex = "([^,]*),(.*)";
- StrictRegexWriter writer = new StrictRegexWriter(regex, endPt, conf, connection);
-
- // 1st Txn
- TransactionBatch txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- Assert.assertEquals(TransactionBatch.TxnState.OPEN
- , txnBatch.getCurrentTransactionState());
- txnBatch.write("1,Hello streaming".getBytes());
- txnBatch.commit();
-
- checkDataWritten(partLoc, 1, 10, 1, 1, "{1, Hello streaming}");
-
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch.getCurrentTransactionState());
-
- // 2nd Txn
- txnBatch.beginNextTransaction();
- Assert.assertEquals(TransactionBatch.TxnState.OPEN
- , txnBatch.getCurrentTransactionState());
- txnBatch.write("2,Welcome to streaming".getBytes());
-
- // data should not be visible
- checkDataWritten(partLoc, 1, 10, 1, 1, "{1, Hello streaming}");
-
- txnBatch.commit();
-
- checkDataWritten(partLoc, 1, 10, 1, 1, "{1, Hello streaming}",
- "{2, Welcome to streaming}");
-
- txnBatch.close();
- Assert.assertEquals(TransactionBatch.TxnState.INACTIVE
- , txnBatch.getCurrentTransactionState());
-
-
- connection.close();
-
-
- // To Unpartitioned table
- endPt = new HiveEndPoint(metaStoreURI, dbName2, tblName2, null);
- connection = endPt.newConnection(true, conf, ugi, "UT_" + Thread.currentThread().getName());
- regex = "([^:]*):(.*)";
- writer = new StrictRegexWriter(regex, endPt, conf, connection);
-
- // 1st Txn
- txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- Assert.assertEquals(TransactionBatch.TxnState.OPEN
- , txnBatch.getCurrentTransactionState());
- txnBatch.write("1:Hello streaming".getBytes());
- txnBatch.commit();
-
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch.getCurrentTransactionState());
- connection.close();
- }
-
- @Test
- public void testTransactionBatchCommit_Json() throws Exception {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- StreamingConnection connection = endPt.newConnection(true, "UT_" + Thread.currentThread().getName());
- StrictJsonWriter writer = new StrictJsonWriter(endPt, connection);
-
- // 1st Txn
- TransactionBatch txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- Assert.assertEquals(TransactionBatch.TxnState.OPEN
- , txnBatch.getCurrentTransactionState());
- String rec1 = "{\"id\" : 1, \"msg\": \"Hello streaming\"}";
- txnBatch.write(rec1.getBytes());
- txnBatch.commit();
-
- checkDataWritten(partLoc, 1, 10, 1, 1, "{1, Hello streaming}");
-
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch.getCurrentTransactionState());
-
- txnBatch.close();
- Assert.assertEquals(TransactionBatch.TxnState.INACTIVE
- , txnBatch.getCurrentTransactionState());
-
- connection.close();
- List rs = queryTable(driver, "select * from " + dbName + "." + tblName);
- Assert.assertEquals(1, rs.size());
- }
-
- @Test
- public void testRemainingTransactions() throws Exception {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames,",", endPt);
- StreamingConnection connection = endPt.newConnection(true, "UT_" + Thread.currentThread().getName());
-
- // 1) test with txn.Commit()
- TransactionBatch txnBatch = connection.fetchTransactionBatch(10, writer);
- int batch=0;
- int initialCount = txnBatch.remainingTransactions();
- while (txnBatch.remainingTransactions()>0) {
- txnBatch.beginNextTransaction();
- Assert.assertEquals(--initialCount, txnBatch.remainingTransactions());
- for (int rec=0; rec<2; ++rec) {
- Assert.assertEquals(TransactionBatch.TxnState.OPEN
- , txnBatch.getCurrentTransactionState());
- txnBatch.write((batch * rec + ",Hello streaming").getBytes());
- }
- txnBatch.commit();
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch.getCurrentTransactionState());
- ++batch;
- }
- Assert.assertEquals(0, txnBatch.remainingTransactions());
- txnBatch.close();
-
- Assert.assertEquals(TransactionBatch.TxnState.INACTIVE
- , txnBatch.getCurrentTransactionState());
-
- // 2) test with txn.Abort()
- txnBatch = connection.fetchTransactionBatch(10, writer);
- batch=0;
- initialCount = txnBatch.remainingTransactions();
- while (txnBatch.remainingTransactions()>0) {
- txnBatch.beginNextTransaction();
- Assert.assertEquals(--initialCount,txnBatch.remainingTransactions());
- for (int rec=0; rec<2; ++rec) {
- Assert.assertEquals(TransactionBatch.TxnState.OPEN
- , txnBatch.getCurrentTransactionState());
- txnBatch.write((batch * rec + ",Hello streaming").getBytes());
- }
- txnBatch.abort();
- Assert.assertEquals(TransactionBatch.TxnState.ABORTED
- , txnBatch.getCurrentTransactionState());
- ++batch;
- }
- Assert.assertEquals(0, txnBatch.remainingTransactions());
- txnBatch.close();
-
- Assert.assertEquals(TransactionBatch.TxnState.INACTIVE
- , txnBatch.getCurrentTransactionState());
-
- connection.close();
- }
-
- @Test
- public void testTransactionBatchAbort() throws Exception {
-
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames,",", endPt, connection);
-
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("1,Hello streaming".getBytes());
- txnBatch.write("2,Welcome to streaming".getBytes());
- txnBatch.abort();
-
- checkNothingWritten(partLoc);
-
- Assert.assertEquals(TransactionBatch.TxnState.ABORTED
- , txnBatch.getCurrentTransactionState());
-
- txnBatch.close();
- connection.close();
-
- checkNothingWritten(partLoc);
-
- }
-
-
- @Test
- public void testTransactionBatchAbortAndCommit() throws Exception {
- String agentInfo = "UT_" + Thread.currentThread().getName();
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- StreamingConnection connection = endPt.newConnection(false, agentInfo);
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames,",", endPt, connection);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("1,Hello streaming".getBytes());
- txnBatch.write("2,Welcome to streaming".getBytes());
- ShowLocksResponse resp = msClient.showLocks(new ShowLocksRequest());
- Assert.assertEquals("LockCount", 1, resp.getLocksSize());
- Assert.assertEquals("LockType", LockType.SHARED_READ, resp.getLocks().get(0).getType());
- Assert.assertEquals("LockState", LockState.ACQUIRED, resp.getLocks().get(0).getState());
- Assert.assertEquals("AgentInfo", agentInfo, resp.getLocks().get(0).getAgentInfo());
- txnBatch.abort();
-
- checkNothingWritten(partLoc);
-
- Assert.assertEquals(TransactionBatch.TxnState.ABORTED
- , txnBatch.getCurrentTransactionState());
-
- txnBatch.beginNextTransaction();
- txnBatch.write("1,Hello streaming".getBytes());
- txnBatch.write("2,Welcome to streaming".getBytes());
- txnBatch.commit();
-
- checkDataWritten(partLoc, 1, 10, 1, 1, "{1, Hello streaming}",
- "{2, Welcome to streaming}");
-
- txnBatch.close();
- connection.close();
- }
-
- @Test
- public void testMultipleTransactionBatchCommits() throws Exception {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames,",", endPt);
- StreamingConnection connection = endPt.newConnection(true, "UT_" + Thread.currentThread().getName());
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("1,Hello streaming".getBytes());
- txnBatch.commit();
- String validationQuery = "select id, msg from " + dbName + "." + tblName + " order by id, msg";
- checkDataWritten2(partLoc, 1, 10, 1, validationQuery, false, "1\tHello streaming");
-
- txnBatch.beginNextTransaction();
- txnBatch.write("2,Welcome to streaming".getBytes());
- txnBatch.commit();
-
- checkDataWritten2(partLoc, 1, 10, 1, validationQuery, true, "1\tHello streaming",
- "2\tWelcome to streaming");
-
- txnBatch.close();
-
- // 2nd Txn Batch
- txnBatch = connection.fetchTransactionBatch(10, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("3,Hello streaming - once again".getBytes());
- txnBatch.commit();
-
- checkDataWritten2(partLoc, 1, 20, 2, validationQuery, false, "1\tHello streaming",
- "2\tWelcome to streaming", "3\tHello streaming - once again");
-
- txnBatch.beginNextTransaction();
- txnBatch.write("4,Welcome to streaming - once again".getBytes());
- txnBatch.commit();
-
- checkDataWritten2(partLoc, 1, 20, 2, validationQuery, true, "1\tHello streaming",
- "2\tWelcome to streaming", "3\tHello streaming - once again",
- "4\tWelcome to streaming - once again");
-
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch.getCurrentTransactionState());
-
- txnBatch.close();
-
- connection.close();
- }
-
- @Test
- public void testInterleavedTransactionBatchCommits() throws Exception {
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName, tblName,
- partitionVals);
- DelimitedInputWriter writer = new DelimitedInputWriter(fieldNames, ",", endPt);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
-
- // Acquire 1st Txn Batch
- TransactionBatch txnBatch1 = connection.fetchTransactionBatch(10, writer);
- txnBatch1.beginNextTransaction();
-
- // Acquire 2nd Txn Batch
- DelimitedInputWriter writer2 = new DelimitedInputWriter(fieldNames, ",", endPt);
- TransactionBatch txnBatch2 = connection.fetchTransactionBatch(10, writer2);
- txnBatch2.beginNextTransaction();
-
- // Interleaved writes to both batches
- txnBatch1.write("1,Hello streaming".getBytes());
- txnBatch2.write("3,Hello streaming - once again".getBytes());
-
- checkNothingWritten(partLoc);
-
- txnBatch2.commit();
-
- String validationQuery = "select id, msg from " + dbName + "." + tblName + " order by id, msg";
- checkDataWritten2(partLoc, 11, 20, 1,
- validationQuery, true, "3\tHello streaming - once again");
-
- txnBatch1.commit();
- /*now both batches have committed (but not closed) so we for each primary file we expect a side
- file to exist and indicate the true length of primary file*/
- FileSystem fs = partLoc.getFileSystem(conf);
- AcidUtils.Directory dir = AcidUtils.getAcidState(partLoc, conf,
- msClient.getValidWriteIds(AcidUtils.getFullTableName(dbName, tblName)));
- for(AcidUtils.ParsedDelta pd : dir.getCurrentDirectories()) {
- for(FileStatus stat : fs.listStatus(pd.getPath(), AcidUtils.bucketFileFilter)) {
- Path lengthFile = OrcAcidUtils.getSideFile(stat.getPath());
- Assert.assertTrue(lengthFile + " missing", fs.exists(lengthFile));
- long lengthFileSize = fs.getFileStatus(lengthFile).getLen();
- Assert.assertTrue("Expected " + lengthFile + " to be non empty. lengh=" +
- lengthFileSize, lengthFileSize > 0);
- long logicalLength = AcidUtils.getLogicalLength(fs, stat);
- long actualLength = stat.getLen();
- Assert.assertTrue("", logicalLength == actualLength);
- }
- }
- checkDataWritten2(partLoc, 1, 20, 2,
- validationQuery, false,"1\tHello streaming", "3\tHello streaming - once again");
-
- txnBatch1.beginNextTransaction();
- txnBatch1.write("2,Welcome to streaming".getBytes());
-
- txnBatch2.beginNextTransaction();
- txnBatch2.write("4,Welcome to streaming - once again".getBytes());
- //here each batch has written data and committed (to bucket0 since table only has 1 bucket)
- //so each of 2 deltas has 1 bucket0 and 1 bucket0_flush_length. Furthermore, each bucket0
- //has now received more data(logically - it's buffered) but it is not yet committed.
- //lets check that side files exist, etc
- dir = AcidUtils.getAcidState(partLoc, conf, msClient.getValidWriteIds(AcidUtils.getFullTableName(dbName, tblName)));
- for(AcidUtils.ParsedDelta pd : dir.getCurrentDirectories()) {
- for(FileStatus stat : fs.listStatus(pd.getPath(), AcidUtils.bucketFileFilter)) {
- Path lengthFile = OrcAcidUtils.getSideFile(stat.getPath());
- Assert.assertTrue(lengthFile + " missing", fs.exists(lengthFile));
- long lengthFileSize = fs.getFileStatus(lengthFile).getLen();
- Assert.assertTrue("Expected " + lengthFile + " to be non empty. lengh=" +
- lengthFileSize, lengthFileSize > 0);
- long logicalLength = AcidUtils.getLogicalLength(fs, stat);
- long actualLength = stat.getLen();
- Assert.assertTrue("", logicalLength <= actualLength);
- }
- }
- checkDataWritten2(partLoc, 1, 20, 2,
- validationQuery, true,"1\tHello streaming", "3\tHello streaming - once again");
-
- txnBatch1.commit();
-
- checkDataWritten2(partLoc, 1, 20, 2,
- validationQuery, false, "1\tHello streaming",
- "2\tWelcome to streaming",
- "3\tHello streaming - once again");
-
- txnBatch2.commit();
-
- checkDataWritten2(partLoc, 1, 20, 2,
- validationQuery, true, "1\tHello streaming",
- "2\tWelcome to streaming",
- "3\tHello streaming - once again",
- "4\tWelcome to streaming - once again");
-
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch1.getCurrentTransactionState());
- Assert.assertEquals(TransactionBatch.TxnState.COMMITTED
- , txnBatch2.getCurrentTransactionState());
-
- txnBatch1.close();
- txnBatch2.close();
-
- connection.close();
- }
-
- private static class WriterThd extends Thread {
-
- private final StreamingConnection conn;
- private final DelimitedInputWriter writer;
- private final String data;
- private Throwable error;
-
- WriterThd(HiveEndPoint ep, String data) throws Exception {
- super("Writer_" + data);
- writer = new DelimitedInputWriter(fieldNames, ",", ep);
- conn = ep.newConnection(false, "UT_" + Thread.currentThread().getName());
- this.data = data;
- setUncaughtExceptionHandler(new UncaughtExceptionHandler() {
- @Override
- public void uncaughtException(Thread thread, Throwable throwable) {
- error = throwable;
- LOG.error("Thread " + thread.getName() + " died: " + throwable.getMessage(), throwable);
- }
- });
- }
-
- @Override
- public void run() {
- TransactionBatch txnBatch = null;
- try {
- txnBatch = conn.fetchTransactionBatch(10, writer);
- while (txnBatch.remainingTransactions() > 0) {
- txnBatch.beginNextTransaction();
- txnBatch.write(data.getBytes());
- txnBatch.write(data.getBytes());
- txnBatch.commit();
- } // while
- } catch (Exception e) {
- throw new RuntimeException(e);
- } finally {
- if (txnBatch != null) {
- try {
- txnBatch.close();
- } catch (Exception e) {
- LOG.error("txnBatch.close() failed: " + e.getMessage(), e);
- conn.close();
- }
- }
- try {
- conn.close();
- } catch (Exception e) {
- LOG.error("conn.close() failed: " + e.getMessage(), e);
- }
-
- }
- }
- }
-
- @Test
- public void testConcurrentTransactionBatchCommits() throws Exception {
- final HiveEndPoint ep = new HiveEndPoint(metaStoreURI, dbName, tblName, partitionVals);
- List writers = new ArrayList(3);
- writers.add(new WriterThd(ep, "1,Matrix"));
- writers.add(new WriterThd(ep, "2,Gandhi"));
- writers.add(new WriterThd(ep, "3,Silence"));
-
- for(WriterThd w : writers) {
- w.start();
- }
- for(WriterThd w : writers) {
- w.join();
- }
- for(WriterThd w : writers) {
- if(w.error != null) {
- Assert.assertFalse("Writer thread" + w.getName() + " died: " + w.error.getMessage() +
- " See log file for stack trace", true);
- }
- }
- }
-
-
- private ArrayList dumpBucket(Path orcFile) throws IOException {
- org.apache.hadoop.fs.FileSystem fs = org.apache.hadoop.fs.FileSystem.getLocal(new Configuration());
- Reader reader = OrcFile.createReader(orcFile,
- OrcFile.readerOptions(conf).filesystem(fs));
-
- RecordReader rows = reader.rows();
- StructObjectInspector inspector = (StructObjectInspector) reader
- .getObjectInspector();
-
- System.out.format("Found Bucket File : %s \n", orcFile.getName());
- ArrayList result = new ArrayList();
- while (rows.hasNext()) {
- Object row = rows.next(null);
- SampleRec rec = (SampleRec) deserializeDeltaFileRow(row, inspector)[5];
- result.add(rec);
- }
-
- return result;
- }
-
- // Assumes stored data schema = [acid fields],string,int,string
- // return array of 6 fields, where the last field has the actual data
- private static Object[] deserializeDeltaFileRow(Object row, StructObjectInspector inspector) {
- List extends StructField> fields = inspector.getAllStructFieldRefs();
-
- WritableIntObjectInspector f0ins = (WritableIntObjectInspector) fields.get(0).getFieldObjectInspector();
- WritableLongObjectInspector f1ins = (WritableLongObjectInspector) fields.get(1).getFieldObjectInspector();
- WritableIntObjectInspector f2ins = (WritableIntObjectInspector) fields.get(2).getFieldObjectInspector();
- WritableLongObjectInspector f3ins = (WritableLongObjectInspector) fields.get(3).getFieldObjectInspector();
- WritableLongObjectInspector f4ins = (WritableLongObjectInspector) fields.get(4).getFieldObjectInspector();
- StructObjectInspector f5ins = (StructObjectInspector) fields.get(5).getFieldObjectInspector();
-
- int f0 = f0ins.get(inspector.getStructFieldData(row, fields.get(0)));
- long f1 = f1ins.get(inspector.getStructFieldData(row, fields.get(1)));
- int f2 = f2ins.get(inspector.getStructFieldData(row, fields.get(2)));
- long f3 = f3ins.get(inspector.getStructFieldData(row, fields.get(3)));
- long f4 = f4ins.get(inspector.getStructFieldData(row, fields.get(4)));
- SampleRec f5 = deserializeInner(inspector.getStructFieldData(row, fields.get(5)), f5ins);
-
- return new Object[] {f0, f1, f2, f3, f4, f5};
- }
-
- // Assumes row schema => string,int,string
- private static SampleRec deserializeInner(Object row, StructObjectInspector inspector) {
- List extends StructField> fields = inspector.getAllStructFieldRefs();
-
- WritableStringObjectInspector f0ins = (WritableStringObjectInspector) fields.get(0).getFieldObjectInspector();
- WritableIntObjectInspector f1ins = (WritableIntObjectInspector) fields.get(1).getFieldObjectInspector();
- WritableStringObjectInspector f2ins = (WritableStringObjectInspector) fields.get(2).getFieldObjectInspector();
-
- String f0 = f0ins.getPrimitiveJavaObject(inspector.getStructFieldData(row, fields.get(0)));
- int f1 = f1ins.get(inspector.getStructFieldData(row, fields.get(1)));
- String f2 = f2ins.getPrimitiveJavaObject(inspector.getStructFieldData(row, fields.get(2)));
- return new SampleRec(f0, f1, f2);
- }
-
- @Test
- public void testBucketing() throws Exception {
- String agentInfo = "UT_" + Thread.currentThread().getName();
- dropDB(msClient, dbName3);
- dropDB(msClient, dbName4);
-
- // 1) Create two bucketed tables
- String dbLocation = dbFolder.newFolder(dbName3).getCanonicalPath() + ".db";
- dbLocation = dbLocation.replaceAll("\\\\","/"); // for windows paths
- String[] colNames = "key1,key2,data".split(",");
- String[] colTypes = "string,int,string".split(",");
- String[] bucketNames = "key1,key2".split(",");
- int bucketCount = 4;
- createDbAndTable(driver, dbName3, tblName3, null, colNames, colTypes, bucketNames
- , null, dbLocation, bucketCount);
-
- String dbLocation2 = dbFolder.newFolder(dbName4).getCanonicalPath() + ".db";
- dbLocation2 = dbLocation2.replaceAll("\\\\","/"); // for windows paths
- String[] colNames2 = "key3,key4,data2".split(",");
- String[] colTypes2 = "string,int,string".split(",");
- String[] bucketNames2 = "key3,key4".split(",");
- createDbAndTable(driver, dbName4, tblName4, null, colNames2, colTypes2, bucketNames2
- , null, dbLocation2, bucketCount);
-
-
- // 2) Insert data into both tables
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName3, tblName3, null);
- StreamingConnection connection = endPt.newConnection(false, agentInfo);
- DelimitedInputWriter writer = new DelimitedInputWriter(colNames,",", endPt, connection);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(2, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("name0,1,Hello streaming".getBytes());
- txnBatch.write("name2,2,Welcome to streaming".getBytes());
- txnBatch.write("name4,2,more Streaming unlimited".getBytes());
- txnBatch.write("name5,2,even more Streaming unlimited".getBytes());
- txnBatch.commit();
-
-
- HiveEndPoint endPt2 = new HiveEndPoint(metaStoreURI, dbName4, tblName4, null);
- StreamingConnection connection2 = endPt2.newConnection(false, agentInfo);
- DelimitedInputWriter writer2 = new DelimitedInputWriter(colNames2,",", endPt2, connection);
- TransactionBatch txnBatch2 = connection2.fetchTransactionBatch(2, writer2);
- txnBatch2.beginNextTransaction();
-
- txnBatch2.write("name5,2,fact3".getBytes()); // bucket 0
- txnBatch2.write("name8,2,fact3".getBytes()); // bucket 1
- txnBatch2.write("name0,1,fact1".getBytes()); // bucket 2
-
- txnBatch2.commit();
-
- // 3 Check data distribution in buckets
-
- HashMap> actual1 = dumpAllBuckets(dbLocation, tblName3);
- HashMap> actual2 = dumpAllBuckets(dbLocation2, tblName4);
- System.err.println("\n Table 1");
- System.err.println(actual1);
- System.err.println("\n Table 2");
- System.err.println(actual2);
-
- // assert bucket listing is as expected
- Assert.assertEquals("number of buckets does not match expectation", actual1.values().size(), 3);
- Assert.assertTrue("bucket 0 shouldn't have been created", actual1.get(0) == null);
- Assert.assertEquals("records in bucket does not match expectation", actual1.get(1).size(), 1);
- Assert.assertEquals("records in bucket does not match expectation", actual1.get(2).size(), 2);
- Assert.assertEquals("records in bucket does not match expectation", actual1.get(3).size(), 1);
- }
- private void runCmdOnDriver(String cmd) throws QueryFailedException {
- boolean t = runDDL(driver, cmd);
- Assert.assertTrue(cmd + " failed", t);
- }
-
-
- @Test
- public void testFileDump() throws Exception {
- String agentInfo = "UT_" + Thread.currentThread().getName();
- dropDB(msClient, dbName3);
- dropDB(msClient, dbName4);
-
- // 1) Create two bucketed tables
- String dbLocation = dbFolder.newFolder(dbName3).getCanonicalPath() + ".db";
- dbLocation = dbLocation.replaceAll("\\\\","/"); // for windows paths
- String[] colNames = "key1,key2,data".split(",");
- String[] colTypes = "string,int,string".split(",");
- String[] bucketNames = "key1,key2".split(",");
- int bucketCount = 4;
- createDbAndTable(driver, dbName3, tblName3, null, colNames, colTypes, bucketNames
- , null, dbLocation, bucketCount);
-
- String dbLocation2 = dbFolder.newFolder(dbName4).getCanonicalPath() + ".db";
- dbLocation2 = dbLocation2.replaceAll("\\\\","/"); // for windows paths
- String[] colNames2 = "key3,key4,data2".split(",");
- String[] colTypes2 = "string,int,string".split(",");
- String[] bucketNames2 = "key3,key4".split(",");
- createDbAndTable(driver, dbName4, tblName4, null, colNames2, colTypes2, bucketNames2
- , null, dbLocation2, bucketCount);
-
-
- // 2) Insert data into both tables
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName3, tblName3, null);
- StreamingConnection connection = endPt.newConnection(false, agentInfo);
- DelimitedInputWriter writer = new DelimitedInputWriter(colNames,",", endPt, connection);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(2, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("name0,1,Hello streaming".getBytes());
- txnBatch.write("name2,2,Welcome to streaming".getBytes());
- txnBatch.write("name4,2,more Streaming unlimited".getBytes());
- txnBatch.write("name5,2,even more Streaming unlimited".getBytes());
- txnBatch.commit();
-
- PrintStream origErr = System.err;
- ByteArrayOutputStream myErr = new ByteArrayOutputStream();
-
- // replace stderr and run command
- System.setErr(new PrintStream(myErr));
- FileDump.main(new String[]{dbLocation});
- System.err.flush();
- System.setErr(origErr);
-
- String errDump = new String(myErr.toByteArray());
- Assert.assertEquals(false, errDump.contains("file(s) are corrupted"));
- // since this test runs on local file system which does not have an API to tell if files or
- // open or not, we are testing for negative case even though the bucket files are still open
- // for writes (transaction batch not closed yet)
- Assert.assertEquals(false, errDump.contains("is still open for writes."));
-
- HiveEndPoint endPt2 = new HiveEndPoint(metaStoreURI, dbName4, tblName4, null);
- DelimitedInputWriter writer2 = new DelimitedInputWriter(colNames2,",", endPt2);
- StreamingConnection connection2 = endPt2.newConnection(false, agentInfo);
- TransactionBatch txnBatch2 = connection2.fetchTransactionBatch(2, writer2);
- txnBatch2.beginNextTransaction();
-
- txnBatch2.write("name5,2,fact3".getBytes()); // bucket 0
- txnBatch2.write("name8,2,fact3".getBytes()); // bucket 1
- txnBatch2.write("name0,1,fact1".getBytes()); // bucket 2
- // no data for bucket 3 -- expect 0 length bucket file
-
- txnBatch2.commit();
-
- origErr = System.err;
- myErr = new ByteArrayOutputStream();
-
- // replace stderr and run command
- System.setErr(new PrintStream(myErr));
- FileDump.main(new String[]{dbLocation});
- System.out.flush();
- System.err.flush();
- System.setErr(origErr);
-
- errDump = new String(myErr.toByteArray());
- Assert.assertEquals(false, errDump.contains("Exception"));
- Assert.assertEquals(false, errDump.contains("file(s) are corrupted"));
- Assert.assertEquals(false, errDump.contains("is still open for writes."));
- }
-
- @Test
- public void testFileDumpCorruptDataFiles() throws Exception {
- dropDB(msClient, dbName3);
-
- // 1) Create two bucketed tables
- String dbLocation = dbFolder.newFolder(dbName3).getCanonicalPath() + ".db";
- dbLocation = dbLocation.replaceAll("\\\\","/"); // for windows paths
- String[] colNames = "key1,key2,data".split(",");
- String[] colTypes = "string,int,string".split(",");
- String[] bucketNames = "key1,key2".split(",");
- int bucketCount = 4;
- createDbAndTable(driver, dbName3, tblName3, null, colNames, colTypes, bucketNames
- , null, dbLocation, bucketCount);
-
- // 2) Insert data into both tables
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName3, tblName3, null);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- DelimitedInputWriter writer = new DelimitedInputWriter(colNames,",", endPt, connection);
-
- // we need side file for this test, so we create 2 txn batch and test with only one
- TransactionBatch txnBatch = connection.fetchTransactionBatch(2, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("name0,1,Hello streaming".getBytes());
- txnBatch.write("name2,2,Welcome to streaming".getBytes());
- txnBatch.write("name4,2,more Streaming unlimited".getBytes());
- txnBatch.write("name5,2,even more Streaming unlimited".getBytes());
- txnBatch.commit();
-
- // intentionally corrupt some files
- Path path = new Path(dbLocation);
- Collection files = FileDump.getAllFilesInPath(path, conf);
- int readableFooter = -1;
- for (String file : files) {
- if (file.contains("bucket_00000")) {
- // empty out the file
- corruptDataFile(file, conf, Integer.MIN_VALUE);
- } else if (file.contains("bucket_00001")) {
- corruptDataFile(file, conf, -1);
- } else if (file.contains("bucket_00002")) {
- corruptDataFile(file, conf, 100);
- } else if (file.contains("bucket_00003")) {
- corruptDataFile(file, conf, 100);
- }
- }
-
- PrintStream origErr = System.err;
- ByteArrayOutputStream myErr = new ByteArrayOutputStream();
-
- // replace stderr and run command
- System.setErr(new PrintStream(myErr));
- FileDump.main(new String[]{dbLocation});
- System.err.flush();
- System.setErr(origErr);
-
- String errDump = new String(myErr.toByteArray());
- Assert.assertEquals(false, errDump.contains("Exception"));
- Assert.assertEquals(true, errDump.contains("3 file(s) are corrupted"));
- Assert.assertEquals(false, errDump.contains("is still open for writes."));
-
- origErr = System.err;
- myErr = new ByteArrayOutputStream();
-
- // replace stderr and run command
- System.setErr(new PrintStream(myErr));
- FileDump.main(new String[]{dbLocation, "--recover", "--skip-dump"});
- System.err.flush();
- System.setErr(origErr);
-
- errDump = new String(myErr.toByteArray());
- Assert.assertEquals(true, errDump.contains("bucket_00001 recovered successfully!"));
- Assert.assertEquals(true, errDump.contains("No readable footers found. Creating empty orc file."));
- Assert.assertEquals(true, errDump.contains("bucket_00002 recovered successfully!"));
- Assert.assertEquals(true, errDump.contains("bucket_00003 recovered successfully!"));
- Assert.assertEquals(false, errDump.contains("Exception"));
- Assert.assertEquals(false, errDump.contains("is still open for writes."));
-
- // test after recovery
- origErr = System.err;
- myErr = new ByteArrayOutputStream();
-
- // replace stdout and run command
- System.setErr(new PrintStream(myErr));
- FileDump.main(new String[]{dbLocation});
- System.err.flush();
- System.setErr(origErr);
-
- errDump = new String(myErr.toByteArray());
- Assert.assertEquals(false, errDump.contains("Exception"));
- Assert.assertEquals(false, errDump.contains("file(s) are corrupted"));
- Assert.assertEquals(false, errDump.contains("is still open for writes."));
-
- // after recovery there shouldn't be any *_flush_length files
- files = FileDump.getAllFilesInPath(path, conf);
- for (String file : files) {
- Assert.assertEquals(false, file.contains("_flush_length"));
- }
-
- txnBatch.close();
- }
-
- private void corruptDataFile(final String file, final Configuration conf, final int addRemoveBytes)
- throws Exception {
- Path bPath = new Path(file);
- Path cPath = new Path(bPath.getParent(), bPath.getName() + ".corrupt");
- FileSystem fs = bPath.getFileSystem(conf);
- FileStatus fileStatus = fs.getFileStatus(bPath);
- int len = addRemoveBytes == Integer.MIN_VALUE ? 0 : (int) fileStatus.getLen() + addRemoveBytes;
- byte[] buffer = new byte[len];
- FSDataInputStream fdis = fs.open(bPath);
- fdis.readFully(0, buffer, 0, (int) Math.min(fileStatus.getLen(), buffer.length));
- fdis.close();
- FSDataOutputStream fdos = fs.create(cPath, true);
- fdos.write(buffer, 0, buffer.length);
- fdos.close();
- fs.delete(bPath, false);
- fs.rename(cPath, bPath);
- }
-
- @Test
- public void testFileDumpCorruptSideFiles() throws Exception {
- dropDB(msClient, dbName3);
-
- // 1) Create two bucketed tables
- String dbLocation = dbFolder.newFolder(dbName3).getCanonicalPath() + ".db";
- dbLocation = dbLocation.replaceAll("\\\\","/"); // for windows paths
- String[] colNames = "key1,key2,data".split(",");
- String[] colTypes = "string,int,string".split(",");
- String[] bucketNames = "key1,key2".split(",");
- int bucketCount = 4;
- createDbAndTable(driver, dbName3, tblName3, null, colNames, colTypes, bucketNames
- , null, dbLocation, bucketCount);
-
- // 2) Insert data into both tables
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, dbName3, tblName3, null);
- StreamingConnection connection = endPt.newConnection(false, "UT_" + Thread.currentThread().getName());
- DelimitedInputWriter writer = new DelimitedInputWriter(colNames,",", endPt, connection);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(2, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("name0,1,Hello streaming".getBytes());
- txnBatch.write("name2,2,Welcome to streaming".getBytes());
- txnBatch.write("name4,2,more Streaming unlimited".getBytes());
- txnBatch.write("name5,2,even more Streaming unlimited".getBytes());
- txnBatch.write("name6,3,aHello streaming".getBytes());
- txnBatch.commit();
-
- Map> offsetMap = new HashMap>();
- recordOffsets(conf, dbLocation, offsetMap);
-
- txnBatch.beginNextTransaction();
- txnBatch.write("name01,11,-Hello streaming".getBytes());
- txnBatch.write("name21,21,-Welcome to streaming".getBytes());
- txnBatch.write("name41,21,-more Streaming unlimited".getBytes());
- txnBatch.write("name51,21,-even more Streaming unlimited".getBytes());
- txnBatch.write("name02,12,--Hello streaming".getBytes());
- txnBatch.write("name22,22,--Welcome to streaming".getBytes());
- txnBatch.write("name42,22,--more Streaming unlimited".getBytes());
- txnBatch.write("name52,22,--even more Streaming unlimited".getBytes());
- txnBatch.write("name7,4,aWelcome to streaming".getBytes());
- txnBatch.write("name8,5,amore Streaming unlimited".getBytes());
- txnBatch.write("name9,6,aeven more Streaming unlimited".getBytes());
- txnBatch.write("name10,7,bHello streaming".getBytes());
- txnBatch.write("name11,8,bWelcome to streaming".getBytes());
- txnBatch.write("name12,9,bmore Streaming unlimited".getBytes());
- txnBatch.write("name13,10,beven more Streaming unlimited".getBytes());
- txnBatch.commit();
-
- recordOffsets(conf, dbLocation, offsetMap);
-
- // intentionally corrupt some files
- Path path = new Path(dbLocation);
- Collection files = FileDump.getAllFilesInPath(path, conf);
- for (String file : files) {
- if (file.contains("bucket_00000")) {
- corruptSideFile(file, conf, offsetMap, "bucket_00000", -1); // corrupt last entry
- } else if (file.contains("bucket_00001")) {
- corruptSideFile(file, conf, offsetMap, "bucket_00001", 0); // empty out side file
- } else if (file.contains("bucket_00002")) {
- corruptSideFile(file, conf, offsetMap, "bucket_00002", 3); // total 3 entries (2 valid + 1 fake)
- } else if (file.contains("bucket_00003")) {
- corruptSideFile(file, conf, offsetMap, "bucket_00003", 10); // total 10 entries (2 valid + 8 fake)
- }
- }
-
- PrintStream origErr = System.err;
- ByteArrayOutputStream myErr = new ByteArrayOutputStream();
-
- // replace stderr and run command
- System.setErr(new PrintStream(myErr));
- FileDump.main(new String[]{dbLocation});
- System.err.flush();
- System.setErr(origErr);
-
- String errDump = new String(myErr.toByteArray());
- Assert.assertEquals(true, errDump.contains("bucket_00000_flush_length [length: 11"));
- Assert.assertEquals(true, errDump.contains("bucket_00001_flush_length [length: 0"));
- Assert.assertEquals(true, errDump.contains("bucket_00002_flush_length [length: 24"));
- Assert.assertEquals(true, errDump.contains("bucket_00003_flush_length [length: 80"));
- Assert.assertEquals(false, errDump.contains("Exception"));
- Assert.assertEquals(true, errDump.contains("4 file(s) are corrupted"));
- Assert.assertEquals(false, errDump.contains("is still open for writes."));
-
- origErr = System.err;
- myErr = new ByteArrayOutputStream();
-
- // replace stderr and run command
- System.setErr(new PrintStream(myErr));
- FileDump.main(new String[]{dbLocation, "--recover", "--skip-dump"});
- System.err.flush();
- System.setErr(origErr);
-
- errDump = new String(myErr.toByteArray());
- Assert.assertEquals(true, errDump.contains("bucket_00000 recovered successfully!"));
- Assert.assertEquals(true, errDump.contains("bucket_00001 recovered successfully!"));
- Assert.assertEquals(true, errDump.contains("bucket_00002 recovered successfully!"));
- Assert.assertEquals(true, errDump.contains("bucket_00003 recovered successfully!"));
- List offsets = offsetMap.get("bucket_00000");
- Assert.assertEquals(true, errDump.contains("Readable footerOffsets: " + offsets.toString()));
- offsets = offsetMap.get("bucket_00001");
- Assert.assertEquals(true, errDump.contains("Readable footerOffsets: " + offsets.toString()));
- offsets = offsetMap.get("bucket_00002");
- Assert.assertEquals(true, errDump.contains("Readable footerOffsets: " + offsets.toString()));
- offsets = offsetMap.get("bucket_00003");
- Assert.assertEquals(true, errDump.contains("Readable footerOffsets: " + offsets.toString()));
- Assert.assertEquals(false, errDump.contains("Exception"));
- Assert.assertEquals(false, errDump.contains("is still open for writes."));
-
- // test after recovery
- origErr = System.err;
- myErr = new ByteArrayOutputStream();
-
- // replace stdout and run command
- System.setErr(new PrintStream(myErr));
- FileDump.main(new String[]{dbLocation});
- System.err.flush();
- System.setErr(origErr);
-
- errDump = new String(myErr.toByteArray());
- Assert.assertEquals(false, errDump.contains("Exception"));
- Assert.assertEquals(false, errDump.contains("file(s) are corrupted"));
- Assert.assertEquals(false, errDump.contains("is still open for writes."));
-
- // after recovery there shouldn't be any *_flush_length files
- files = FileDump.getAllFilesInPath(path, conf);
- for (String file : files) {
- Assert.assertEquals(false, file.contains("_flush_length"));
- }
-
- txnBatch.close();
- }
-
- private void corruptSideFile(final String file, final HiveConf conf,
- final Map> offsetMap, final String key, final int numEntries)
- throws IOException {
- Path dataPath = new Path(file);
- Path sideFilePath = OrcAcidUtils.getSideFile(dataPath);
- Path cPath = new Path(sideFilePath.getParent(), sideFilePath.getName() + ".corrupt");
- FileSystem fs = sideFilePath.getFileSystem(conf);
- List offsets = offsetMap.get(key);
- long lastOffset = offsets.get(offsets.size() - 1);
- FSDataOutputStream fdos = fs.create(cPath, true);
- // corrupt last entry
- if (numEntries < 0) {
- byte[] lastOffsetBytes = longToBytes(lastOffset);
- for (int i = 0; i < offsets.size() - 1; i++) {
- fdos.writeLong(offsets.get(i));
- }
-
- fdos.write(lastOffsetBytes, 0, 3);
- } else if (numEntries > 0) {
- int firstRun = Math.min(offsets.size(), numEntries);
- // add original entries
- for (int i=0; i < firstRun; i++) {
- fdos.writeLong(offsets.get(i));
- }
-
- // add fake entries
- int remaining = numEntries - firstRun;
- for (int i = 0; i < remaining; i++) {
- fdos.writeLong(lastOffset + ((i + 1) * 100));
- }
- }
-
- fdos.close();
- fs.delete(sideFilePath, false);
- fs.rename(cPath, sideFilePath);
- }
-
- private byte[] longToBytes(long x) {
- ByteBuffer buffer = ByteBuffer.allocate(8);
- buffer.putLong(x);
- return buffer.array();
- }
-
- private void recordOffsets(final HiveConf conf, final String dbLocation,
- final Map> offsetMap) throws IOException {
- Path path = new Path(dbLocation);
- Collection files = FileDump.getAllFilesInPath(path, conf);
- for (String file: files) {
- Path bPath = new Path(file);
- FileSystem fs = bPath.getFileSystem(conf);
- FileStatus fileStatus = fs.getFileStatus(bPath);
- long len = fileStatus.getLen();
-
- if (file.contains("bucket_00000")) {
- if (offsetMap.containsKey("bucket_00000")) {
- List offsets = offsetMap.get("bucket_00000");
- offsets.add(len);
- offsetMap.put("bucket_00000", offsets);
- } else {
- List offsets = new ArrayList();
- offsets.add(len);
- offsetMap.put("bucket_00000", offsets);
- }
- } else if (file.contains("bucket_00001")) {
- if (offsetMap.containsKey("bucket_00001")) {
- List offsets = offsetMap.get("bucket_00001");
- offsets.add(len);
- offsetMap.put("bucket_00001", offsets);
- } else {
- List offsets = new ArrayList();
- offsets.add(len);
- offsetMap.put("bucket_00001", offsets);
- }
- } else if (file.contains("bucket_00002")) {
- if (offsetMap.containsKey("bucket_00002")) {
- List offsets = offsetMap.get("bucket_00002");
- offsets.add(len);
- offsetMap.put("bucket_00002", offsets);
- } else {
- List offsets = new ArrayList();
- offsets.add(len);
- offsetMap.put("bucket_00002", offsets);
- }
- } else if (file.contains("bucket_00003")) {
- if (offsetMap.containsKey("bucket_00003")) {
- List offsets = offsetMap.get("bucket_00003");
- offsets.add(len);
- offsetMap.put("bucket_00003", offsets);
- } else {
- List offsets = new ArrayList();
- offsets.add(len);
- offsetMap.put("bucket_00003", offsets);
- }
- }
- }
- }
-
- @Test
- public void testErrorHandling() throws Exception {
- String agentInfo = "UT_" + Thread.currentThread().getName();
- runCmdOnDriver("create database testErrors");
- runCmdOnDriver("use testErrors");
- runCmdOnDriver("create table T(a int, b int) clustered by (b) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true')");
-
- HiveEndPoint endPt = new HiveEndPoint(metaStoreURI, "testErrors", "T", null);
- StreamingConnection connection = endPt.newConnection(false, agentInfo);
- DelimitedInputWriter innerWriter = new DelimitedInputWriter("a,b".split(","),",", endPt, connection);
- FaultyWriter writer = new FaultyWriter(innerWriter);
-
- TransactionBatch txnBatch = connection.fetchTransactionBatch(2, writer);
- txnBatch.close();
- txnBatch.heartbeat();//this is no-op on closed batch
- txnBatch.abort();//ditto
- GetOpenTxnsInfoResponse r = msClient.showTxns();
- Assert.assertEquals("HWM didn't match", 17, r.getTxn_high_water_mark());
- List ti = r.getOpen_txns();
- Assert.assertEquals("wrong status ti(0)", TxnState.ABORTED, ti.get(0).getState());
- Assert.assertEquals("wrong status ti(1)", TxnState.ABORTED, ti.get(1).getState());
-
- Exception expectedEx = null;
- try {
- txnBatch.beginNextTransaction();
- }
- catch(IllegalStateException ex) {
- expectedEx = ex;
- }
- Assert.assertTrue("beginNextTransaction() should have failed",
- expectedEx != null && expectedEx.getMessage().contains("has been closed()"));
- expectedEx = null;
- try {
- txnBatch.write("name0,1,Hello streaming".getBytes());
- }
- catch(IllegalStateException ex) {
- expectedEx = ex;
- }
- Assert.assertTrue("write() should have failed",
- expectedEx != null && expectedEx.getMessage().contains("has been closed()"));
- expectedEx = null;
- try {
- txnBatch.commit();
- }
- catch(IllegalStateException ex) {
- expectedEx = ex;
- }
- Assert.assertTrue("commit() should have failed",
- expectedEx != null && expectedEx.getMessage().contains("has been closed()"));
-
- txnBatch = connection.fetchTransactionBatch(2, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("name2,2,Welcome to streaming".getBytes());
- txnBatch.write("name4,2,more Streaming unlimited".getBytes());
- txnBatch.write("name5,2,even more Streaming unlimited".getBytes());
- txnBatch.commit();
-
- //test toString()
- String s = txnBatch.toString();
- Assert.assertTrue("Actual: " + s, s.contains("LastUsed " + JavaUtils.txnIdToString(txnBatch.getCurrentTxnId())));
- Assert.assertTrue("Actual: " + s, s.contains("TxnStatus[CO]"));
-
- expectedEx = null;
- txnBatch.beginNextTransaction();
- writer.enableErrors();
- try {
- txnBatch.write("name6,2,Doh!".getBytes());
- }
- catch(StreamingIOFailure ex) {
- expectedEx = ex;
- txnBatch.getCurrentTransactionState();
- txnBatch.getCurrentTxnId();//test it doesn't throw ArrayIndexOutOfBounds...
- }
- Assert.assertTrue("Wrong exception: " + (expectedEx != null ? expectedEx.getMessage() : "?"),
- expectedEx != null && expectedEx.getMessage().contains("Simulated fault occurred"));
- expectedEx = null;
- try {
- txnBatch.commit();
- }
- catch(IllegalStateException ex) {
- expectedEx = ex;
- }
- Assert.assertTrue("commit() should have failed",
- expectedEx != null && expectedEx.getMessage().contains("has been closed()"));
-
- //test toString()
- s = txnBatch.toString();
- Assert.assertTrue("Actual: " + s, s.contains("LastUsed " + JavaUtils.txnIdToString(txnBatch.getCurrentTxnId())));
- Assert.assertTrue("Actual: " + s, s.contains("TxnStatus[CA]"));
-
- r = msClient.showTxns();
- Assert.assertEquals("HWM didn't match", 19, r.getTxn_high_water_mark());
- ti = r.getOpen_txns();
- Assert.assertEquals("wrong status ti(0)", TxnState.ABORTED, ti.get(0).getState());
- Assert.assertEquals("wrong status ti(1)", TxnState.ABORTED, ti.get(1).getState());
- //txnid 3 was committed and thus not open
- Assert.assertEquals("wrong status ti(2)", TxnState.ABORTED, ti.get(2).getState());
-
- writer.disableErrors();
- txnBatch = connection.fetchTransactionBatch(2, writer);
- txnBatch.beginNextTransaction();
- txnBatch.write("name2,2,Welcome to streaming".getBytes());
- writer.enableErrors();
- expectedEx = null;
- try {
- txnBatch.commit();
- }
- catch(StreamingIOFailure ex) {
- expectedEx = ex;
- }
- Assert.assertTrue("Wrong exception: " + (expectedEx != null ? expectedEx.getMessage() : "?"),
- expectedEx != null && expectedEx.getMessage().contains("Simulated fault occurred"));
-
- r = msClient.showTxns();
- Assert.assertEquals("HWM didn't match", 21, r.getTxn_high_water_mark());
- ti = r.getOpen_txns();
- Assert.assertEquals("wrong status ti(3)", TxnState.ABORTED, ti.get(3).getState());
- Assert.assertEquals("wrong status ti(4)", TxnState.ABORTED, ti.get(4).getState());
-
- txnBatch.abort();
- }
-
- // assumes un partitioned table
- // returns a map >
- private HashMap> dumpAllBuckets(String dbLocation, String tableName)
- throws IOException {
- HashMap> result = new HashMap>();
-
- for (File deltaDir : new File(dbLocation + "/" + tableName).listFiles()) {
- if(!deltaDir.getName().startsWith("delta")) {
- continue;
- }
- File[] bucketFiles = deltaDir.listFiles(new FileFilter() {
- @Override
- public boolean accept(File pathname) {
- String name = pathname.getName();
- return !name.startsWith("_") && !name.startsWith(".");
- }
- });
- for (File bucketFile : bucketFiles) {
- if(bucketFile.toString().endsWith("length")) {
- continue;
- }
- Integer bucketNum = getBucketNumber(bucketFile);
- ArrayList recs = dumpBucket(new Path(bucketFile.toString()));
- result.put(bucketNum, recs);
- }
- }
- return result;
- }
-
- //assumes bucket_NNNNN format of file name
- private Integer getBucketNumber(File bucketFile) {
- String fname = bucketFile.getName();
- int start = fname.indexOf('_');
- String number = fname.substring(start+1, fname.length());
- return Integer.parseInt(number);
- }
-
- // delete db and all tables in it
- public static void dropDB(IMetaStoreClient client, String databaseName) {
- try {
- for (String table : client.listTableNamesByFilter(databaseName, "", (short)-1)) {
- client.dropTable(databaseName, table, true, true);
- }
- client.dropDatabase(databaseName);
- } catch (TException e) {
- }
-
- }
-
-
-
- ///////// -------- UTILS ------- /////////
- // returns Path of the partition created (if any) else Path of table
- private static Path createDbAndTable(IDriver driver, String databaseName,
- String tableName, List partVals,
- String[] colNames, String[] colTypes,
- String[] bucketCols,
- String[] partNames, String dbLocation, int bucketCount)
- throws Exception {
-
- String dbUri = "raw://" + new Path(dbLocation).toUri().toString();
- String tableLoc = dbUri + Path.SEPARATOR + tableName;
-
- runDDL(driver, "create database IF NOT EXISTS " + databaseName + " location '" + dbUri + "'");
- runDDL(driver, "use " + databaseName);
- String crtTbl = "create table " + tableName +
- " ( " + getTableColumnsStr(colNames,colTypes) + " )" +
- getPartitionStmtStr(partNames) +
- " clustered by ( " + join(bucketCols, ",") + " )" +
- " into " + bucketCount + " buckets " +
- " stored as orc " +
- " location '" + tableLoc + "'" +
- " TBLPROPERTIES ('transactional'='true') ";
- runDDL(driver, crtTbl);
- if(partNames!=null && partNames.length!=0) {
- return addPartition(driver, tableName, partVals, partNames);
- }
- return new Path(tableLoc);
- }
-
- private static Path addPartition(IDriver driver, String tableName, List partVals, String[] partNames)
- throws Exception {
- String partSpec = getPartsSpec(partNames, partVals);
- String addPart = "alter table " + tableName + " add partition ( " + partSpec + " )";
- runDDL(driver, addPart);
- return getPartitionPath(driver, tableName, partSpec);
- }
-
- private static Path getPartitionPath(IDriver driver, String tableName, String partSpec) throws Exception {
- ArrayList res = queryTable(driver, "describe extended " + tableName + " PARTITION (" + partSpec + ")");
- String partInfo = res.get(res.size() - 1);
- int start = partInfo.indexOf("location:") + "location:".length();
- int end = partInfo.indexOf(",",start);
- return new Path( partInfo.substring(start,end) );
- }
-
- private static String getTableColumnsStr(String[] colNames, String[] colTypes) {
- StringBuilder sb = new StringBuilder();
- for (int i=0; i < colNames.length; ++i) {
- sb.append(colNames[i]).append(" ").append(colTypes[i]);
- if (i partVals) {
- StringBuilder sb = new StringBuilder();
- for (int i=0; i < partVals.size(); ++i) {
- sb.append(partNames[i]).append(" = '").append(partVals.get(i)).append("'");
- if(i < partVals.size()-1) {
- sb.append(",");
- }
- }
- return sb.toString();
- }
-
- private static String join(String[] values, String delimiter) {
- if(values==null) {
- return null;
- }
- StringBuilder strbuf = new StringBuilder();
-
- boolean first = true;
-
- for (Object value : values) {
- if (!first) { strbuf.append(delimiter); } else { first = false; }
- strbuf.append(value.toString());
- }
-
- return strbuf.toString();
- }
- private static String getPartitionStmtStr(String[] partNames) {
- if ( partNames == null || partNames.length == 0) {
- return "";
- }
- return " partitioned by (" + getTablePartsStr(partNames) + " )";
- }
-
- private static boolean runDDL(IDriver driver, String sql) throws QueryFailedException {
- LOG.debug(sql);
- System.out.println(sql);
- //LOG.debug("Running Hive Query: "+ sql);
- CommandProcessorResponse cpr = driver.run(sql);
- if (cpr.getResponseCode() == 0) {
- return true;
- }
- LOG.error("Statement: " + sql + " failed: " + cpr);
- return false;
- }
-
-
- private static ArrayList queryTable(IDriver driver, String query) throws IOException {
- CommandProcessorResponse cpr = driver.run(query);
- if(cpr.getResponseCode() != 0) {
- throw new RuntimeException(query + " failed: " + cpr);
- }
- ArrayList res = new ArrayList();
- driver.getResults(res);
- return res;
- }
-
- private static class SampleRec {
- public String field1;
- public int field2;
- public String field3;
-
- public SampleRec(String field1, int field2, String field3) {
- this.field1 = field1;
- this.field2 = field2;
- this.field3 = field3;
- }
-
- @Override
- public boolean equals(Object o) {
- if (this == o) {
- return true;
- }
- if (o == null || getClass() != o.getClass()) {
- return false;
- }
-
- SampleRec that = (SampleRec) o;
-
- if (field2 != that.field2) {
- return false;
- }
- if (field1 != null ? !field1.equals(that.field1) : that.field1 != null) {
- return false;
- }
- return !(field3 != null ? !field3.equals(that.field3) : that.field3 != null);
-
- }
-
- @Override
- public int hashCode() {
- int result = field1 != null ? field1.hashCode() : 0;
- result = 31 * result + field2;
- result = 31 * result + (field3 != null ? field3.hashCode() : 0);
- return result;
- }
-
- @Override
- public String toString() {
- return " { " +
- "'" + field1 + '\'' +
- "," + field2 +
- ",'" + field3 + '\'' +
- " }";
- }
- }
- /**
- * This is test-only wrapper around the real RecordWriter.
- * It can simulate faults from lower levels to test error handling logic.
- */
- private static final class FaultyWriter implements RecordWriter {
- private final RecordWriter delegate;
- private boolean shouldThrow = false;
-
- private FaultyWriter(RecordWriter delegate) {
- assert delegate != null;
- this.delegate = delegate;
- }
- @Override
- public void write(long writeId, byte[] record) throws StreamingException {
- delegate.write(writeId, record);
- produceFault();
- }
- @Override
- public void flush() throws StreamingException {
- delegate.flush();
- produceFault();
- }
- @Override
- public void clear() throws StreamingException {
- delegate.clear();
- }
- @Override
- public void newBatch(Long minTxnId, Long maxTxnID) throws StreamingException {
- delegate.newBatch(minTxnId, maxTxnID);
- }
- @Override
- public void closeBatch() throws StreamingException {
- delegate.closeBatch();
- }
-
- /**
- * allows testing of "unexpected" errors
- * @throws StreamingIOFailure
- */
- private void produceFault() throws StreamingIOFailure {
- if(shouldThrow) {
- throw new StreamingIOFailure("Simulated fault occurred");
- }
- }
- void enableErrors() {
- shouldThrow = true;
- }
- void disableErrors() {
- shouldThrow = false;
- }
- }
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/ExampleUseCase.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/ExampleUseCase.java
deleted file mode 100644
index d38950e..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/ExampleUseCase.java
+++ /dev/null
@@ -1,99 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate;
-
-import java.util.List;
-
-import org.apache.hive.hcatalog.streaming.mutate.client.MutatorClient;
-import org.apache.hive.hcatalog.streaming.mutate.client.MutatorClientBuilder;
-import org.apache.hive.hcatalog.streaming.mutate.client.AcidTable;
-import org.apache.hive.hcatalog.streaming.mutate.client.Transaction;
-import org.apache.hive.hcatalog.streaming.mutate.worker.BucketIdResolver;
-import org.apache.hive.hcatalog.streaming.mutate.worker.MutatorCoordinator;
-import org.apache.hive.hcatalog.streaming.mutate.worker.MutatorCoordinatorBuilder;
-import org.apache.hive.hcatalog.streaming.mutate.worker.MutatorFactory;
-
-public class ExampleUseCase {
-
- private String metaStoreUri;
- private String databaseName;
- private String tableName;
- private boolean createPartitions = true;
- private List partitionValues1, partitionValues2, partitionValues3;
- private Object record1, record2, record3;
- private MutatorFactory mutatorFactory;
-
- /* This is an illustration, not a functioning example. */
- public void example() throws Exception {
- // CLIENT/TOOL END
- //
- // Singleton instance in the job client
-
- // Create a client to manage our transaction
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(databaseName, tableName, createPartitions)
- .metaStoreUri(metaStoreUri)
- .build();
-
- // Get the transaction
- Transaction transaction = client.newTransaction();
-
- // Get serializable details of the destination tables
- List tables = client.getTables();
-
- transaction.begin();
-
- // CLUSTER / WORKER END
- //
- // Job submitted to the cluster
- //
-
- BucketIdResolver bucketIdResolver = mutatorFactory.newBucketIdResolver(tables.get(0).getTotalBuckets());
- record1 = bucketIdResolver.attachBucketIdToRecord(record1);
-
- // --------------------------------------------------------------
- // DATA SHOULD GET SORTED BY YOUR ETL/MERGE PROCESS HERE
- //
- // Group the data by (partitionValues, ROW__ID.bucketId)
- // Order the groups by (ROW__ID.writeId, ROW__ID.rowId)
- // --------------------------------------------------------------
-
- // One of these runs at the output of each reducer
- //
- MutatorCoordinator coordinator = new MutatorCoordinatorBuilder()
- .metaStoreUri(metaStoreUri)
- .table(tables.get(0))
- .mutatorFactory(mutatorFactory)
- .build();
-
- coordinator.insert(partitionValues1, record1);
- coordinator.update(partitionValues2, record2);
- coordinator.delete(partitionValues3, record3);
-
- coordinator.close();
-
- // CLIENT/TOOL END
- //
- // The tasks have completed, control is back at the tool
-
- transaction.commit();
-
- client.close();
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/MutableRecord.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/MutableRecord.java
deleted file mode 100644
index 365d20c..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/MutableRecord.java
+++ /dev/null
@@ -1,50 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.hive.hcatalog.streaming.mutate;
-
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hadoop.io.Text;
-
-public class MutableRecord {
-
- // Column 0
- public final int id;
- // Column 1
- public final Text msg;
- // Column 2
- public RecordIdentifier rowId;
-
- public MutableRecord(int id, String msg, RecordIdentifier rowId) {
- this.id = id;
- this.msg = new Text(msg);
- this.rowId = rowId;
- }
-
- public MutableRecord(int id, String msg) {
- this.id = id;
- this.msg = new Text(msg);
- rowId = null;
- }
-
- @Override
- public String toString() {
- return "MutableRecord [id=" + id + ", msg=" + msg + ", rowId=" + rowId + "]";
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/ReflectiveMutatorFactory.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/ReflectiveMutatorFactory.java
deleted file mode 100644
index c05ddcf..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/ReflectiveMutatorFactory.java
+++ /dev/null
@@ -1,68 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate;
-
-import java.io.IOException;
-
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
-import org.apache.hive.hcatalog.streaming.mutate.worker.BucketIdResolver;
-import org.apache.hive.hcatalog.streaming.mutate.worker.BucketIdResolverImpl;
-import org.apache.hive.hcatalog.streaming.mutate.worker.Mutator;
-import org.apache.hive.hcatalog.streaming.mutate.worker.MutatorFactory;
-import org.apache.hive.hcatalog.streaming.mutate.worker.MutatorImpl;
-import org.apache.hive.hcatalog.streaming.mutate.worker.RecordInspector;
-import org.apache.hive.hcatalog.streaming.mutate.worker.RecordInspectorImpl;
-
-public class ReflectiveMutatorFactory implements MutatorFactory {
-
- private final int recordIdColumn;
- private final ObjectInspector objectInspector;
- private final Configuration configuration;
- private final int[] bucketColumnIndexes;
-
- public ReflectiveMutatorFactory(Configuration configuration, Class> recordClass, int recordIdColumn,
- int[] bucketColumnIndexes) {
- this.configuration = configuration;
- this.recordIdColumn = recordIdColumn;
- this.bucketColumnIndexes = bucketColumnIndexes;
- objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(recordClass,
- ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
- }
-
- @Override
- public Mutator newMutator(AcidOutputFormat, ?> outputFormat, long writeId, Path partitionPath, int bucketId)
- throws IOException {
- return new MutatorImpl(configuration, recordIdColumn, objectInspector, outputFormat, writeId, partitionPath,
- bucketId);
- }
-
- @Override
- public RecordInspector newRecordInspector() {
- return new RecordInspectorImpl(objectInspector, recordIdColumn);
- }
-
- @Override
- public BucketIdResolver newBucketIdResolver(int totalBuckets) {
- return new BucketIdResolverImpl(objectInspector, recordIdColumn, totalBuckets, bucketColumnIndexes);
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingAssert.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingAssert.java
deleted file mode 100644
index 0edf1cd..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingAssert.java
+++ /dev/null
@@ -1,211 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.hive.hcatalog.streaming.mutate;
-
-import static org.junit.Assert.assertEquals;
-
-import java.util.ArrayList;
-import java.util.Collections;
-import java.util.List;
-
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.common.ValidWriteIdList;
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
-import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
-import org.apache.hadoop.hive.ql.io.AcidInputFormat.AcidRecordReader;
-import org.apache.hadoop.hive.ql.io.AcidUtils;
-import org.apache.hadoop.hive.ql.io.IOConstants;
-import org.apache.hadoop.hive.ql.io.AcidUtils.Directory;
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
-import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
-import org.apache.hadoop.io.NullWritable;
-import org.apache.hadoop.mapred.InputFormat;
-import org.apache.hadoop.mapred.InputSplit;
-import org.apache.hadoop.mapred.JobConf;
-import org.apache.hadoop.mapred.Reporter;
-import org.apache.thrift.TException;
-
-public class StreamingAssert {
-
- public static class Factory {
- private IMetaStoreClient metaStoreClient;
- private final HiveConf conf;
-
- public Factory(IMetaStoreClient metaStoreClient, HiveConf conf) {
- this.metaStoreClient = metaStoreClient;
- this.conf = conf;
- }
-
- public StreamingAssert newStreamingAssert(Table table) throws Exception {
- return newStreamingAssert(table, Collections. emptyList());
- }
-
- public StreamingAssert newStreamingAssert(Table table, List partition) throws Exception {
- return new StreamingAssert(metaStoreClient, conf, table, partition);
- }
- }
-
- private Table table;
- private List partition;
- private IMetaStoreClient metaStoreClient;
- private Directory dir;
- private ValidWriteIdList writeIds;
- private List currentDeltas;
- private long min;
- private long max;
- private Path partitionLocation;
-
- StreamingAssert(IMetaStoreClient metaStoreClient, HiveConf conf, Table table, List partition)
- throws Exception {
- this.metaStoreClient = metaStoreClient;
- this.table = table;
- this.partition = partition;
-
- writeIds = metaStoreClient.getValidWriteIds(AcidUtils.getFullTableName(table.getDbName(), table.getTableName()));
- partitionLocation = getPartitionLocation();
- dir = AcidUtils.getAcidState(partitionLocation, conf, writeIds);
- assertEquals(0, dir.getObsolete().size());
- assertEquals(0, dir.getOriginalFiles().size());
-
- currentDeltas = dir.getCurrentDirectories();
- min = Long.MAX_VALUE;
- max = Long.MIN_VALUE;
- System.out.println("Files found: ");
- for (AcidUtils.ParsedDelta parsedDelta : currentDeltas) {
- System.out.println(parsedDelta.getPath().toString());
- max = Math.max(parsedDelta.getMaxWriteId(), max);
- min = Math.min(parsedDelta.getMinWriteId(), min);
- }
- }
-
- public void assertExpectedFileCount(int expectedFileCount) {
- assertEquals(expectedFileCount, currentDeltas.size());
- }
-
- public void assertNothingWritten() {
- assertExpectedFileCount(0);
- }
-
- public void assertMinWriteId(long expectedMinWriteId) {
- if (currentDeltas.isEmpty()) {
- throw new AssertionError("No data");
- }
- assertEquals(expectedMinWriteId, min);
- }
-
- public void assertMaxWriteId(long expectedMaxWriteId) {
- if (currentDeltas.isEmpty()) {
- throw new AssertionError("No data");
- }
- assertEquals(expectedMaxWriteId, max);
- }
-
- List readRecords() throws Exception {
- return readRecords(1);
- }
-
- /**
- * TODO: this would be more flexible doing a SQL select statement rather than using InputFormat directly
- * see {@link org.apache.hive.hcatalog.streaming.TestStreaming#checkDataWritten2(Path, long, long, int, String, String...)}
- * @param numSplitsExpected
- * @return
- * @throws Exception
- */
- List readRecords(int numSplitsExpected) throws Exception {
- if (currentDeltas.isEmpty()) {
- throw new AssertionError("No data");
- }
- InputFormat inputFormat = new OrcInputFormat();
- JobConf job = new JobConf();
- job.set("mapred.input.dir", partitionLocation.toString());
- job.set(hive_metastoreConstants.BUCKET_COUNT, Integer.toString(table.getSd().getNumBuckets()));
- job.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS, "id,msg");
- job.set(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES, "bigint:string");
- AcidUtils.setAcidOperationalProperties(job, true, null);
- job.setBoolean(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, true);
- job.set(ValidWriteIdList.VALID_WRITEIDS_KEY, writeIds.toString());
- InputSplit[] splits = inputFormat.getSplits(job, 1);
- assertEquals(numSplitsExpected, splits.length);
-
-
- List records = new ArrayList<>();
- for(InputSplit is : splits) {
- final AcidRecordReader recordReader = (AcidRecordReader) inputFormat
- .getRecordReader(is, job, Reporter.NULL);
-
- NullWritable key = recordReader.createKey();
- OrcStruct value = recordReader.createValue();
-
- while (recordReader.next(key, value)) {
- RecordIdentifier recordIdentifier = recordReader.getRecordIdentifier();
- Record record = new Record(new RecordIdentifier(recordIdentifier.getWriteId(),
- recordIdentifier.getBucketProperty(), recordIdentifier.getRowId()), value.toString());
- System.out.println(record);
- records.add(record);
- }
- recordReader.close();
- }
- return records;
- }
-
- private Path getPartitionLocation() throws NoSuchObjectException, MetaException, TException {
- Path partitionLocacation;
- if (partition.isEmpty()) {
- partitionLocacation = new Path(table.getSd().getLocation());
- } else {
- // TODO: calculate this instead. Just because we're writing to the location doesn't mean that it'll
- // always be wanted in the meta store right away.
- List partitionEntries = metaStoreClient.listPartitions(table.getDbName(), table.getTableName(),
- partition, (short) 1);
- partitionLocacation = new Path(partitionEntries.get(0).getSd().getLocation());
- }
- return partitionLocacation;
- }
-
- public static class Record {
- private RecordIdentifier recordIdentifier;
- private String row;
-
- Record(RecordIdentifier recordIdentifier, String row) {
- this.recordIdentifier = recordIdentifier;
- this.row = row;
- }
-
- public RecordIdentifier getRecordIdentifier() {
- return recordIdentifier;
- }
-
- public String getRow() {
- return row;
- }
-
- @Override
- public String toString() {
- return "Record [recordIdentifier=" + recordIdentifier + ", row=" + row + "]";
- }
-
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingTestUtils.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingTestUtils.java
deleted file mode 100644
index 63690f9..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingTestUtils.java
+++ /dev/null
@@ -1,283 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate;
-
-import java.io.File;
-import java.io.FileNotFoundException;
-import java.io.IOException;
-import java.net.URI;
-import java.net.URISyntaxException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
-
-import org.apache.hadoop.fs.FileStatus;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.fs.RawLocalFileSystem;
-import org.apache.hadoop.fs.permission.FsPermission;
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.TableType;
-import org.apache.hadoop.hive.metastore.Warehouse;
-import org.apache.hadoop.hive.metastore.api.Database;
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.metastore.api.SerDeInfo;
-import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.hadoop.hive.metastore.txn.TxnDbUtil;
-import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
-import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat;
-import org.apache.hadoop.hive.ql.io.orc.OrcSerde;
-import org.apache.hadoop.hive.serde.serdeConstants;
-import org.apache.thrift.TException;
-
-public class StreamingTestUtils {
-
- public HiveConf newHiveConf(String metaStoreUri) {
- HiveConf conf = new HiveConf(this.getClass());
- conf.set("fs.raw.impl", RawFileSystem.class.getName());
- if (metaStoreUri != null) {
- conf.setVar(HiveConf.ConfVars.METASTOREURIS, metaStoreUri);
- }
- conf.setBoolVar(HiveConf.ConfVars.METASTORE_EXECUTE_SET_UGI, true);
- conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, true);
- return conf;
- }
-
- public void prepareTransactionDatabase(HiveConf conf) throws Exception {
- TxnDbUtil.setConfValues(conf);
- TxnDbUtil.cleanDb(conf);
- TxnDbUtil.prepDb(conf);
- }
-
- public IMetaStoreClient newMetaStoreClient(HiveConf conf) throws Exception {
- return new HiveMetaStoreClient(conf);
- }
-
- public static class RawFileSystem extends RawLocalFileSystem {
- private static final URI NAME;
- static {
- try {
- NAME = new URI("raw:///");
- } catch (URISyntaxException se) {
- throw new IllegalArgumentException("bad uri", se);
- }
- }
-
- @Override
- public URI getUri() {
- return NAME;
- }
-
- @Override
- public FileStatus getFileStatus(Path path) throws IOException {
- File file = pathToFile(path);
- if (!file.exists()) {
- throw new FileNotFoundException("Can't find " + path);
- }
- // get close enough
- short mod = 0;
- if (file.canRead()) {
- mod |= 0444;
- }
- if (file.canWrite()) {
- mod |= 0200;
- }
- if (file.canExecute()) {
- mod |= 0111;
- }
- return new FileStatus(file.length(), file.isDirectory(), 1, 1024, file.lastModified(), file.lastModified(),
- FsPermission.createImmutable(mod), "owen", "users", path);
- }
- }
-
- public static DatabaseBuilder databaseBuilder(File warehouseFolder) {
- return new DatabaseBuilder(warehouseFolder);
- }
-
- public static class DatabaseBuilder {
-
- private Database database;
- private File warehouseFolder;
-
- public DatabaseBuilder(File warehouseFolder) {
- this.warehouseFolder = warehouseFolder;
- database = new Database();
- }
-
- public DatabaseBuilder name(String name) {
- database.setName(name);
- File databaseFolder = new File(warehouseFolder, name + ".db");
- String databaseLocation = "raw://" + databaseFolder.toURI().getPath();
- database.setLocationUri(databaseLocation);
- return this;
- }
-
- public Database dropAndCreate(IMetaStoreClient metaStoreClient) throws Exception {
- if (metaStoreClient == null) {
- throw new IllegalArgumentException();
- }
- try {
- for (String table : metaStoreClient.listTableNamesByFilter(database.getName(), "", (short) -1)) {
- metaStoreClient.dropTable(database.getName(), table, true, true);
- }
- metaStoreClient.dropDatabase(database.getName());
- } catch (TException e) {
- }
- metaStoreClient.createDatabase(database);
- return database;
- }
-
- public Database build() {
- return database;
- }
-
- }
-
- public static TableBuilder tableBuilder(Database database) {
- return new TableBuilder(database);
- }
-
- public static class TableBuilder {
-
- private Table table;
- private StorageDescriptor sd;
- private SerDeInfo serDeInfo;
- private Database database;
- private List> partitions;
- private List columnNames;
- private List columnTypes;
- private List partitionKeys;
-
- public TableBuilder(Database database) {
- this.database = database;
- partitions = new ArrayList<>();
- columnNames = new ArrayList<>();
- columnTypes = new ArrayList<>();
- partitionKeys = Collections.emptyList();
- table = new Table();
- table.setDbName(database.getName());
- table.setTableType(TableType.MANAGED_TABLE.toString());
- Map tableParams = new HashMap();
- tableParams.put("transactional", Boolean.TRUE.toString());
- table.setParameters(tableParams);
-
- sd = new StorageDescriptor();
- sd.setInputFormat(OrcInputFormat.class.getName());
- sd.setOutputFormat(OrcOutputFormat.class.getName());
- sd.setNumBuckets(1);
- table.setSd(sd);
-
- serDeInfo = new SerDeInfo();
- serDeInfo.setParameters(new HashMap());
- serDeInfo.getParameters().put(serdeConstants.SERIALIZATION_FORMAT, "1");
- serDeInfo.setSerializationLib(OrcSerde.class.getName());
- sd.setSerdeInfo(serDeInfo);
- }
-
- public TableBuilder name(String name) {
- sd.setLocation(database.getLocationUri() + Path.SEPARATOR + name);
- table.setTableName(name);
- serDeInfo.setName(name);
- return this;
- }
-
- public TableBuilder buckets(int buckets) {
- sd.setNumBuckets(buckets);
- return this;
- }
-
- public TableBuilder bucketCols(List columnNames) {
- sd.setBucketCols(columnNames);
- return this;
- }
-
- public TableBuilder addColumn(String columnName, String columnType) {
- columnNames.add(columnName);
- columnTypes.add(columnType);
- return this;
- }
-
- public TableBuilder partitionKeys(String... partitionKeys) {
- this.partitionKeys = Arrays.asList(partitionKeys);
- return this;
- }
-
- public TableBuilder addPartition(String... partitionValues) {
- partitions.add(Arrays.asList(partitionValues));
- return this;
- }
-
- public TableBuilder addPartition(List partitionValues) {
- partitions.add(partitionValues);
- return this;
- }
-
- public Table create(IMetaStoreClient metaStoreClient) throws Exception {
- if (metaStoreClient == null) {
- throw new IllegalArgumentException();
- }
- return internalCreate(metaStoreClient);
- }
-
- public Table build() throws Exception {
- return internalCreate(null);
- }
-
- private Table internalCreate(IMetaStoreClient metaStoreClient) throws Exception {
- List fields = new ArrayList(columnNames.size());
- for (int i = 0; i < columnNames.size(); i++) {
- fields.add(new FieldSchema(columnNames.get(i), columnTypes.get(i), ""));
- }
- sd.setCols(fields);
-
- if (!partitionKeys.isEmpty()) {
- List partitionFields = new ArrayList();
- for (String partitionKey : partitionKeys) {
- partitionFields.add(new FieldSchema(partitionKey, serdeConstants.STRING_TYPE_NAME, ""));
- }
- table.setPartitionKeys(partitionFields);
- }
- if (metaStoreClient != null) {
- metaStoreClient.createTable(table);
- }
-
- for (List partitionValues : partitions) {
- Partition partition = new Partition();
- partition.setDbName(database.getName());
- partition.setTableName(table.getTableName());
- StorageDescriptor partitionSd = new StorageDescriptor(table.getSd());
- partitionSd.setLocation(table.getSd().getLocation() + Path.SEPARATOR
- + Warehouse.makePartName(table.getPartitionKeys(), partitionValues));
- partition.setSd(partitionSd);
- partition.setValues(partitionValues);
-
- if (metaStoreClient != null) {
- metaStoreClient.add_partition(partition);
- }
- }
- return table;
- }
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/TestMutations.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/TestMutations.java
deleted file mode 100644
index 3d008e6..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/TestMutations.java
+++ /dev/null
@@ -1,566 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.hive.hcatalog.streaming.mutate;
-
-import static org.apache.hive.hcatalog.streaming.TransactionBatch.TxnState.ABORTED;
-import static org.apache.hive.hcatalog.streaming.TransactionBatch.TxnState.COMMITTED;
-import static org.apache.hive.hcatalog.streaming.mutate.StreamingTestUtils.databaseBuilder;
-import static org.apache.hive.hcatalog.streaming.mutate.StreamingTestUtils.tableBuilder;
-import static org.hamcrest.CoreMatchers.is;
-import static org.junit.Assert.assertThat;
-
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.List;
-
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.api.Database;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat;
-import org.apache.hadoop.hive.ql.io.BucketCodec;
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hive.hcatalog.streaming.TestStreaming;
-import org.apache.hive.hcatalog.streaming.mutate.StreamingAssert.Factory;
-import org.apache.hive.hcatalog.streaming.mutate.StreamingAssert.Record;
-import org.apache.hive.hcatalog.streaming.mutate.StreamingTestUtils.TableBuilder;
-import org.apache.hive.hcatalog.streaming.mutate.client.MutatorClient;
-import org.apache.hive.hcatalog.streaming.mutate.client.MutatorClientBuilder;
-import org.apache.hive.hcatalog.streaming.mutate.client.AcidTable;
-import org.apache.hive.hcatalog.streaming.mutate.client.Transaction;
-import org.apache.hive.hcatalog.streaming.mutate.worker.BucketIdResolver;
-import org.apache.hive.hcatalog.streaming.mutate.worker.MutatorCoordinator;
-import org.apache.hive.hcatalog.streaming.mutate.worker.MutatorCoordinatorBuilder;
-import org.apache.hive.hcatalog.streaming.mutate.worker.MutatorFactory;
-import org.junit.Before;
-import org.junit.Rule;
-import org.junit.Test;
-import org.junit.rules.TemporaryFolder;
-
-/**
- * This test is based on {@link TestStreaming} and has a similar core set of tests to ensure that basic transactional
- * behaviour is as expected in the {@link RecordMutator} line. This is complemented with a set of tests related to the
- * use of update and delete operations.
- */
-public class TestMutations {
-
- private static final List EUROPE_FRANCE = Arrays.asList("Europe", "France");
- private static final List EUROPE_UK = Arrays.asList("Europe", "UK");
- private static final List ASIA_INDIA = Arrays.asList("Asia", "India");
- // id
- private static final int[] BUCKET_COLUMN_INDEXES = new int[] { 0 };
- private static final int RECORD_ID_COLUMN = 2;
-
- @Rule
- public TemporaryFolder warehouseFolder = new TemporaryFolder();
-
- private StreamingTestUtils testUtils = new StreamingTestUtils();
- private HiveConf conf;
- private IMetaStoreClient metaStoreClient;
- private String metaStoreUri;
- private Database database;
- private TableBuilder partitionedTableBuilder;
- private TableBuilder unpartitionedTableBuilder;
- private Factory assertionFactory;
-
- public TestMutations() throws Exception {
- conf = testUtils.newHiveConf(metaStoreUri);
- testUtils.prepareTransactionDatabase(conf);
- metaStoreClient = testUtils.newMetaStoreClient(conf);
- assertionFactory = new StreamingAssert.Factory(metaStoreClient, conf);
- }
-
- @Before
- public void setup() throws Exception {
- database = databaseBuilder(warehouseFolder.getRoot()).name("testing").dropAndCreate(metaStoreClient);
-
- partitionedTableBuilder = tableBuilder(database)
- .name("partitioned")
- .addColumn("id", "int")
- .addColumn("msg", "string")
- .partitionKeys("continent", "country")
- .bucketCols(Collections.singletonList("string"));
-
- unpartitionedTableBuilder = tableBuilder(database)
- .name("unpartitioned")
- .addColumn("id", "int")
- .addColumn("msg", "string")
- .bucketCols(Collections.singletonList("string"));
- }
- private static int encodeBucket(int bucketId) {
- return BucketCodec.V1.encode(
- new AcidOutputFormat.Options(null).bucket(bucketId));
- }
-
- @Test
- public void testTransactionBatchEmptyCommitPartitioned() throws Exception {
- Table table = partitionedTableBuilder.addPartition(ASIA_INDIA).create(metaStoreClient);
-
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), true)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction transaction = client.newTransaction();
-
- transaction.begin();
-
- transaction.commit();
- assertThat(transaction.getState(), is(COMMITTED));
- client.close();
- }
-
- @Test
- public void testTransactionBatchEmptyCommitUnpartitioned() throws Exception {
- Table table = unpartitionedTableBuilder.create(metaStoreClient);
-
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), false)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction transaction = client.newTransaction();
-
- transaction.begin();
-
- transaction.commit();
- assertThat(transaction.getState(), is(COMMITTED));
- client.close();
- }
-
- @Test
- public void testTransactionBatchEmptyAbortPartitioned() throws Exception {
- Table table = partitionedTableBuilder.addPartition(ASIA_INDIA).create(metaStoreClient);
-
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), true)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction transaction = client.newTransaction();
-
- List destinations = client.getTables();
-
- transaction.begin();
-
- MutatorFactory mutatorFactory = new ReflectiveMutatorFactory(conf, MutableRecord.class, RECORD_ID_COLUMN,
- BUCKET_COLUMN_INDEXES);
- MutatorCoordinator coordinator = new MutatorCoordinatorBuilder()
- .metaStoreUri(metaStoreUri)
- .table(destinations.get(0))
- .mutatorFactory(mutatorFactory)
- .build();
-
- coordinator.close();
-
- transaction.abort();
- assertThat(transaction.getState(), is(ABORTED));
- client.close();
- }
-
- @Test
- public void testTransactionBatchEmptyAbortUnartitioned() throws Exception {
- Table table = unpartitionedTableBuilder.create(metaStoreClient);
-
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), false)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction transaction = client.newTransaction();
-
- List destinations = client.getTables();
-
- transaction.begin();
-
- MutatorFactory mutatorFactory = new ReflectiveMutatorFactory(conf, MutableRecord.class, RECORD_ID_COLUMN,
- BUCKET_COLUMN_INDEXES);
- MutatorCoordinator coordinator = new MutatorCoordinatorBuilder()
- .metaStoreUri(metaStoreUri)
- .table(destinations.get(0))
- .mutatorFactory(mutatorFactory)
- .build();
-
- coordinator.close();
-
- transaction.abort();
- assertThat(transaction.getState(), is(ABORTED));
- client.close();
- }
-
- @Test
- public void testTransactionBatchCommitPartitioned() throws Exception {
- Table table = partitionedTableBuilder.addPartition(ASIA_INDIA).create(metaStoreClient);
-
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), true)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction transaction = client.newTransaction();
-
- List destinations = client.getTables();
-
- transaction.begin();
-
- MutatorFactory mutatorFactory = new ReflectiveMutatorFactory(conf, MutableRecord.class, RECORD_ID_COLUMN,
- BUCKET_COLUMN_INDEXES);
- MutatorCoordinator coordinator = new MutatorCoordinatorBuilder()
- .metaStoreUri(metaStoreUri)
- .table(destinations.get(0))
- .mutatorFactory(mutatorFactory)
- .build();
-
- BucketIdResolver bucketIdAppender = mutatorFactory.newBucketIdResolver(destinations.get(0).getTotalBuckets());
- MutableRecord record = (MutableRecord) bucketIdAppender.attachBucketIdToRecord(new MutableRecord(1,
- "Hello streaming"));
- coordinator.insert(ASIA_INDIA, record);
- coordinator.close();
-
- transaction.commit();
-
- StreamingAssert streamingAssertions = assertionFactory.newStreamingAssert(table, ASIA_INDIA);
- streamingAssertions.assertMinWriteId(1L);
- streamingAssertions.assertMaxWriteId(1L);
- streamingAssertions.assertExpectedFileCount(1);
-
- List readRecords = streamingAssertions.readRecords();
- assertThat(readRecords.size(), is(1));
- assertThat(readRecords.get(0).getRow(), is("{1, Hello streaming}"));
- assertThat(readRecords.get(0).getRecordIdentifier(), is(new RecordIdentifier(1L,
- encodeBucket(0), 0L)));
-
- assertThat(transaction.getState(), is(COMMITTED));
- client.close();
- }
-
- @Test
- public void testMulti() throws Exception {
- Table table = partitionedTableBuilder.addPartition(ASIA_INDIA).create(metaStoreClient);
-
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), true)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction transaction = client.newTransaction();
-
- List destinations = client.getTables();
-
- transaction.begin();
-
- MutatorFactory mutatorFactory = new ReflectiveMutatorFactory(conf, MutableRecord.class, RECORD_ID_COLUMN,
- BUCKET_COLUMN_INDEXES);
- MutatorCoordinator coordinator = new MutatorCoordinatorBuilder()
- .metaStoreUri(metaStoreUri)
- .table(destinations.get(0))
- .mutatorFactory(mutatorFactory)
- .build();
-
- BucketIdResolver bucketIdResolver = mutatorFactory.newBucketIdResolver(destinations.get(0).getTotalBuckets());
- MutableRecord asiaIndiaRecord1 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(1,
- "Hello streaming"));
- MutableRecord europeUkRecord1 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(2,
- "Hello streaming"));
- MutableRecord europeFranceRecord1 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(3,
- "Hello streaming"));
- MutableRecord europeFranceRecord2 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(4,
- "Bonjour streaming"));
-
- coordinator.insert(ASIA_INDIA, asiaIndiaRecord1);
- coordinator.insert(EUROPE_UK, europeUkRecord1);
- coordinator.insert(EUROPE_FRANCE, europeFranceRecord1);
- coordinator.insert(EUROPE_FRANCE, europeFranceRecord2);
- coordinator.close();
-
- transaction.commit();
-
- // ASIA_INDIA
- StreamingAssert streamingAssertions = assertionFactory.newStreamingAssert(table, ASIA_INDIA);
- streamingAssertions.assertMinWriteId(1L);
- streamingAssertions.assertMaxWriteId(1L);
- streamingAssertions.assertExpectedFileCount(1);
-
- List readRecords = streamingAssertions.readRecords();
- assertThat(readRecords.size(), is(1));
- assertThat(readRecords.get(0).getRow(), is("{1, Hello streaming}"));
- assertThat(readRecords.get(0).getRecordIdentifier(), is(new RecordIdentifier(1L,
- encodeBucket(0), 0L)));
-
- // EUROPE_UK
- streamingAssertions = assertionFactory.newStreamingAssert(table, EUROPE_UK);
- streamingAssertions.assertMinWriteId(1L);
- streamingAssertions.assertMaxWriteId(1L);
- streamingAssertions.assertExpectedFileCount(1);
-
- readRecords = streamingAssertions.readRecords();
- assertThat(readRecords.size(), is(1));
- assertThat(readRecords.get(0).getRow(), is("{2, Hello streaming}"));
- assertThat(readRecords.get(0).getRecordIdentifier(), is(new RecordIdentifier(1L,
- encodeBucket(0), 0L)));
-
- // EUROPE_FRANCE
- streamingAssertions = assertionFactory.newStreamingAssert(table, EUROPE_FRANCE);
- streamingAssertions.assertMinWriteId(1L);
- streamingAssertions.assertMaxWriteId(1L);
- streamingAssertions.assertExpectedFileCount(1);
-
- readRecords = streamingAssertions.readRecords();
- assertThat(readRecords.size(), is(2));
- assertThat(readRecords.get(0).getRow(), is("{3, Hello streaming}"));
- assertThat(readRecords.get(0).getRecordIdentifier(), is(new RecordIdentifier(1L,
- encodeBucket(0), 0L)));
- assertThat(readRecords.get(1).getRow(), is("{4, Bonjour streaming}"));
- assertThat(readRecords.get(1).getRecordIdentifier(), is(new RecordIdentifier(1L,
- encodeBucket(0), 1L)));
-
- client.close();
- }
-
- @Test
- public void testTransactionBatchCommitUnpartitioned() throws Exception {
- Table table = unpartitionedTableBuilder.create(metaStoreClient);
-
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), false)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction transaction = client.newTransaction();
-
- List destinations = client.getTables();
-
- transaction.begin();
-
- MutatorFactory mutatorFactory = new ReflectiveMutatorFactory(conf, MutableRecord.class, RECORD_ID_COLUMN,
- BUCKET_COLUMN_INDEXES);
- MutatorCoordinator coordinator = new MutatorCoordinatorBuilder()
- .metaStoreUri(metaStoreUri)
- .table(destinations.get(0))
- .mutatorFactory(mutatorFactory)
- .build();
-
- BucketIdResolver bucketIdResolver = mutatorFactory.newBucketIdResolver(destinations.get(0).getTotalBuckets());
- MutableRecord record = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(1,
- "Hello streaming"));
-
- coordinator.insert(Collections. emptyList(), record);
- coordinator.close();
-
- transaction.commit();
-
- StreamingAssert streamingAssertions = assertionFactory.newStreamingAssert(table);
- streamingAssertions.assertMinWriteId(1L);
- streamingAssertions.assertMaxWriteId(1L);
- streamingAssertions.assertExpectedFileCount(1);
-
- List readRecords = streamingAssertions.readRecords();
- assertThat(readRecords.size(), is(1));
- assertThat(readRecords.get(0).getRow(), is("{1, Hello streaming}"));
- assertThat(readRecords.get(0).getRecordIdentifier(), is(new RecordIdentifier(1L,
- encodeBucket(0), 0L)));
-
- assertThat(transaction.getState(), is(COMMITTED));
- client.close();
- }
-
- @Test
- public void testTransactionBatchAbort() throws Exception {
- Table table = partitionedTableBuilder.addPartition(ASIA_INDIA).create(metaStoreClient);
-
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), true)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction transaction = client.newTransaction();
-
- List destinations = client.getTables();
-
- transaction.begin();
-
- MutatorFactory mutatorFactory = new ReflectiveMutatorFactory(conf, MutableRecord.class, RECORD_ID_COLUMN,
- BUCKET_COLUMN_INDEXES);
- MutatorCoordinator coordinator = new MutatorCoordinatorBuilder()
- .metaStoreUri(metaStoreUri)
- .table(destinations.get(0))
- .mutatorFactory(mutatorFactory)
- .build();
-
- BucketIdResolver bucketIdResolver = mutatorFactory.newBucketIdResolver(destinations.get(0).getTotalBuckets());
- MutableRecord record1 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(1,
- "Hello streaming"));
- MutableRecord record2 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(2,
- "Welcome to streaming"));
-
- coordinator.insert(ASIA_INDIA, record1);
- coordinator.insert(ASIA_INDIA, record2);
- coordinator.close();
-
- transaction.abort();
-
- assertThat(transaction.getState(), is(ABORTED));
-
- client.close();
-
- StreamingAssert streamingAssertions = assertionFactory.newStreamingAssert(table, ASIA_INDIA);
- streamingAssertions.assertNothingWritten();
- }
-
- @Test
- public void testUpdatesAndDeletes() throws Exception {
- // Set up some base data then stream some inserts/updates/deletes to a number of partitions
- MutatorFactory mutatorFactory = new ReflectiveMutatorFactory(conf, MutableRecord.class, RECORD_ID_COLUMN,
- BUCKET_COLUMN_INDEXES);
-
- // INSERT DATA
- //
- Table table = partitionedTableBuilder.addPartition(ASIA_INDIA).addPartition(EUROPE_FRANCE).create(metaStoreClient);
-
- MutatorClient client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), true)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction insertTransaction = client.newTransaction();
-
- List destinations = client.getTables();
-
- insertTransaction.begin();
-
- MutatorCoordinator insertCoordinator = new MutatorCoordinatorBuilder()
- .metaStoreUri(metaStoreUri)
- .table(destinations.get(0))
- .mutatorFactory(mutatorFactory)
- .build();
-
- BucketIdResolver bucketIdResolver = mutatorFactory.newBucketIdResolver(destinations.get(0).getTotalBuckets());
- MutableRecord asiaIndiaRecord1 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(1,
- "Namaste streaming 1"));
- MutableRecord asiaIndiaRecord2 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(2,
- "Namaste streaming 2"));
- MutableRecord europeUkRecord1 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(3,
- "Hello streaming 1"));
- MutableRecord europeUkRecord2 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(4,
- "Hello streaming 2"));
- MutableRecord europeFranceRecord1 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(5,
- "Bonjour streaming 1"));
- MutableRecord europeFranceRecord2 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(6,
- "Bonjour streaming 2"));
-
- insertCoordinator.insert(ASIA_INDIA, asiaIndiaRecord1);
- insertCoordinator.insert(ASIA_INDIA, asiaIndiaRecord2);
- insertCoordinator.insert(EUROPE_UK, europeUkRecord1);
- insertCoordinator.insert(EUROPE_UK, europeUkRecord2);
- insertCoordinator.insert(EUROPE_FRANCE, europeFranceRecord1);
- insertCoordinator.insert(EUROPE_FRANCE, europeFranceRecord2);
- insertCoordinator.close();
-
- insertTransaction.commit();
-
- assertThat(insertTransaction.getState(), is(COMMITTED));
- client.close();
-
- // MUTATE DATA
- //
- client = new MutatorClientBuilder()
- .addSinkTable(table.getDbName(), table.getTableName(), true)
- .metaStoreUri(metaStoreUri)
- .build();
- client.connect();
-
- Transaction mutateTransaction = client.newTransaction();
-
- destinations = client.getTables();
-
- mutateTransaction.begin();
-
- MutatorCoordinator mutateCoordinator = new MutatorCoordinatorBuilder()
- .metaStoreUri(metaStoreUri)
- .table(destinations.get(0))
- .mutatorFactory(mutatorFactory)
- .build();
-
- bucketIdResolver = mutatorFactory.newBucketIdResolver(destinations.get(0).getTotalBuckets());
- MutableRecord asiaIndiaRecord3 = (MutableRecord) bucketIdResolver.attachBucketIdToRecord(new MutableRecord(20,
- "Namaste streaming 3"));
-
- mutateCoordinator.update(ASIA_INDIA, new MutableRecord(2, "UPDATED: Namaste streaming 2", new RecordIdentifier(1L,
- encodeBucket(0), 1L)));
- mutateCoordinator.insert(ASIA_INDIA, asiaIndiaRecord3);
- mutateCoordinator.delete(EUROPE_UK, new MutableRecord(3, "Hello streaming 1", new RecordIdentifier(1L,
- encodeBucket(0), 0L)));
- mutateCoordinator.delete(EUROPE_FRANCE,
- new MutableRecord(5, "Bonjour streaming 1", new RecordIdentifier(1L,
- encodeBucket(0), 0L)));
- mutateCoordinator.update(EUROPE_FRANCE, new MutableRecord(6, "UPDATED: Bonjour streaming 2", new RecordIdentifier(
- 1L, encodeBucket(0), 1L)));
- mutateCoordinator.close();
-
- mutateTransaction.commit();
-
- assertThat(mutateTransaction.getState(), is(COMMITTED));
-
- StreamingAssert indiaAssertions = assertionFactory.newStreamingAssert(table, ASIA_INDIA);
- indiaAssertions.assertMinWriteId(1L);
- indiaAssertions.assertMaxWriteId(2L);
- List indiaRecords = indiaAssertions.readRecords(2);
- assertThat(indiaRecords.size(), is(3));
- assertThat(indiaRecords.get(0).getRow(), is("{1, Namaste streaming 1}"));
- assertThat(indiaRecords.get(0).getRecordIdentifier(), is(new RecordIdentifier(1L,
- encodeBucket(0), 0L)));
- assertThat(indiaRecords.get(1).getRow(), is("{2, UPDATED: Namaste streaming 2}"));
- assertThat(indiaRecords.get(1).getRecordIdentifier(), is(new RecordIdentifier(2L,
- encodeBucket(0), 0L)));//with split update, new version of the row is a new insert
- assertThat(indiaRecords.get(2).getRow(), is("{20, Namaste streaming 3}"));
- assertThat(indiaRecords.get(2).getRecordIdentifier(), is(new RecordIdentifier(2L,
- encodeBucket(0), 1L)));
-
- StreamingAssert ukAssertions = assertionFactory.newStreamingAssert(table, EUROPE_UK);
- ukAssertions.assertMinWriteId(1L);
- ukAssertions.assertMaxWriteId(2L);
- //1 split since mutateTransaction txn just does deletes
- List ukRecords = ukAssertions.readRecords(1);
- assertThat(ukRecords.size(), is(1));
- assertThat(ukRecords.get(0).getRow(), is("{4, Hello streaming 2}"));
- assertThat(ukRecords.get(0).getRecordIdentifier(), is(new RecordIdentifier(1L,
- encodeBucket(0), 1L)));
-
- StreamingAssert franceAssertions = assertionFactory.newStreamingAssert(table, EUROPE_FRANCE);
- franceAssertions.assertMinWriteId(1L);
- franceAssertions.assertMaxWriteId(2L);
- List franceRecords = franceAssertions.readRecords(2);
- assertThat(franceRecords.size(), is(1));
- assertThat(franceRecords.get(0).getRow(), is("{6, UPDATED: Bonjour streaming 2}"));
- assertThat(franceRecords.get(0).getRecordIdentifier(), is(new RecordIdentifier(2L,
- encodeBucket(0), 0L)));//with split update, new version of the row is a new insert
-
- client.close();
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/TestAcidTableSerializer.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/TestAcidTableSerializer.java
deleted file mode 100644
index 1523a10..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/TestAcidTableSerializer.java
+++ /dev/null
@@ -1,83 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.client;
-
-import static org.hamcrest.CoreMatchers.is;
-import static org.hamcrest.CoreMatchers.nullValue;
-import static org.junit.Assert.assertThat;
-
-import java.io.File;
-
-import org.apache.hadoop.hive.metastore.api.Database;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.hive.hcatalog.streaming.mutate.StreamingTestUtils;
-import org.junit.Test;
-
-public class TestAcidTableSerializer {
-
- @Test
- public void testSerializeDeserialize() throws Exception {
- Database database = StreamingTestUtils.databaseBuilder(new File("/tmp")).name("db_1").build();
- Table table = StreamingTestUtils
- .tableBuilder(database)
- .name("table_1")
- .addColumn("one", "string")
- .addColumn("two", "integer")
- .partitionKeys("partition")
- .addPartition("p1")
- .buckets(10)
- .build();
-
- AcidTable acidTable = new AcidTable("db_1", "table_1", true, TableType.SINK);
- acidTable.setTable(table);
- acidTable.setWriteId(42L);
-
- String encoded = AcidTableSerializer.encode(acidTable);
- System.out.println(encoded);
- AcidTable decoded = AcidTableSerializer.decode(encoded);
-
- assertThat(decoded.getDatabaseName(), is("db_1"));
- assertThat(decoded.getTableName(), is("table_1"));
- assertThat(decoded.createPartitions(), is(true));
- assertThat(decoded.getOutputFormatName(), is("org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"));
- assertThat(decoded.getTotalBuckets(), is(10));
- assertThat(decoded.getQualifiedName(), is("DB_1.TABLE_1"));
- assertThat(decoded.getWriteId(), is(42L));
- assertThat(decoded.getTableType(), is(TableType.SINK));
- assertThat(decoded.getTable(), is(table));
- }
-
- @Test
- public void testSerializeDeserializeNoTableNoTransaction() throws Exception {
- AcidTable acidTable = new AcidTable("db_1", "table_1", true, TableType.SINK);
-
- String encoded = AcidTableSerializer.encode(acidTable);
- AcidTable decoded = AcidTableSerializer.decode(encoded);
-
- assertThat(decoded.getDatabaseName(), is("db_1"));
- assertThat(decoded.getTableName(), is("table_1"));
- assertThat(decoded.createPartitions(), is(true));
- assertThat(decoded.getOutputFormatName(), is(nullValue()));
- assertThat(decoded.getTotalBuckets(), is(0));
- assertThat(decoded.getQualifiedName(), is("DB_1.TABLE_1"));
- assertThat(decoded.getWriteId(), is(0L));
- assertThat(decoded.getTableType(), is(TableType.SINK));
- assertThat(decoded.getTable(), is(nullValue()));
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/TestMutatorClient.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/TestMutatorClient.java
deleted file mode 100644
index 91b90ed..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/TestMutatorClient.java
+++ /dev/null
@@ -1,197 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.client;
-
-import static org.hamcrest.CoreMatchers.is;
-import static org.junit.Assert.assertThat;
-import static org.junit.Assert.fail;
-import static org.mockito.Matchers.anyString;
-import static org.mockito.Mockito.verify;
-import static org.mockito.Mockito.when;
-
-import java.util.ArrayList;
-import java.util.Collections;
-import java.util.List;
-import java.util.Map;
-
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat;
-import org.apache.hive.hcatalog.streaming.TransactionBatch.TxnState;
-import org.apache.hive.hcatalog.streaming.mutate.client.lock.Lock;
-import org.apache.hive.hcatalog.streaming.mutate.client.lock.LockFailureListener;
-import org.apache.thrift.TException;
-import org.junit.Before;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.mockito.Mock;
-import org.mockito.runners.MockitoJUnitRunner;
-
-@RunWith(MockitoJUnitRunner.class)
-public class TestMutatorClient {
-
- private static final long TRANSACTION_ID = 42L;
- private static final long WRITE_ID1 = 78L;
- private static final long WRITE_ID2 = 33L;
- private static final String TABLE_NAME_1 = "TABLE_1";
- private static final String TABLE_NAME_2 = "TABLE_2";
- private static final String DB_NAME = "DB_1";
- private static final String USER = "user";
- private static final AcidTable TABLE_1 = new AcidTable(DB_NAME, TABLE_NAME_1, true, TableType.SINK);
- private static final AcidTable TABLE_2 = new AcidTable(DB_NAME, TABLE_NAME_2, true, TableType.SINK);
-
- @Mock
- private IMetaStoreClient mockMetaStoreClient;
- @Mock
- private Lock mockLock;
- @Mock
- private Table mockTable1, mockTable2;
- @Mock
- private StorageDescriptor mockSd;
- @Mock
- private Map mockParameters;
- @Mock
- private HiveConf mockConfiguration;
- @Mock
- private LockFailureListener mockLockFailureListener;
-
- private MutatorClient client;
-
- @Before
- public void configureMocks() throws Exception {
- when(mockMetaStoreClient.getTable(DB_NAME, TABLE_NAME_1)).thenReturn(mockTable1);
- when(mockTable1.getDbName()).thenReturn(DB_NAME);
- when(mockTable1.getTableName()).thenReturn(TABLE_NAME_1);
- when(mockTable1.getSd()).thenReturn(mockSd);
- when(mockTable1.getParameters()).thenReturn(mockParameters);
- when(mockMetaStoreClient.getTable(DB_NAME, TABLE_NAME_2)).thenReturn(mockTable2);
- when(mockTable2.getDbName()).thenReturn(DB_NAME);
- when(mockTable2.getTableName()).thenReturn(TABLE_NAME_2);
- when(mockTable2.getSd()).thenReturn(mockSd);
- when(mockTable2.getParameters()).thenReturn(mockParameters);
- when(mockSd.getNumBuckets()).thenReturn(1, 2);
- when(mockSd.getOutputFormat()).thenReturn(OrcOutputFormat.class.getName());
- when(mockParameters.get("transactional")).thenReturn(Boolean.TRUE.toString());
-
- when(mockMetaStoreClient.openTxn(USER)).thenReturn(TRANSACTION_ID);
- when(mockMetaStoreClient.allocateTableWriteId(TRANSACTION_ID, DB_NAME, TABLE_NAME_1)).thenReturn(WRITE_ID1);
- when(mockMetaStoreClient.allocateTableWriteId(TRANSACTION_ID, DB_NAME, TABLE_NAME_2)).thenReturn(WRITE_ID2);
-
- client = new MutatorClient(mockMetaStoreClient, mockConfiguration, mockLockFailureListener, USER,
- Collections.singletonList(TABLE_1));
- }
-
- @Test
- public void testCheckValidTableConnect() throws Exception {
- List inTables = new ArrayList<>();
- inTables.add(TABLE_1);
- inTables.add(TABLE_2);
- client = new MutatorClient(mockMetaStoreClient, mockConfiguration, mockLockFailureListener, USER, inTables);
-
- client.connect();
- List outTables = client.getTables();
-
- assertThat(client.isConnected(), is(true));
- assertThat(outTables.size(), is(2));
- assertThat(outTables.get(0).getDatabaseName(), is(DB_NAME));
- assertThat(outTables.get(0).getTableName(), is(TABLE_NAME_1));
- assertThat(outTables.get(0).getTotalBuckets(), is(2));
- assertThat(outTables.get(0).getOutputFormatName(), is(OrcOutputFormat.class.getName()));
- assertThat(outTables.get(0).getWriteId(), is(0L));
- assertThat(outTables.get(0).getTable(), is(mockTable1));
- assertThat(outTables.get(1).getDatabaseName(), is(DB_NAME));
- assertThat(outTables.get(1).getTableName(), is(TABLE_NAME_2));
- assertThat(outTables.get(1).getTotalBuckets(), is(2));
- assertThat(outTables.get(1).getOutputFormatName(), is(OrcOutputFormat.class.getName()));
- assertThat(outTables.get(1).getWriteId(), is(0L));
- assertThat(outTables.get(1).getTable(), is(mockTable2));
- }
-
- @Test
- public void testCheckNonTransactionalTableConnect() throws Exception {
- when(mockParameters.get("transactional")).thenReturn(Boolean.FALSE.toString());
-
- try {
- client.connect();
- fail();
- } catch (ConnectionException e) {
- }
-
- assertThat(client.isConnected(), is(false));
- }
-
- @Test
- public void testCheckUnBucketedTableConnect() throws Exception {
- when(mockSd.getNumBuckets()).thenReturn(0);
-
- try {
- client.connect();
- fail();
- } catch (ConnectionException e) {
- }
-
- assertThat(client.isConnected(), is(false));
- }
-
- @Test
- public void testMetaStoreFailsOnConnect() throws Exception {
- when(mockMetaStoreClient.getTable(anyString(), anyString())).thenThrow(new TException());
-
- try {
- client.connect();
- fail();
- } catch (ConnectionException e) {
- }
-
- assertThat(client.isConnected(), is(false));
- }
-
- @Test(expected = ConnectionException.class)
- public void testGetDestinationsFailsIfNotConnected() throws Exception {
- client.getTables();
- }
-
- @Test
- public void testNewTransaction() throws Exception {
- List inTables = new ArrayList<>();
- inTables.add(TABLE_1);
- inTables.add(TABLE_2);
- client = new MutatorClient(mockMetaStoreClient, mockConfiguration, mockLockFailureListener, USER, inTables);
-
- client.connect();
- Transaction transaction = client.newTransaction();
- List outTables = client.getTables();
-
- assertThat(client.isConnected(), is(true));
-
- assertThat(transaction.getTransactionId(), is(TRANSACTION_ID));
- assertThat(transaction.getState(), is(TxnState.INACTIVE));
- assertThat(outTables.get(0).getWriteId(), is(WRITE_ID1));
- assertThat(outTables.get(1).getWriteId(), is(WRITE_ID2));
- }
-
- @Test
- public void testCloseClosesClient() throws Exception {
- client.close();
- assertThat(client.isConnected(), is(false));
- verify(mockMetaStoreClient).close();
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/TestTransaction.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/TestTransaction.java
deleted file mode 100644
index c47cf4d..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/TestTransaction.java
+++ /dev/null
@@ -1,112 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.client;
-
-import static org.hamcrest.CoreMatchers.is;
-import static org.junit.Assert.assertThat;
-import static org.mockito.Mockito.doThrow;
-import static org.mockito.Mockito.verify;
-import static org.mockito.Mockito.when;
-
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hive.hcatalog.streaming.TransactionBatch;
-import org.apache.hive.hcatalog.streaming.mutate.client.lock.Lock;
-import org.apache.hive.hcatalog.streaming.mutate.client.lock.LockException;
-import org.junit.Before;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.mockito.Mock;
-import org.mockito.runners.MockitoJUnitRunner;
-
-@RunWith(MockitoJUnitRunner.class)
-public class TestTransaction {
-
- private static final String USER = "user";
- private static final long TRANSACTION_ID = 10L;
-
- @Mock
- private Lock mockLock;
- @Mock
- private IMetaStoreClient mockMetaStoreClient;
-
- private Transaction transaction;
-
- @Before
- public void createTransaction() throws Exception {
- when(mockLock.getUser()).thenReturn(USER);
- when(mockMetaStoreClient.openTxn(USER)).thenReturn(TRANSACTION_ID);
- transaction = new Transaction(mockMetaStoreClient, mockLock);
- }
-
- @Test
- public void testInitialState() {
- assertThat(transaction.getState(), is(TransactionBatch.TxnState.INACTIVE));
- assertThat(transaction.getTransactionId(), is(TRANSACTION_ID));
- }
-
- @Test
- public void testBegin() throws Exception {
- transaction.begin();
-
- verify(mockLock).acquire(TRANSACTION_ID);
- assertThat(transaction.getState(), is(TransactionBatch.TxnState.OPEN));
- }
-
- @Test
- public void testBeginLockFails() throws Exception {
- doThrow(new LockException("")).when(mockLock).acquire(TRANSACTION_ID);
-
- try {
- transaction.begin();
- } catch (TransactionException ignore) {
- }
-
- assertThat(transaction.getState(), is(TransactionBatch.TxnState.INACTIVE));
- }
-
- @Test
- public void testCommit() throws Exception {
- transaction.commit();
-
- verify(mockLock).release();
- verify(mockMetaStoreClient).commitTxn(TRANSACTION_ID);
- assertThat(transaction.getState(), is(TransactionBatch.TxnState.COMMITTED));
- }
-
- @Test(expected = TransactionException.class)
- public void testCommitLockFails() throws Exception {
- doThrow(new LockException("")).when(mockLock).release();
- transaction.commit();
- }
-
- @Test
- public void testAbort() throws Exception {
- transaction.abort();
-
- verify(mockLock).release();
- verify(mockMetaStoreClient).rollbackTxn(TRANSACTION_ID);
- assertThat(transaction.getState(), is(TransactionBatch.TxnState.ABORTED));
- }
-
- @Test(expected = TransactionException.class)
- public void testAbortLockFails() throws Exception {
- doThrow(new LockException("")).when(mockLock).release();
- transaction.abort();
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/lock/TestHeartbeatTimerTask.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/lock/TestHeartbeatTimerTask.java
deleted file mode 100644
index 1edec69..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/lock/TestHeartbeatTimerTask.java
+++ /dev/null
@@ -1,117 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.client.lock;
-
-import static org.mockito.Mockito.doThrow;
-import static org.mockito.Mockito.verify;
-
-import java.util.Arrays;
-import java.util.List;
-
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.api.NoSuchLockException;
-import org.apache.hadoop.hive.metastore.api.NoSuchTxnException;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.hadoop.hive.metastore.api.TxnAbortedException;
-import org.apache.thrift.TException;
-import org.junit.Before;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.mockito.Mock;
-import org.mockito.runners.MockitoJUnitRunner;
-
-@RunWith(MockitoJUnitRunner.class)
-public class TestHeartbeatTimerTask {
-
- private static final long TRANSACTION_ID = 10L;
- private static final long LOCK_ID = 1L;
- private static final List TABLES = createTable();
-
- @Mock
- private IMetaStoreClient mockMetaStoreClient;
- @Mock
- private LockFailureListener mockListener;
-
- private HeartbeatTimerTask task;
-
- @Before
- public void create() throws Exception {
- task = new HeartbeatTimerTask(mockMetaStoreClient, mockListener, TRANSACTION_ID, TABLES, LOCK_ID);
- }
-
- @Test
- public void testRun() throws Exception {
- task.run();
-
- verify(mockMetaStoreClient).heartbeat(TRANSACTION_ID, LOCK_ID);
- }
-
- @Test
- public void testRunNullTransactionId() throws Exception {
- task = new HeartbeatTimerTask(mockMetaStoreClient, mockListener, null, TABLES, LOCK_ID);
-
- task.run();
-
- verify(mockMetaStoreClient).heartbeat(0, LOCK_ID);
- }
-
- @Test
- public void testRunHeartbeatFailsNoSuchLockException() throws Exception {
- NoSuchLockException exception = new NoSuchLockException();
- doThrow(exception).when(mockMetaStoreClient).heartbeat(TRANSACTION_ID, LOCK_ID);
-
- task.run();
-
- verify(mockListener).lockFailed(LOCK_ID, TRANSACTION_ID, Arrays.asList("DB.TABLE"), exception);
- }
-
- @Test
- public void testRunHeartbeatFailsNoSuchTxnException() throws Exception {
- NoSuchTxnException exception = new NoSuchTxnException();
- doThrow(exception).when(mockMetaStoreClient).heartbeat(TRANSACTION_ID, LOCK_ID);
-
- task.run();
-
- verify(mockListener).lockFailed(LOCK_ID, TRANSACTION_ID, Arrays.asList("DB.TABLE"), exception);
- }
-
- @Test
- public void testRunHeartbeatFailsTxnAbortedException() throws Exception {
- TxnAbortedException exception = new TxnAbortedException();
- doThrow(exception).when(mockMetaStoreClient).heartbeat(TRANSACTION_ID, LOCK_ID);
-
- task.run();
-
- verify(mockListener).lockFailed(LOCK_ID, TRANSACTION_ID, Arrays.asList("DB.TABLE"), exception);
- }
-
- @Test
- public void testRunHeartbeatFailsTException() throws Exception {
- TException exception = new TException();
- doThrow(exception).when(mockMetaStoreClient).heartbeat(TRANSACTION_ID, LOCK_ID);
-
- task.run();
- }
-
- private static List createTable() {
- Table table = new Table();
- table.setDbName("DB");
- table.setTableName("TABLE");
- return Arrays.asList(table);
- }
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/lock/TestLock.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/lock/TestLock.java
deleted file mode 100644
index 0a46faf..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/client/lock/TestLock.java
+++ /dev/null
@@ -1,338 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.client.lock;
-
-import static org.apache.hadoop.hive.metastore.api.LockState.ABORT;
-import static org.apache.hadoop.hive.metastore.api.LockState.ACQUIRED;
-import static org.apache.hadoop.hive.metastore.api.LockState.NOT_ACQUIRED;
-import static org.apache.hadoop.hive.metastore.api.LockState.WAITING;
-import static org.junit.Assert.assertEquals;
-import static org.junit.Assert.assertNull;
-import static org.junit.Assert.assertTrue;
-import static org.mockito.Matchers.any;
-import static org.mockito.Matchers.anyInt;
-import static org.mockito.Matchers.anyLong;
-import static org.mockito.Matchers.eq;
-import static org.mockito.Mockito.doThrow;
-import static org.mockito.Mockito.verify;
-import static org.mockito.Mockito.verifyNoMoreInteractions;
-import static org.mockito.Mockito.verifyZeroInteractions;
-import static org.mockito.Mockito.when;
-
-import java.net.InetAddress;
-import java.util.Collection;
-import java.util.Collections;
-import java.util.List;
-import java.util.Set;
-import java.util.Timer;
-
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.api.DataOperationType;
-import org.apache.hadoop.hive.metastore.api.LockComponent;
-import org.apache.hadoop.hive.metastore.api.LockLevel;
-import org.apache.hadoop.hive.metastore.api.LockRequest;
-import org.apache.hadoop.hive.metastore.api.LockResponse;
-import org.apache.hadoop.hive.metastore.api.LockType;
-import org.apache.hadoop.hive.metastore.api.NoSuchLockException;
-import org.apache.hadoop.hive.metastore.api.NoSuchTxnException;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.hadoop.hive.metastore.api.TxnAbortedException;
-import org.apache.thrift.TException;
-import org.junit.Before;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.mockito.ArgumentCaptor;
-import org.mockito.Captor;
-import org.mockito.Mock;
-import org.mockito.runners.MockitoJUnitRunner;
-
-import com.google.common.collect.ImmutableSet;
-
-@RunWith(MockitoJUnitRunner.class)
-public class TestLock {
-
- private static final Table SOURCE_TABLE_1 = createTable("DB", "SOURCE_1");
- private static final Table SOURCE_TABLE_2 = createTable("DB", "SOURCE_2");
- private static final Table SINK_TABLE = createTable("DB", "SINK");
- private static final Set SOURCES = ImmutableSet.of(SOURCE_TABLE_1, SOURCE_TABLE_2);
- private static final Set SINKS = ImmutableSet.of(SINK_TABLE);
- private static final Set TABLES = ImmutableSet.of(SOURCE_TABLE_1, SOURCE_TABLE_2, SINK_TABLE);
- private static final long LOCK_ID = 42;
- private static final long TRANSACTION_ID = 109;
- private static final String USER = "ewest";
-
- @Mock
- private IMetaStoreClient mockMetaStoreClient;
- @Mock
- private LockFailureListener mockListener;
- @Mock
- private LockResponse mockLockResponse;
- @Mock
- private HeartbeatFactory mockHeartbeatFactory;
- @Mock
- private Timer mockHeartbeat;
- @Captor
- private ArgumentCaptor requestCaptor;
-
- private Lock readLock;
- private Lock writeLock;
- private HiveConf configuration = new HiveConf();
-
- @Before
- public void injectMocks() throws Exception {
- when(mockMetaStoreClient.lock(any(LockRequest.class))).thenReturn(mockLockResponse);
- when(mockLockResponse.getLockid()).thenReturn(LOCK_ID);
- when(mockLockResponse.getState()).thenReturn(ACQUIRED);
- when(
- mockHeartbeatFactory.newInstance(any(IMetaStoreClient.class), any(LockFailureListener.class), any(Long.class),
- any(Collection.class), anyLong(), anyInt())).thenReturn(mockHeartbeat);
-
- readLock = new Lock(mockMetaStoreClient, mockHeartbeatFactory, configuration, mockListener, USER, SOURCES,
- Collections. emptySet(), 3, 0);
- writeLock = new Lock(mockMetaStoreClient, mockHeartbeatFactory, configuration, mockListener, USER, SOURCES, SINKS,
- 3, 0);
- }
-
- @Test
- public void testAcquireReadLockWithNoIssues() throws Exception {
- readLock.acquire();
- assertEquals(Long.valueOf(LOCK_ID), readLock.getLockId());
- assertNull(readLock.getTransactionId());
- }
-
- @Test(expected = IllegalArgumentException.class)
- public void testAcquireWriteLockWithoutTxn() throws Exception {
- writeLock.acquire();
- }
-
- @Test(expected = IllegalArgumentException.class)
- public void testAcquireWriteLockWithInvalidTxn() throws Exception {
- writeLock.acquire(0);
- }
-
- @Test
- public void testAcquireTxnLockWithNoIssues() throws Exception {
- writeLock.acquire(TRANSACTION_ID);
- assertEquals(Long.valueOf(LOCK_ID), writeLock.getLockId());
- assertEquals(Long.valueOf(TRANSACTION_ID), writeLock.getTransactionId());
- }
-
- @Test
- public void testAcquireReadLockCheckHeartbeatCreated() throws Exception {
- configuration.set("hive.txn.timeout", "100s");
- readLock.acquire();
-
- verify(mockHeartbeatFactory).newInstance(eq(mockMetaStoreClient), eq(mockListener), any(Long.class), eq(SOURCES),
- eq(LOCK_ID), eq(75));
- }
-
- @Test
- public void testAcquireTxnLockCheckHeartbeatCreated() throws Exception {
- configuration.set("hive.txn.timeout", "100s");
- writeLock.acquire(TRANSACTION_ID);
-
- verify(mockHeartbeatFactory).newInstance(eq(mockMetaStoreClient), eq(mockListener), eq(TRANSACTION_ID),
- eq(TABLES), eq(LOCK_ID), eq(75));
- }
-
- @Test
- public void testAcquireLockCheckUser() throws Exception {
- readLock.acquire();
- verify(mockMetaStoreClient).lock(requestCaptor.capture());
- LockRequest actualRequest = requestCaptor.getValue();
- assertEquals(USER, actualRequest.getUser());
- }
-
- @Test
- public void testAcquireReadLockCheckLocks() throws Exception {
- readLock.acquire();
- verify(mockMetaStoreClient).lock(requestCaptor.capture());
-
- LockRequest request = requestCaptor.getValue();
- assertEquals(0, request.getTxnid());
- assertEquals(USER, request.getUser());
- assertEquals(InetAddress.getLocalHost().getHostName(), request.getHostname());
-
- List components = request.getComponent();
-
- assertEquals(2, components.size());
-
- LockComponent expected1 = new LockComponent(LockType.SHARED_READ, LockLevel.TABLE, "DB");
- expected1.setTablename("SOURCE_1");
- expected1.setOperationType(DataOperationType.INSERT);
- expected1.setIsTransactional(true);
- assertTrue(components.contains(expected1));
-
- LockComponent expected2 = new LockComponent(LockType.SHARED_READ, LockLevel.TABLE, "DB");
- expected2.setTablename("SOURCE_2");
- expected2.setOperationType(DataOperationType.INSERT);
- expected2.setIsTransactional(true);
- assertTrue(components.contains(expected2));
- }
-
- @Test
- public void testAcquireTxnLockCheckLocks() throws Exception {
- writeLock.acquire(TRANSACTION_ID);
- verify(mockMetaStoreClient).lock(requestCaptor.capture());
-
- LockRequest request = requestCaptor.getValue();
- assertEquals(TRANSACTION_ID, request.getTxnid());
- assertEquals(USER, request.getUser());
- assertEquals(InetAddress.getLocalHost().getHostName(), request.getHostname());
-
- List components = request.getComponent();
-
- assertEquals(3, components.size());
-
- LockComponent expected1 = new LockComponent(LockType.SHARED_READ, LockLevel.TABLE, "DB");
- expected1.setTablename("SOURCE_1");
- expected1.setOperationType(DataOperationType.INSERT);
- expected1.setIsTransactional(true);
- assertTrue(components.contains(expected1));
-
- LockComponent expected2 = new LockComponent(LockType.SHARED_READ, LockLevel.TABLE, "DB");
- expected2.setTablename("SOURCE_2");
- expected2.setOperationType(DataOperationType.INSERT);
- expected2.setIsTransactional(true);
- assertTrue(components.contains(expected2));
-
- LockComponent expected3 = new LockComponent(LockType.SHARED_WRITE, LockLevel.TABLE, "DB");
- expected3.setTablename("SINK");
- expected3.setOperationType(DataOperationType.UPDATE);
- expected3.setIsTransactional(true);
- assertTrue(components.contains(expected3));
- }
-
- @Test(expected = LockException.class)
- public void testAcquireLockNotAcquired() throws Exception {
- when(mockLockResponse.getState()).thenReturn(NOT_ACQUIRED);
- readLock.acquire();
- }
-
- @Test(expected = LockException.class)
- public void testAcquireLockAborted() throws Exception {
- when(mockLockResponse.getState()).thenReturn(ABORT);
- readLock.acquire();
- }
-
- @Test(expected = LockException.class)
- public void testAcquireLockWithWaitRetriesExceeded() throws Exception {
- when(mockLockResponse.getState()).thenReturn(WAITING, WAITING, WAITING);
- readLock.acquire();
- }
-
- @Test
- public void testAcquireLockWithWaitRetries() throws Exception {
- when(mockLockResponse.getState()).thenReturn(WAITING, WAITING, ACQUIRED);
- readLock.acquire();
- assertEquals(Long.valueOf(LOCK_ID), readLock.getLockId());
- }
-
- @Test
- public void testReleaseLock() throws Exception {
- readLock.acquire();
- readLock.release();
- verify(mockMetaStoreClient).unlock(LOCK_ID);
- }
-
- @Test
- public void testReleaseLockNoLock() throws Exception {
- readLock.release();
- verifyNoMoreInteractions(mockMetaStoreClient);
- }
-
- @Test
- public void testReleaseLockCancelsHeartbeat() throws Exception {
- readLock.acquire();
- readLock.release();
- verify(mockHeartbeat).cancel();
- }
-
- @Test
- public void testReadHeartbeat() throws Exception {
- HeartbeatTimerTask task = new HeartbeatTimerTask(mockMetaStoreClient, mockListener, null, SOURCES, LOCK_ID);
- task.run();
- verify(mockMetaStoreClient).heartbeat(0, LOCK_ID);
- }
-
- @Test
- public void testTxnHeartbeat() throws Exception {
- HeartbeatTimerTask task = new HeartbeatTimerTask(mockMetaStoreClient, mockListener, TRANSACTION_ID, SOURCES,
- LOCK_ID);
- task.run();
- verify(mockMetaStoreClient).heartbeat(TRANSACTION_ID, LOCK_ID);
- }
-
- @Test
- public void testReadHeartbeatFailsNoSuchLockException() throws Exception {
- Throwable t = new NoSuchLockException();
- doThrow(t).when(mockMetaStoreClient).heartbeat(0, LOCK_ID);
- HeartbeatTimerTask task = new HeartbeatTimerTask(mockMetaStoreClient, mockListener, null, SOURCES, LOCK_ID);
- task.run();
- verify(mockListener).lockFailed(LOCK_ID, null, Lock.asStrings(SOURCES), t);
- }
-
- @Test
- public void testTxnHeartbeatFailsNoSuchLockException() throws Exception {
- Throwable t = new NoSuchLockException();
- doThrow(t).when(mockMetaStoreClient).heartbeat(TRANSACTION_ID, LOCK_ID);
- HeartbeatTimerTask task = new HeartbeatTimerTask(mockMetaStoreClient, mockListener, TRANSACTION_ID, SOURCES,
- LOCK_ID);
- task.run();
- verify(mockListener).lockFailed(LOCK_ID, TRANSACTION_ID, Lock.asStrings(SOURCES), t);
- }
-
- @Test
- public void testHeartbeatFailsNoSuchTxnException() throws Exception {
- Throwable t = new NoSuchTxnException();
- doThrow(t).when(mockMetaStoreClient).heartbeat(TRANSACTION_ID, LOCK_ID);
- HeartbeatTimerTask task = new HeartbeatTimerTask(mockMetaStoreClient, mockListener, TRANSACTION_ID, SOURCES,
- LOCK_ID);
- task.run();
- verify(mockListener).lockFailed(LOCK_ID, TRANSACTION_ID, Lock.asStrings(SOURCES), t);
- }
-
- @Test
- public void testHeartbeatFailsTxnAbortedException() throws Exception {
- Throwable t = new TxnAbortedException();
- doThrow(t).when(mockMetaStoreClient).heartbeat(TRANSACTION_ID, LOCK_ID);
- HeartbeatTimerTask task = new HeartbeatTimerTask(mockMetaStoreClient, mockListener, TRANSACTION_ID, SOURCES,
- LOCK_ID);
- task.run();
- verify(mockListener).lockFailed(LOCK_ID, TRANSACTION_ID, Lock.asStrings(SOURCES), t);
- }
-
- @Test
- public void testHeartbeatContinuesTException() throws Exception {
- Throwable t = new TException();
- doThrow(t).when(mockMetaStoreClient).heartbeat(0, LOCK_ID);
- HeartbeatTimerTask task = new HeartbeatTimerTask(mockMetaStoreClient, mockListener, TRANSACTION_ID, SOURCES,
- LOCK_ID);
- task.run();
- verifyZeroInteractions(mockListener);
- }
-
- private static Table createTable(String databaseName, String tableName) {
- Table table = new Table();
- table.setDbName(databaseName);
- table.setTableName(tableName);
- return table;
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
deleted file mode 100644
index e890c52..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestBucketIdResolverImpl.java
+++ /dev/null
@@ -1,59 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import static org.hamcrest.CoreMatchers.is;
-import static org.junit.Assert.assertThat;
-
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat;
-import org.apache.hadoop.hive.ql.io.BucketCodec;
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
-import org.apache.hive.hcatalog.streaming.mutate.MutableRecord;
-import org.junit.Test;
-
-public class TestBucketIdResolverImpl {
-
- private static final int TOTAL_BUCKETS = 12;
- private static final int RECORD_ID_COLUMN = 2;
- // id - TODO: use a non-zero index to check for offset errors.
- private static final int[] BUCKET_COLUMN_INDEXES = new int[] { 0 };
-
- private BucketIdResolver capturingBucketIdResolver = new BucketIdResolverImpl(
- ObjectInspectorFactory.getReflectionObjectInspector(MutableRecord.class,
- ObjectInspectorFactory.ObjectInspectorOptions.JAVA), RECORD_ID_COLUMN, TOTAL_BUCKETS, BUCKET_COLUMN_INDEXES);
-
- @Test
- public void testAttachBucketIdToRecord() {
- MutableRecord record = new MutableRecord(1, "hello");
- capturingBucketIdResolver.attachBucketIdToRecord(record);
- assertThat(record.rowId, is(new RecordIdentifier(-1L,
- BucketCodec.V1.encode(new AcidOutputFormat.Options(null).bucket(1)),
- -1L)));
- assertThat(record.id, is(1));
- assertThat(record.msg.toString(), is("hello"));
- }
-
- @Test(expected = IllegalArgumentException.class)
- public void testNoBucketColumns() {
- new BucketIdResolverImpl(ObjectInspectorFactory.getReflectionObjectInspector(MutableRecord.class,
- ObjectInspectorFactory.ObjectInspectorOptions.JAVA), RECORD_ID_COLUMN, TOTAL_BUCKETS, new int[0]);
-
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestGroupingValidator.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestGroupingValidator.java
deleted file mode 100644
index 1d171c4..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestGroupingValidator.java
+++ /dev/null
@@ -1,87 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertTrue;
-
-import java.util.Arrays;
-import java.util.Collections;
-
-import org.junit.Test;
-
-public class TestGroupingValidator {
-
- private GroupingValidator validator = new GroupingValidator();
-
- @Test
- public void uniqueGroups() {
- assertTrue(validator.isInSequence(Arrays.asList("a", "A"), 1));
- assertTrue(validator.isInSequence(Arrays.asList("c", "C"), 3));
- assertTrue(validator.isInSequence(Arrays.asList("b", "B"), 2));
- }
-
- @Test
- public void sameGroup() {
- assertTrue(validator.isInSequence(Arrays.asList("a", "A"), 1));
- assertTrue(validator.isInSequence(Arrays.asList("a", "A"), 1));
- assertTrue(validator.isInSequence(Arrays.asList("a", "A"), 1));
- }
-
- @Test
- public void revisitedGroup() {
- assertTrue(validator.isInSequence(Arrays.asList("a", "A"), 1));
- assertTrue(validator.isInSequence(Arrays.asList("c", "C"), 3));
- assertFalse(validator.isInSequence(Arrays.asList("a", "A"), 1));
- }
-
- @Test
- public void samePartitionDifferentBucket() {
- assertTrue(validator.isInSequence(Arrays.asList("a", "A"), 1));
- assertTrue(validator.isInSequence(Arrays.asList("c", "C"), 3));
- assertTrue(validator.isInSequence(Arrays.asList("a", "A"), 2));
- }
-
- @Test
- public void sameBucketDifferentPartition() {
- assertTrue(validator.isInSequence(Arrays.asList("a", "A"), 1));
- assertTrue(validator.isInSequence(Arrays.asList("c", "C"), 3));
- assertTrue(validator.isInSequence(Arrays.asList("b", "B"), 1));
- }
-
- @Test
- public void uniqueGroupsNoPartition() {
- assertTrue(validator.isInSequence(Collections. emptyList(), 1));
- assertTrue(validator.isInSequence(Collections. emptyList(), 3));
- assertTrue(validator.isInSequence(Collections. emptyList(), 2));
- }
-
- @Test
- public void sameGroupNoPartition() {
- assertTrue(validator.isInSequence(Collections. emptyList(), 1));
- assertTrue(validator.isInSequence(Collections. emptyList(), 1));
- assertTrue(validator.isInSequence(Collections. emptyList(), 1));
- }
-
- @Test
- public void revisitedGroupNoPartition() {
- assertTrue(validator.isInSequence(Collections. emptyList(), 1));
- assertTrue(validator.isInSequence(Collections. emptyList(), 3));
- assertFalse(validator.isInSequence(Collections. emptyList(), 1));
- }
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestMetaStorePartitionHelper.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestMetaStorePartitionHelper.java
deleted file mode 100644
index 335ecd2..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestMetaStorePartitionHelper.java
+++ /dev/null
@@ -1,129 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import static org.hamcrest.CoreMatchers.is;
-import static org.junit.Assert.assertThat;
-import static org.mockito.Mockito.verify;
-import static org.mockito.Mockito.verifyZeroInteractions;
-import static org.mockito.Mockito.when;
-
-import java.io.IOException;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.List;
-
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.metastore.IMetaStoreClient;
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.junit.Before;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.mockito.ArgumentCaptor;
-import org.mockito.Captor;
-import org.mockito.Mock;
-import org.mockito.runners.MockitoJUnitRunner;
-
-@RunWith(MockitoJUnitRunner.class)
-public class TestMetaStorePartitionHelper {
-
- private static final Path TABLE_PATH = new Path("table");
- private static final String TABLE_LOCATION = TABLE_PATH.toString();
-
- private static final FieldSchema PARTITION_KEY_A = new FieldSchema("A", "string", null);
- private static final FieldSchema PARTITION_KEY_B = new FieldSchema("B", "string", null);
- private static final List PARTITION_KEYS = Arrays.asList(PARTITION_KEY_A, PARTITION_KEY_B);
- private static final Path PARTITION_PATH = new Path(TABLE_PATH, "a=1/b=2");
- private static final String PARTITION_LOCATION = PARTITION_PATH.toString();
-
- private static final String DATABASE_NAME = "db";
- private static final String TABLE_NAME = "one";
-
- private static final List UNPARTITIONED_VALUES = Collections.emptyList();
- private static final List PARTITIONED_VALUES = Arrays.asList("1", "2");
-
- @Mock
- private IMetaStoreClient mockClient;
- @Mock
- private Table mockTable;
- private StorageDescriptor tableStorageDescriptor = new StorageDescriptor();
-
- @Mock
- private Partition mockPartition;
- @Mock
- private StorageDescriptor mockPartitionStorageDescriptor;
- @Captor
- private ArgumentCaptor partitionCaptor;
-
- private PartitionHelper helper;
-
- @Before
- public void injectMocks() throws Exception {
- when(mockClient.getTable(DATABASE_NAME, TABLE_NAME)).thenReturn(mockTable);
- when(mockTable.getDbName()).thenReturn(DATABASE_NAME);
- when(mockTable.getTableName()).thenReturn(TABLE_NAME);
- when(mockTable.getPartitionKeys()).thenReturn(PARTITION_KEYS);
- when(mockTable.getSd()).thenReturn(tableStorageDescriptor);
- tableStorageDescriptor.setLocation(TABLE_LOCATION);
-
- when(mockClient.getPartition(DATABASE_NAME, TABLE_NAME, PARTITIONED_VALUES)).thenReturn(mockPartition);
- when(mockPartition.getSd()).thenReturn(mockPartitionStorageDescriptor);
- when(mockPartitionStorageDescriptor.getLocation()).thenReturn(PARTITION_LOCATION);
-
- helper = new MetaStorePartitionHelper(mockClient, DATABASE_NAME, TABLE_NAME, TABLE_PATH);
- }
-
- @Test
- public void getPathForUnpartitionedTable() throws Exception {
- Path path = helper.getPathForPartition(UNPARTITIONED_VALUES);
- assertThat(path, is(TABLE_PATH));
- verifyZeroInteractions(mockClient);
- }
-
- @Test
- public void getPathForPartitionedTable() throws Exception {
- Path path = helper.getPathForPartition(PARTITIONED_VALUES);
- assertThat(path, is(PARTITION_PATH));
- }
-
- @Test
- public void createOnUnpartitionTableDoesNothing() throws Exception {
- helper.createPartitionIfNotExists(UNPARTITIONED_VALUES);
- verifyZeroInteractions(mockClient);
- }
-
- @Test
- public void createOnPartitionTable() throws Exception {
- helper.createPartitionIfNotExists(PARTITIONED_VALUES);
-
- verify(mockClient).add_partition(partitionCaptor.capture());
- Partition actual = partitionCaptor.getValue();
- assertThat(actual.getSd().getLocation(), is(PARTITION_LOCATION));
- assertThat(actual.getValues(), is(PARTITIONED_VALUES));
- }
-
- @Test
- public void closeSucceeds() throws IOException {
- helper.close();
- verify(mockClient).close();
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestMutatorCoordinator.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestMutatorCoordinator.java
deleted file mode 100644
index fab56b3..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestMutatorCoordinator.java
+++ /dev/null
@@ -1,261 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import static org.mockito.Matchers.any;
-import static org.mockito.Matchers.anyInt;
-import static org.mockito.Matchers.anyList;
-import static org.mockito.Matchers.anyLong;
-import static org.mockito.Matchers.eq;
-import static org.mockito.Mockito.never;
-import static org.mockito.Mockito.times;
-import static org.mockito.Mockito.verify;
-import static org.mockito.Mockito.verifyZeroInteractions;
-import static org.mockito.Mockito.when;
-
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.List;
-
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat;
-import org.apache.hive.hcatalog.streaming.mutate.client.AcidTable;
-import org.junit.Before;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.mockito.Mock;
-import org.mockito.runners.MockitoJUnitRunner;
-
-@RunWith(MockitoJUnitRunner.class)
-public class TestMutatorCoordinator {
-
- private static final List UNPARTITIONED = Collections. emptyList();
- private static final List PARTITION_B = Arrays.asList("B");
- private static final List PARTITION_A = Arrays.asList("A");
- private static final long WRITE_ID = 2L;
- private static final int BUCKET_ID = 0;
- private static final Path PATH_A = new Path("X");
- private static final Path PATH_B = new Path("B");
- private static final Object RECORD = "RECORD";
- private static final RecordIdentifier ROW__ID_B0_R0 = new RecordIdentifier(10L, BUCKET_ID, 0L);
- private static final RecordIdentifier ROW__ID_B0_R1 = new RecordIdentifier(10L, BUCKET_ID, 1L);
- private static final RecordIdentifier ROW__ID_B1_R0 = new RecordIdentifier(10L, BUCKET_ID + 1, 0L);
- private static final RecordIdentifier ROW__ID_INSERT = new RecordIdentifier(-1L, BUCKET_ID, -1L);
-
- @Mock
- private MutatorFactory mockMutatorFactory;
- @Mock
- private PartitionHelper mockPartitionHelper;
- @Mock
- private GroupingValidator mockGroupingValidator;
- @Mock
- private SequenceValidator mockSequenceValidator;
- @Mock
- private AcidTable mockAcidTable;
- @Mock
- private RecordInspector mockRecordInspector;
- @Mock
- private BucketIdResolver mockBucketIdResolver;
- @Mock
- private Mutator mockMutator;
-
- private MutatorCoordinator coordinator;
-
- private HiveConf configuration = new HiveConf();
-
- @Before
- public void createCoordinator() throws Exception {
- when(mockAcidTable.getOutputFormatName()).thenReturn(OrcOutputFormat.class.getName());
- when(mockAcidTable.getTotalBuckets()).thenReturn(1);
- when(mockAcidTable.getWriteId()).thenReturn(WRITE_ID);
- when(mockAcidTable.createPartitions()).thenReturn(true);
- when(mockMutatorFactory.newRecordInspector()).thenReturn(mockRecordInspector);
- when(mockMutatorFactory.newBucketIdResolver(anyInt())).thenReturn(mockBucketIdResolver);
- when(mockMutatorFactory.newMutator(any(OrcOutputFormat.class), anyLong(), any(Path.class), anyInt())).thenReturn(
- mockMutator);
- when(mockPartitionHelper.getPathForPartition(any(List.class))).thenReturn(PATH_A);
- when(mockRecordInspector.extractRecordIdentifier(RECORD)).thenReturn(ROW__ID_INSERT);
- when(mockSequenceValidator.isInSequence(any(RecordIdentifier.class))).thenReturn(true);
- when(mockGroupingValidator.isInSequence(any(List.class), anyInt())).thenReturn(true);
-
- coordinator = new MutatorCoordinator(configuration, mockMutatorFactory, mockPartitionHelper, mockGroupingValidator,
- mockSequenceValidator, mockAcidTable, false);
- }
-
- @Test
- public void insert() throws Exception {
- coordinator.insert(UNPARTITIONED, RECORD);
-
- verify(mockPartitionHelper).createPartitionIfNotExists(UNPARTITIONED);
- verify(mockMutatorFactory).newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_A), eq(BUCKET_ID));
- verify(mockMutator).insert(RECORD);
- }
-
- @Test
- public void multipleInserts() throws Exception {
- coordinator.insert(UNPARTITIONED, RECORD);
- coordinator.insert(UNPARTITIONED, RECORD);
- coordinator.insert(UNPARTITIONED, RECORD);
-
- verify(mockPartitionHelper).createPartitionIfNotExists(UNPARTITIONED);
- verify(mockMutatorFactory).newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_A), eq(BUCKET_ID));
- verify(mockMutator, times(3)).insert(RECORD);
- }
-
- @Test
- public void insertPartitionChanges() throws Exception {
- when(mockPartitionHelper.getPathForPartition(PARTITION_A)).thenReturn(PATH_A);
- when(mockPartitionHelper.getPathForPartition(PARTITION_B)).thenReturn(PATH_B);
-
- coordinator.insert(PARTITION_A, RECORD);
- coordinator.insert(PARTITION_B, RECORD);
-
- verify(mockPartitionHelper).createPartitionIfNotExists(PARTITION_A);
- verify(mockPartitionHelper).createPartitionIfNotExists(PARTITION_B);
- verify(mockMutatorFactory).newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_A), eq(BUCKET_ID));
- verify(mockMutatorFactory).newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_B), eq(BUCKET_ID));
- verify(mockMutator, times(2)).insert(RECORD);
- }
-
- @Test
- public void bucketChanges() throws Exception {
- when(mockRecordInspector.extractRecordIdentifier(RECORD)).thenReturn(ROW__ID_B0_R0, ROW__ID_B1_R0);
-
- when(mockBucketIdResolver.computeBucketId(RECORD)).thenReturn(0, 1);
-
- coordinator.update(UNPARTITIONED, RECORD);
- coordinator.delete(UNPARTITIONED, RECORD);
-
- verify(mockMutatorFactory).newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_A), eq(BUCKET_ID));
- verify(mockMutatorFactory)
- .newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_A), eq(BUCKET_ID + 1));
- verify(mockMutator).update(RECORD);
- verify(mockMutator).delete(RECORD);
- }
-
- @Test
- public void partitionThenBucketChanges() throws Exception {
- when(mockRecordInspector.extractRecordIdentifier(RECORD)).thenReturn(ROW__ID_B0_R0, ROW__ID_B0_R1, ROW__ID_B1_R0,
- ROW__ID_INSERT);
-
- when(mockBucketIdResolver.computeBucketId(RECORD)).thenReturn(0, 0, 1, 0);
-
- when(mockPartitionHelper.getPathForPartition(PARTITION_A)).thenReturn(PATH_A);
- when(mockPartitionHelper.getPathForPartition(PARTITION_B)).thenReturn(PATH_B);
-
- coordinator.update(PARTITION_A, RECORD); /* PaB0 */
- coordinator.insert(PARTITION_B, RECORD); /* PbB0 */
- coordinator.delete(PARTITION_B, RECORD); /* PbB0 */
- coordinator.update(PARTITION_B, RECORD); /* PbB1 */
-
- verify(mockPartitionHelper).createPartitionIfNotExists(PARTITION_B);
- verify(mockMutatorFactory).newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_A), eq(BUCKET_ID));
- verify(mockMutatorFactory, times(2)).newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_B),
- eq(BUCKET_ID));
- verify(mockMutatorFactory)
- .newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_B), eq(BUCKET_ID + 1));
- verify(mockMutator, times(2)).update(RECORD);
- verify(mockMutator).delete(RECORD);
- verify(mockMutator).insert(RECORD);
- verify(mockSequenceValidator, times(4)).reset();
- }
-
- @Test
- public void partitionThenBucketChangesNoCreateAsPartitionEstablished() throws Exception {
- when(mockRecordInspector.extractRecordIdentifier(RECORD)).thenReturn(ROW__ID_B0_R0, ROW__ID_INSERT);
- when(mockBucketIdResolver.computeBucketId(RECORD)).thenReturn(0, 0);
- when(mockPartitionHelper.getPathForPartition(PARTITION_B)).thenReturn(PATH_B);
-
- coordinator.delete(PARTITION_B, RECORD); /* PbB0 */
- coordinator.insert(PARTITION_B, RECORD); /* PbB0 */
-
- verify(mockPartitionHelper, never()).createPartitionIfNotExists(anyList());
- }
-
- @Test(expected = RecordSequenceException.class)
- public void outOfSequence() throws Exception {
- when(mockSequenceValidator.isInSequence(any(RecordIdentifier.class))).thenReturn(false);
-
- coordinator.update(UNPARTITIONED, RECORD);
- coordinator.delete(UNPARTITIONED, RECORD);
-
- verify(mockPartitionHelper).createPartitionIfNotExists(UNPARTITIONED);
- verify(mockMutatorFactory).newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_A), eq(BUCKET_ID));
- verify(mockMutator).update(RECORD);
- verify(mockMutator).delete(RECORD);
- }
-
- @Test(expected = GroupRevisitedException.class)
- public void revisitGroup() throws Exception {
- when(mockGroupingValidator.isInSequence(any(List.class), anyInt())).thenReturn(false);
-
- coordinator.update(UNPARTITIONED, RECORD);
- coordinator.delete(UNPARTITIONED, RECORD);
-
- verify(mockPartitionHelper).createPartitionIfNotExists(UNPARTITIONED);
- verify(mockMutatorFactory).newMutator(any(OrcOutputFormat.class), eq(WRITE_ID), eq(PATH_A), eq(BUCKET_ID));
- verify(mockMutator).update(RECORD);
- verify(mockMutator).delete(RECORD);
- }
-
- @Test(expected = BucketIdException.class)
- public void insertWithBadBucket() throws Exception {
- when(mockRecordInspector.extractRecordIdentifier(RECORD)).thenReturn(ROW__ID_B0_R0);
-
- when(mockBucketIdResolver.computeBucketId(RECORD)).thenReturn(1);
-
- coordinator.insert(UNPARTITIONED, RECORD);
- }
-
- @Test(expected = BucketIdException.class)
- public void updateWithBadBucket() throws Exception {
- when(mockRecordInspector.extractRecordIdentifier(RECORD)).thenReturn(ROW__ID_B0_R0);
-
- when(mockBucketIdResolver.computeBucketId(RECORD)).thenReturn(1);
-
- coordinator.update(UNPARTITIONED, RECORD);
- }
-
- @Test
- public void deleteWithBadBucket() throws Exception {
- when(mockRecordInspector.extractRecordIdentifier(RECORD)).thenReturn(ROW__ID_B0_R0);
-
- when(mockBucketIdResolver.computeBucketId(RECORD)).thenReturn(1);
-
- coordinator.delete(UNPARTITIONED, RECORD);
- }
-
- @Test
- public void closeNoRecords() throws Exception {
- coordinator.close();
-
- // No mutator created
- verifyZeroInteractions(mockMutator);
- }
-
- @Test
- public void closeUsedCoordinator() throws Exception {
- coordinator.insert(UNPARTITIONED, RECORD);
- coordinator.close();
-
- verify(mockMutator).close();
- verify(mockPartitionHelper).close();
- }
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestMutatorImpl.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestMutatorImpl.java
deleted file mode 100644
index d2c89e5..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestMutatorImpl.java
+++ /dev/null
@@ -1,116 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import static org.hamcrest.CoreMatchers.is;
-import static org.junit.Assert.assertThat;
-import static org.mockito.Matchers.any;
-import static org.mockito.Matchers.eq;
-import static org.mockito.Mockito.never;
-import static org.mockito.Mockito.verify;
-import static org.mockito.Mockito.when;
-
-import java.io.IOException;
-
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat;
-import org.apache.hadoop.hive.ql.io.AcidOutputFormat.Options;
-import org.apache.hadoop.hive.ql.io.RecordUpdater;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
-import org.junit.Before;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.mockito.ArgumentCaptor;
-import org.mockito.Captor;
-import org.mockito.Mock;
-import org.mockito.runners.MockitoJUnitRunner;
-
-@RunWith(MockitoJUnitRunner.class)
-public class TestMutatorImpl {
-
- private static final Object RECORD = new Object();
- private static final int RECORD_ID_COLUMN = 2;
- private static final int BUCKET_ID = 0;
- private static final Path PATH = new Path("X");
- private static final long WRITE_ID = 1L;
-
- @Mock
- private AcidOutputFormat, ?> mockOutputFormat;
- @Mock
- private ObjectInspector mockObjectInspector;
- @Mock
- private RecordUpdater mockRecordUpdater;
- @Captor
- private ArgumentCaptor captureOptions;
-
- private final HiveConf configuration = new HiveConf();
-
- private Mutator mutator;
-
- @Before
- public void injectMocks() throws IOException {
- when(mockOutputFormat.getRecordUpdater(eq(PATH), any(Options.class))).thenReturn(mockRecordUpdater);
- mutator = new MutatorImpl(configuration, RECORD_ID_COLUMN, mockObjectInspector, mockOutputFormat, WRITE_ID,
- PATH, BUCKET_ID);
- }
-
- @Test
- public void testCreatesRecordReader() throws IOException {
- verify(mockOutputFormat).getRecordUpdater(eq(PATH), captureOptions.capture());
- Options options = captureOptions.getValue();
- assertThat(options.getBucketId(), is(BUCKET_ID));
- assertThat(options.getConfiguration(), is((Configuration) configuration));
- assertThat(options.getInspector(), is(mockObjectInspector));
- assertThat(options.getRecordIdColumn(), is(RECORD_ID_COLUMN));
- assertThat(options.getMinimumWriteId(), is(WRITE_ID));
- assertThat(options.getMaximumWriteId(), is(WRITE_ID));
- }
-
- @Test
- public void testInsertDelegates() throws IOException {
- mutator.insert(RECORD);
- verify(mockRecordUpdater).insert(WRITE_ID, RECORD);
- }
-
- @Test
- public void testUpdateDelegates() throws IOException {
- mutator.update(RECORD);
- verify(mockRecordUpdater).update(WRITE_ID, RECORD);
- }
-
- @Test
- public void testDeleteDelegates() throws IOException {
- mutator.delete(RECORD);
- verify(mockRecordUpdater).delete(WRITE_ID, RECORD);
- }
-
- @Test
- public void testCloseDelegates() throws IOException {
- mutator.close();
- verify(mockRecordUpdater).close(false);
- }
-
- @Test
- public void testFlushDoesNothing() throws IOException {
- mutator.flush();
- verify(mockRecordUpdater, never()).flush();
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestRecordInspectorImpl.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestRecordInspectorImpl.java
deleted file mode 100644
index 55da312..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestRecordInspectorImpl.java
+++ /dev/null
@@ -1,48 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import static org.hamcrest.CoreMatchers.is;
-import static org.junit.Assert.assertThat;
-
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
-import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
-import org.apache.hive.hcatalog.streaming.mutate.MutableRecord;
-import org.junit.Test;
-
-public class TestRecordInspectorImpl {
-
- private static final int ROW_ID_COLUMN = 2;
-
- private RecordInspectorImpl inspector = new RecordInspectorImpl(ObjectInspectorFactory.getReflectionObjectInspector(
- MutableRecord.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA), ROW_ID_COLUMN);
-
- @Test
- public void testExtractRecordIdentifier() {
- RecordIdentifier recordIdentifier = new RecordIdentifier(10L, 4, 20L);
- MutableRecord record = new MutableRecord(1, "hello", recordIdentifier);
- assertThat(inspector.extractRecordIdentifier(record), is(recordIdentifier));
- }
-
- @Test(expected = IllegalArgumentException.class)
- public void testNotAStructObjectInspector() {
- new RecordInspectorImpl(PrimitiveObjectInspectorFactory.javaBooleanObjectInspector, 2);
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestSequenceValidator.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestSequenceValidator.java
deleted file mode 100644
index 2b3f79f..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestSequenceValidator.java
+++ /dev/null
@@ -1,108 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import static org.hamcrest.CoreMatchers.is;
-import static org.junit.Assert.assertThat;
-
-import org.apache.hadoop.hive.ql.io.RecordIdentifier;
-import org.junit.Test;
-
-public class TestSequenceValidator {
-
- private static final int BUCKET_ID = 1;
-
- private SequenceValidator validator = new SequenceValidator();
-
- @Test
- public void testSingleInSequence() {
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- }
-
- @Test
- public void testRowIdInSequence() {
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 1)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 4)), is(true));
- }
-
- @Test
- public void testTxIdInSequence() {
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(1L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(4L, BUCKET_ID, 0)), is(true));
- }
-
- @Test
- public void testMixedInSequence() {
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 1)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(1L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(1L, BUCKET_ID, 1)), is(true));
- }
-
- @Test
- public void testNegativeTxId() {
- assertThat(validator.isInSequence(new RecordIdentifier(-1L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- }
-
- @Test
- public void testNegativeRowId() {
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, -1)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- }
-
- @Test
- public void testRowIdOutOfSequence() {
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 4)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 1)), is(false));
- }
-
- @Test
- public void testReset() {
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 4)), is(true));
- // New partition for example
- validator.reset();
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 1)), is(true));
- }
-
- @Test
- public void testTxIdOutOfSequence() {
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(4L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(1L, BUCKET_ID, 0)), is(false));
- }
-
- @Test
- public void testMixedOutOfSequence() {
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 0)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(1L, BUCKET_ID, 4)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(1L, BUCKET_ID, 0)), is(false));
- assertThat(validator.isInSequence(new RecordIdentifier(1L, BUCKET_ID, 5)), is(true));
- assertThat(validator.isInSequence(new RecordIdentifier(0L, BUCKET_ID, 6)), is(false));
- }
-
- @Test(expected = NullPointerException.class)
- public void testNullRecordIdentifier() {
- validator.isInSequence(null);
- }
-
-}
diff --git a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestWarehousePartitionHelper.java b/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestWarehousePartitionHelper.java
deleted file mode 100644
index 1011d34..0000000
--- a/hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/worker/TestWarehousePartitionHelper.java
+++ /dev/null
@@ -1,74 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.hive.hcatalog.streaming.mutate.worker;
-
-import static org.hamcrest.CoreMatchers.is;
-import static org.junit.Assert.assertThat;
-
-import java.io.IOException;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.List;
-
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.conf.HiveConf;
-import org.junit.Test;
-
-public class TestWarehousePartitionHelper {
-
- private static final HiveConf CONFIGURATION = new HiveConf();
- private static final Path TABLE_PATH = new Path("table");
-
- private static final List UNPARTITIONED_COLUMNS = Collections.emptyList();
- private static final List UNPARTITIONED_VALUES = Collections.emptyList();
-
- private static final List PARTITIONED_COLUMNS = Arrays.asList("A", "B");
- private static final List PARTITIONED_VALUES = Arrays.asList("1", "2");
-
- private final PartitionHelper unpartitionedHelper;
- private final PartitionHelper partitionedHelper;
-
- public TestWarehousePartitionHelper() throws Exception {
- unpartitionedHelper = new WarehousePartitionHelper(CONFIGURATION, TABLE_PATH, UNPARTITIONED_COLUMNS);
- partitionedHelper = new WarehousePartitionHelper(CONFIGURATION, TABLE_PATH, PARTITIONED_COLUMNS);
- }
-
- @Test(expected = UnsupportedOperationException.class)
- public void createNotSupported() throws Exception {
- unpartitionedHelper.createPartitionIfNotExists(UNPARTITIONED_VALUES);
- }
-
- @Test
- public void getPathForUnpartitionedTable() throws Exception {
- Path path = unpartitionedHelper.getPathForPartition(UNPARTITIONED_VALUES);
- assertThat(path, is(TABLE_PATH));
- }
-
- @Test
- public void getPathForPartitionedTable() throws Exception {
- Path path = partitionedHelper.getPathForPartition(PARTITIONED_VALUES);
- assertThat(path, is(new Path(TABLE_PATH, "A=1/B=2")));
- }
-
- @Test
- public void closeSucceeds() throws IOException {
- partitionedHelper.close();
- unpartitionedHelper.close();
- }
-
-}
diff --git a/hcatalog/streaming/src/test/sit b/hcatalog/streaming/src/test/sit
deleted file mode 100644
index 38cc352..0000000
--- a/hcatalog/streaming/src/test/sit
+++ /dev/null
@@ -1,39 +0,0 @@
-#!/bin/sh
-
-if [ "${HADOOP_HOME}x" == "x" ]
- then
- echo "Please set HADOOP_HOME";
- exit 1
-fi
-
-if [ "${HIVE_HOME}x" == "x" ]
- then
- echo "Please set HIVE_HOME";
- exit 1
-fi
-
-if [ "${JAVA_HOME}x" == "x" ]
- then
- echo "Please set JAVA_HOME";
- exit 1
-fi
-
-for jar in ${HADOOP_HOME}/client/*.jar
- do
- CLASSPATH=${CLASSPATH}:$jar
-done
-
-for jar in ${HIVE_HOME}/lib/*.jar
- do
- CLASSPATH=${CLASSPATH}:$jar
-done
-
-for jar in ${HIVE_HOME}/hcatalog/share/hcatalog/*.jar
- do
- CLASSPATH=${CLASSPATH}:$jar
-done
-
-CLASSPATH=${CLASSPATH}:${HADOOP_HOME}/etc/hadoop
-CLASSPATH=${CLASSPATH}:${HIVE_HOME}/conf
-
-$JAVA_HOME/bin/java -cp ${CLASSPATH} org.apache.hive.hcatalog.streaming.StreamingIntegrationTester $@
diff --git a/itests/hive-unit/pom.xml b/itests/hive-unit/pom.xml
index 5264617..a6008f5 100644
--- a/itests/hive-unit/pom.xml
+++ b/itests/hive-unit/pom.xml
@@ -81,11 +81,6 @@
${project.version}
- org.apache.hive.hcatalog
- hive-hcatalog-streaming
- ${project.version}
-
-
org.apache.hive
hive-streaming
${project.version}
diff --git a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java b/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
index 4a0e834..97530cd 100644
--- a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
+++ b/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
@@ -21,12 +21,10 @@
import static org.junit.Assert.assertNull;
import java.io.File;
-import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
-import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.Random;
@@ -80,11 +78,7 @@
import org.apache.hadoop.hive.ql.session.SessionState;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hive.common.util.Retry;
-import org.apache.hive.common.util.RetryTestRunner;
import org.apache.hive.hcatalog.common.HCatUtil;
-import org.apache.hive.hcatalog.streaming.DelimitedInputWriter;
-import org.apache.hive.hcatalog.streaming.HiveEndPoint;
-import org.apache.hive.hcatalog.streaming.TransactionBatch;
import org.apache.hive.streaming.HiveStreamingConnection;
import org.apache.hive.streaming.StreamingConnection;
import org.apache.hive.streaming.StreamingException;
@@ -96,12 +90,9 @@
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TemporaryFolder;
-import org.junit.runner.RunWith;
-import org.junit.runners.Parameterized;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
-@RunWith(Parameterized.class)
public class TestCompactor {
private static final AtomicInteger salt = new AtomicInteger(new Random().nextInt());
private static final Logger LOG = LoggerFactory.getLogger(TestCompactor.class);
@@ -111,17 +102,6 @@
private final String BASIC_FILE_NAME = TEST_DATA_DIR + "/basic.input.data";
private final String TEST_WAREHOUSE_DIR = TEST_DATA_DIR + "/warehouse";
- @Parameterized.Parameters
- public static Collection
org.apache.hive.hcatalog
- hive-hcatalog-streaming
- ${project.version}
-
-
- org.apache.hive.hcatalog
hive-hcatalog-core
${project.version}
diff --git a/packaging/src/main/assembly/bin.xml b/packaging/src/main/assembly/bin.xml
index a9557cf..957b643 100644
--- a/packaging/src/main/assembly/bin.xml
+++ b/packaging/src/main/assembly/bin.xml
@@ -77,7 +77,6 @@
org.apache.hive.hcatalog:hive-hcatalog-core
org.apache.hive.hcatalog:hive-hcatalog-pig-adapter
org.apache.hive.hcatalog:hive-hcatalog-server-extensions
- org.apache.hive.hcatalog:hive-hcatalog-streaming