[HIVE-17923] 'cluster by' should not be needed for a bucketed table - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Duplicate
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: None
Labels:
None

Target Version/s:

3.0.0

Description

given

CREATE TABLE over10k_orc_bucketed(t tinyint,
           si smallint,
           i int,
           b bigint,
           f float,
           d double,
           bo boolean,
           s string,
           ts timestamp,
           `dec` decimal(4,2),
           bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC;

insert into over10k_orc_bucketed select * from over10k

produces 1 data file (bucket 0).  It should produce 4 based on input data.

insert into over10k_orc_bucketed select * from over10k cluster by si

does the right thing.

acid_vectorization_original.q has the full script (~~HIVE-17458~~)

Attachments

Issue Links

Add Link

blocks

HIVE-17458 VectorizedOrcAcidRowBatchReader doesn't handle 'original' files

Closed

Delete this link

is duplicated by

HIVE-18157 Vectorization : Insert in bucketed table is broken with vectorization

Closed

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Deepak Jaiswal Assign to me

Reporter:: Eugene Koifman

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Oct/17 15:58

Updated:: 07/Dec/17 20:26

Resolved:: 07/Dec/17 20:11

Agile

View on Board

'cluster by' should not be needed for a bucketed table