Details
-
Bug
-
Status: Open
-
Blocker
-
Resolution: Unresolved
-
2.3.0
-
None
-
None
-
None
-
Release label:emr-5.24.1
Hadoop distribution:Amazon 2.8.5
Applications:
Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
Jar complied with:
apache-carbondata:2.3.0-SNAPSHOT
spark:2.4.5
hadoop:2.8.3
Description
as described here
After the commit https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7
I have successfully created a table with partitions, but when I trying insert data the job end with a success
but the segment is marked as "Marked for Delete"
I am running:
CREATE TABLE lior_carbon_tests.mark_for_del_bug( timestamp string, name string ) STORED AS carbondata PARTITIONED BY (dt string, hr string)
INSERT INTO lior_carbon_tests.mark_for_del_bug select '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
select * from lior_carbon_tests.mark_for_del_bug
gives:
+---------+----+---+---+ |timestamp|name| dt| hr| +---------+----+---+---+ +---------+----+---+---+
And
show segments for TABLE lior_carbon_tests.mark_for_del_bug
gives
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|ID |Status |Load Start Time |Load Time Taken|Partition|Data Size|Index Size|File Format|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S |NA |NA |NA |columnar_v3|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
I took a looking at the folder structure in S3 and it seems fine