Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-4279

Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • 2.3.0
    • None
    • None
    • None
    • Release label:emr-5.24.1
      Hadoop distribution:Amazon 2.8.5
      Applications:
      Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6

      Jar complied with:
      apache-carbondata:2.3.0-SNAPSHOT
      spark:2.4.5
      hadoop:2.8.3

    Description

      as described here

      After the commit https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7

      I have successfully created a table with partitions, but when I trying insert data the job end with a success
      but the segment is marked as "Marked for Delete"

      I am running:

      CREATE TABLE lior_carbon_tests.mark_for_del_bug(
      timestamp string,
      name string
      )
      STORED AS carbondata
      PARTITIONED BY (dt string, hr string)
      
      INSERT INTO lior_carbon_tests.mark_for_del_bug select '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
      
      select * from lior_carbon_tests.mark_for_del_bug
      

      gives:

      +---------+----+---+---+
      |timestamp|name| dt| hr|
      +---------+----+---+---+
      +---------+----+---+---+
      

      And

      show segments for TABLE lior_carbon_tests.mark_for_del_bug
      

      gives

       

      +---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
      |ID |Status           |Load Start Time        |Load Time Taken|Partition|Data Size|Index Size|File Format|
      +---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
      |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S        |NA       |NA       |NA        |columnar_v3|
      +---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
      

       
      I took a looking at the folder structure in S3 and it seems fine

      Attachments

        Activity

          People

            Unassigned Unassigned
            bigicecream Bigicecream
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: