Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1824

Carbon 1.3.0 - Spark 2.2-Residual segment files left over when load failure happens

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.3.0
    • 1.3.0
    • data-load
    • Test - 3 node ant cluster

    Description

      Steps:
      Beeline:
      1. Create a table with batch sort as sort type, keep block size small
      2. Run Load/Insert/Compaction the table
      3. Bring down thrift server when carbon data is being written to the segment
      4. Do show segments on the table

      Expected: It should not show the residual segments
      Actual: The segment intended for load is shown as marked for delete and it does not get deleted with clean file. No impact on the table as such.

      Query:
      create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ('table_blocksize'='1','sort_scope'='BATCH_SORT','batch_sort_size_inmb'='5000');

      load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');

      0: jdbc:hive2://10.18.98.34:23040> select count from t_carbn0161;
      -----------+

      count(1)

      -----------+

      0

      -----------+
      1 row selected (13.011 seconds)
      0: jdbc:hive2://10.18.98.34:23040> show segments for table lineitem1;
      ------------------------------------------------------------------------------------------------------------

      SegmentSequenceId Status Load Start Time Load End Time Merged To File Format

      ------------------------------------------------------------------------------------------------------------

      1 Marked for Delete 2017-11-28 19:14:46.265 2017-11-28 19:15:28.396 NA COLUMNAR_V3
      0 Marked for Delete 2017-11-28 19:12:58.269 2017-11-28 19:13:37.26 NA COLUMNAR_V3

      ------------------------------------------------------------------------------------------------------------
      0: jdbc:hive2://10.18.98.34:23040> clean files for table t_carbn0161;
      ---------+

      Result

      ---------+
      ---------+
      No rows selected (7.473 seconds)
      0: jdbc:hive2://10.18.98.34:23040> show segments for table lineitem1;
      ------------------------------------------------------------------------------------------------------------

      SegmentSequenceId Status Load Start Time Load End Time Merged To File Format

      ------------------------------------------------------------------------------------------------------------

      1 Marked for Delete 2017-11-28 19:14:46.265 2017-11-28 19:15:28.396 NA COLUMNAR_V3
      0 Marked for Delete 2017-11-28 19:12:58.269 2017-11-28 19:13:37.26 NA COLUMNAR_V3

      ------------------------------------------------------------------------------------------------------------

      Attachments

        Issue Links

          Activity

            People

              dhatchayani dhatchayani
              Ram@huawei Ramakrishna S
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: