Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-2085

It's different between load twice and create datamap with load again after load data and create datamap

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.0
    • 1.4.0
    • core, spark-integration
    • None

    Description

      It's different between two test case

      test case 1: load twice and create datamap , and then query
      test case 2:load once , create datamap and load again, and then query

      +  test("load data into mainTable after create timeseries datamap on table 1") {
       +    sql("drop table if exists mainTable")
       +    sql(
       +      """
       +        | CREATE TABLE mainTable(
       +        |   mytime timestamp,
       +        |   name string,
       +        |   age int)
       +        | STORED BY 'org.apache.carbondata.format'
       +      """.stripMargin)
       +
       +    sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable")
       +
       +    sql(
       +      """
       +        | create datamap agg0 on table mainTable
       +        | using 'preaggregate'
       +        | DMPROPERTIES (
       +        |   'timeseries.eventTime'='mytime',
       +        |   'timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1')
       +        | as select mytime, sum(age)
       +        | from mainTable
       +        | group by mytime""".stripMargin)
       +
       +    sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable")
       +    val df = sql(
       +      """
       +        | select
       +        |   timeseries(mytime,'minute') as minuteLevel,
       +        |   sum(age) as sum
       +        | from mainTable
       +        | where timeseries(mytime,'minute')>='2016-02-23 01:01:00'
       +        | group by
       +        |   timeseries(mytime,'minute')
       +        | order by
       +        |   timeseries(mytime,'minute')
       +      """.stripMargin)
       +
       +    // only for test, it need remove before merge
       +    df.show()
       +    sql("select * from maintable_agg0_minute").show(100)
       +
       +    checkAnswer(df,
       +      Seq(Row(Timestamp.valueOf("2016-02-23 01:01:00"), 120),
       +        Row(Timestamp.valueOf("2016-02-23 01:02:00"), 280)))
       +
       +  }
       +
       +  test("load data into mainTable after create timeseries datamap on table 2") {
       +    sql("drop table if exists mainTable")
       +    sql(
       +      """
       +        | CREATE TABLE mainTable(
       +        |   mytime timestamp,
       +        |   name string,
       +        |   age int)
       +        | STORED BY 'org.apache.carbondata.format'
       +      """.stripMargin)
       +
       +    sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable")
       +    sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable")
       +    sql(
       +      """
       +        | create datamap agg0 on table mainTable
       +        | using 'preaggregate'
       +        | DMPROPERTIES (
       +        |   'timeseries.eventTime'='mytime',
       +        |   'timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1')
       +        | as select mytime, sum(age)
       +        | from mainTable
       +        | group by mytime""".stripMargin)
       +
       +
       +    val df = sql(
       +      """
       +        | select
       +        |   timeseries(mytime,'minute') as minuteLevel,
       +        |   sum(age) as sum
       +        | from mainTable
       +        | where timeseries(mytime,'minute')>='2016-02-23 01:01:00'
       +        | group by
       +        |   timeseries(mytime,'minute')
       +        | order by
       +        |   timeseries(mytime,'minute')
       +      """.stripMargin)
       +
       +    // only for test, it need remove before merge
       +    df.show()
       +    sql("select * from maintable_agg0_minute").show(100)
       +
       +
       +    checkAnswer(df,
       +      Seq(Row(Timestamp.valueOf("2016-02-23 01:01:00"), 120),
       +        Row(Timestamp.valueOf("2016-02-23 01:02:00"), 280)))
       +  }
       +
      

      test case 1 and 2 should success , but test case 1 fail

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xubo245 Bo Xu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m