Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-3905

When there are many segment files presto query fail

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0
    • None
    • presto-integration
    • None

    Description

      test case1

      insert data in:

      df.writeStream.foreachBatch{ (batchDF: DataFrame, batchId: Long) => {
          ...    
          val cond = $"B.id".isin(df.select(col = "id").as[Int].collect: _*)
          target.as("A")
            .merge(df.as("B"), "A.id = B.id")
            .whenMatched(cond)
            .updateExpr(Map("name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
            .whenNotMatched(cond)
            .insertExpr(Map("id" -> "B.id", "name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
            .execute()    
           ...
      }).outputMode("update").trigger(Trigger.ProcessingTime("3600 seconds")).start()
      

      a lot of segment files will be generated after a few hours
      when i try to use presto to query.
      single condition can be queried, but cannot be queried when there are multiple conditions.

      select name from test_table // ok
      select name from test_table where name = 'joe' // ok
      select name from test_table where name='joe' AND age > 25;// query failed
      select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// query failed

      i have also tried to compact 'major' the segment files to reduce the segment quantity, and I still cannot query successfully.

      presto server logs

      java.lang.IllegalArgumentException: Invalid position 0 in block with 0 positions
      at io.prestosql.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:62)
      at io.prestosql.spi.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:160)
      at io.prestosql.spi.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:154)
      at io.prestosql.spi.block.LazyBlock.isNull(LazyBlock.java:248)
      at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
      at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
      at io.prestosql.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115)
      at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:254)
      at io.prestosql.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:246)
      at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
      at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
      at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
      at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
      at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
      at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
      at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
      at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
      at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
      at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:278)
      at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
      at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
      at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
      at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
      at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
      at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
      at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
      at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
      at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
      at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
      at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
      at io.prestosql.operator.Driver.processInternal(Driver.java:379)
      at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
      at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
      at io.prestosql.operator.Driver.processFor(Driver.java:276)
      at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
      at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
      at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
      at io.prestosql.$gen.Presto_316____20200623_163219_1.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

       

      test case2

      when I import directly:
      INSERT OVERWRITE TABLE test_table SELECT * FROM other_table

      i found only a few segment files(about 3)

      select name from test_table // ok
      select name from test_table where name = 'joe' // ok
      select name from test_table where name='joe' AND age > 25;// ok
      select name from test_table where name='joe' AND age > 25 AND city ='shenzhen';// ok

      Attachments

        Activity

          People

            Unassigned Unassigned
            hugess1f XiaoWen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m