Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1746

Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • ManifoldCF 2.25
    • Web connector
    • None
    • Using ManifoldCF 2.24 with PostgreSQL 12.14 as the database. 

    Description

      Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem.

      Therefore, in addition to the current configuration parameter 'org.apache.manifoldcf.db.postgres.analyze.<tablename>', it is considered necessary to execute the 'ANALYZE' even in the following situations.
      1. When the number of records in the table exceeds the number required for creating a execution plan after the job starts.
      2. When the crawling performance slows down. For example, if the processing rate of documents drops below a specified threshold.

      Attachments

        1. DBInterfacePostgreSQL.java.patch
          4 kB
          Mingchun Zhao

        Activity

          People

            kwright@metacarta.com Karl Wright
            mingchun.zhao Mingchun Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: