Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-6702

Ambari detects RPM DB corruption

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.0
    • 1.7.0
    • ambari-web
    • None

    Description

      Users have described scenarios in which the RPM DB becomes corrupt, usually after stoping all services, rebooting all hosts (including the server), and restarting all services.

      http://hortonworks.com/community/forums/topic/cant-restart-cluster-ambari-not-proving-useful/
      http://hortonworks.com/community/forums/topic/ambari-corrupts-rpmdb/

      • Problem: yum commands fail to run because the RPM database is corrupt.
      • Symptom: The ambari agent log will show something of the sort,
        INFO 2014-04-24 05:30:11,051 Controller.py:186 - RegistrationCommand received - repeat agent registration
        ERROR 2014-04-24 05:33:22,669 PackagesAnalyzer.py:43 - Task timed out and will be killed
        INFO 2014-04-24 05:35:12,815 HostCheckReportFileHandler.py:43 - Host check report at /var/lib/ambari-agent/data/hostcheck.result
        INFO 2014-04-24 05:35:12,845 HostCheckReportFileHandler.py:104 - Removing old host check file at /var/lib/ambari-agent/data/hostcheck.result
        INFO 2014-04-24 05:35:12,845 HostCheckReportFileHandler.py:109 - Creating host check file at /var/lib/ambari-agent/data/hostcheck.result
        root@xhadoopm32p rpm# rpm -qa
        rpmdb: Thread/process 30282/xx failed: Thread died in Berkeley DB library
        error: db3 error(30974) from dbenv>failchk: DB_RUNRECOVERY: Fatal error, run database recovery
        error: cannot open Packages index using db3 - (-30974)
        error: cannot open Packages database in /var/lib/rpm
        rpmdb: Thread/process 30282/xx failed: Thread died in Berkeley DB library
        error: db3 error(30974) from dbenv>failchk: DB_RUNRECOVERY: Fatal error, run database recovery
        error: cannot open Packages database in /var/lib/rpm
        
      • Fix:
        Run the following
        rm /var/lib/rpm/__db*
        yum --rebuilddb
        

      This appears to be an underlying issue with yum (either a lock is not released, or multiple yum commands are ran in parallel), so to attempt to decrease its frequency, the agent's PackagesAnalyzer will increase the time it waits for the "yum list available" and "yum list installed" from 10 secs to 20 secs.

      Attachments

        Issue Links

          Activity

            People

              afernandez Alejandro Fernandez
              afernandez Alejandro Fernandez
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: