Drupal revision removal and database cleanup by brute force

Type: 
Code
Date: 
February 5, 2010

We had a website with >500,000 old revisions in their database. There is a nice module for knocking out small numbers of revisions called http://drupal.org/project/revision_deletion. But, to quickly remove a large set we wrote this script.

The general idea is to remove all revisions older than a certain date and then do the same for each CCK table. No matter what date limit you pick the current revision is not deleted.

WARNING: it makes assumptions about your table naming conventions -- use only on a backup

function remove_revisions() {
  
  // Set limit for revision age
  $days_ago = 30;
  
  // delete all revisions from node_revisions
  $limit = time() - (60*60*24*$days_ago);
  $sql = "DELETE FROM node_revisions WHERE timestamp < $limit AND vid NOT IN (SELECT vid FROM node)";
  db_query($sql);
  
  $rows = db_affected_rows();
  drupal_set_message("$rows revisions deleted -- node_revisions");
  
  $table_types = array('content_field_%%', 'content_type_%%');
  
  // remove orphaned data from CCK content tables
  foreach($table_types as $table_type) {
    $result= db_query("SHOW TABLES LIKE '$table_type'");
    while ($row = db_fetch_array($result)) {
      $table = current($row);
      $sql = "DELETE FROM $table WHERE vid NOT IN (SELECT vid FROM node_revisions)";
      db_query($sql);
      $rows = db_affected_rows();
      drupal_set_message("$rows revisions deleted -- $table");
    }
  }
}