Drupal revision removal and database cleanup by brute force

We had a website with >500,000 old revisions in their database. There is a nice module for knocking out small numbers of revisions called http://drupal.org/project/revision_deletion. But, to quickly remove a large set we wrote this script.

The general idea is to remove all revisions older than a certain date and then do the same for each CCK table. No matter what date limit you pick the current revision is not deleted.

WARNING: it makes assumptions about your table naming conventions -- use only on a backup

<?php
// Note - this is for Drupal 6.x
function remove_revisions() {
 
 
// Set limit for revision age
 
$days_ago = 30;
 
 
// delete all revisions from node_revisions
 
$limit = time() - (60*60*24*$days_ago);
 
$sql = "DELETE FROM node_revisions WHERE timestamp < $limit AND vid NOT IN (SELECT vid FROM node)";
 
db_query($sql);
 
 
$rows = db_affected_rows();
 
drupal_set_message("$rows revisions deleted -- node_revisions");
 
 
$table_types = array('content_field_%%', 'content_type_%%');
 
 
// remove orphaned data from CCK content tables
 
foreach($table_types as $table_type) {
   
$result= db_query("SHOW TABLES LIKE '$table_type'");
    while (
$row = db_fetch_array($result)) {
     
$table = current($row);
     
$sql = "DELETE FROM $table WHERE vid NOT IN (SELECT vid FROM node_revisions)";
     
db_query($sql);
     
$rows = db_affected_rows();
     
drupal_set_message("$rows revisions deleted -- $table");
    }
  }
}
?>