Stop Junk Mail archive

One of my resolutions for 2020 was to relaunch my anti-junk mail website. I'm pretty sure that was on the list with resolutions for 2019 as well. Unfortunately, I haven't found the time to work on the project. As it doesn't look like I'll be enjoying much spare time this year either I decided to create a minimal, read-only copy of the site.

wget to the rescue

The first step was to clone the Drupal install and create a minimal version of the website. There was a time when I was working pretty much full time on the site. It grew quite large, which is why it became too much to maintain. I didn't want to keep all the content. Much of it had become dated, and junk mail historians can always find the site on the WayBack Machine (there's also a copy on the UK Web Archive).

Stripping the site down to just the home page and the guide to stamping out junk mail was fairly straight-forward. I just needed to make sure to remove all links to other pages, such as the old news pages – one link to the section would include the entire news section in the copy.

I used wget to create a read-only copy of the site. As the site was on my localhost with nothing being downloaded from external domains the following command worked just fine:

$ wget -mpck --html-extension http://localhost/stopjunkmail.org.uk

I probably didn't need the --html-extension option. The downside of using the option is that all the URLs now have the extension .html. That's bad for SEO and I'm pretty sure a simple directive in a .htaccess file would have prevented the need for the extension. Frankly, though, I don't care about SEO. The website is a minimal archive, and I'm not interested in visitor numbers. In fact, I'm keen to get fewer visitors – it helps keep my hosting costs low.

To handle the inevitable flood of error 404s I simply added a 404 redirect to the virtual host:

ErrorDocument 404 /404.html

Any non-existing page redirects to /404.html. On the page I apologise profusely for the error.

Switching hosts

To smoothly switch things over I added the domain to a different VPS (the one that serves this website). I did run into one error: CSS and JavaScript files weren't loading. The files were there and they were returning the status code 200 ("OK") but the site looked like stallman.org. The issue was the Content Security Policy. This threw me a bit – the policy is very strict but as everything is downloaded locally I didn't think it would cause any issues. I've commented out the policy for now and will dive into the documentation the next time I can't get to sleep.

I now need to keep an eye on the VPS. It's a cheap 'n cheerful Hetzner VPS with just one CPU core and 2GB of RAM. It would be great if it can cope as the cost is less than €3 per month, but I'm expecting it will need more juice.

The future of Stop Junk Mail

The ultimate goal is to replace the archived website with a new site. I might use Pelican, as I'm fairly familiar with it now. Honestly though, this feels like a retirement project – and I'm still quite a way off from my retirement.