Rooting out SEO spam from a Drupal site
The Problem.
There's a particularly frustrating type of spam referred to as SEO spam. It's not the obvious porn or phishing scam that'd be easier to detect. It's often subtle posts that almost look legit but include spam links to another site to increase their google rank... For years I've been keeping a close eye on comments/posts from new users because there were posts like
"THANKS! This is the best but my friend really likes www.site.com"
Over that time I've implemented the CAPTCHA module (basic math formula to decrease spambots) and then the ReCAPTCHA module (more sophisticated, uses recaptcha service) and now have moved to Mollom. Mollom works pretty well at getting spam posts but I found that a good chunk of my spam seemed to be human as they were answering the challenge questions correctly. Things were at a tolerable level of annoyance (given this site generates a good amount of traffic) until the past few months.
Notice the huge spike of spam (that's roughly October 2010). That's when I realized something was wrong and I needed to look into things further. Over the coming weeks / months I discovered a massive infestation of user profile spam that I suspect is prevalent on many social network style sites that allow users to add signatures & profile information. For my site I found users were creating dummy profiles and filling those profiles with links to their spam sites. Everything from the "bio" field to signatures, every field that was a text area was filled with links. It was an incredibly frustrating realization to see how many profiles were affected.
My user statistics clearly show a huge increase in the number of total site users (a nearly 15% increase) from Sept 2010 to Jan 2010 when I realized what was happening. Now whether these users were total dummys or they were blocked by Mollom when they attempted to post I can't say for sure. I can say though that I had thousands of infested user profiles that had to get cleaned out.
The Solution.
All of the spam profiles share a couple things in common - they include links in the profile fields, they are relatively new users, they don't participate the way real users do (ie post valid comments, accrue points, etc). I needed to write a view that selected users with "href" in their user profile bio field and excluded valid users (based on old account creation and user points). This has proven to be a fantastic method of detection - I was able to find & block over 4,000 users.
Now that I had a way to select users with spam profiles I needed a way to block them. That's where "Views Bulk Operations" comes in. It enables you to easily edit any view, changing the row style to "Bulk Operations" which allows the module to operate on the results of that view. So since my view selects users with spam profiles VBO was able to instantly block those users. Amazing. I thought I was going to have to weed through the users list by hand, hours and hours of tedious work but it turned out to be pretty quick thanks to VBO. Once that was out of the way I realized I could use VBO to improve the user experience too. I created a view that selected hundreds of longtime active users then promoted them to a new user role (which bypassed spam measures). This ensures the active user community aren't being bothered with Mollom's CAPTCHA measures every time they post to the site. Again VBO worked flawlessly and I was able to promote hundreds of users. Fantastic.
The Future.
Moving forward I'd like to automate much more of this than I'm currently doing. I want user accounts to automatically upgrade themselves based on site participation. Since I already use the Voting API extensively I'd like to look into integrating it with Rules and Actions to create tiers of users (and maybe tie that back to badges). For example - Joe creates an account and all he can do is vote. Once Joe votes on X number of items he's upgraded to an account that can comment. Once Joe's comment's are flagged as "useful" by other users X number of times then he's upgraded to another account level. I can see this tying back into the upload quotas too - inital accounts can't upload, others can upload 1MB, others can upload 10MB, etc, etc.
The Tools.
- Views Module - and a view that selects the questionable accounts. In my case I filtered by user creation date, activity (via voting api) & whether there were links in the various user profile/signature fields.
- Views Bulk Operations Module makes it easy to operate on the results of your View
- Mollom is what I'm using to moderate submitted content but obviously there are other methods (like recaptcha) too
- Acquia Network Subscription is totally optional - I use it for their awesome hosted SOLR search but it also provides site monitoring data that I referenced in this post
- Drupal. Of course none of this would be possible without the Drupal CMS which is the backbone of my community site.