A Bit About Spammers
A few weeks ago we re-enabled new account creation on the Open Hub. New Account creation had been shut of for a number of months as it had become impossible to stave off the number of spam accounts. We were desperate and needed a better solution. Back in February of this year, we talked about the problem we were having with spammers and the reality that we had built a Spam Farm, and some of the solutions we were considering. To the right is a chart of how the number of accounts grew each year. I would have loved to believe each of these accounts was someone with interest in Open Source Software (OSS) and the OSS community.
We have over 800,000 accounts. Back in February, when I wrote that post, we had 660,000. That’s 140,000 new accounts in essentially a month and a half. More on that in a moment.
We chose to use Twitter Digits as supplied by Digits.com as the technology to block spammers. This service has a high enough barrier to entry that it would probably not be easily defeated by spammy account bots and “marketing firms”. We initially thought that we could use the service after accounts were created to verify the user and therein not be too intrusive. That really didn’t work. Look at the chart below. See that spike at the right? The line is the percent of spam accounts to valid accounts. For every legitimate account, there were more than 7 spam accounts created. We shut that down quickly.
Hey, we should note that these are detected spam accounts. We choose to err on the side of letting an account that really looks spammy but isn’t violating any Terms Of Usage to remain unblocked. There are probably a lot of accounts that should be flagged as spam but aren’t.
The way Twitter Digits works is by presenting the new account applicant with a Digits dialog box prompting for an SMS capable number. Digits.com sends a four digit code to that SMS number. The applicant enters the code into a field we provide and we send that code to Digits, which tells us if the code is valid. At that point we get an ID that we can use to identify that SMS number holder.
Please note that Open Hub does not receive your phone number. The Open Hub can’t call you. We can’t get your phone number. Digits is stand alone service. You don’t need a Twitter account. You don’t need a Digits account. Twitter does not link your number to any other account information. (source: New York Times Blog).
All we get is a unique identifier, but we can’t trace that back to a phone number. Let’s be clear: The Open Hub does not get or store your phone number. (source: Me. I wrote the technical specifications and reviewed the code)
As mentioned, we have over 800,000 accounts. Just over 28,500 accounts have claimed OSS contributions. There could be that many accounts again that are interested in OSS, but haven’t claimed any contributions or haven’t made contributions, or are OSS consumers. That would give us 57,000 accounts that are “legitimate” accounts. Oh, heck. Let’s double that again just to be generous. Now we have almost 120,000 accounts. This still leaves 680,000 illegitimate accounts (spammers!) in our system, using our bandwidth, gumming up our analyses, and impacting the experience of those who seek to use the Open Hub for it’s intended purposes. That means for every one of those legitimate accounts, there are 5.6 accounts that are nothing but junk.
Let’s talk about what happened when we opened up new account creation on August 10, 2015. At that time we let users sign up and then verify their account with Digits. In the next few weeks we had 4,077 new accounts created. We identified that 3807, or 93.4%, of them were spam accounts.
We re-worked the flow of new account creation to ensure that the SMS verification with Digits is done first. Since then we’ve had 391 new accounts created. Of those; only 20, or 5.1%, have been identified as spam.
Twitter Digits is currently successfully controlling new account creation and blocking the vast majority of empty spam accounts. But that does not solve the problem of all those accounts that are currently in the system. We need a way to get rid of all those worthless accounts. Which is why we ask for verification using Digits for all Open Hub account holders who are looking to make edits on the Open Hub.
We are looking at expanding our account verification options to include something like a GitHub OAuth verification.
We really hope that existing users will verify their account with these new verification techniques because we are working on the next part of the Spammer Purge, which is to request re-verification of every non-SMS or non-OAuth verified account. Accounts that are have not been SMS or OAuth verified will be deactivated after a few weeks of the request and then deleted after a few months. We understand that this may impact a few legitimate users, which is why we will wait months before deleting any account and why we will send email notifications at each step of the process.
This is only one part of the work that we’re doing at the Open Hub to improve the speed and reliability of our service. Other current work is to move off of our aging crawler infrastructure and update our Analytics Engine so that it is a stand alone application with it’s own database. This will leave the database under the Ohloh-UI responsible only for serving the website. These plans will enable the Open Hub team to invent new analyses and bring in other elements of the open source landscape to support enriched comparisons and conversations about OSS.
The Open Hub Team is grateful to you, our community, for your support and patience as we address these important infrastructure elements. We’re also grateful to Black Duck for providing all the funding for our team, all our crawlers, our web servers, the IT support, and all other support costs that make it possible to provide the Open Hub as a completely free service to the OSS community.