New Accounts are Back. But so are the spammers.

New account creation is once again available on the Open Hub.  You may have already been asked for your SMS capable device when logging back into the Open Hub. 

This marks a few milestones. Of course, re-enabling new account creation is a big one. We shut off new account creation back around April and have been responding to users who have requested accounts.  But we know that a modern community-oriented web site needs to let users signup when they wish.  We promised that after working something out, we would re-enable new account creation.  What we worked out is that Twitter Digits (discussed back in April in Update: New Account Creation, PURR, and Project Analyses) to validate users when signing up.  We have a few more ideas about working the sign-up flow and handling existing accounts.  More on that in a bit.

Another milestone is that we’ve been running the Open Hub on the new Ohloh-UI code upon which we have been working.  The new code has been in production for about a month and we’ve closed all critical and urgent issues that were discovered after we started handling traffic.  Most major issues are now closed and we’ve moved into only the low priority fixes.  When the database isn’t loaded with analyses, the new UI code runs about 20% faster than the old code base.  We’re really pleased with this work.  In addition to the slight performance lift, the new code base is just a pleasure to read and in which to work.  Plus, the entire team is now familiar with nearly every aspect of the new code. This will be a big help as we start building new features.

Back to new accounts for a moment.  We’ve gotten tweets and emails from some folks who will never use our site if we require them to enter a SMS number to verify their account.  We respect the difficulty this decision will cause some folks such as those who don’t have an SMS number or who are blocked from accessing the digits.com service. We don’t know if and how we can help them right now.  We are in agony that some folks who want to use our site legitimately choose not to or can not because of our decision to use Twitter Digits.

We really want everyone who wants to legitimately use our site have an account.  The tricky part is how to block those users who have only ill intent for the use of the site.  Of our 757,896 accounts, the vast majority of them are spammers.  Ah, but how to tell?  How to get rid of them?  How to keep more from signing up?  Oh, we should mention that within mere seconds of new account creation being enabled, they were back.

We should first even confirm that blocking these users is important.  We posit that it is.  Spammers can use 30% or more of our site resources.  This blocks legitimate users accessing our site via our web pages and our API.  Spammers sometimes create empty projects so that can have more links for the link farming. Then they direct their spam traffic to their spammy links to get someone to land on their “money site”.

So, we don’t want them to sign up.  Many web site don’t deal with this because they don’t offer publicly searchable user profiles.  We feel this is an important aspect of the Open Hub; that open source members can claim their work and get aggregated analysis that is publicly available.  The best advice in the development community is to provide multiple levels of verification and then additional regular checks.  Hence the addition of Twitter Digits to replace the easily defeatable captcha (even Google’s new improved reCaptcha).

The next part is cleaning up existing accounts. Any account that isn’t verified with via twitter digits will get an email requiring re-verification.  We’re going to have to send these out on regular basis — maybe annually. Any account that isn’t re-verified will be flagged as a spammer and, after some period of time, will be deleted.

We imagine this will also ruffle some feathers.  We’re sorry about that.  We’re also thinking about using external services such as StackExchange, GitHub and LinkedIn OAuth to verify an account for those users who can’t or don’t want to use Twitter Digits.  But there will have to be something else because you, the valued users of the Open Hub site, deserve to have us, the developers, focus on keeping your project analyses up to date, making it possible for your to get recognition of everything you do in the open source community, discover and discuss great up-and-coming open source projects, and have new ways of looking at open source software invented for you.

About Peter Degen-Portnoy

Mars-One Round 3 Candidate. Engineer on the Open Hub development team at Black Duck Software. Family man, athlete, inventor
  • Kaz Nishimura

    I saw several topics on your forum stating code locations could not be added. Is it related to the new code?

  • Quattro Bajeena

    The project metadata hasn’t been updated properly in a long time. Look at Chromium. It’s one of the projects in your frontpage and it says this: “Analyzed about 1 month ago. based on code collected 3 months ago.”. I go to the project page and the last commit was 31 minutes ago. The FAQ claims turnaround of 24 hours for the crawler. This is quite off. Like two orders of magnitude off.

    • Hi Quattro;

      We cache the front page for 24 hours, so a newly updated analysis won’t show up there for about a day. When we get a chance, we can bust the front page cache when any of the projects there have an updated status. I’ll put in a ticket for that.

      The database runs on SSD’s, on a 32 AMD processor server with 196 GB RAM and still struggles to keep 681,000 repositories up to date while serving all our web traffic. We’re working aggressively to address the architectural shortcomings that have developed as the Open Hub has grown tremendously over the past 5 years since Black Duck has obtained the site and promoted it.

      Thanks so much for your support. We’re working hard to make the site super responsive for all queries; this overhaul should take us most of this year.

  • Matthijs Kooijman

    I suspect that this is also why you need to verify your phone number to actually make changes to your account? I just tried to claim a few contributions, which then told me I had to verify my phone number first. Frankly, I was slightly pissed off about this, I wasn’t planning on sharing my phone number. Now, reading this, it makes more sense if it is just to ward off spammers.

    Perhaps it would be good to add a link to this post or some other page that explains the reasons to the message that tells you the account needs a phone number? That might help people understand, instead of feeling blackmailed into submitting a phone number? 🙂

  • Regular account re-verification may be annoying. Is there any reason to do this for all accounts? Also, could the verification be done by some form of web-of-trust system?

    • I agree; we hope to tone things down very quickly after an initial clean up. Maybe we should consider doing just the initial clean up and then put the re-verification on a manual switch in case we need it again in the future.

  • Maurice van der Pot

    I really hope you do add authentication through GitHub, because I would love to use the site to register my projects, but not if I have to provide my phone number.

    If you guys reach any decisions (even if it is to not support it), will you blog about it?

    • Hi Maurice;

      Yes! We’ve bandied about using GitHub, LinkedIn, and/or StackExchange as external authorizers. The issue with any external authorizer is whether it is sufficient to prevent spammers.

      One note about the phone verification; it seem to be very successful. The number of new accounts being created since we adjusted the flow and dropped down to reasonable amounts with only one or two spammers getting through. This is well within our ability to monitor and address.

      I also want to stress that we *do not* get your phone number. Digits.com provides the service, and you do not need either a twitter or digits.com account to use the service. All we get is an identifier that we cannot use in any other way.

    • I second that, please consider to add alternative options (GitHub would be an obvious choice) that do not involve phone numbers.

  • Are you aware that the third-party service you rely upon, digits/com, is considered to have a questionable reputation due to “malvertising”? My browser actually blocks browsing to it (related to Disconnect/me).

    • Hi Christoph;

      No, we have not heard of any such allegations. Digits.com is a service provided by Twitter. I would be interested in seeing any references you could provide discussing digits.com and malvertising.

      That said, we have seen that some tracking cookie blockers do block digits.com by default, but that seems more algorithmic than intentional.

      • Thanks for the reply. It seems the mentioned block was based on black lists maintained by Disconnect/me. Admittedly the block may be more related to Digits other service, the mobile graph, which is indeed very questionable regarding privacy concerns.

      • FWIW it seems Disconnect/me has changed ownership lately and is much less open since – and thus less relevant. You can probably ignore it safely.

  • Citing from the docs of Digits: “From your web server, over SSL, you can use this response to securely request the userID, phone number, and oAuth tokens of the Digits user. “. This seems contrary to your statement that you do not get our phone numbers. Maybe you do not actually query them, but technically you could do that any time with no way for us to prevent this.
    Can you clarify how you enforce technically that you indeed can not get the phone numbers?

    • I promise you, Christoph, as the team lead of the Open Hub team, that we do not get your phone number, only a unique identifier that we use to ensure that users do not make multiple accounts using the same SMS number.

  • Bla

    Why do you need a phone number when I want to add one of my projects? Especially a project where you already identified me as being a contributor. And no, I do not belief any claims, that the majority of spammers have made significant contributions to multiple projects.

    • Hi Bla;

      Thanks for expressing your concerns. I started writing a response to your concerns to try and express all the thinking and data that went into our choice to use Digits and express that the Open Hub *does not* get your phone number.

      it became, ah, lengthy, so I converted it to a blog post with pictures and everything: http://blog.openhub.net/2015/09/why-do-we-ask-for-your-phone-number/

      I apologize if we said anything that led you to believe that we feel the spam accounts have made any contributions to OSS projects. They just get in the way of our doing the analysis and reporting that is the purpose of the Open Hub.

      Please feel free to contact me at info@openhub.net if you’d like to continue a conversation privately.

  • Timothy Pearson

    Well, I can see how this new verification was so successful in cutting the number of spam users, since it has also blocked abnormally high numbers of legitimate users (myself included). While I have several landlines that could be used for verification (home, office, etc.) you chose a service locked to SMS only, which means I have to wait for SMS support to be added by our telephony provider before I can use OpenHub. This, combined with the constant update issues, is enough to make me both avoid the service and recommend that the other FOSS developers I know also avoid the service as it presents an incorrect view of many projects, with no easy way to let you know there is a problem.

    Interestingly I had to use Google phone verification to log in to this comment board. It worked just fine, no SMS required. Perhaps you should look into this type of service instead?

    I originally tried to log in to report the TDE project not updating in over 5 months(!), which presents a highly inaccurate view of this large project. It seems there is yet another stuck enlistment: https://www.openhub.net/p/tde/enlistments?page=8 , please update it, allow me to communicate on the site, or remove the TDE project entirely from your site.

    Thank you,

    • Hi Timothy,

      Thank you for taking the time to share your experiences with us.

      We are working to add alternative forms of user authentication because we’ve heard various reports of similar difficulties with Twitter Digits. Some folks have had more success after creating an account at digits.com and verifying that there is a successful pathway using their mobile number.

      I see that the reason for the delayed analysis on TDE is that one of the repositories failed to update a few months ago. I’ve rescheduled the FetchJob and will monitor it.

      • Timothy Pearson

        Thank you for taking the time to respond to my somewhat frustrated post. I really like the idea behind OpenHub, and am more frustrated at the spammers that are ruining it for everyone than anything else.

        What happens to my account while I am locked out? Will it sit there until a non-SMS method of authentication is available, or will I end up having to completely recreate it at some point in the future?

        Would it be possible to just automatically authenticate accounts that are created with the same Email address as FOSS developers with thousands of existing commits (such as myself)? I would think the spammers would have a hard time establishing that kind of history, although it might end up being too much strain on your analytics.

        • You raise a good point — that we automatically validate users who have claimed commits. There are a small number of spam accounts that do so, but they are reasonably easy to track down.

          Your account will sit tight while we wire up an alternative OAuth mechanism. The first, which is being tested now, is to use your GitHub login to validate your Open Hub account. That should solve a major pain point for lots of folks.

          After this is in place, we will being a long, multi-month process of notifying accounts that have not yet been validated and asking for validation, erring on the side of giving folks plenty of time to let us know if they want to keep their account.

          I’d love to really know that our users are real people who intend upon using the site legitimately while walking the fine line of not being too intrusive or draconian in our re-validation methods. Clearly, we aren’t yet at this point but are continuing to work on it.

          Thanks again for your time and thoughts.

          • Timothy Pearson

            The TDE project is still not updating. Also, since I am locked out I am not able to update the enlistments as I normally would, therefore the project data being presented is growing more and more out of date and incorrect.

            Can you add this new enlistment to that project, and check into why the project status is not updating?
            http://scm.trinitydesktop.org/scm/git/kooldock

            Thanks!

          • Hi Timothy;

            The last update failed a few days ago due to an internal error that has since been fixed. I’ve rescheduled the Analyze Job.

            I went to the referred new enlistment site, but continuously get an error accessing the site. isup.me is currently reporting that the site isn’t available.

          • Timothy Pearson

            I see that the TDE project has finally updated; thank you for nudging that.

            The TDE servers were down over the weekend for maintenance. The provided URL should be functional again; can you please add it to the enlistments?

            Thanks!