The current system supporting KDE Identity is severely outdated and hindering the upgrade of several other systems.
Among things this blocks are an update to the 3.1 series of phpBB for the Forum, and the transfer of repository hosting support to Phabricator. It also blocks updates to our Mediawiki installations, and makes it too difficult to integrate Identity login support into pre-existing sites like the Dot or Bugzilla, leading to poor user experience.
We are also unable to implement proper multi-factor login support with the current implementation of Identity.
The Current System is Comprised Of:
- Underlying datastore is an OpenLDAP directory.
- Web frontend is a custom application (codenamed Solena) written in PHP using the Yii framework.
It should be noted that OpenLDAP imposes horrible limitations on us, which the web application works around to a certain degree (like memberOf not being searchable, so the application populates a custom groupMember attribute)
Additionally most applications pre-built support for LDAP is broken, as they copy details like email address to their internal database on first login and don't sync changes made later on. This makes email address updates for users a pain and basically always requires Sysadmin intervention.
Issues of the old system which currently constrain us
- Account removal is hard, requiring significant manual intervention and effort (several hours work in some instances)
- Account registration takes 30 seconds or more to complete, creating a poor user experience
- Groups don't scale effectively
- Anti-spam measures are too crude
What a replacement system needs to do
Provide a flexible way for us to have community members store profiles of themselves, where certain details about them have been validated.
- In particular we must always double-confirm any email address which is provided by a user
- We should be able to add custom profile fields, or groups of profile fields reasonably easily, and ideally at runtime.
- Some profile fields will be "multiple value" while others will be single value (you can have multiple email addresses but usually only have one name) so there should be a mechanism to mark this accordingly.
- Given that some people only want to login to the Forum and Bugzilla, just about everything apart from email address and password should be optional (those accessing Phabricator, the Wikis, any of our CMSes can be expected to provide more detail)
Shouldn't have a concept of usernames
- History tells us that users will forget them, or will have issue with them not being able to have their desired choice.
- People should login using their email address (either the primary or one of their secondary addresses) and password instead.
- Accounts should instead be uniquely identified across the new platform using something like a UUID (just a number is not a good idea for various reasons)
On the subject of login:
- People should have the option of authenticating with their Google/Github/etc account for those that like that. Not mandatory by any means though.
- 2FA should be supported - the most common ones being TOTP (aka. Google Authenticator or Authy) and Yubikey
- For those who enable 2FA we probably need some kind of recovery mechanism like pre-generated recovery codes or security questions (only a matter of time until someones phone dies or Yubikey is lost). Enabling several kinds of 2FA (both TOTP and Yubikey) should be something we may want to consider (but also possibly consider too hard basket)
- People who are members of certain groups should be considered to have "privileged" access and have to use 2FA when logging in (ie. it's mandatory)
Have groups which are highly scalable:
- Since the inception of KDE over 2,000 people have been granted a developer account, which means the 'developers' and 'disabled-developers' groups between them have more than 2,000 members
- We should also be able to delegate administration of the management of a group to others (we can't at the moment, at least not easily)
- Users should be able to specify what details of their profile they want to share with a group or it's admins
- This could be used for sprints, details like dietary requirements can be a profile field which is then shared with the sprint organiser - who is the group admin - after the sprint we just delete the group
- It is also needed for the KDE eV membership database
- We should probably make this something which has a default defined at the group level as user customisation might cause too many issues here.
- Having the ability to give a reasonable description and embed information from elsewhere (group admins will only be trusted community members so letting them enter raw html is probably okay, at least for v1) would likely be all that's needed to provide the rest of the functionality which we lost when sprints.kde.org was shutdown
Have support for means to be integrated with other systems
- The best way of doing this is probably using something which is already defined to a certain extent like OAuth 2
- Sites which are "connecting" with the new system should be able to specify what information (profile fields) they'd like about a user. If the user hasn't provided this, they should be prompted (required) to provide it.
- Even if a user decides to "disconnect" a site, a record of them having authorised that connection in the past should be retained as part of their profile, as we need it to remove someone's account (dropping the site to the bottom of the list with an option to "Reconnect" it should suffice for this purpose)
- The new system should be able to push profile change details out to sites users have connected to their profiles, so email address and name changes are reflected everywhere in a reasonably short space of time
- Ideally, it should also be possible to auto log-out a given user from a connected system.
Provide a unified anti-spam mechanism for all sites which use it:
- All content which users try to post to websites should be submitted by those sites to the new system for examination
- It should then run various checks on it (URLs checked against domain blacklists like URIBL, regular expression checks as defined by an admin, similar content already posted in the past hour, etc) and respond to the site accordingly as to whether it thinks it is spam or not
- We should have the ability to apply modifiers based on group membership (so established contributors aren't hit by more aggressive rules which apply to those who have just registered an account)
- If submitted content is bad enough, or if the user hits enough flagged submissions in a given timeframe it should be able to automatically suspend their account (requiring an admin to unlock them is okay here, as spammers can stay locked for eternity, and established contributors can come to us and explain why they their account got hacked)
Development Concerns
Over the past few years, there have been some loose attempts at building out an OAuth2 based solution, other than personal time to do so, there are a few things that should be considered:
Staged Deployment:
Identity is one of the most important parts of the KDE infrastructure and as such, pressure to deploy it correctly is rather high. It should be possible to deploy across a few web applications and platforms without disrupting the existing system until full migration has been completed. While sync between them would be ideal, it is recognised that the existing system has such significant constraints that this may not be possible (allowing a one off "migration" where users are able to login with their Identity credentials into the new system would be a good start here)
Data Model and Storage:
As the notes below state, an RDBMS may be the most appropriate route for storing data. Some time should be taken to architect a fairly flexible data model. Having experimented with NoSQL databases for this use case, the same care should be taken to build out an appropriate data model - to this end, NoSQL might not offer any advantages and could just complicate things.
As GDPR compliance recommends data encryption, some form of record encryption, especially for fields which do not need to be searched would also be desirable if it is possible.
Some assets, such as avatars and ssh keys might be better suited to be stored in some object/document store rather than a database.
Other things to keep in mind
The community has historically be strong with both PHP and Python coding, and they're both stable and mature languages in terms of their bindings. Ruby and Node.js should be avoided as they break stuff too much.
In regards to underlying datastore, Postgres is not an option. We use MySQL for our RDBMS duties almost exclusively, although some apps have used MongoDB as well (and it might be a better fit here given what we are trying to accomplish with user profiles)
Care should also be given when handling data, as some users details are entirely in CJK/Cyrillic/etc character sets (the current system doesn't handle that well, it has some support for the Western European ranges, but Eastern European or CJK support is a bit patchy, especially in the username generator system)