Replacement of KDE Identity System
Open, Needs TriagePublic

Description

The current system supporting KDE Identity is severely outdated and hindering the upgrade of several other systems.

Among things this blocks are an update to the 3.1 series of phpBB for the Forum, and the transfer of repository hosting support to Phabricator. It also blocks updates to our Mediawiki installations, and makes it too difficult to integrate Identity login support into pre-existing sites like the Dot or Bugzilla, leading to poor user experience.

We are also unable to implement proper multi-factor login support with the current implementation of Identity.

The Current System is Comprised Of:

  • Underlying datastore is an OpenLDAP directory.
  • Web frontend is a custom application (codenamed Solena) written in PHP using the Yii framework.

It should be noted that OpenLDAP imposes horrible limitations on us, which the web application works around to a certain degree (like memberOf not being searchable, so the application populates a custom groupMember attribute)

Additionally most applications pre-built support for LDAP is broken, as they copy details like email address to their internal database on first login and don't sync changes made later on. This makes email address updates for users a pain and basically always requires Sysadmin intervention.

Issues of the old system which currently constrain us

  • Account removal is hard, requiring significant manual intervention and effort (several hours work in some instances)
  • Account registration takes 30 seconds or more to complete, creating a poor user experience
  • Groups don't scale effectively
  • Anti-spam measures are too crude

What a replacement system needs to do

Provide a flexible way for us to have community members store profiles of themselves, where certain details about them have been validated.

  • In particular we must always double-confirm any email address which is provided by a user
  • We should be able to add custom profile fields, or groups of profile fields reasonably easily, and ideally at runtime.
  • Some profile fields will be "multiple value" while others will be single value (you can have multiple email addresses but usually only have one name) so there should be a mechanism to mark this accordingly.
  • Given that some people only want to login to the Forum and Bugzilla, just about everything apart from email address and password should be optional (those accessing Phabricator, the Wikis, any of our CMSes can be expected to provide more detail)

Shouldn't have a concept of usernames

  • History tells us that users will forget them, or will have issue with them not being able to have their desired choice.
  • People should login using their email address (either the primary or one of their secondary addresses) and password instead.
  • Accounts should instead be uniquely identified across the new platform using something like a UUID (just a number is not a good idea for various reasons)

On the subject of login:

  • People should have the option of authenticating with their Google/Github/etc account for those that like that. Not mandatory by any means though.
  • 2FA should be supported - the most common ones being TOTP (aka. Google Authenticator or Authy) and Yubikey
  • For those who enable 2FA we probably need some kind of recovery mechanism like pre-generated recovery codes or security questions (only a matter of time until someones phone dies or Yubikey is lost). Enabling several kinds of 2FA (both TOTP and Yubikey) should be something we may want to consider (but also possibly consider too hard basket)
  • People who are members of certain groups should be considered to have "privileged" access and have to use 2FA when logging in (ie. it's mandatory)

Have groups which are highly scalable:

  • Since the inception of KDE over 2,000 people have been granted a developer account, which means the 'developers' and 'disabled-developers' groups between them have more than 2,000 members
  • We should also be able to delegate administration of the management of a group to others (we can't at the moment, at least not easily)
  • Users should be able to specify what details of their profile they want to share with a group or it's admins
    • This could be used for sprints, details like dietary requirements can be a profile field which is then shared with the sprint organiser - who is the group admin - after the sprint we just delete the group
    • It is also needed for the KDE eV membership database
    • We should probably make this something which has a default defined at the group level as user customisation might cause too many issues here.
  • Having the ability to give a reasonable description and embed information from elsewhere (group admins will only be trusted community members so letting them enter raw html is probably okay, at least for v1) would likely be all that's needed to provide the rest of the functionality which we lost when sprints.kde.org was shutdown

Have support for means to be integrated with other systems

  • The best way of doing this is probably using something which is already defined to a certain extent like OAuth 2
  • Sites which are "connecting" with the new system should be able to specify what information (profile fields) they'd like about a user. If the user hasn't provided this, they should be prompted (required) to provide it.
  • Even if a user decides to "disconnect" a site, a record of them having authorised that connection in the past should be retained as part of their profile, as we need it to remove someone's account (dropping the site to the bottom of the list with an option to "Reconnect" it should suffice for this purpose)
  • The new system should be able to push profile change details out to sites users have connected to their profiles, so email address and name changes are reflected everywhere in a reasonably short space of time
  • Ideally, it should also be possible to auto log-out a given user from a connected system.

Provide a unified anti-spam mechanism for all sites which use it:

  • All content which users try to post to websites should be submitted by those sites to the new system for examination
  • It should then run various checks on it (URLs checked against domain blacklists like URIBL, regular expression checks as defined by an admin, similar content already posted in the past hour, etc) and respond to the site accordingly as to whether it thinks it is spam or not
  • We should have the ability to apply modifiers based on group membership (so established contributors aren't hit by more aggressive rules which apply to those who have just registered an account)
  • If submitted content is bad enough, or if the user hits enough flagged submissions in a given timeframe it should be able to automatically suspend their account (requiring an admin to unlock them is okay here, as spammers can stay locked for eternity, and established contributors can come to us and explain why they their account got hacked)

Development Concerns

Over the past few years, there have been some loose attempts at building out an OAuth2 based solution, other than personal time to do so, there are a few things that should be considered:

Staged Deployment:

Identity is one of the most important parts of the KDE infrastructure and as such, pressure to deploy it correctly is rather high. It should be possible to deploy across a few web applications and platforms without disrupting the existing system until full migration has been completed. While sync between them would be ideal, it is recognised that the existing system has such significant constraints that this may not be possible (allowing a one off "migration" where users are able to login with their Identity credentials into the new system would be a good start here)

Data Model and Storage:

As the notes below state, an RDBMS may be the most appropriate route for storing data. Some time should be taken to architect a fairly flexible data model. Having experimented with NoSQL databases for this use case, the same care should be taken to build out an appropriate data model - to this end, NoSQL might not offer any advantages and could just complicate things.

As GDPR compliance recommends data encryption, some form of record encryption, especially for fields which do not need to be searched would also be desirable if it is possible.

Some assets, such as avatars and ssh keys might be better suited to be stored in some object/document store rather than a database.

Other things to keep in mind

The community has historically be strong with both PHP and Python coding, and they're both stable and mature languages in terms of their bindings. Ruby and Node.js should be avoided as they break stuff too much.

In regards to underlying datastore, Postgres is not an option. We use MySQL for our RDBMS duties almost exclusively, although some apps have used MongoDB as well (and it might be a better fit here given what we are trying to accomplish with user profiles)

Care should also be given when handling data, as some users details are entirely in CJK/Cyrillic/etc character sets (the current system doesn't handle that well, it has some support for the Western European ranges, but Eastern European or CJK support is a bit patchy, especially in the username generator system)

Restricted Application added a subscriber: sysadmin. · View Herald TranscriptApr 6 2018, 10:18 AM
lydia added a subscriber: lydia.Apr 6 2018, 10:37 AM
joshua added a subscriber: joshua.Apr 6 2018, 1:14 PM
ngraham added a subscriber: ngraham.Apr 6 2018, 1:23 PM
kfunk added a subscriber: kfunk.Apr 6 2018, 4:57 PM

Where does one go to learn about how identity.kde.org is currently developed? I did a quick search on the Wiki and didn't find much.

sharvey added a subscriber: sharvey.Apr 7 2018, 1:14 AM

I speak reasonably fluent Python. Got a stack of O'Reilly books next to my chair for light reading. If I can help, I certainly will.

You can find the current sources for Solena at websites/identity-kde-org on KDE Git.
https://phabricator.kde.org/source/websites-identity-kde-org/

Please note that i'd advise against using it as a reference for how a replacement system should be built as many of the requirements outlined above are radically different from what the existing system enforces/provides.

kcoyle added a subscriber: kcoyle.Apr 14 2018, 5:16 PM

Keeping usernames is fine as long as they are not used as the primary identifier for a user.

Usernames provide some useful flexibility:

  • Users can log in with either username or email would make things easier for them.
  • Referencing user profiles via url (eg. identity.kde.org/rgb-one)

Please note that as developers have to have a Subversion Username assigned to them, i'd rather not have a concept of usernames on the system (as then you either have two usernames for those developers with Subversion access which will generate confusion there, or have to impose the username-is-based-on-real-name requirement that has caused so many issues on the usernames the user can select)

Using email addresses should be perfectly fine, and public profiles aren't something we'll be pursuing as we won't be collecting the sort of information people might find useful (like Phabricator's activity feed) on the Identity platform.

helio added a subscriber: helio.May 4 2018, 8:39 AM
helio added a comment.May 4 2018, 8:44 AM

Ok, so we need a proper industry solution and opensource.

As i discussed yesterday on akademy channel, we should have the ability of allow external accounts registers, like Google, Facebook, Linkedin, Github.
So, the solution goes through OopenID and Oauth2.0

I really think that we should take a look on this, https://www.keycloak.org/

It's fully open source, we can use whatever we want.
The only setback that i saw is the usage of jboss, with demand some resources.
But still, fully open source, and met all our requirements, including creating our own KDE login if someone decide not go bt any social media login.

It provides everything we need AND the federated login from all structures. An example on how easy would be to integrate the Google login button:
https://github.com/metadatacenter/cedar-docs/wiki/Configuring-Keycloak-to-use-Google-Identity-Provider

Feasible or too much ?

Please note that I mentioned the possibility of allowing external providers to be supported through this (under On the subject of login) however I don't consider it mandatory at all as there are elements of the community for whom all of those external services will be completely unacceptable.

In regards to Keycloak specifically, we considered it in the past but dismissed it as a potential option as it would require customisation to fit our needs, and that would still not eliminate the need for usernames (which is a hard requirement as previously stated).

We would also be stuck with some limitations around how data could be managed (any such changes would require substantial invasive changes). To my understanding it also has a requirement that a first and last name be provided, which many users (who only want to use the Forum and maybe Bugzilla) don't want to do.

Usage of an "off the shelf" solution would also prevent us from implementing solutions to some of the issues we currently face around data propagation (see Have support for means to be integrated with other systems) which cause quite a few problems for us.

Systems like Keycloak are intended for usage in a corporate context, where details like names and email addresses rarely (if ever) change, which is in stark contrast to our open source context (where email address can and do change on a fairly regular basis)

justJanne added a subscriber: justJanne.EditedMay 14 2018, 2:22 PM

@helio You suggested OpenID, which is significantly outdated and unsupported. There is a replacement, with much improved security, called OpenID Connect (it’s based on OAuth2 and has nothing but the name in common with OpenID), so I’d suggest that instead.

Luckily, Keycloak has native support for it.

As previously mentioned, Keycloak is not suitable for our purposes due to the limited customisability it offers in regards to custom fields, along with its hard dependency on usernames in addition to the other issues noted in my earlier comments above.

bport added a subscriber: bport.Aug 25 2018, 8:31 PM

Some notes after Akademy. I'll expand on these as we move forward:

Application Components

Persistent Layer:

  • Legacy (LDAP)
  • In-Live (PostgreSQL)

Microservices:

  • Identity REST API & OAuth2 Provider
  • Rest Client / LDAP Server
  • Front-End / REST Client

LDAP Proxy

A large segment of the KDE infrastructure is tightly coupled to identity. In particular, they are integrated to the LDAP interface itself.

To minimise any potential impact of replacing identity throughout our infrastructure, we will be building a lightweight LDAP server that uses a REST api as its backend / data store.

Phased Deployment

  • Phase 1: Deploy new API and LDAP Proxy
  • Phase 2: Migrate Low-Priority Applicatations
  • Phase 3: Deploy new Identity Interface and Oauth2 Provider
  • Phase 4: Migrate remaining active users
  • Phase 5: Wait for it … Wait for it … Kill OpenLDAP

@tcanabrava - Ben mentioned that you were interested in helping out with this. Are you happy for me to add you to this task?

Yes, put me up

Em sáb, 1 de set de 2018 às 13:01, Kenny Coyle <noreply@phabricator.kde.org>
escreveu:

kcoyle added a subscriber: tcanabrava.
kcoyle added a comment.

@tcanabrava https://phabricator.kde.org/p/tcanabrava/ - Ben mentioned
that you were interested in helping out with this. Are you happy for me to
add you to this task?

*TASK DETAIL*
https://phabricator.kde.org/T8449

*To: *kcoyle
*Cc: *tcanabrava, bport, justJanne, helio, kcoyle, alexeymin, sharvey,
jackyalcine, kfunk, richardbowen, ahmedabouelhamayed, ngraham, joshua,
sagarhani, aspotashev, lydia, bcooksley, sysadmin, Anachronox, skadinna,
ochurlaud, kvermette, scarlettclark

helio added a comment.Sep 9 2018, 7:53 AM

I want be involved as well, in a less capacity, but willing to do some work as well.

[]'s

kcoyle added a comment.Sep 9 2018, 8:30 AM

Awesome.

So it looks like we have a bit of a team together now.

Who would like to take point on building out the initial REST API?

The high-level requirements for this are:

  • Simple REST API to expose all user and group information that currently exists in LDAP.
  • Superuser credentials that can access and modify all records.
  • User credentials per account that can update a users own information.
  • RDBMS backend with simple schema.

We can go into more detail and build out tasks once we have a simple API in place.

I'm trying to set up the current system (Solena) in a development environment, but I currently get an error when registering a user:

2019/07/16 20:05:11 [error] [system.db.CDbCommand] CDbCommand::execute() failed: SQLSTATE[HY000]: General error: 1364 Field 'uid' doesn't have a default value. The SQL statement executed was: INSERT INTO `tokens` (`type`, `givenName`, `sn`, `mail`, `token`) VALUES (:yp0, :yp1, :yp2, :yp3, :yp4).
2019/07/16 20:05:11 [error] [exception.CDbException] exception 'CDbException' with message 'CDbCommand failed to execute the SQL statement: SQLSTATE[HY000]: General error: 1364 Field 'uid' doesn't have a default value' in /var/www/html/framework/db/CDbCommand.php:354
Stack trace:
#0 /var/www/html/framework/db/ar/CActiveRecord.php(1014): CDbCommand->execute()
#1 /var/www/html/framework/db/ar/CActiveRecord.php(787): CActiveRecord->insert(NULL)
#2 /var/www/html/protected/controllers/RegistrationController.php(109): CActiveRecord->save()
#3 /var/www/html/framework/web/actions/CInlineAction.php(50): RegistrationController->actionEnterDetails()
#4 /var/www/html/framework/web/CController.php(309): CInlineAction->runWithParams(Array)
#5 /var/www/html/framework/web/filters/CFilterChain.php(134): CController->runAction(Object(CInlineAction))
#6 /var/www/html/framework/web/filters/CFilter.php(41): CFilterChain->run()
#7 /var/www/html/framework/web/CController.php(1146): CFilter->filter(Object(CFilterChain))
#8 /var/www/html/framework/web/filters/CInlineFilter.php(59): CController->filterAccessControl(Object(CFilterChain))
#9 /var/www/html/framework/web/filters/CFilterChain.php(131): CInlineFilter->filter(Object(CFilterChain))
#10 /var/www/html/framework/web/CController.php(292): CFilterChain->run()
#11 /var/www/html/framework/web/CController.php(266): CController->runActionWithFilters(Object(CInlineAction), Array)
#12 /var/www/html/framework/web/CWebApplication.php(276): CController->run('enterDetails')
#13 /var/www/html/framework/web/CWebApplication.php(135): CWebApplication->runController('registration/en...')
#14 /var/www/html/framework/base/CApplication.php(162): CWebApplication->processRequest()
#15 /var/www/html/index.php(13): CApplication->run()
#16 {main}
REQUEST_URI=/index.php?r=registration/enterDetails
HTTP_REFERER=http://localhost:9000/index.php?r=registration/enterDetails

The SQL error is clear: the uid column is marked as NOT NULL (according to protected/data/database-schema.sql), but its value is not given in the INSERT INTO request.

However after reading the PHP source code didn't find no place where uid could be set. May be the sources in Git repo are not in sync with the sources used in production environment for https://identity.kde.org ?

The registration form sends to following params in POST request, uid is not there:

YII_CSRF_TOKEN: c51fbacd4b5c1a2a2ffedbf1a39d8e2fb3fe7616
Token[givenName]: 123
Token[sn]: 213
Token[mail]: 123@g312.com
g-recaptcha-response: 03AOLTBLS32zwVbDyS6-gdW1eeCm5e4xKoMJg-fVT45J6nQWEbsXMYEGm9mezFDZs79OLvor92xG8zIy47UYyx_YNQhnpRuz0e7yGw_tkyG2j7PCigEkrCodbCK2rUndb51eDYI6CJYysMkcWqA0VK7_3MmT_AWK2LfDRLmI8iQwyCHSUguTRpbcSRYXWiTeWJ-GRc_EHh2CZfaaNyf0OAzjXAQVAFNJypFdJqtjlIuOIxH_cnlYbBTT0ziWCMu8EURJaH49-c8lULRVc8rYzncq2xQVwz5BMW8LdR4Gtf0ZoNjEndd1jL-40yk9dMZKQzjO8Ac1d4mUgV
yt0: Register Account

The tokens table is multi-use and is used for doing several different things (registration, adding mutliple email addresses, etc) so the lack of uid in that POST submission is normal.

The correct schema should be:

CREATE TABLE `tokens` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `token` varchar(50) NOT NULL,
  `type` int(1) NOT NULL,
  `mail` varchar(255) NOT NULL DEFAULT '',
  `uid` varchar(255) NOT NULL DEFAULT '',
  `givenName` varchar(255) NOT NULL DEFAULT '',
  `sn` varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=75848 DEFAULT CHARSET=utf8

Additionally most applications pre-built support for LDAP is broken, as they copy details like email address to their internal database on first login and don't sync changes made later on. This makes email address updates for users a pain and basically always requires Sysadmin intervention.

@bcooksley , how should this be implemented on top of OAuth2?

Let's say

  1. I use identity.kde.org's OAuth2 Provider to create a UserBase wiki account,
  2. start "watching" some wiki pages,
  3. then go to identity.kde.org and change the default email address,
  4. someone edits the watched page, so that UserBase wiki wants to notify me over email about this change. How would it know if my old email address is still valid or it needs to request an update from OAuth2?

In regards to underlying datastore, Postgres is not an option.

Why not PostgreSQL? Because Sysadmin team has little experience with it?

PostgreSQL was disqualified because the majority of KDE servers (all apart from the machine running Gitlab and Mirrorbrain) run using MySQL, and having two database servers running is an inefficient use of system resources. In any event, implementing something like this should not require the use of anything other than simple database queries for which an ORM or equivalent should be able to handle (giving us flexibility to change in the future).

With regards to how details should be updated, there are two possibilities:

  1. Have sites update local profiles each time a user logs in. While this means there would be some propagation delay (next time users goes through the login flow for that site) it is very simple to implement.
  1. Have the OAuth2 provider send notifications to sites a user has logged into when details of their profile (including group membership) change. This would be required for integration with services such as Gitlab because changes in membership to some groups (such as Developers or Sysadmin) is extremely security sensitive and thus an immediate sync is necessary.

Due to our requirements, we'd therefore want to have Option 2, at least for some sites (at which point we may as well just implement them for all sites, as it also has a better user experience).

In terms of how Option 2 is implemented, the way I would do this would depend on the application/site in question.

For something like Gitlab, it has a full featured API which allows those with Administrator access to update account details (without needing confirmation in the case of email addresses) so working with that is the path of least resistance and requires us to do less custom code on the Gitlab side (improving the odds that things won't break in future updates, a key concern of a system like this).

For other software (such as Mediawiki, or Drupal) they lack this functionality and don't have such an API (to my knowledge at least, Mediawiki might have something, and chances are there is a Drupal module out there which provides some kind of nice API). For these applications, having our own mini protocol (where Identity sends a POST notification to an endpoint to trigger custom code to do the update on the site side) should be sufficient for our purposes.

Because OAuth2 will need custom code on the application/site side anyway, this shouldn't be an issue (while the flow may be generic, the underlying APIs you need to talk to for things like user profiles always differ, so you invariably need some kind of custom code on the consumer side even if it is relatively straight forward - hence the preference for having Identity talk to site APIs as quite a bit of software has scaffodling you can plug into for writing OAuth2 consumer plugins)

It's for the above reasons we've eliminated all of the off the shelf / existing solutions for OAuth2 providers we've evaluated so far - because none of them attempt to do profile syncing.

This comment was removed by nalvarez.
ognarb added a subscriber: ognarb.Fri, Aug 30, 4:41 PM

I looked at the blender id system as a replacement. Under the hood it's using django and the django authentication plugin

Functionalities

  • Role can be created a runtime
  • There is already a hook for role change (a python function is called each time the role of a user are modified)
  • OAuth2
  • Basic moderation functionalities (admin and moderation)
  • We can put a lot of optional field with the django orm
  • Badge feature (need to be disabled)

Missing functionalities

One of the advantage of using Django is that it's quite easy to add new feature (like 2fa and the spam system). So that from the missing functionalities, only the fact that we can add field at runtime will be really difficult to add.

Adding fields at runtime isn't something we've got a hard requirement for, so that won't be a big deal.

The lack of 2FA and abuse preventation systems is a bit more of an issue though, and we'll probably need to add the account change notifications we need to keep sites up to date with account details as they change them. Otherwise it is looking like a very good base to start from.

Do we know if it supports multiple email addresses?

Adding fields at runtime isn't something we've got a hard requirement for, so that won't be a big deal.

The lack of 2FA and abuse preventation systems is a bit more of an issue though, and we'll probably need to add the account change notifications we need to keep sites up to date with account details as they change them. Otherwise it is looking like a very good base to start from.

I implemented 2FA with a django plugin in my local build, this wasn't so difficult (just some configuration change). I will need to clean it a bit and then push it to a repo in the gitlab instance. Creating a basic abuse prevention systems should not be too difficult from the identity side, but mostly from the others apps side. We will need to add hooks in gitlab that send a request to identity for each comment.

Do we know if it supports multiple email addresses?

For the moment it only supports one email addresses. That would be the use case, storing multiples emails only and one primary or doing authentication with whatever email is provided. The first case would be easy to implement (just add a custom json field in the database), the second case would be more difficult and I'm not sure it would present any advantage.

I found another problem, the blender id devs didn't explicitly add a LICENSE file to the repo (there are some mention of gpl2.0+ in the repo). I think they forgot (most of the KDE website repos don't have a LICENSE file too) but I prefer to be sure. So I created a request to add it explicitly.

Awesome, thanks for investigating all of this.