Harald did a rewrite of the kde.org/applications code using appstream data and modern infrastructure. Todo item to not forget about it:
The main problems as I recall off the top of my head:
- I was too opinionated in excluding certain types of apps (unmaintained/* repos comes to mind; there's more though)
- For the same reason also some fields from the old format didn't get moved into the new format and so overall there was less data
- Numerous urls broke because the human-made category didn't match the actual category of the appstream data
- A number of applications had no, bad, or un-crawlable appstream data (I think there's a list in a comment in appstream.rb)
It may also not properly extract data just now due to bitrot and the fact that things were reshuffled on the CI...
The inner workings are fairly straight-forward. It gets a list of all projects from projects.kde.org API, then filters all the ones we have on build.kde.org as well, iters them and grabs the tarball created from the make install result from some server, crawls all the application appdata inside, runs them through appstreamcli to convert them to yaml, then converts that to json and done. Well, almost, projects which are not under CI get their git repo crawled for an actual appstream file as a last ditch effort. That's actually when appstream data is considered not crawlable... if the appdata is configure_file'd through cmake there's no reliable way of knowing what the output is unless it gets CI'd (being the CI fanboy that I am I'd argue that the solution of course is to CI them ;))
On the UI side it mostly changes the data frontend class to be backed by the new appstream json format.