Port syndication away from QXmlInputSource API
Open, Needs TriagePublic

Description

OLD OUTDATED INFORMATION: Qt5::Xml and therefore QDom* is supposed to go away in Qt5. This is a meta task to collect all places in KF5 we need to adjust for this.

vkrause created this task.Nov 23 2019, 9:37 AM

Turns out that QDom actually won't go away, but we still need to check for any usage of deprecated APIs from QDomDocument

vkrause moved this task from Backlog to Metatasks on the KF6 board.Nov 25 2019, 9:35 PM
dfaure added a subscriber: dfaure.Sep 8 2020, 4:52 PM

What's deprecated is only QXmlInputSource, QXmlReader/QXmlSimpleReader and associated classes (QXmlEntityResolver, QXmlAttributes...).

https://lxr.kde.org/ident?_i=QXmlInputSource&_remember=1 says only syndication needs to be ported away.

dfaure renamed this task from Port away from QDom API to Port syndication away from QXmlInputSource API.Sep 8 2020, 4:52 PM
dfaure updated the task description. (Show Details)
dfaure claimed this task.Sep 8 2020, 5:28 PM
dfaure added a comment.Sep 8 2020, 6:21 PM

This is not as easy as I thought it would be. When used without QXmlInputSource, QDomDocument simplifies whitespace-only CDATA sections.
This patch: http://www.davidfaure.fr/2020/port_syndication_away_from_qxmlinputsource.diff
leads to a failure in autotests/atom/atom10_entry_content.xml which can be narrowed to
-id: #hash:aff2c4358030579d2c3dcea6e92b40fe#
+id: #hash:a359558b397d24593c3b55afb85d173a#

Somehow the hash is calculated over the contents, and this breaks due to whitespace simplification?
But the HTML itself doesn't really care about that whitespace (see the unittest fixes in the patch).
What can we do about this? Is the hash thing useful, and we *have* to get a no-whitespace-collapsing feature in Qt again? Or can we somehow adapt?

I think this would be fine in general (there is no guarantee on how exactly the hash in computed from what I can see), the only problem could be that we at some point rely on it being long-term stable, ie. store it on disk for something actually relevant (e.g. losing selection state is probably acceptable, breaking Akregator's database entirely probably not). Hard to tell from a quick look at Akregator though.

Disclaimer: I haven't looked at Akregator's codebase in almost 10 years...

Originally, the hash was meant to serve as item ID where the source feed didn't provide IDs an item could be identified with. In Akregator, this ID was used to tell items that were already seen from new items when fetching the RSS/Atom. So I would assume that if the hash changes, that you would end up with duplicate items for every item that already existed locally and is still in the feed, because the old item and the new item now have different IDs/hashes.

I would say that's annoying but not critical, provided that it only happens once.

So, to be sure, I continue with this patch, adjusting the unittest to the new hash -- and no change required in akregrator?

alex moved this task from Metatasks to Done on the KF6 board.Apr 14 2021, 8:11 AM