The current parsing mechanism to obtain the current NatGeo POTD based on XML is broken, this patch proposes an alternative mechanism using QRegularExpression library.
For this method to continue working in the future it is supposed that the web page will maintain a sane format, namely that the relevant tag will not be broken among many lines, the fields inside the tag will maintain the same relative order, and obviously that it will not be obfuscated.
Details
Diff Detail
- Repository
- R114 Plasma Addons
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
dataengines/potd/natgeoprovider.cpp | ||
---|---|---|
96 ↗ | (On Diff #14202) | Why this change? Moving away from https to http seems backward... |
dataengines/potd/natgeoprovider.cpp | ||
---|---|---|
61 ↗ | (On Diff #14202) | An alternative regex may be "^<meta\\s.*property=\"og:image\"\\s.*content=\"(.*)\".*/>$", so to match any possible future fields before property, and between property and content. |
96 ↗ | (On Diff #14202) | The versions with https or without the trailing slash were not working for me, so I changed it to that one, but it could as well be just be my mistake. |
dataengines/potd/natgeoprovider.cpp | ||
---|---|---|
96 ↗ | (On Diff #14202) | The https redirects to http (HTTP 302 temp move), then redirects to add the trailing slash (HTTP 301 Perm Move). |
So, what do we want to do with this patch?
I've been using it for a week now, and didn't notice any problems. Has anyone else been testing it?
As for the link, I am for keeping the http version, as it is the one used by NatGeo, so it would be less likely to be broken in the near future. If and when they will move to TLS it should be their responsibility to issue the right redirect.
Anyway, any decision will be fine for me, as long as it works.
Lastly, any opinion on the regex?