Restore the ability to view man and info pages from Konqueror
AcceptedPublic

Authored by stefanocrocco on Jul 6 2018, 1:49 PM.

Details

Summary

Add a QWebEngineUrlSchemeHandler which uses KIO::get to produce HTML code to be sent using QWebEngineUrlRequestJob::reply. This scheme handler is added whenever WebEnginePart::openUrl is called for an URL for which KProtocolManager::defaultMimetype returns text/html.

The basic idea is taken from the implementation used in the webengine_stream branch, but instead of changing the way KonqView::openUrl works, it only adds the appropriate scheme handler.

The main issue for this implementation to work is that QWebEngine refuses to load local resources in the generated HTML (images and CSS files) because of cross-origin rules. The only way to work around this limitation that I could think of is to parse the HTML code produced by KIO::get and to replace, in img and link elements the URL with a data URL embedding the content of the file. To do this, the scheme handler uses the external program htmltidy to convert the HTML generated by KIO into XHTML, then uses QDomDocument to parse the resulting XHTML file and replace the URLs appropriately. If one of these two steps fails, the original HTML will be used: this means that the user won't be able to see most of the formatting in the man/info pages but will be able to read the text.

Unfortunately, there's still a problem I couldn't solve: some resources in the HTML produced by KIO use a help URL rather than the standard file one, but I couldn't find out to which file they point, so they aren't changed, meaning that the page is not displayed completely as it was intended.

Test Plan

Open man and info pages from location bar and following links between such pages.

Diff Detail

Repository
R226 Konqueror
Branch
add-man
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 969
Build 982: arc lint + arc unit
stefanocrocco requested review of this revision.Jul 6 2018, 1:49 PM
stefanocrocco created this revision.
stefanocrocco edited the summary of this revision. (Show Details)Jul 6 2018, 2:04 PM
stefanocrocco edited the test plan for this revision. (Show Details)
stefanocrocco added a reviewer: dfaure.
pino added a subscriber: pino.Jul 6 2018, 2:08 PM
pino added inline comments.
webenginepart/src/webengineparthtmlmimetypehandler.cpp
38–41 ↗(On Diff #37256)

Instead of using KIO::get() + manually keeping the received data, just use KIO::storedGet() which gives a job that does the same.

  • Use KIO::storedGet instead of KIO::get
dfaure requested changes to this revision.Jul 8 2018, 9:50 PM
dfaure added inline comments.
webenginepart/src/webenginepart.cpp
351 ↗(On Diff #37276)

coding style: space after if, and space before '{'

352 ↗(On Diff #37276)

The space before "->" is strange

webenginepart/src/webengineparthtmlmimetypehandler.cpp
41 ↗(On Diff #37276)

If you captured job, you wouldn't need this [unchecked] dynamic_cast.
Or use a static_cast, you know j is a StoredTransferJob.

74 ↗(On Diff #37276)

coding style: space before '{'

87 ↗(On Diff #37276)

url.toLocalFile(), so that this works on Windows too.

91 ↗(On Diff #37276)

Use path here, i.e. the local variable that contains the result of toLocalFile.

92 ↗(On Diff #37276)

Should use if() here.
I know you checked QFile::exists, but I would remove that, it's faster to just check the retval from open, and this will also cover the case of missing permissions.

94 ↗(On Diff #37276)

Not necessary, the QFile dtor does it.

101 ↗(On Diff #37276)

missing spaces around '='

121 ↗(On Diff #37276)

Why does this run tidy? Can't we fix the kioslaves to generate proper HTML instead?

webenginepart/src/webengineparthtmlmimetypehandler.h
67 ↗(On Diff #37276)

remove "virtual", add "override" at the end instead.

125 ↗(On Diff #37276)

s/the the/then the/ ?
or maybe there's just one "the" too many.

This revision now requires changes to proceed.Jul 8 2018, 9:50 PM

Make requested corrections

Make missing corrections to vwebengineparthtmlmimetypehandler.h

dfaure added a comment.Jul 9 2018, 8:01 AM

What about the dependency on tidy? Can it be removed?

stefanocrocco added inline comments.Jul 9 2018, 8:05 AM
webenginepart/src/webengineparthtmlmimetypehandler.cpp
121 ↗(On Diff #37276)

The code generated by the kioslaves is proper HTML (except in some cases where it seems to use an invalid DOCTYPE). The problem is that, QDomDocument represents an XML document, so I need to convert the HTML code into XHTML before passing it to QDomDocument. I haven't actually tried to use it on the original HTML, but since its documentation states that "The QDomDocument class represents an XML document", I assumed it wouldn't work. Using tidy is the simplest way I found to convert an HTML document into XHTML.

dfaure added inline comments.Jul 9 2018, 8:09 AM
webenginepart/src/webengineparthtmlmimetypehandler.cpp
121 ↗(On Diff #37276)

Ah, I see.

Two thoughts:

  1. the only purpose in life of these kioslaves is to send HTML to konqueror, so we could change them to generate XHTML in the first place.
  1. parsing HTML isn't a job for QDomDocument, but for an HTML parser. The usual solution these days (like in KMail) is to use QtWebEngine and inject javascript code to fetch the stuff you're looking for. Somewhat overkill in general, but you're already linking to QtWebEngine here ;-)

Either of these solutions would however be better than a dependency on tidy IMHO
(bad for portability, bad for performance, bad for the user experience when tidy is not installed) :-)

stefanocrocco added inline comments.Jul 9 2018, 8:25 AM
webenginepart/src/webengineparthtmlmimetypehandler.cpp
121 ↗(On Diff #37276)

Ok, I'll look into using QWebEngine and javascript to change the HTML code.

Use QWebEnginePage to parse the HTML

The QWebEnginePage used here is different from that associated with the
WebEnginePart because we don't have access to that page from the handler.

Looks good - almost ;)

webenginepart/src/webengineparthtmlmimetypehandler.cpp
36 ↗(On Diff #37469)

static QStrings slow down startup time. Better use static const char s_...[] = "..."

65 ↗(On Diff #37469)

making m_request a QPointer does the same with less runtime cost and less code.

webenginepart/src/webengineparthtmlmimetypehandler.h
31 ↗(On Diff #37469)

no longer needed

34 ↗(On Diff #37469)

not needed

36 ↗(On Diff #37469)

not needed

Use QPointer and static char, remove useless includes and forward declarations

dfaure accepted this revision.Jul 15 2018, 10:39 PM
This revision is now accepted and ready to land.Jul 15 2018, 10:39 PM

Don't link agains QtXml

I forgot to remove Qt5::Xml from target_link_libraries for kwebenginepartlib
when switching to using QWebEnginePage to parse the HTML code

dfaure accepted this revision.Jul 16 2018, 7:48 AM
This revision was automatically updated to reflect the committed changes.

Thanks for working on this, but a note about:

Unfortunately, there's still a problem I couldn't solve: some resources in the HTML produced by KIO use a help URL rather than the standard file one, but I couldn't find out to which file they point, so they aren't changed, meaning that the page is not displayed completely as it was intended.

This is intended, and the other engines were able to get those resources by retrieving again them through the help:/ URL.

arfrever added inline comments.
webenginepart/src/CMakeLists.txt
2

Since you dropped linking against Qt5::Xml (using target_link_libraries), this find_package(Qt5Xml ... is probably unnecessary.

stefanocrocco reopened this revision.Jul 18 2018, 9:11 AM
This revision is now accepted and ready to land.Jul 18 2018, 9:11 AM

Remove unneed check for QXml

Thanks for working on this, but a note about:

Unfortunately, there's still a problem I couldn't solve: some resources in the HTML produced by KIO use a help URL rather than the standard file one, but I couldn't find out to which file they point, so they aren't changed, meaning that the page is not displayed completely as it was intended.

This is intended, and the other engines were able to get those resources by retrieving again them through the help:/ URL.

Thanks for the information. I was confused because I didn't notice that the help kioslave wasn't installed on my system, so those URLs weren't resolved even when using the KHTML part. Now I've installed it and created a new revision (D14233) adding support for it.