Resurrect the Web Archiver
Closed, ResolvedPublic

Description

This refers to the "Archive Web Page" plugin previously available in Konqueror. It was removed in 2019 because it relied on KHTML DOM access and could not be ported.

It was useful, though, so I've looked at bringing the web archiver function back without having to depend on KHTML or indeed whatever KPart is being used. It does this by having a very small Konqueror plugin that simply passes the page URL to an external program. After putting up a dialogue for configuration options, it then passes the URL to wget in its 'page and requisites' mode, which downloads the page and all images, stylesheets etc. that it requires. The downloaded page is then saved in the original web archive format, or optionally a tar or zip archive or unpacked in a directory. No access to the part's DOM is needed, and only the wget command is needed as a runtime dependency.

I've also brought back the web archive thumbnailer, using either WebEngine or WebKit as the renderer. WebEngine is the supported default option but is not 100% reliable (depends on a timeout) due to its asynchronous loading. WebKit is unsupported but still works. Neither of these depend on any KPart or other framework, they use QWebEngine/QWebPage only. This can generate thumbnails for web archives and also HTML files.

So the question: is this worth putting back into Konqueror, and if so in what form? At the moment all three of the major components (plugin, archiver program and thumbnailer) are where they were originally in the Konqueror plugins/webarchiver directory. Only the first of these really needs to be in the Konqueror source tree, though, and the archiver could be useful as a standalone command (kcreatewebarchive <URL>). So should everything stay within the Konqueror source, or could there be three components:

  • plugin - remains in Konqueror source
  • archiver - as a separate repository (in the 'network' category?)
  • thumbnailer - as a separate repository (in kdegraphics?)

The desktop group and those interested in Konqueror are added as subscribers. Any thoughts would be appreciated.

marten created this task.Aug 28 2020, 11:53 AM

I think it would be worth to put it back. I think QtWebEngine already provides support for doing something like it using QWebEnginePage::save() (even if I never tried using it), however a solution which didn't rely on a specific part would be, in my opinion, even better. I remember that when I ported the webarchiver plugin to KF5 (before switching to QtWebEngine) I wondered why it didn't use wget instead of doing all the work manually. Of course, this requires having wget installed, but I don't think it would be an issue.

As for the form, I don't have a preference: I'd be ok both with having the whole plugin in Konqueror or to split it as you suggest.

marten closed this task as Resolved.Mar 30 2021, 6:40 PM
marten claimed this task.