Port KFind away from KMimeType
ClosedPublic

Authored by yurchor on Oct 5 2018, 7:38 PM.

Details

Reviewers
broulik
Group Reviewers
KDE Applications
Summary

The code is shamelessly borrowed from kdelibs4support implementation:

https://cgit.kde.org/kdelibs4support.git/tree/src/kdecore/kmimetype.cpp

Personally, I do not know if there is a better way to implement this check without libmagic. Please help.

Test Plan
  1. Compile and run patched KFind.
  2. Try to find some short text in the folder with binary files. Nothing should be found.
  3. Try to find binary files with some metadata (e.g., "Radiohead" in the folder with Radiohead MP3s) and empty "Containing text" field. All files with "Radiohead" in metadata should be found.

Diff Detail

Repository
R228 KFind
Lint
Lint Skipped
Unit
Unit Tests Skipped
yurchor requested review of this revision.Oct 5 2018, 7:38 PM
yurchor created this revision.

The question was raised previously here:
https://mail.kde.org/pipermail/kde-frameworks-devel/2015-January/021254.html
I guess that this is the second user of isBinaryData...

Better ask @dfaure

("KDE Applications" is not meant to be used as tag for reviews, it's used to collect the projects; the last one who did significant changes to kfind is @broulik )

aacid added a subscriber: aacid.Oct 5 2018, 8:34 PM

("KDE Applications" is not meant to be used as tag for reviews, it's used to collect the projects; the last one who did significant changes to kfind is @broulik )

Is it not? Why not?

+1 overall

src/kquery.cpp
447–457

use .toLocalFile()

448

Coding style, {} also for single line statements

yurchor marked 2 inline comments as done.Oct 8 2018, 11:26 AM
yurchor updated this revision to Diff 43124.Oct 8 2018, 11:28 AM
  • Use toLocalFile().
  • Use braces to make the condition unambiguous.
broulik accepted this revision.Oct 8 2018, 12:37 PM

Thanks

This revision is now accepted and ready to land.Oct 8 2018, 12:37 PM
dfaure added inline comments.Oct 13 2018, 7:36 PM
src/kquery.cpp
451

Actually, that's 128 bytes nowadays.
See this code from qmimedatabase.cpp:

static inline bool isTextFile(const QByteArray &data)
{
    // UTF16 byte order marks
    static const char bigEndianBOM[] = "\xFE\xFF";
    static const char littleEndianBOM[] = "\xFF\xFE";
    if (data.startsWith(bigEndianBOM) || data.startsWith(littleEndianBOM))
        return true;

    // Check the first 128 bytes (see shared-mime spec)
    const char *p = data.constData();
    const char *e = p + qMin(128, data.size());
    for ( ; p < e; ++p) {
        if ((unsigned char)(*p) < 32 && *p != 9 && *p !=10 && *p != 13)
            return false;
    }

    return true;                                                                                                                                                                                                 
}

But yeah I'm not sure it's worth its own public method in QMimeDatabase, it's just some inexact heuristic for last-chance fallback...

yurchor marked an inline comment as done.Oct 14 2018, 8:13 AM