Differential D15972

Port KFind away from KMimeType
ClosedPublic
Actions

Authored by yurchor on Oct 5 2018, 7:38 PM.

Details

Reviewers

broulik

Group Reviewers

KDE Applications

Summary

The code is shamelessly borrowed from kdelibs4support implementation:

https://cgit.kde.org/kdelibs4support.git/tree/src/kdecore/kmimetype.cpp

Personally, I do not know if there is a better way to implement this check without libmagic. Please help.

Test Plan

Compile and run patched KFind.
Try to find some short text in the folder with binary files. Nothing should be found.
Try to find binary files with some metadata (e.g., "Radiohead" in the folder with Radiohead MP3s) and empty "Containing text" field. All files with "Radiohead" in metadata should be found.

Diff Detail

Repository

R228 KFind

Lint

Lint Skipped

Unit

Unit Tests Skipped

yurchor requested review of this revision.Oct 5 2018, 7:38 PM

yurchor created this revision.

The question was raised previously here:
https://mail.kde.org/pipermail/kde-frameworks-devel/2015-January/021254.html
I guess that this is the second user of isBinaryData...

Better ask @dfaure

("KDE Applications" is not meant to be used as tag for reviews, it's used to collect the projects; the last one who did significant changes to kfind is @broulik )

In D15972#337295, @ltoscano wrote:

("KDE Applications" is not meant to be used as tag for reviews, it's used to collect the projects; the last one who did significant changes to kfind is @broulik )

Is it not? Why not?

+1 overall

src/kquery.cpp
447–457	use `.toLocalFile()`
448	Coding style, `{}` also for single line statements

yurchor marked 2 inline comments as done.Oct 8 2018, 11:26 AM

Use toLocalFile().
Use braces to make the condition unambiguous.

Thanks

This revision is now accepted and ready to land.Oct 8 2018, 12:37 PM

Committed to repo.

https://commits.kde.org/kfind/4b4dd16de57485f7ed7030efffcb7349ebfa33ce

dfaure added inline comments.Oct 13 2018, 7:36 PM

src/kquery.cpp
451	Actually, that's 128 bytes nowadays. See this code from qmimedatabase.cpp: static inline bool isTextFile(const QByteArray &data) { // UTF16 byte order marks static const char bigEndianBOM[] = "\xFE\xFF"; static const char littleEndianBOM[] = "\xFF\xFE"; if (data.startsWith(bigEndianBOM) \|\| data.startsWith(littleEndianBOM)) return true; // Check the first 128 bytes (see shared-mime spec) const char p = data.constData(); const char e = p + qMin(128, data.size()); for ( ; p < e; ++p) { if ((unsigned char)(p) < 32 && p != 9 && p !=10 && p != 13) return false; } return true; } But yeah I'm not sure it's worth its own public method in QMimeDatabase, it's just some inexact heuristic for last-chance fallback...

src/kquery.cpp

451

Actually, that's 128 bytes nowadays.
See this code from qmimedatabase.cpp:

static inline bool isTextFile(const QByteArray &data)
{
    // UTF16 byte order marks
    static const char bigEndianBOM[] = "\xFE\xFF";
    static const char littleEndianBOM[] = "\xFF\xFE";
    if (data.startsWith(bigEndianBOM) || data.startsWith(littleEndianBOM))
        return true;

    // Check the first 128 bytes (see shared-mime spec)
    const char *p = data.constData();
    const char *e = p + qMin(128, data.size());
    for ( ; p < e; ++p) {
        if ((unsigned char)(*p) < 32 && *p != 9 && *p !=10 && *p != 13)
            return false;
    }

    return true;                                                                                                                                                                                                 
}

But yeah I'm not sure it's worth its own public method in QMimeDatabase, it's just some inexact heuristic for last-chance fallback...

yurchor marked an inline comment as done.Oct 14 2018, 8:13 AM

Revision Contents
Changeset List

			Path	Packages
M			src/kquery.cpp (14 lines)

Diff 43124

View Options

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	42943		Oct 5 2018, 7:26 PM	★	★
Diff 2	43124		Oct 8 2018, 11:26 AM	★	★