- User Since
- Feb 14 2017, 10:36 AM (113 w, 5 d)
Sun, Apr 7
I didn't realize LMDB does not modify pages; and if we change one, it creates a new one, copies the data from the old page, modifies it and marks old one as "dirty".
In that case we'll most likely end up modifying a single page in both cases.
Fri, Apr 5
Just because you hide the RMW, it does not mean it does not happen. For LMDB, duplicate keys are just a plain array, see e.g. MDB_APPENDDUP in http://www.lmdb.tech/doc/group__mdb.html#ga4fa8573d9236d54687c61827ebf8cac0.
The data entries are sorted by LMDB as soon as you do a mdb_put, and this is of course a RMW cycle.
I thought the structure behind MDB_DUPSORT is a bit more clever that just a plain array, i.e. it's still a search tree.
And insertion still should cost much less compared to what is done now (fetch whole list + insert + put it back).
Of course it still might do some RMW work (tree rebalancing, for example), my point is that it should be more efficient.
Thu, Apr 4
I'd like to revive this discussion.
Mar 20 2019
Mar 19 2019
Mar 17 2019
Feb 23 2019
I've looked through the patch (quite large indeed), apart from the single note I think it's good to go.
Feb 15 2019
Forgot to define fname inside EventMoveTo
Updated comment, removed duplicated QFile::decodeName
Feb 12 2019
Feb 6 2019
Something like that? I've decided not to emit created signal from inside the function, just to have a bit less branching in the code (and documented this behavior, since it might be a bit confusing)
Added recursive iteration over all contents for Create event as well
Feb 5 2019
Added code to work with first entry that pops from FilteredDirIterator (that is the directory itself)
Test still works; but we should emit created() signal for it as well.
I am not sure if I understood your description correctly, but I am quite sure this race condition does not exist - the files/folders inside the moved folder are not created/moved one by one, but the containing folder ist just "renamed" - it is unlinked from the old parent and linked into the new one, atomically.
Sure. That concern corresponded only to the second note - if moving from another device, system has to do actual copy/move.
Feb 4 2019
Explained the race condition in summary, expanded test to check if watches were installed correctly.
Feb 3 2019
Feb 2 2019
Nice! I like it, it's definitely much better than Q_ASSERT_X macros that are just silently ignored in non-debug builds.
Dec 18 2018
Nov 21 2018
I believe can do something better here.
I think if we stick to canonical paths everywhere, and resolve symlinks ASAP (but still follow them), that might solve all the problems.
Nov 15 2018
It seems like you've pushed something that was not intended to be pushed (XML extractor parts)
Nov 14 2018
Apart from trivial comment, this looks fine. I've tested it on my setup (with bunch of (e)ps files), and randomly chosen files seems to be indexed nicely. It also reduced the size of the index by almost 50MB, because those are not indexed as plaintext anymore :)
Yet I would also vote for replacing it (eventually) with a full-featured extractor based on libspectre
(I'm not a security specialist in any way, but that CVE doesn't look too harmful, and from my point of view it's not worth to abandon full support of (E)PS because of it)
It's a bad idea to removeRecursively starting from root of the tree (documentid 0).
If user has indexed /home/username folder, there is also an index entry for /home (that's how IdTreeDB works).
However, /home should not be indexed, according to checks (because it's not in includeFolders, while /home/username is)
This will lead to removeRecursively("/home") call, which will wipe index for /home/username as well.
Rebase on master
Oct 30 2018
Yep, fine by me
That's nice! I'll test it a little.
Sorry, I've postponed this one for some time :(
Oct 20 2018
BTW. Is there any reason why i.e. inside PostingDB Baloo stores key -> encoded list of values, instead of using MDB_DUPSORT | MDB_DUPFIXED (which allow to store multiple values for keys)?
Dropped in favor of D16265: [Scheduler] Use flag to track when a runner is going idle, which handles this problem better.
I like it, it's better than D15959: Wait for the extraction process to finish before scheduling.
And it seems to be working, as far as I can see :)
Oct 17 2018
Oct 16 2018
Here is some test data for my (rather simple) setup.
Oct 15 2018
Oct 14 2018
Oct 13 2018
Replaced size() by count(), which is more appropriate here
Oct 12 2018
That's weird. I caught it only once and thought that it is because I trivially forgot the endl. But now I cannot reproduce it anymore.
Seems like it should be fine, since we do emit finished signal, which will write the 'OK' part.
Sorry for the noise.
Woops! startedIndexingFile does not print a newline. I guess I can just add m_out << endl...
Address raised issues
I'm not entirely sure how translating system works, but won't it pop up as 4 identical lines in i.e. Lokalize, causing frustration to our translators?
It would also mean that if we would want to change the message, we would have to do it in 4 different lines.
Oct 9 2018
Does this run at startup? If so, this will erase the entries of files on a removable volume not already mounted.
Whoa, thanks for the notice. Not cool, forgot about it.
Probably need to add some checks in IndexCleaner, about the device.
Oct 8 2018
Hah, looks like you pushed a commit for it last year!
I'm not wedded to symlink support, but if we're not going to do it, we should close the bug with some good reasons.
Looks fine to me. But do we really need to optimize it? I mean, I didn't see it running more than ~20 ms, and with this patch for small queries it runs like ~16 ms. Worst case is when user types something in KRunner, but again, the lag is negligible there.
Maybe we should print also a suggestion to user, something like (maybe rephrase it better)
WARNING: Looks like your index is corrupted. We suggest you to run `balooctl disable && balooctl disable` to wipe it and rebuild from scratch
so they won't have to google what to do (there's still not much can be done in that case)
Oct 7 2018
I never experienced such corruption, though, but sanity check shouldn't hurt.
I would like to comment on that and add my 2 cents:
Oct 6 2018
I've found the relevant bug: https://bugs.kde.org/show_bug.cgi?id=333678
That doesn't looks like it works all the time: sometimes I still have this issue.
Apparently, it still can happen that even when the process is finished, signal is emitted, but the thread is not finished yet.
I can look at these, shouldn't be too hard.
Weird, bug 364858 should be fine as it is - user asks for a way to display files Baloo is currently processing, but that's exactly what monitor does (even without this patch).
Oct 5 2018
Oct 4 2018