- User Since
- Feb 14 2017, 10:36 AM (126 w, 6 d)
Fri, Jul 19
Wed, Jul 17
Sun, Jul 14
Thu, Jul 11
Apparently, it does fix bug 409257, which is pretty serious one (db corruption, after all).
Sun, Jun 30
Note that it somewhat duplicates work done inside QueryParser. Which is almost not used anywhere - most of the parsing is done by AdvancedQueryParser.
Not entirely sure, but bug 409257 might be caused by that (at least in my case, it looked the same).
Fri, Jun 28
I've found a way to reproduce a related issue:
$ mkdir ~/test $ balooctl config add includeFolders ~/test $ balooctl stop <make some changes with ~/test, i.e. add a tag> $ balooctl start
Jun 21 2019
Jun 16 2019
Rebase on master.
As the limit is somewhat arbitrary, maybe we can just limit the QString? I don't think this has any serious side effects.
Actually, there is an issue with that code right now, which I wanted to fix, but forgot.
The trimming part finalArr = finalArr.mid(0, maxTermSize); actually should be performed on QString instead of QByteArray - unicode symbols inside term can consist of two bytes, and cutting at maxTermSize bytes can actually cut half of last symbol. I end up with terms like тождественно� inside balooshow -x.
Not to mention that russian terms end up being pretty small.
Jun 10 2019
Apart from small nitpick, I think it's fine.
@bruns what do you think about it?
Can we also cover this case with tests?
That's a nice catch!
I thought about it myself. I googled it a bit (i.e. here) and saw that there might be some quite unwanted runtime overhead because of using std::function. It might be negligible (since we're doing some costly DB operations inside anyways), but I'd prefer if we did some profiling to make sure it's OK.
Makes sense to me
Jun 4 2019
I didn't know QFileInfo fetches information on demand (and caches it). That is the reason for this change, right?
I think it would be nice to elaborate on that in summary, or maybe as a brief comment in the code.
Jun 3 2019
Use single temp dir
Fixed comment with directory structure
Moved m_nameChanged check inside separate block
Split test to three separate test functions, which cover different test cases.
May 31 2019
May 29 2019
May 27 2019
May 25 2019
I've spent some time to implement structure I propose (the code is available in private clone).
Most notable changes:
Apr 7 2019
I didn't realize LMDB does not modify pages; and if we change one, it creates a new one, copies the data from the old page, modifies it and marks old one as "dirty".
In that case we'll most likely end up modifying a single page in both cases.
Apr 5 2019
Just because you hide the RMW, it does not mean it does not happen. For LMDB, duplicate keys are just a plain array, see e.g. MDB_APPENDDUP in http://www.lmdb.tech/doc/group__mdb.html#ga4fa8573d9236d54687c61827ebf8cac0.
The data entries are sorted by LMDB as soon as you do a mdb_put, and this is of course a RMW cycle.
I thought the structure behind MDB_DUPSORT is a bit more clever that just a plain array, i.e. it's still a search tree.
And insertion still should cost much less compared to what is done now (fetch whole list + insert + put it back).
Of course it still might do some RMW work (tree rebalancing, for example), my point is that it should be more efficient.
Apr 4 2019
I'd like to revive this discussion.
Mar 20 2019
Mar 19 2019
Mar 17 2019
Feb 23 2019
I've looked through the patch (quite large indeed), apart from the single note I think it's good to go.
Feb 15 2019
Forgot to define fname inside EventMoveTo
Updated comment, removed duplicated QFile::decodeName
Feb 12 2019
Feb 6 2019
Something like that? I've decided not to emit created signal from inside the function, just to have a bit less branching in the code (and documented this behavior, since it might be a bit confusing)
Added recursive iteration over all contents for Create event as well
Feb 5 2019
Added code to work with first entry that pops from FilteredDirIterator (that is the directory itself)
Test still works; but we should emit created() signal for it as well.
I am not sure if I understood your description correctly, but I am quite sure this race condition does not exist - the files/folders inside the moved folder are not created/moved one by one, but the containing folder ist just "renamed" - it is unlinked from the old parent and linked into the new one, atomically.
Sure. That concern corresponded only to the second note - if moving from another device, system has to do actual copy/move.
Feb 4 2019
Explained the race condition in summary, expanded test to check if watches were installed correctly.
Feb 3 2019
Feb 2 2019
Nice! I like it, it's definitely much better than Q_ASSERT_X macros that are just silently ignored in non-debug builds.
Dec 18 2018
Nov 21 2018
I believe can do something better here.
I think if we stick to canonical paths everywhere, and resolve symlinks ASAP (but still follow them), that might solve all the problems.
Nov 15 2018
It seems like you've pushed something that was not intended to be pushed (XML extractor parts)
Nov 14 2018
Apart from trivial comment, this looks fine. I've tested it on my setup (with bunch of (e)ps files), and randomly chosen files seems to be indexed nicely. It also reduced the size of the index by almost 50MB, because those are not indexed as plaintext anymore :)
Yet I would also vote for replacing it (eventually) with a full-featured extractor based on libspectre
(I'm not a security specialist in any way, but that CVE doesn't look too harmful, and from my point of view it's not worth to abandon full support of (E)PS because of it)
It's a bad idea to removeRecursively starting from root of the tree (documentid 0).
If user has indexed /home/username folder, there is also an index entry for /home (that's how IdTreeDB works).
However, /home should not be indexed, according to checks (because it's not in includeFolders, while /home/username is)
This will lead to removeRecursively("/home") call, which will wipe index for /home/username as well.
Rebase on master
Oct 30 2018
Yep, fine by me
That's nice! I'll test it a little.
Sorry, I've postponed this one for some time :(
Oct 20 2018
BTW. Is there any reason why i.e. inside PostingDB Baloo stores key -> encoded list of values, instead of using MDB_DUPSORT | MDB_DUPFIXED (which allow to store multiple values for keys)?
Dropped in favor of D16265: [Scheduler] Use flag to track when a runner is going idle, which handles this problem better.
I like it, it's better than D15959: Wait for the extraction process to finish before scheduling.
And it seems to be working, as far as I can see :)