Rewrite Baloo indexing engine
Open, WishlistPublic

Description

Baloo is used as the main indexing engine and provides search for other apps like Dolphin.

But it fails to honor its promises, because it never finds searched files like how KFind does it.

The problem is that many core apps are heavily relying on it, so any bug related to search is dumped on Baloo.

And when you browse Baloo bugs, most of them are still opened because no valid fixes are possible due to how unstable and fragile Baloo is at the moment.

This situation is really urgent because correctly searching files is one of the main functions that a file manager should provide in any desktop.

So Plasma really need a new well-designed search engine that is solid and correct to satisfy and respond to Plasma desktop high adoption.

medmedin created this task.Jun 6 2024, 11:46 PM
lydia added a subscriber: lydia.Jun 7 2024, 8:40 AM

Please rewrite this using the template provided in T17356.

lydia triaged this task as Wishlist priority.Jun 14 2024, 6:32 PM
mrjulius added a subscriber: mrjulius.EditedJul 5 2024, 7:55 AM

Baloo framework was actually one of the things I fell in love in KDE. Yet, I sometimes miss the somewhat better working indexed search of windows 7 (in otherwise abusive relationship).

In my case, the problem you describe, was mainly due to Baloo not supporting indexing of any other encoding than UTF8 (such as ANSI/Latin-1/ISO8129-1), which left heaps of files unindexed.

Furthermore, these additions to Baloo would take it to the next level in practical value:

  • Refining and clarifying the query syntax (taking the whole path into account, optionally fuzzy search or advanced syntax using wildcards, AND/OR logic etc)
  • Indexing of mbox & maildir (thus supporting also Thunderbird, which is the most popular email client)
  • Providing examples (for example in python) how to make queries from the Baloo database (to allow future developers to learn and become familiar with Baloo by creating their own CLI tools and scripts, and to encourage community to experiment with open source virtual assistants, for instance plugging Baloo to Llamaindex)

Note that this is more of a feature request/rant, not really a goal. You also don't list any champions, so it's not even eligible to vote.

@redstrate

It's not a rant, it's the result of many tries through 4years of using it with various PIM apps. That thing with Akonadi are the only two weaknesses that give Plasma desktop a bad reputation.

Perhaps it's not my place to say as I'm currently just a non-contributing user, but for me this observation by medmeding is very valid, despite of not yet being eligible for voting.

If there's a room for conversation, the problem will be better understood and it will inspire ideas from the community, then the champions will emerge.

Personally, I'd feel quite obnoxious to declare myself as a champion to solve a problem without discussing it and gaining a proper understanding of the earlier community efforts and challenges. On the other hand, discussing it with others could engage and give the necessary encouragement to roll up ones sleeves.

If this is not the place to talk about the issues, but simply pitch&vote, I apologize and restrain myself from giving any further feedback here.

frdbr added a subscriber: frdbr.Jul 29 2024, 3:54 PM
bruns added a subscriber: bruns.Aug 2 2024, 6:58 PM

@redstrate

It's not a rant, it's the result of many tries through 4years of using it with various PIM apps. That thing with Akonadi are the only two weaknesses that give Plasma desktop a bad reputation.

Given PIM and Baloo are totally unrelated stresses the point you are ranting.

bruns added a comment.Aug 2 2024, 7:12 PM

Baloo framework was actually one of the things I fell in love in KDE. Yet, I sometimes miss the somewhat better working indexed search of windows 7 (in otherwise abusive relationship).

In my case, the problem you describe, was mainly due to Baloo not supporting indexing of any other encoding than UTF8 (such as ANSI/Latin-1/ISO8129-1), which left heaps of files unindexed.

Content extraction is not the domain of Baloo itself, but KFileMetadata. For formats with a given encoding, other encodings have always worked. And for Plain text and HTML, the encoding detection has been enhanced recently, a made more robust (e.g. when a nominally UTF-8 document contains invalid code sequences).

Furthermore, these additions to Baloo would take it to the next level in practical value:

  • Refining and clarifying the query syntax (taking the whole path into account, optionally fuzzy search or advanced syntax using wildcards, AND/OR logic etc)

These are two orthogonal topics:

  • Documenting - e.g. AND/OR are already supported.
  • Wildcards etc. are feature additions, well supportable with the current existing engine
  • Indexing of mbox & maildir (thus supporting also Thunderbird, which is the most popular email client)

Again, mbox is content, so a topic for KFileMetadata, not Baloo.

On the other hand, e.g. Akonadi already has its own full text search engine, duplicating that would be just wasteful.

  • Providing examples (for example in python) how to make queries from the Baloo database (to allow future developers to learn and become familiar with Baloo by creating their own CLI tools and scripts, and to encourage community to experiment with open source virtual assistants, for instance plugging Baloo to Llamaindex)

The DB itself is *not* public API - anything else would block any evolution of the DB format. If you want access to more DB contents:

  1. Write down your requirements
  2. Discuss how an API could look like
  3. Propose this API, and then add e.g. Python wrappers around it.
This comment was removed by medmedin.

@bruns
Thank you for taking the time to respond and give pointers. I will keep familiarizing myself with the project.

frdbr added a comment.Aug 12 2024, 7:41 PM

Hello,

Please note that the deadline just around the corner on Wednesday, so now is the time to finalize your proposal. Remember that proposals without a Goal Champion will be disqualified, so this step is crucial to ensure your idea moves forward. If you need help or have any questions, please let me know.

If you’re unable to finish your proposal but still want to participate, consider contributing to other ongoing tasks.

Thank you for submitting your ideas for the KDE Goals!