Allow overriding to disable auto language detection
ClosedPublic

Authored by sdepiets on Jan 1 2020, 3:06 PM.

Details

Summary

Currently, if language detection is enabled at Sonnet-wide level level, there is no way to disable it in a specific context.

In Lokalize for instance, the language of the source and target strings are already known by the application.

In that context, the language detection feature creates a lot of uncertainty in terms of spellchecking, as it may change quickly from one language to another or from one dictionnary to another within the same language. A user should not have to disable Sonnet-wide language detection to work around this, as it can be very useful elsewhere.

https://bugs.kde.org/show_bug.cgi?id=394347
BUG:394347

Diff Detail

Repository
R246 Sonnet
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
sdepiets created this revision.Jan 1 2020, 3:06 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptJan 1 2020, 3:06 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
sdepiets requested review of this revision.Jan 1 2020, 3:06 PM
sdepiets edited the summary of this revision. (Show Details)Jan 1 2020, 3:12 PM
sdepiets added a reviewer: Frameworks.
mludwig added a subscriber: mludwig.Jan 1 2020, 4:44 PM

I'd support such a change for the BackgroundChecker as well. KTextEditor needs a way to do its own language detection or override the detected language.

sdepiets updated this revision to Diff 74851.Feb 2 2020, 10:09 AM

Allow overriding to disable auto language detection in BackGround Checker

I'd support such a change for the BackgroundChecker as well. KTextEditor needs a way to do its own language detection or override the detected language.

I've updated the diff

@mludwig: ok with the current patch?

Can we get some traction on this?
It's a relatively light change that doesn't affect the default Sonnet behavior.

aacid added a subscriber: aacid.Feb 29 2020, 8:15 AM

without having used Sonnet much, this seems the wrong API to me.

Are you saying that it can happen that you tell Sonnet "use this language" and it goes and say "nah i'll ignore you and do my thing".

It seems to me that what would make sense is that the "use this language" is what sets autoDetectLanguageDisabled to false, or at least the function that should get a new overload saying setLanguage(¿qstring? language, bool disableAutoDetection) if we want to be sure not to change existing behaviour

without having used Sonnet much, this seems the wrong API to me.

Are you saying that it can happen that you tell Sonnet "use this language" and it goes and say "nah i'll ignore you and do my thing".

Yep, that's right, and it won't even tell you which language it has detected. When the autodetection feature was introduced, it fundamentally changed the behaviour of Sonnet, breaking KatePart's spell checking in the process.

It seems to me that what would make sense is that the "use this language" is what sets autoDetectLanguageDisabled to false, or at least the function that should get a new overload saying setLanguage(¿qstring? language, bool disableAutoDetection) if we want to be sure not to change existing behaviour

Ideally, the autodetection feature should be redesigned as it is only usable for simple text fields in its current form. KatePart requires a more sophisticated way of detecting languages(s).

mludwig added a comment.EditedMar 1 2020, 8:37 AM

@Simon: If you add a call 'setAutoDetectLanguageDisabled(true)' to 'BackgroundChecker::changeLanguage' and 'BackgroundChecker::setSpeller', you should be able to take Albert's suggestion into account.

For the Highlighter it should be similar.

sdepiets updated this revision to Diff 76992.Mar 5 2020, 8:33 AM

Disable autodetect on language change

aacid added inline comments.Mar 7 2020, 9:33 AM
src/core/backgroundchecker.cpp
134

question from someone that has never really used this setSpeller API.

Is this behaviour change for sure 100% of the times wanted?

If not you should add a new API that is setSpeller(speller, bool)

166

same question here

I don't see a use case where you would want to set the Language, but then let the system override it (and only if the user has auto-detection).
I left the option in the API to "setAutoDetectLanguageDisabled" to false, for instance if you are not sure which language is being used anymore and want to let the system re-decide by itself.

aacid added a comment.Mar 7 2020, 11:17 AM

Would it make sense to enshrine this behaviour with an autotest?

Would it make sense to enshrine this behaviour with an autotest?

Sorry what do you mean by that? Similar to tests/BackgroundTest.cpp ?

Would it make sense to enshrine this behaviour with an autotest?

Sorry what do you mean by that? Similar to tests/BackgroundTest.cpp ?

No, something like the stuff autotests/

one example would be something like, use a text written in french that the autodetect algorithm properly detects as french, but then you go and force say German on it and then you go and ask for which language is and it should say German when autodetetaLanguageDisabled = true but then if you go and say setAutoDetectLanguageDisabled(false) again it should say again French

Am i making sense?

sdepiets updated this revision to Diff 81176.Apr 25 2020, 2:31 PM
  • Merge branch 'master' of git.kde.org:sonnet
aacid added a comment.Apr 25 2020, 2:39 PM

You need to update the \since markers?

After that i guess you can commit unless anyone really disagrees?

@mludwig @cullmann ?

sdepiets updated this revision to Diff 81177.Apr 25 2020, 2:51 PM

Bump since + autotest checks the language as well

aacid accepted this revision.May 3 2020, 3:19 PM

Noone seems to really disagree.

Let's merge it. @sdepiets I hope this doesn't break everything ;)

This revision is now accepted and ready to land.May 3 2020, 3:19 PM
This revision was automatically updated to reflect the committed changes.
dfaure added a subscriber: dfaure.Jun 13 2020, 12:51 PM

This actually breaks language auto-detection for me in the KMail composer.

Testcase:

  • New Mail
  • I type "Bonjour," in the body

Before:
It's detected as French and not underlined as a typo

After:
The language remains English, and the word is underlined as a typo

Workaround:
Tools / Spelling / change language from English to French

Too bad I'm realizing this is the reason for the regression only today (day of 5.71 release) by looking at the KF-5.71 changelog :(

This actually breaks language auto-detection for me in the KMail composer.

Testcase:

  • New Mail
  • I type "Bonjour," in the body

    Before: It's detected as French and not underlined as a typo

    After: The language remains English, and the word is underlined as a typo

    Workaround: Tools / Spelling / change language from English to French

    Too bad I'm realizing this is the reason for the regression only today (day of 5.71 release) by looking at the KF-5.71 changelog :(

I don't think that's a regression, in the previous behavior you could try to set any language to proofread, it would always auto-detect "Bonjour" as French, thus the "Tools / Spelling / change language" had not effect if autodetect was enabled at system level (while autodetection should be an application or even case by case decision).

It may not seem like an issue for simple cases, but actually for mixed contents (i.e. an email that is 50% French, 50% English) that would be detected as English, you would have no way at all to check the French text without disabling system-wide autodectection.

Calling setAutoDetectLanguageDisabled(false) restores the previous behavior

aacid added a comment.Jun 13 2020, 4:07 PM

This actually breaks language auto-detection for me in the KMail composer.

Testcase:

  • New Mail
  • I type "Bonjour," in the body

    Before: It's detected as French and not underlined as a typo

    After: The language remains English, and the word is underlined as a typo

    Workaround: Tools / Spelling / change language from English to French

    Too bad I'm realizing this is the reason for the regression only today (day of 5.71 release) by looking at the KF-5.71 changelog :(

I don't think that's a regression, in the previous behavior you could try to set any language to proofread, it would always auto-detect "Bonjour" as French, thus the "Tools / Spelling / change language" had not effect if autodetect was enabled at system level (while autodetection should be an application or even case by case decision).

It may not seem like an issue for simple cases, but actually for mixed contents (i.e. an email that is 50% French, 50% English) that would be detected as English, you would have no way at all to check the French text without disabling system-wide autodectection.

Calling setAutoDetectLanguageDisabled(false) restores the previous behavior

The problem here is that David probably never set any language on purpose.

you could argue that this is a deficiency of the UI, should have a checkbox for "enable spellchecking" and another for "enable spellchecking *exactly* in this language"

And this is were we failed, we changed the default behaviour and that's probably not the greatest of the ideas in retrospect. Even if we could not see why anyone would set a language and would still want the auto language detection to be enabled, well it seems that at least David wanted because it's been the behaviour for ages :D

Not sure what's the correct path forward though :/

I don't think that's a regression, in the previous behavior you could try to set any language to proofread, it would always auto-detect "Bonjour" as French, thus the "Tools / Spelling / change language" had not effect if autodetect was enabled at system level (while autodetection should be an application or even case by case decision).

Autodetection *is* an application setting, it's in Settings / Spellchecker... / [x] Enable autodetection of language

It may not seem like an issue for simple cases, but actually for mixed contents (i.e. an email that is 50% French, 50% English) that would be detected as English, you would have no way at all to check the French text without disabling system-wide autodectection.

In my experience a mixed email was perfectly handled, with each paragraph having a different language.
See http://www.davidfaure.fr/2020/Screenshot_20200613_194827.png

But I think you're talking about the spellchecking dialog (which I never use, except this month because of the need to find a workaround).
My concern is for autodetection and background spellchecking. If a fix for the spellcheck dialog introduces a change of behaviour for the background spellchecking, that's a regression.

Calling setAutoDetectLanguageDisabled(false) restores the previous behavior

Changes in KF5 should NOT require applications to adapt in order to restore previous *working* behaviour.

The problem here is that David probably never set any language on purpose.

Right.
In fact, if I open the settings dialog, it has "français" checked in the top listview, and only that one.
Meanwhile the Tools / Spelling dialog shows "American English (UnitedStates)" in the combobox.
This particular issue happens even with this patch reverted. Something's rather broken.

you could argue that this is a deficiency of the UI, should have a checkbox for "enable spellchecking" and another for "enable spellchecking *exactly* in this language"

Not sure I understand this.

And this is were we failed, we changed the default behaviour and that's probably not the greatest of the ideas in retrospect. Even if we could not see why anyone would set a language and would still want the auto language detection to be enabled, well it seems that at least David wanted because it's been the behaviour for ages :D

*I* don't really set a language. Maybe KMail does though. @mlaurent might know more?

Please see below proposal to restore the previous default behavior

https://invent.kde.org/frameworks/sonnet/-/merge_requests/1