Fix Scheduler slewing stage and dome interaction.
ClosedPublic

Authored by TallFurryMan on Sep 16 2018, 12:06 AM.

Details

Summary

Fixed dome isMoving property, which was declared as method.
Moved dome movement check to checkJobStage to avoid a lock-up in the case the dome has not finished moving when the mount has.
That lock-up would happen because the dome check was done upon notification of the end of the slew, and that no further notification would come to test the dome again.
Moving that code block also avoids duplicating it for stages "STAGE_SLEWING" and "STAGE_RESLEWING".
Admittedly, when there is no dome in the setup, the next action could be taken directly in the mount callback, but the cost of deferring this to checkJobStage is light.
Moreover the use of the intermediate stages "STAGE_SLEW_COMPLETE" and "STAGE_RESLEWING_COMPLETE" clarifies the code of the state machine.

Test Plan

Use test vector "simple_test_no_twilight", ensure there is a dome in the INDI profile and that it is not required to unpark.
Without the change, Scheduler starts slewing to the first target and hangs there doing nothing.
With the change, Scheduler completes the first target successfully.

Stop the Scheduler, check the Unpark box for the dome, restart the Scheduler.
Without the change, Scheduler hangs at the INDI property check.
With the change, Scheduler hangs for 30 seconds, then restart both indi and Ekos to recover the dome connection.

Diff Detail

Repository
R321 KStars
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
TallFurryMan created this revision.Sep 16 2018, 12:06 AM
Restricted Application added a project: KDE Edu. · View Herald TranscriptSep 16 2018, 12:06 AM
Restricted Application added a subscriber: kde-edu. · View Herald Transcript
TallFurryMan requested review of this revision.Sep 16 2018, 12:06 AM

SchedulerJob::STAGE_SLEWING is handled in an event driven fashion in setMountStatus(). I don't think we should be using slewStatus() for this, the mount overall status should suffice (ISD::Telescope::STATUS). The reason I moved SLEW_RESLEWING to checkJob status is that in the event when reslewing is not necessary (i.e. mount is already at location so when you tell it to reslew, its state would remain as is (TRACKING), and therefore won't emit a signal that we can catch in setMountStatus) then it uses polling to check mount status and then proceed. But I agree with your dome check changes they make sense. Right now, STAGE_SLEWING is checked twice in your code, in checkJobStatus and in setMountStatus

Don't get me wrong, I completely agree with your changes. I am using slewStatus as a mean to check the mount is still there, because there's nothing else I can rely on. The problem is really to manage, on one hand, the best case, that is, the interface has a working connection with the device, and the problematic case, that is, the interface lost contact but because no one is requesting anything, no event will ever come up.

I believe my changes are temporary, as indicated in the FIXME: we need an event that says "something is wrong", even if the interface is not requested to do anything. If you can provide an event source that goes to setMountStatus that notifies such loss, we can build on that.

Right now, I chose to poll slewStatus to preserve the existing feature, but from the point of view of the Scheduler we cannot be sure the mount connection is checked when using that property. We need a dedicated event.

Besides, I think there will be other parts of the code that check the job stage independently. Right now nothing stops guiding when the mount suddenly slews. Also, it's still possible to guide and align at the same time, to dramatic effect. As I mentioned in a forum thread, guiding corrections do break the mount model.

I note only now that I left a vi :wq in the title :)

Oh I apologize!! I made the wrong conclusion looking at the diff! I just applied it now so it's clear. There is one minor change with casting, but the issue is that you moved STAGE_RESLEWING back to setMountStatus(...). The problem is we are now waiting for an EVENT from the mount, but most likely there is no new event after ALIGN_COMPLETE. Since by ALIGN_COMPLETE, the mount would be TRACKING. I actually now do not recall why we need RESLEWING stage? I just looked at alignment module and it would only emit ALIGN_COMPLETE once the diff is within acceptable limits. Maybe before, it used to do a last slew to target? Not sure.

kstars/ekos/scheduler/scheduler.cpp
6511

Maybe use dynamic_cast here? I get warnings of "old cast"

TallFurryMan planned changes to this revision.Sep 16 2018, 4:25 PM

Actually, I moved the dome check to checkJobStage to avoid code duplication, and to fix the concurrency issue. I'm not clear on why you think this is a problem?
This said, I didn't check the alignment stage. But because the tracking stage was not working, I'm not surprised the alignment stage isn't neither!

I think the objective of RESLEWING is to reposition the target in the case a failed guiding calibration did offset the target.
This is a good idea, but I don't think it was fully implemented.

I'll fix the cast, sorry for this unfortunate change.

I clearly remember it was related to alignment having the mount still in motion after it emits ALIGN_COMPLETE long ago. I don't think that alignment emits that signal before it is completely stopped now. It was never related to guiding.

Fixed raw cast on local time.

I was thinking about https://github.com/KDE/kstars/commit/c5ef911920df5a4d47e2def464c41924ffc18c12, but this doesn't make use of RESLEWING. But because there is not much doc on this change and the state machine, I'm not sure how this was supposed to work.

TallFurryMan retitled this revision from Fix Scheduler slewing stage andwq dome interaction. to Fix Scheduler slewing stage and dome interaction..Sep 17 2018, 9:12 AM
TallFurryMan planned changes to this revision.Sep 18 2018, 4:37 PM

On hold until I really understand how QT_PROPERTY works.
Modifying the interface xml apparently has no impact on the resulting executable, and thus my test is not effective on dome's isMoving.
It just happens that the value returned by the QVariant conversion is false, and no actual communication with the mount interface is taking place through the properties!

OK, so properties are straightened up (not all of them) and I understand better the design.
I'm working on the issue that if the mount/dome/dustcap park/unpark is not checked, properties don't come up.
If scheduler is stopped, then those boxes are checked, then scheduler is restarted, scheduler hangs waiting for either one.
Mitigation is possible, but stopEkos does not reset its state properly, and cannot recover.
Probably a regression. I'm about to break through on this.

TallFurryMan edited the summary of this revision. (Show Details)Sep 28 2018, 8:44 PM
TallFurryMan edited the test plan for this revision. (Show Details)
TallFurryMan edited the summary of this revision. (Show Details)
TallFurryMan edited the test plan for this revision. (Show Details)

Updated differential.

Mmmh, dome recovery is not working entirely yet. But this is a first step. It can be merged, or I can push an update if I find the time to fix.

My problem is that I can't seem to regenerate mocs properly. The XML used by the mocs are generated on-the-fly, supposedly from the class header and not from the source XML, but while dustcap, mount and other interfaces create the ready() signal flawlessly, the dome interface will just not get it through. And I don't know why yet, which is irritating...

THanks I'll check the dome interface later today to find out if there is a problem in it.

Just tested with dome simulator and it worked flawlessly. Maybe you need to delete the complete build and rebuild everything?

This revision was not accepted when it landed; it landed in state Needs Review.Sep 29 2018, 11:32 AM
This revision was automatically updated to reflect the committed changes.

Ok tested and this fixed the dome/slew issue in the scheduler. The only issue I saw was that if "Remember Job Progress" is not checked, then it would restart the job indefinitely as I reported before.

I'm working on the job termination issue. Will fix all of them at once. I'll first debug the cmake stuff with the moc generation, perhaps we could remove the XMLs in the source and install the generated XML interfaces instead. If that's already the case, I'll just drop the source XMLs.