Prevent rescheduling aborted jobs until all jobs are processed
ClosedPublic

Authored by TallFurryMan on Feb 27 2019, 5:17 PM.

Details

Summary

Add Ekos Scheduler option for altitude cutoff of setting objects.
Fix completion time restriction violation not showing startup time.
Make Scheduler not re-schedule aborted jobs while running.

Test Plan

Create a first scheduler job with a restriction that will trigger soon, with a duration that will make it violate the restriction.
Create a second scheduler job that will run without restrictions just after the first.
Start the Scheduler, observe as the first job executes, then aborts because of the restriction.
Without the fix, the aborted job is rescheduled to the next available time, and pushes the second job afterwards.
With the fix, the aborted job is not rescheduled, and the second job starts.

Test with schedule lists made of:

  • one single job: aborted job is rescheduled, eventually to the next day.
  • multiple jobs: aborted job is left at the same index, next scheduled is run.
  • multiple jobs with a single one valid: aborted job is rescheduled as if alone.

Observe as complete/invalid/error jobs are reordered at the end of the list automatically. This was improving performance, but might not be needed anymore.

Diff Detail

Repository
R321 KStars
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
TallFurryMan created this revision.Feb 27 2019, 5:17 PM
Restricted Application added a project: KDE Edu. · View Herald TranscriptFeb 27 2019, 5:17 PM
Restricted Application added a subscriber: kde-edu. · View Herald Transcript
TallFurryMan requested review of this revision.Feb 27 2019, 5:17 PM
TallFurryMan planned changes to this revision.Feb 27 2019, 5:18 PM

I upload this commit as first step to fixing the issue. Work is still needed to reschedule the aborted jobs when all runnable jobs are complete.

I tested it with the first job with an Alt restriction of 15deg, but it seems not to work. The scheduler changes to sleeping mode when Alt reaches approx. 18deg. Bug or feature?

I would have expected abort of the first job and continuing the second one. What type of restriction do you use?

TallFurryMan added a comment.EditedMar 7 2019, 8:02 AM

When Scheduler assigns a startup date to the job, it considers the altitude restriction and adds the altitude cutoff, then checks if the job has time to run.
On the opposite, when Scheduler is executing the job, it only aborts it when the altitude restriction is crossed. The altitude cutoff is not considered.
Because that's confusing to the end-user, I first provided the customization of that cutoff as an option. That's not extremely clearer that way, but at least a simpler test can be run by setting the cutoff to zero.

As I mentioned, the case that is of interest is the situation where the job has been executing for some time, and crosses the altitude restriction (or whatever restriction or operational issue that causes it to abort occurs).
The runtime part of Scheduler marks it aborted, and the first block of evaluateJobs bypasses its re-evaluation. The job is then removed from the sorted list before evaluating the schedule.
Yes, it ends up just after the last job that can still be run, but there's not much more the Scheduler can do while running.

So I'll be considering the JOB_ABORTED as a special case that must remain in the list in the original order.
When only JOB_ABORTED jobs remain in the list, I'll reset them to JOB_SCHEDULED.
It means that completed/error/invalid jobs will go to the end of the list to avoid disturbing the schedule, and that the order for aborted jobs will be preserved until the end.

I am still testing locally. If it makes clearer sense to leave ALL jobs in the same order, I'll do that.

Hm, I am not so sure whether such a cut-off makes it better. At the end we have two parameters that describe the same limit. For me it was confusing, because I was not aware of the additional cut-off parameter.
Regarding re-sorting completed/error/invalid jobs, I would vote against. We have two modes for the scheduler: either manual sequence or automatically sorted. In the manual sort, I would prefer no change of the order at all. With the automatic sort, that's another story, there it makes sense.

Updated the patch to properly handle aborted jobs.
Now, we can discuss whether it makes sense to reorder completed/invalid/error jobs.
The altitude cutoff, while confusing, remains important to avoid focusing/aligning and finding the job under the restriction when starting capture.

Ok, so I'm bit late to join this discussion. Is this now rebased on D19528 and the discussion that followed there?

D19528 is needed, and is independent of D19393. D19528 does fix a guiding recovery issue, but relies on D19393 to fix a problem with aborted jobs being rescheduled and pushing following jobs away in that situation.
The current discussion on D19393 is about whether all jobs should be left in the original schedule order or completed/errored/invalid jobs should be moved at the end. The current patch implements the second item, but should still be tested. I can do the first item as a later improvement over the original 3.0 design.
I'd say D19528 and D19393 should be merged at the same time in that order. In terms of merge, however, they are independent and should not cause conflicts.

TallFurryMan edited the test plan for this revision. (Show Details)
wreissenberger accepted this revision.Mar 8 2019, 9:04 PM

Looks good, both restarting aborted jobs and the cutoff limit work fine. If I am the only one who doubts the necessity of a cutoff for the altitude limit, let's go. Maybe a hint in the hover help about the existence of the parameter might be helpful.

This revision is now accepted and ready to land.Mar 8 2019, 9:04 PM

Ah yes I forgot to comment on this. I think the feature is important because for now we cannot estimate properly how long preliminary steps will take. So that's one more of them for the altitude. It was there before, I don't remove it and I make it customizable, as I will for others. This said, I fully agree it is quite hidden for now. I want to rework the UI of the Scheduler to better manage all these parameters and I will take this into account. But before that, I will follow up on your attempts to divide the code of the Scheduler because frankly that file is horrible :)

Well, horrible is such a strong word :-P Fully agree! Remember, the idea was making the scheduler job more intelligent. Shall I give it a try?

Shall I give it a try?

Sure, and I have suggestions :) let's do this really step by step. At the extreme, because the source is so complex, I'd say let's go function by function, ensuring that they match features.

There are different stages I think:

  • Frame map consolidation. There is a local part in the job, and a consolidation part in the scheduler. Here job can be considered independent from scheduler I think. But it's easy to add regressions, as we observed last year.
  • Astronomical calculations for restrictions. It was difficult to make those independent from each other, and they rely on scheduler properties, so prototypes will be tricky. Besides, calculating some values in the future like dawn and dusk is still completely bugged. We won't be able to do that in one step.
  • Job state manipulation. This was mostly covered last year, but may still be fragile.
  • Serialization to/from storage. We had a discussion last year about switching to QT serialization instead of manipulating files. Should be a good opportunity to move forward.
  • Relation with sequence files. If we move that part we can do far better in terms of storage organization. For instance, it's annoying to require the end-user to build a sequence file with a target folder before a scheduler job can be started. The sequence file would still need the correct CCD to be connected to Ekos prior to be used, but at least we could start templating sequences used in the scheduler, replacing a few values only. That is too difficult to do for now with the code as it is.

As a side note I want to make all jobs except the running one editable while scheduler is running, so that we'll be able to read schedules from an external source. Will be easier when code is moved away.

wreissenberger added a comment.EditedMar 10 2019, 11:31 AM

Shall I give it a try?

Sure, and I have suggestions :) let's do this really step by step. At the extreme, because the source is so complex, I'd say let's go function by function, ensuring that they match features.

Fully agreed! Wouldn't it better if we lead this discussion in a separate thread or on the kstars forum?

As a side note I want to make all jobs except the running one editable while scheduler is running, so that we'll be able to read schedules from an external source. Will be easier when code is moved away.

Take a look at D19648 - at least we can now inspect other jobs while they are running.

mutlaqja accepted this revision.Mar 17 2019, 5:09 PM
This revision was automatically updated to reflect the committed changes.
TallFurryMan added a comment.EditedDec 4 2019, 4:13 PM

EDIT: wrong ticket......