Fix execution of repeating scheduler jobs.
ClosedPublic

Authored by TallFurryMan on May 20 2018, 8:44 PM.

Details

Summary

This change properly executes jobs when they repeat or are duplicated.
While checking the interesting edge case of captures stored in /tmp being immediately deleted, I made sure jobs being evaluated with no repeats remaining, but for some reason still not completed, would properly be re-checked for captures. I also added a safeguard to avoid having the repeats remaining getting unexpectedly negative.

This change also makes sure job completion is evaluated before a batch repeats. Repeats, in conjunction with option 'Remember Job Progress', ensure the proper amount of captures is stored. For the same target, if job A has a repeat count equal to 1 and job B has a repeat count equal to 2, the amount of captures stored after A and B finish will equal 2 sequence items. This also means that if B runs first, A will not run as it will be considered complete.

This change also works around an issue with /tmp, where captures stored are immediately deleted for some odd reason, and uses /var/tmp instead for unitary tests.

This change also introduces test vector 'distant_jobs_no_twlight', which tests the ability of the scheduler to sleep and park the mount between jobs.
Speaking of parking, it seems the Telescope Simulator is unable to handle parking on my setup. The simulator will always initialize unpark, and will never notify parking state. It will however, move and park properly. There is still an issue with the scheduler in a specific edge case, where the scheduler is unable to consider the mount unparked, and will loop until the mount is unparked manually. Because these issues are interleaved, I consider the problem in the Telescope Simulator must be fixed first.

This change also fixes an issue observed with duplicated repeating jobs, where the optimization on step pipelines was preventing the job from completing and also letting the next job execute. On the subject of capturing, there is still a weird problem with the Capture Tab and the CCD Simulator, where sometimes not all captures of a set of sequence jobs are stored. This is visible with test vector 'duplicated_scheduler_jobs_duplicated_sequence_jobs_no_twilight', in which sometimes only 4 captures will store on the first job run instead of the expected 7 RGBLRGB. The scheduler is able to mitigate this by noticing there are missing captures, and rescheduling the job to execute again.

This change also fixes - again - the capture count algorithm. Now capture counts are displayed properly in the scheduler queue! This one was tricky, so I added a large documentation block in updateCompletedJobsCount. It's important to note that while its name doesn't give any hint, this function will also change the behavior of the set of sequence jobs associated to the scheduler job.

Finally this change fixes an issue/regression on scheduler shutdown by rewriting and clarifying how the scheduler decides to stop when evaluating jobs.

Test Plan

Use 'simple_test_no_twilight' to verify the general behavior of the scheduler. The issue with parking the Telescope Simulator can be triggered there. Depending on the time this test is executed, the ability of the scheduler to park and wait or to sleep can be verified too.

Use 'duplicated_scheduler_jobs_duplicated_sequence_jobs_no_twilight' to test completion checks. The issue with partial captures during the first run of a set of sequence jobs can be triggered there. Note the special manipulation of duplicated scheduler jobs, which all return to idle simultaneously in to be re-evaluated for completion.

Use 'distant_jobs_no_twilight' to test specifically the ability of the scheduler to park and wait for a job that is 12 hours later.

Diff Detail

Repository
R321 KStars
Branch
bugfix__repeated_job_not_scheduling (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
TallFurryMan created this revision.May 20 2018, 8:44 PM
Restricted Application added a project: KDE Edu. · View Herald TranscriptMay 20 2018, 8:44 PM
Restricted Application added a subscriber: kde-edu. · View Herald Transcript
TallFurryMan requested review of this revision.May 20 2018, 8:44 PM

Sorry, this differential is not final, as stated in the message. Hence there was not - yet - a reviewer.

TallFurryMan planned changes to this revision.May 24 2018, 3:44 PM

Didn't know about "plan changes", thanks.

Nearly ready. Little things here and there to check again and again.

The telescope simulator has a problem with parking. I can't mitigate this in the scheduler, but at least it's behaving as correctly as it can. There's still a state machine issue with parking that I need to take care of.

The CCD simulator plus the capture sequence tab have an issue with delay between captures. I don't know where it is exactly, but the scheduler is able to circumvent this now. There is also a problem with storage of captures that silently fail sometimes, but I won't do anything on this. And the storage in /tmp which gets captures deleted, and which is probably due to the fits viewer, same.

Finally there is still an issue with repeated scheduler jobs that will repeat whatever the completion status. I'm still looking into this, that should go away quickly.

Stay tuned :)

What's the issue with Telescope Simulator parking exactly?

I don't know precisely. The park command succeeds, but the notification that the mount is parked never seem to come up. That makes the scheduler abort at stop, but that's OK for testing. Besides, the mount is always unparked when simulator is initialized. This causes the scheduler to sometimes miss a step when checking park state, but that's OK for testing too.

  • Fix execution of repeating scheduler jobs.
  • Work around /tmp issue in .esq jobs by using /var/tmp.
  • Rewrite and fix capture completion estimation
  • Fix re-evaluation of successive duplicated scheduler jobs
  • Fix issue with scheduler aborting schedule too soon
  • Fix startup/shutdown and park wait state.
TallFurryMan edited the summary of this revision. (Show Details)May 27 2018, 8:32 PM
TallFurryMan edited the test plan for this revision. (Show Details)
TallFurryMan edited the summary of this revision. (Show Details)

Jasem, this is good to go. I think there are cool stability and robustness features now, I tried to shake the scheduler in some interesting ways.
Now I've probably got a new item on my todo list: record a helper video showing the new capabilities...

This is getting pretty exciting Eric! A video would be awesome and should clarify the mysterious scheduler to a lot of users.

mutlaqja accepted this revision.May 28 2018, 5:32 AM
This revision is now accepted and ready to land.May 28 2018, 5:32 AM
This revision was automatically updated to reflect the committed changes.