weixin_39687881 2020-11-20 20:11 采纳率: 0%
浏览 0

Job Concept Unification

This fixes #3215 by eliminating JobGraph and JobNode in favor of a single JobDescription class, which stores the official single copy of a job's requirements and service and child relationships, all by ID reference.

This should make #3216 much easier by providing a sensible type to use.

该提问来源于开源项目:DataBiosphere/toil

  • 写回答

5条回答 默认 最新

  • weixin_39687881 2020-11-20 20:11
    关注

    This is starting to pass a lot of the tests, so I think it's about ready for review. There are a few fixes for other issues that came up in testing, and some py2->3 fixes I made along the way, but I'm not sure it makes sense to try and unmake them here and turn them into their own PRs.

    The major changes are: * JobGraph and JobNode are no more. There is now only JobDescription, which contains all the scheduling information. * JobDescriptions are not comparable for equality. We want to move towards a system where only one JobDescription for a particular job can exist in memory at a time. * Instead of duplicating the fields for ID, name, predecessors/successors, and requirements, Job now contains a JobDescription. * On serialization, the JobDescription is persisted separately from the Job, so the leader can load the description only. * Jobs storing other Jobs for insertion along with them into the job store has been unified. Instead of keeping a list of child Jobs, and a list of follow-on Jobs, or whatever, Jobs use ID references (via JobDescription) to describe relationships, and all jobs not yet in the job store live in a "registry" dict by ID, shared among all jobs in a connected component of the graph of job relationships. * When JobDescriptions are first created, they use temporary fake IDs to facilitate ID references. The JobStore now assigns final IDs to all the JobDescriptions in a graph before we serialize any of them. * Services are no longer themselves Jobs, since they were never scheduled as Jobs. * I created some service start-up bugs and had to refactor the ServiceManager slightly to understand them. The ServiceManager now has a concept of a job's services having failed to start, and will log accordingly. * JobDescriptions internally distinguish child and follow-on relationships. The stack member is now a property that synthesizes a stack from the child and follow-on relationships, instead of a directly mutable list. Instead of splicing the stack, there are now methods on JobDescription to drop completed successors. * Because the stack can't be freely spliced anymore, I couldn't work out how to chain to a child, and then go on to chain to a follow-on of the original job. Now, when a job has both children and follow-ons, chaining stops. * I created promise serialization order bugs, and to debug them I improved reporting of promise passing mistakes. Error messages should now describe the jobs that the offending promise was passed between.

    评论

报告相同问题?