This should begin work on #2650.
I've reorganized the FileJobStore's file layout.
- We now break things up across
jobs/,stats/,files/shared,files/global, andfiles/for-job. - Files that get cleaned up when a job is destroyed are no longer under the job's directory. The
jobs/hierarchy holds only actual jobs. - We only create multiple levels of random subdirectories when a directory starts to get a lot of files/directories in it.
- We sort jobs by (and thus prefix their IDs with) a filename-safe version of the
jobName. This also applies to files that live infiles/for-job. - Each file written gets its own uniquely-named directory, so the file itself could potentially have a more reasonable name.
Things I still want to do (eventually):
- Combine
files/for-jobandfiles/global. Have just one directory for all non-shared files written by a job, with a subdirectory for the files that need to get cleaned up when the job is destroyed. - Stop naming files based on the call stack, and just keep their full original names in their unique subdirectories.
- Document the file job store file layout for people who need to dig into it for a failed workflow.
该提问来源于开源项目:DataBiosphere/toil