Firstly: if you're CPU bound for 40 seconds chances are you need to rethink your algorithm or your programming language. PHP IMHO still does best as a user facing, HTML/JSON/whatever output format producing engine. That's why it's built on a shared nothing architecture: one request gets one CPU, computes and terminates. You scale this architecture over multiple CPUs by distributing the requests among them.
That said, if a single request takes 40s to complete, you might have luck with using Worker threads, that will you need to spin off your main thread and which can run on your under-utilized CPUs. But again, that's a whole mountain of complexity to scale - think hard before starting down that route.
A probably easier alternative to do-it-yourself thread management is to fork out work to dedicated worker processes. Gearman is a well known framework to help you with this.