I'm using Doctrine ODM to connect to MongoDB. I have a three node replica set: two fulls and one arbiter. Replication is solely for higher availability, I don't seek to distribute reads across nodes. My application is logging MongoCursorException
s every so often with the message not master and slaveok=false
. I don't see any evidence that a failover event occurred in the mongodb logs and the primary didn't change. CORRECTION: failover had indeed occurred, but the exception with not master and slaveok=false
was appearing often, even showing up 6 hours after a new primary was successfully elected.
What to do? I see our version of doctrine-mongodb includes (experimental?) retry functionality, but I don't see an easy way to enable that.
Not sure if it matters, but this is a Symfony2 (v2.0) app.
https://groups.google.com/d/topic/mongodb-user/6p710Rdycpg/discussion implies that we need retries (emphasis mine):
Your application must be written to reconnect/retry since there are any number of transient (network) errors which could come up much like the rolling upgrade process during normal operation.
The Mongo PHP extension docs seem to account for this:
The driver will automatically retry "plain" queries (not commands) a couple of times if the first attempt failed for certain reasons. This is to cause fewer exceptions during replica set failover (although you will probably still have to deal with some) and gloss over transient network issues.
And I thought doctrine-mongodb
just used the PHP extension to actually talk to mongod. So I'm left a bit confused whether or not I should have to worry about retry configuration.
I think I solved part of the problem: I removed the arbiter from the connection string following this advice from Kristina Chodorow. I'm no longer seeing any MongoCursorException
s with the message not master and slaveok=false
. I might have been hitting https://jira.mongodb.org/browse/PHP-392.
However, I'm still getting a few MongoCursorException
s with the message couldn't determine master
during failover. For example, I just did a failover; based on the mongod logs, a new primary was elected after a few seconds, but the web application was throwing that exception even 5 minutes later.