###### weixin_39628180
2020-12-31 03:21 阅读 8

# Wavelet Log Likelihood (Carter & Winn 2009)

Is your feature request related to a problem? Please describe. I'm using exoplanet to integrate PyMC3 with efficient transit modeling. I have the base system running smoothly -- thanks to great tutorials. Thank you.

I was requested by a collaborator to include the Carter & Winn (2009) Daubechies wavelet likelihood.

(1) I am able to generate log_likelihood as a float value. So that first feature request that I would ask is how to add that into the xo.optimize or pm.sample calls?

(2) If we can bypass wavelets altogether by using a GP to estimate sigma_r and sigma_w efficiently, then I would be more than happy to do that. We've previously discussed using a GP to model the Daubechies wavelet likelihood; but I would need for help in selecting the GP+Kernels combinations to do it.

As a secondary feature request, I would ask for a tutorial to estimate the residual red noise in a light curve the way that our community to put in tables.

Describe the solution you'd like I see from #70, that adding in a likelihood function is feasible; but I get the impression that there is more to do than a simple xo_logp + wavelet_logp.

Is there a way to take a function (with 2 inputs + 3 PyM3 parameters) and "add" it to the likelihood computed inside exoplanet?

Describe alternatives you've considered I grabbed the necessary bits of code form 's package mc3 https://github.com/pcubillos/mc3/blob/master/mc3/stats/stats.py [line 209]

I can compute the wavelet likelihood (outside of PyMC3+XO) as a float value; but how to wrap this into something that xo can interpret as an addition to the log-likelihood built in?

Additional context If there is a much simpler way to use the GP module to estimate sigma_r and sigma_w, then I could bypass this whole issue.

At the same time, others may still want a wavelet likelihood added onto their light curve posteriors. As such, may I suggest this feature overall.

• 点赞
• 写回答
• 关注问题
• 收藏
• 复制链接分享

#### 15条回答默认 最新

• weixin_39628180 2020-12-31 03:21

In a directly related problem, I am able to compute the wavelet likelihood given the model_prediction and data in np.array format, to later subtract as the residuals.

The function call is

dwt_chisq(model_prediction, data, [gamma, sigma_r, sigma_w])

To run this with xo via PyMC it places model_prediction as a tensor (i.e. theano.tensor.var.TensorVariable); also, gamma, sigma_r, and sigma_w are each of type theano.tensor.var.TensorVariable.

In order to run the dwt_chisq as is, I would need to output a numpy array from each of those tensors. What is the appropriate method to do so?

I tried .eval(); but that had very little useful outputs. Please let me know if I'm off base, such as "it's not possible to output a np.array type" or "that's now how you use .eval()"

Thank you!

点赞 评论 复制链接分享
• weixin_39948277 2020-12-31 03:21

Hi Jonathan,

The catch here is that you need to provide the wavelet likelihood as a Theano op and you need to be able to compute the gradient of this operation with respect to its inputs. Here's how I worked out the gradients for celerite and I bet something similar would be possible (and maybe not even too onerous!) for the C&W method.

But, that being said, I wouldn't bother! You can use celerite and it'll be just as fast and have fewer restrictions (it can model more complicated power spectra and it doesn't require evenly spaced data). If your model is something like:

python
log_s2 = pm.Normal(...)
log_Sw4 = pm.Normal(...)
log_w0 = pm.Normal(...)

kernel = xo.gp.terms.SHOTerm(log_Sw4=log_Sw4, log_w0=log_w0, log_Q=1/np.sqrt(2))
gp = xo.gp.GP(kernel, x, yerr ** 2 + pm.math.exp(log_s2))


then log_s2 is the variance of the white noise component and log_Sw4 will be related to the "red noise" variance that you want. Honestly it's not obvious that the sigma_r parameter is a very meaningful number since its value will depend sensitively on the specific choice of power spectrum model, but it should be possible to calibrate a relationship between that number and the parameters of a celerite model if you want. Take a look at Section 4 of the celerite paper to see more about the model specification there.

Hope this helps!

点赞 评论 复制链接分享
• weixin_39838829 2020-12-31 03:21

To add to what said, the wavelet model from the Carter & Winn paper solves for the correlated noise component assuming a specific power-spectrum: f^{-1}, aka pink noise. (Their paper in principle allows for other power-laws, f^{-\nu}, but the fast scaling of the algorithm only works for \nu=1, if I remember correctly).

This power-spectrum is not a great description of stellar variability. It has a scaling at high frequency which causes sharper variations in the correlated noise component that does a stochastically-driven simple damped harmonic oscillator. It is also even stronger at high frequency than a damped random walk (aka Ornstein-Uhlenbeck process). In practice, however, what I expect this means is that the short-timescale variability will be conflated with the white noise component of variability.

Also, at low frequencies f^{-1} is too large - in fact, it diverges, which means that there is power on long timescales. In practice this may not be too much of an issue since any dataset has a finite duration. So, I think f^{-1} worked well for Carter & Winn thanks to these saving factors.

That said, there may be ranges of frequency where an f^{-1} slope is valid, and that can be approximated well with, e.g., three Q=1/2 celerite terms, as in this plot:

Comparing these, you can see that the pink noise continues to rise at low frequencies, and dominates at high frequencies. But, as I mentioned above, a white-noise component will always dominate at the highest frequencies. So, as Dan mentioned, you can approximate this wavelet spectrum over some dynamic range of the power spectrum with the celerite GP.

点赞 评论 复制链接分享
• weixin_39948277 2020-12-31 03:21

: Do you mean Q=1/sqrt(2) rather than 1/2?

点赞 评论 复制链接分享
• weixin_39838829 2020-12-31 03:21

I used Q=1/2 for this plot. Here's the Julia code:


using PyPlot
clf()
nu = 10.0.^(collect(-1:0.01:1.5))
s(s0,nu0,nu) = s0/((nu/nu0)^2+1)^2

s1 = s.(1.0,1.0,nu)
s2 = s.(0.25,4.0,nu)
s3 = s.(0.0625,16.0,nu)
loglog(nu, s1 .+ s2 .+ s3, label=L"3 Q=$1/2$ terms")
loglog(nu, 0.52 ./nu, label = L"Pink: $f^{-1}$")
loglog(nu, s1, alpha=0.3,color="C0")
loglog(nu, s2, alpha=0.3,color="C0")
loglog(nu, s3, alpha=0.3,color="C0")
legend()
axis([0.1,30.0,1e-2,10.0])
xlabel(L"$\omega$")
ylabel(L"$S(\omega)$")

点赞 评论 复制链接分享
• weixin_39948277 2020-12-31 03:21

: there will be numerical issues for Q=1/2 so I generally recommend Q=1/sqrt(2) or something else that's not exactly 1/2.

点赞 评论 复制链接分享
• weixin_39838829 2020-12-31 03:22

Right, that's the Matern kernel. Here's a case with Q=1/3 which looks even better.


using PyPlot
clf()
nu = 10.0.^(collect(-1:0.01:1.5))
s(s0,nu0,nu,Q) = sqrt(2/pi)*s0*nu0^4/((nu^2-nu0^2)^2+(nu*nu0)^2/Q^2)

#loglog(nu,1.0./(nu.^2 .+1).^2 .+ 0.25 ./ ((nu./4).^2 .+1).^2 .+ 0.0625 ./ ((nu ./16).^2 .+ 1).^2)
s1 = s.(1.4,0.8,nu,0.33)
s2 = s.(0.25,4.0,nu,0.33)
s3 = s.(0.0725,20.0,nu,0.33)
loglog(nu, s1 .+ s2 .+ s3, label=L"3 Q=$1/3$ terms")
loglog(nu, 0.27 ./nu, label = L"Pink: $f^{-1}$")
loglog(nu, s1, alpha=0.3,color="C0")
loglog(nu, s2, alpha=0.3,color="C0")
loglog(nu, s3, alpha=0.3,color="C0")
legend()
axis([0.1,30.0,1e-2,10.0])
xlabel(L"$\omega$")
ylabel(L"$S(\omega)$")

点赞 评论 复制链接分享
• weixin_39628180 2020-12-31 03:22

python
log_s2 = pm.Normal(...)
log_Sw4 = pm.Normal(...)
log_w0 = pm.Normal(...)
<p>kernel = xo.gp.terms.SHOTerm(log_Sw4=log_Sw4, log_w0=log_w0, log_Q=1/np.sqrt(2))
gp = xo.gp.GP(kernel, x, yerr ** 2 + pm.math.exp(log_s2))
</p>

then log_s2 is the variance of the white noise component and log_Sw4 will be related to the "red noise" variance that you want.

You are both great! Thank you for the discussion; and, especially, the code segments. I'm a big fan of GPs. I just have a lot of problems picking out the 'best' kernel+hyperparameter combos, for a given problem set.

For instance, I read many sections of the the celerite paper; and, we had talked in person about using the SHO kernel in replacement for the CW-Wavelets. My issue is that I find it difficult to know which SHO hyperparameters to set. I would easily have let Q float, instead of setting it to 1/3 or 1/sqrt(2). I still don't fully know what prior bounds to place on the log_s2, log_sw4, log_w0 terms; but I'll probably model it after these examles: example1, example2.

Side question: what did you (both) mean, when you said "Q=1/2 is the Matern kernel". Related to the above question, I was already trying out the Matern kernel, because I thought that it was the appropriate method to characterize "1/f" noise.

This discussion has been a big help. I already cited the exoplanet+celerite paper; I think that I'll also add a footnote to this discussion.

点赞 评论 复制链接分享
• weixin_39628180 2020-12-31 03:22

As a secondary conversation: when collaborating on an observation paper, the conversation often starts with 'we need to measure the red noise'; but then it also includes 'GPs are too complicated/versatile'. I have personally said, and heard from others, phrases such as "GPs may eat the transit/eclipse"; i.e. when fitting a GP freely, they can 'model' the transit/eclipse as a correlated noise source.

In the framework of academic inertia, may I suggest that we could make small examples (i.e. notebooks) that show how to use exoplanet+celerite to model previously established norms in our field.

In the celerite paper [page 34], you discuss CW-wavelets, their limitations, and how to approximate them using celerite with S(w) ~ w^(-1). I didn't know what that would look like in code format until today.

I can also make a notebook comparing CW-Wavelets to Celerite+SHO as well. I'll try to make that soon -- i.e., this semester -- then PR it for you to include if you wish. I think that our community would find it helpful to know where to use celerite as an improvement (or just replacement) for the existing methods.

Thank you both; and the team for this great work!

点赞 评论 复制链接分享
• weixin_39628180 2020-12-31 03:22

Combining example1 and example2, here is what I came up with:

python
def build_gp_pink_noise(time, data, dataerr, Q=1.0 / np.sqrt(2)):

log_S0 = pm.Normal("log_S0", mu=0.0, sigma=15.0,
testval=np.log(np.var(data)))
log_w0 = pm.Normal("log_w0", mu=0.0, sigma=15.0,
testval=np.log(3.0))
log_sw4 = pm.Deterministic("log_variance_r", log_S0 + 4 * log_w0)

log_s2 = = pm.Normal("log_variance_w", mu=0.0, sigma=15.0,
testval=np.log(np.var(data)))

kernel = xo.gp.terms.SHOTerm(log_Sw4=log_Sw4, log_w0=log_w0, log_Q=np.log(Q))
gp = xo.gp.GP(kernel, time, dataerr ** 2 + pm.math.exp(log_s2))

return pink_gp


such that my script, which also creates PyMC3 priors for eclipse_depth, mean, etc inside:

python
with pm.Model() as model:
mean = pm.Normal(f"mean", mu=1.0, sd=1.0)

edepth = pm.Uniform("edepth", lower=0, upper=0.01)
sqrt_edepth = pm.math.sqrt(edepth)

# Compute the model light curve using starry
# Set up a Keplerian orbit for the planets
light_curve = star.get_light_curve(orbit=orbit, r=sqrt_edepth, t=times)

# The likelihood function assuming known Gaussian uncertainty
pm.Normal(f"obs", mu=light_curve, sd=dataerr, observed=data)

# Build GP model
gp = build_gp_pink_noise(time, data, dataerr, Q=1.0 / np.sqrt(2))

map_soln = xo.optimize(start=model.test_point)
trace = pm.sample(
tune=3000,
draws=3000,
start=map_soln,
chains=mp.cpu_count(),
step=xo.get_dense_nuts_step(target_accept=0.9),
cores=mp.cpu_count()
)


[New Question]: In my example above, are log_s2 and log_S0 the exact same thing? or do I need both variables? Should I (effectively) set log_s2 = log_S0 and log_SW4 = log_S0 + 4 * log_w0?

点赞 评论 复制链接分享
• weixin_39948277 2020-12-31 03:22

No - log_s2 is the white noise variance. There's no reason why this should be related to the GP amplitude!

点赞 评论 复制链接分享
• weixin_39628180 2020-12-31 03:22

I created a synthetic data set with pink noise, and then modeled that data with exoplanet + celerite. The full example, which runs from end-to-end for me, can be found here

The important part for this discussion is included here

python

def build_gp_pink_noise(times, data, dataerr,
log_Q=np.log(1.0 / np.sqrt(2))):
''' Build a pink noise GP iterating over S0, w0, Sw4

times (nDarray): time stamps for x-array
data (nDarray): normalized flux for transit light curve
dataerr (nDarray): uncertainty values per data point
log_Q (float): hyperparameter for SHO kernel [Q=1/sqrt(2): pink noise]
'''
log_w0 = pm.Normal("log_w0", mu=0.0, sigma=15.0,
testval=np.log(3.0))
log_Sw4 = pm.Normal("log_variance_r", mu=0.0, sigma=15.0,
testval=np.log(np.var(data)))
log_s2 = pm.Normal("log_variance_w", mu=0.0, sigma=15.0,
testval=np.log(np.var(data)))

kernel = xo.gp.terms.SHOTerm(
log_Sw4=log_Sw4, log_w0=log_w0, log_Q=log_Q)

return xo.gp.GP(kernel, times, dataerr**2 + pm.math.exp(log_s2))


I was able to run this model and converge to a meaningful solution; but I have 3 questions about this.

1. Is this model what you were suggesting?

2. The results that I am getting are not within my expectations: log_s2 is ~ -30 (very very small) log_Sw4 is ~ -5; which is small, but always much larger log_s2.

I was expecting the log_s2 ~ -10 (i.e. 50 ppm) and the log_Sw4 ~ -12 (i.e. 5 ppm). I think that I set the pink scatter to be 10% of the white scatter (see example, lines 186 + 207: with sigma_r=0.1

Would you happen to know why I am finding a much larger log_Sw4 than log_S2? I assume that I screwed something up.

3. How does the GP method interface with an AICc / BIC calculation?

I fit 32 models (other than GPs) with a range of AICc / BIC values from 150-350. When I add the GPs, should I assume 3 extra parameters? I am finding improved logp (within 10%). The AICc and BIC both increase by ~4% (more than enough to reject the GP models).

But I don't know if that is as direct a calculation as I think it is. And of course, I could always have simply coded something wrong to begin with, GP-wise.

Here is a colab link to see the example in action

点赞 评论 复制链接分享
• weixin_39628180 2020-12-31 03:22

There's a completely unrelated problem that I've been encountering. I can enter either the inclination, impact parameters, or transit duration; but only one a time. [this is built in; no problem]

The problem arises that I've tried this exact same code on 3 machines, and ~10 times per machine. Depending on the machine and time of day (i.e. randomness), the KeplerianOrbit object does not like the form of one or two values in the set [inclination, impact parameter, duration] that I give it. Note that I give it the exact same values every time.

Every 3rd - 5th attempt results in an error that says something like "inclination is a bad value" (I'll edit that when I find it next). So I change to using duration or impact parameter. That will work for 3 - 5 more attempts, and then fail. At which point, I rotate through the list and it works again.

On the colab link, if you find an error related to [inc, b, or dur], then please goto the Cell 4 and change the inc for the dur or b as needed.

I should probably make this its own issue

点赞 评论 复制链接分享
• weixin_39948277 2020-12-31 03:22

That second comment should be a separate issue - feel free to open one!

There are a few issues with the first one:

1. First, you're including the likelihood twice:
python
# The likelihood function assuming known Gaussian uncertainty
pm.Normal("obs", mu=light_curve, sd=dataerr, observed=data)
gp = build_gp_pink_noise(times, data, dataerr, log_Q=log_Q)
gp.marginal("gp", observed=data)


The first one is the likelihood assuming white noise (with the quoted error bars) and the second one is assuming both white noise and a GP. The first one shouldn't be there at all!

1. The GP model currently isn't taking the transit light curve model into account so the values you're getting are meaningless. The easiest change would be to condition on the residuals (observed=data - light_curve) or you could provide a mean function to the GP:
python
def lc_model(t):
light_curves = xo.LimbDarkLightCurve(u).get_light_curve(
orbit=orbit, r=r, t=t
)
return pm.math.sum(light_curves, axis=-1) + mean

gp = xo.gp.GP(..., mean=lc_model)

1. Finally, I think that the number you actually want will be something like the integrated autocorrelation which would give you the total power in the GP (although I'm not very familiar with how "pink noise" is defined in the literature you're familiar with so it might take some more digging, perhaps has more thoughts). If that is what you want then you'll want to compute something like:

2 * integral(kernel(dt), {dt, 0, infty})


Which, I think, for an exoplanet celerite kernel will be:

python
ar, cr, ac, bc, cc, dc = kernel.coefficients
variance = 2 * tt.sum(a / c)
variance += 2 * tt.sum(ac * cc / (cc ** 2 + dc ** 2))
variance += 2 * tt.sum(bc * dc / (cc ** 2 + dc ** 2))


but use this with a grain of salt for now and double check the math!

点赞 评论 复制链接分享
• weixin_39838829 2020-12-31 03:22

My understanding from Carter & Winn is that pink noise has a 1/f power spectrum. However, this power spectrum isn't normalizable, so the ACF doesn't exist! In practice I think it is defined over some range of frequencies, which might correspond to the inverse exposure time and total duration of the data. Then, the ACF is defined in terms of the difference of inverse cosine integrals (which I gathered from some googling).

点赞 评论 复制链接分享