Trapezoid Rule Monte Carlo

monte-carlo

quadrature

Mixing the Trapezoid Rule with Monte Carlo

Author

Srinivas Vasudevan

Published

September 19, 2025

I recently encountered a Monte Carlo estimator (Masry and Cambanis 1990) that uses the trapezoid rule and wanted to write out / generalize it.

Masry, Elias, and Stamatis Cambanis. 1990. “Trapezoidal Monte Carlo Integration.” SIAM Journal on Numerical Analysis 27 (2): 225–46. https://apps.dtic.mil/sti/tr/pdf/ADA222380.pdf.

¹ Methods such as Clenshaw-Curtis or Gaussian Quadrature can do very well on smooth integrands with errors of \({\displaystyle O([2N]^{-k}/k)}\), where \(N\) is the number of quadrature points, and \(k\) is the number of derivatives \(f\) has.

Given a function \(f\) (we’ll assume \(f\) is \(C^\infty\) for simplicity, but some of these arguments can be changed depending on the smoothness) defined on the unit interval \([0, 1]\), we would like to compute the integral \(I = \int_0^1 f(x) dx\). Quadrature methods can easily deal with estimating these integrals ¹

Our plan will be to come up with a lower variance (but biased) estimator, and then generalize to computing integrals of the form \(\int_A f(x) p(x) dx\), where \(p(x)\) is a probability density on some subset of \(\mathbb{R}\). Finally, we’ll be able to get an estimator even if we only have access to a sampler and the unnormalized log density (e.g. in the case of MCMC).

The typical Monte-Carlo estimator for this type of integral is \(S_n = \frac{1}{n}\sum_{i=1}^n f(u_i)\), where \(u_i \sim U(0, 1)\) are IID uniform samples. This is an unbiased estimator for \(I\) and has variance \(var(S_n) = \frac{var(f)}{n}\), where \(var(f)\) is the variance of the random variable \(f(U), U \sim U(0, 1)\). In particular the variance is inversely proportional to \(n\).

Trapezoid Rule Estimator

If we look at error estimates of various quadrature rules, we see that the errors get better dependency than \(O(\frac{1}{\sqrt n})\) for estimating \(I\).

We can see from the table that we can get errors \(S_n - I\) that are at least \(O(\frac{1}{n^2})\) for various Newton-Cotes formula. From wikipedia.

Name	Error Bound
Trapezoid Rule	\(-\frac{1}{12n^2}f''(\xi)\)
Simpson’s \(\frac{1}{3}\) rule Rule	\(-\frac{1}{90n^4}f^{(4)}(\xi)\)
Boole’s Rule	\(-\frac{8}{945n^6}f^{(6)}(\xi)\)

where \(\xi\) lies in the interval of integration \([a, b]\).

The main idea is we can exploit the structure of the trapezoid rule (and in general Newton-Cotes rules) to get better scaling of the variance with respect to the number of samples \(n\).

Given \(n\) samples \(u_i \sim U(0, 1)\), we can sort the samples to form the order statistics \(v_i\). We’ll also add the points \(0\) and \(1\) to get \(n + 1\) points (let \(v_0 = 0\) and \(v_{n + 1} = 1\)).

We can then consider the trapezoid rule estimator: \(S'_n = \sum_{i=0}^{n} \frac{1}{2} (v_{i+1} - v_i)(f(v_{i+1}) + f(v_i))\)

Figure 1: Randomized Trapezoid Rule for \(\sin(\frac{\pi}{2} x)\)

The intuition is that for sufficiently smooth \(f\), \(S'_n\) should be a good estimator of the integral \(I\) due to the use of the trapezoid rule. Note that \(S'_n\) may be biased, and indeed Figure 1 shows an example of this, where the trapezoid rule approximates the integral well, but will always be an underestimate due to the concavity of the integrand. The hope is that we may trade off some bias for a reduction in variance.

As mentioned above, we can note the following deficiencies of this method, compared to quadrature rules and monte carlo rules.

This can be a biased estimator.
This is for \(U(0, 1)\) random variables and in one dimension, for which there are better methods.
In the current formulation, you have to evaluate the endpoints \(0\) and \(1\). This can be problematic for integrands (such as certain \(Beta(\alpha, \beta)\) densities which go off to \(\infty\) in the tails). We can actually get away with \(n\) points without evaluating the \(0\) and \(1\) points. Adding the points just simplifies the analysis and reduces the error by a constant.

Error bound on \(S'_n\)

Let’s take a look at the error \(S'_n - I\).

For simplicity we’ll reuse a form of the Trapezoid Rule Error ².

² See here

If we are estimating \(\int_{v_i}^{v_{i+1}} f(x) dx\), then the error of using \(\frac{1}{2}(f(v_{i+1}) + f(v_{i}))(v_{i+1} - v_i)\) is bounded by \(\frac{K_i(v_{i+1} - v_i)^3}{12}\), where \(K_i = \sup_{[v_i, v_{i+1}]} |f''(x)|\) (this can be derived through repeated applications of the Mean Value Theorem).

\(|S'_n - I| = |-\int_0^1 f(x) dx + \sum_{i=0}^{n}\frac{1}{2}(v_{i+1} - v_i) (f(v_{i+1}) - f(v_i))|\)

\(=|\sum_{i=0}^{n}\frac{1}{2}(v_{i+1} - v_i) (f(v_{i+1}) + f(v_i)) - \int_{v_i}^{v_{i+1}} f(x) dx|\)

\(\le\sum_{i=0}^{n}|\frac{1}{2}(v_{i+1} - v_i) (f(v_{i+1}) + f(v_i)) - \int_{v_i}^{v_{i+1}} f(x) dx|\)

\(\le\sum_{i=0}^{n}\frac{K_i}{12}(v_{i + 1} - v_i)^3\)

\(\le\frac{K}{12}\sum_{i=0}^{n}(v_{i + 1} - v_i)^3\)

Let \(x_i = v_{i+1} - v_i\). It’s well known that these lengths are jointly distributed like a Dirichlet ³.

³ See here

This makes the \((x_0, ... , x_{n}) \sim Dirichlet(1, 1, ...., 1)\)

We’ll use the following properties of moments of the Dirichlet distribution (n.d.).

Lin, Jiayu. n.d. “On the Dirichlet Distribution.” https://mast.queensu.ca/~communications/Papers/msc-jiayu-lin.pdf.

\(\mathbb{E}[X_i^k] = \frac{\Gamma(n)}{\Gamma(n + k)} \frac{\Gamma(k + 1)}{\Gamma(1)} = \frac{(n - 1)!k!}{(n + k - 1)!} = \frac{k!}{n (n + 1) (n + 2) ... (n + k - 1)}\) \(\mathbb{E}[X_i^kX_j^k] = \frac{\Gamma(n)}{\Gamma(n + 2k)} \frac{\Gamma(k + 1)^2}{\Gamma(1)} = \frac{(n - 1)!(k!)^2}{(n + 2k - 1)!} = \frac{k!^2}{n (n + 1) (n + 2) ... (n + 2k - 1)}\)

That is the \(k\)-th marginal moment grows like \(O(\frac{1}{n^k})\). and that the mixed moment \(\mathbb{E}[X_i^kX_j^k]\) grows like \(O(\frac{1}{n^{2k}})\).

We can look at \(\mathbb{E}[(S'_n - I)^2] = \mathbb{E}[(\frac{K}{12})^2 (\sum_{i=0}^{n + 1}(x_i)^3)^2]\) = \(\frac{K^2}{144}\sum_{i, j} \mathbb{E}[x_i^3x_j^3]\)

As the sum over the \((n + 2)^2\) terms is over moments which are all \(O(\frac{1}{n^6})\), this results in \(\mathbb{E}[(S'_n - I)^2)] = O(\frac{1}{n^4})\) (with an \(O(\frac{1}{n^2})\) bound on the bias) ⁴. A precise bound can be derived by carrying through the constants from the Trapezoid Rule and the Dirichlet moments.

⁴ One can easily get an error bound on the bias \(|\mathbb{E}[S'_n - I]|\) without squaring the sum. Mainly looking at \(\mathbb{E}[(S'_n - I)^2]\) gives us a stronger form of convergence, and one can still recover a bound on the bias.

Examples

Here’s some example on a few different functions on \([0, 1]\)

\(f = \frac{1}{1 + x^2}\)
\(g = e^x\sin(\frac{x}{2})\)
\(h = x^3 - x + x^2\).

Table 1: Absolute Error in estimating the integral of \(f = \frac{1}{1 + x^2}\), \(g = e^x \sin(\frac{x}{2})\), \(h = x^3 - x + x^2\).

Integral	Method	N=10	N=50	N=100	N=200	N=1000
0.785398	Monte Carlo	0.026491	0.019358	0.008144	0.002608	0.001540
0.785398	Trapezoid Rule	0.000393	0.000010	0.000038	0.000009	0.000000
0.785398	Simpson's 1/3 Rule	0.000002	0.000010	0.000001	0.000000	0.000000

Integral	Method	N=10	N=50	N=100	N=200	N=1000
0.488364	Monte Carlo	0.146778	0.003383	0.026795	0.034787	0.009164
0.488364	Trapezoid Rule	0.005188	0.000636	0.000118	0.000022	0.000001
0.488364	Simpson's 1/3 Rule	0.000131	0.000002	0.000000	0.000000	0.000000

Integral	Method	N=10	N=50	N=100	N=200	N=1000
0.083333	Monte Carlo	0.178741	0.057350	0.007101	0.011000	0.000868
0.083333	Trapezoid Rule	0.011265	0.001467	0.000173	0.000051	0.000002
0.083333	Simpson's 1/3 Rule	0.004317	0.000017	0.000001	0.000000	0.000000

We can see that the error of the estimates from the Trapezoid Monte Carlo estimator seem to decrease very rapidly.

“Improving” the Estimator

The above estimator can be extended to other Newton-Cotes-type estimators. A Newton-Cotes formula is an integration rule that estimates the integral via the integration of a polynomial interpolant on \(k + 1\) equally spaced points.

For instance, the trapezoid rule is just the integral of the linear polynomial between two equally spaced points.

For our purposes, we can take the interpolant and use it to derive an integration rule on non-qequally spaced points.

A Simpson’s Rule-type estimator, for instance would be of order 2 and use the \(3\) points next to each other (as seen in Figure 2).

Figure 2: Randomized Simpson Rule for \(\sin(2 \pi x)\)

The interpolating polynomial through the ordered points \(a, b, c\) would be \(p(x) = f(a)\frac{(c - x)(b - x)}{(c - a)(b - a)} + f(b)\frac{(c - x)(a - x)}{(c - b)(a - c)} + f(c)\frac{(a - x)(b - x)}{(a - c)(b - c)}\).

Then \(\int_a^c p(x) dx = -\frac{(a - c)(f(b)(a - c)^2 + f(a)(c - b)(2(b - a) - (c - b)) + f(c)(b - a)(2(c - b) - (b - a)))}{6(b - a)(c - b)}\)

If we call this result \(g(a, b, c)\), then we can form the estimator

\(S''_n = \sum_{i=0}^{\frac{n - 1}{2}} g(v_{2i}, v_{2i+1}, v_{2i+2})\)

Error analysis for this can proceed in a similar way (though very messily), which should result in an estimator with \(\mathbb{E}[(S''_n - I)^2] = O(\frac{1}{n^8})\). Similary we should have a \(O(\frac{1}{n^4})\) bound on the bias.

In general, we can set up estimators from larger interpolating polynomials, to get asymptotically faster converging estimators ⁵ ⁶.

⁵ Note that polynomial interpolation on equally spaced points can be very numerically unstable. These estimators defend against this by being on random points, but some care needs to be taken.

⁶ Note that while the Dirichlet moments will decay faster in \(n\) for higher order interpolants, they also pick up larger constants.

As seen in Table 1, we empirically get faster convergence with the Simpson’s-rule type estimator.

Extending to other probability distributions

Suppose now we have \(\int_{[a, b]} f(x) p(x) dx\), where \(p(x)\) is some probability density on some interval \([a, b]\)

The Monte Carlo estimate would be \(\sum_{i=0}^{n-1} f(x_i)\), where \(x_i \sim p(x)\). We would like to use our trapezoid estimator here, where we now have samples from \(p(x)\) rather than uniform samples.

We can naively apply the estimator by computing order statistics of \(x_i\), called \(z_i\), appending \(a\) and \(b\) to the ends much like before.

This gives the estimator \(S'_n = \sum_{i=0}^n \frac{1}{2}(z_{i+1} - z_i) (f(z_{i+1})p(z_{i+1}) - f(z_i)p(z_i))\)

Let \(v_i = F(z_i)\), where \(F\) is the CDF (we’ll assume this is completely differentiable). Then we have

\(S'_n = \sum_{i=0}^n \frac{1}{2}(z_{i+1} - z_i) (f(z_{i+1})p(z_{i+1}) - f(z_i)p(z_i)))\)

\(= \sum_{i=0}^n \frac{1}{2}(F^{-1}(v_{i+1})- F^{-1}(v_i)) (f(F^{-1}(v_{i+1}))p(F^{-1}(v_{i+1})) - f(F^{-1}(v_i))p(F^{-1}(v_{i+1})))\)

Via an application of the Mean Value Theorem we have: \(S'_n = \sum_{i=0}^n \frac{1}{2}F'^{-1}(s_i) (v_{i+1} - v_i)(f(F^{-1}(v_{i+1}))p(F^{-1}(v_{i+1})) - f(F^{-1}(v_i))p(F^{-1}(v_i)))\)

for some \(s_i \in [v_i, v_{i+1}]\)

This suggests that if \(\sup_{[0, 1]} |F'^{-1}(u)| = \sup_{[a, b]} \frac{1}{p(x)} < \infty\), then we can pick up another constant and proceed with our error analysis as before, and thus get a \(O(\frac{1}{n^4})\) bound on \(\mathbb{E}[(I - S'_n)^2]\). More careful treatment in something like Phillipe (1997) can loosen this restriction.

We can apply this rule to a Beta density \(p(x)\) of a \(Beta(2, 3)\) random variable, with our functions from Table 1 to estimate \(\int_0^1 f(x)p(x) dx\).

Table 2: Absolute Error in estimating the integral of \(f = \frac{1}{1 + x^2}\), \(g = e^x \sin(\frac{x}{2})\), \(h = x^3 - x + x^2\) with respect to the Beta distribution.

Integral	Method	N=10	N=50	N=100	N=200	N=1000
0.849556	Monte Carlo	0.022431	0.015612	0.006972	0.006267	0.003704
0.849556	Trapezoid Rule	0.071244	0.004386	0.001175	0.003429	0.000172
0.849556	Simpson's 1/3 Rule	0.021684	0.000002	0.000070	0.000004	0.000000

Integral	Method	N=10	N=50	N=100	N=200	N=1000
0.331193	Monte Carlo	0.047922	0.032177	0.012697	0.005025	0.003999
0.331193	Trapezoid Rule	0.000814	0.001131	0.005256	0.001520	0.000146
0.331193	Simpson's 1/3 Rule	0.011204	0.000972	0.001880	0.001769	0.000013

Integral	Method	N=10	N=50	N=100	N=200	N=1000
-0.085714	Monte Carlo	0.061996	0.014922	0.000664	0.006580	0.001264
-0.085714	Trapezoid Rule	0.009933	0.001845	0.000278	0.000761	0.000123
-0.085714	Simpson's 1/3 Rule	0.015353	0.000206	0.001751	0.001618	0.000059

As seen in Table 2, we can still outperform the naive Monte-Carlo estimator.

If we only have access to an unnormalized density \(h(x)\), \(\frac{h(x)}{Z} = p(x)\), an application of Slutsky’s Theorem gives us the following estimator (where we simultaneously estimate the normalizing constant \(Z\)):

\(S'_n = \frac{\sum_{i=0}^n \frac{1}{2}(z_{i+1} - z_i) (f(z_{i+1})h(z_{i+1}) - f(z_i)h(z_i))}{\sum_{i=0}^n \frac{1}{2}(z_{i+1} - z_i) (h(z_{i+1}) - h(z_i))}\)

Thus, given a non-zero unnormalized density on \([a, b]\) and access to a sampler, we can estimate an expectation to any smooth function with this estimator ⁷.

⁷ In general we can mix and match estimators for the normalizing constant in the denominator. These estimators will be consistent, but in general will not be unbiased.

Other

This type of estimator seems to have been originally derived from Yakowitz and Szidarovszky (1978) for the unit interval. Phillipe (1997) and related works extend the Riemann sum estimator to other distributions.

Yakowitz, Krimmel, S., and F. Szidarovszky. 1978. “Weighted Monte Carlo Integration.” SIAM Journal on Numerical Analysis 15 (6): 1289–1300.

Phillipe, Anne. 1997. “Importance Sampling and Riemann Sums.” Publications IRMA Université de Lille 43 (IV).