Subset Sampling
The basic idea of subset sampling is the subdivision of the failure event into a sequence of m partial failure events (subsets) F1, F2, ···, Fm = F. Generally, the determination of small failure probabilities PF with the aid of Monte Carlo simulation requires the expensive simulation of rare events. The division into subsets (subproblems) offers the possibility to transfer the simulation of rare events into a set of simulations of more frequent events. The determination of the failure regions Fi can be effected by presetting a series gi|i=1...m of limit values, whereas m denotes the (unknown) total number of subsets.
This enables the computation of the failure probability as a product of conditional probabilities P(Fi+1|Fi) and P(F1).

The determination of the failure regions Fi and subsequently the partial failure probabilities Pi =P(Fi+1 | Fi) influences the accuracy of the simulation. It is convenient to specify the limit values gi|i=1...m so that nearly equal partial failure probabilities Pi|i=2...m are obtained for each subset. Unfortunately, it is difficult to specify the limit values gi in advance according to the prescribed probability Pi. Therefore the limit values have to be determined adaptively within the simulation.
Subset sampling algortihm
In the first step the probability P1=P(F1) is determined by application of the direct Monte Carlo simulation.

To obtain conditional probabilities P(Fi+1|Fi) the evaluation of the respective probability functions

is necessary. With the application of the Markov chain Monte Carlo simulation in conjunction with the Metropolis-Hastings algorithm samples may be generated in a numerically efficient way according to f(x|Fi).





Numerical efficiency
The numerical efficiency can be assessed with the demanded number of samples. In contrast to Monte Carlo simulation the required sample size for subset sampling cannot be predetermined in advance. A coarse approximation of the number of samples NT, which is sufficient to estimate PF, is given by

The parameter p0 denotes the predefined failure probability of the respective susbet and
denotes the coefficient of variation. The exponent r < 3 specifies the correlation of the Pi of the individual subsets. The value of
represent the correlation among Markov chain samples and depends itself on the selected proposal density function q. The accuracy of the conditional probability estimator Pi is reduced for
> 0 compared to the case of independently distributed samples (
= 0).
The estimated computational effort of subset sampling vs. Monte Carlo simulation is shown in the following diagrams


Example
The susbet sampling method is demonstrated by means of a numerical example. The performance function

ss![]() 1st subset (direct Monte Carlo simulation) |
![]() 2nd subset (Markov chain Monte Carlo simulation) |

References
- Au, S, and Beck, JL (2001) Estimation of small failure probabilities in high dimensions by subset simulation, Probabilistic Engineering Mechanics 16(4):263–277.
- Liebscher, M, Pannier, S, Sickert, J, and Graf, W (2006) Efficiency improvement of stochastic simulations by means of subset sampling, In: Proceedings of the 5th German LS-DYNA Forum 2006. DYNAmore GmbH, Ulm.

