Chapter 12 Estimating transition intensities

12.0.1 Chapter 9: Exposed to Risk – Estimating Transition Intensities

This chapter provides the essential framework for quantifying risk from observed data. While the fundamental idea of calculating rates (deaths divided by exposure) seems simple, the practicalities involve meticulous handling of data and careful adherence to actuarial principles.

12.0.1.1 1. Introduction to Estimating Transition Intensities

At its core, estimating mortality rates involves counting the number of deaths within a specified observation period for a particular age and dividing by the number of lives exposed to the risk of death at that age. However, as your exam coach, I must highlight that this seemingly straightforward task introduces several computational complexities. The primary challenge lies in accurately defining and measuring the “population at risk” throughout the observation period. This chapter equips you with the methodologies to navigate these complications and derive robust mortality rate estimates.

12.0.1.2 2. The Concept of “Exposed to Risk”

The “exposed to risk,” fundamentally an observable quantity, is also known as the total waiting time. It represents the total time individuals are observed and, crucially, “at risk” of experiencing a specific event, such as death, during an investigation period. This measure is observable even if a life is only under observation for a fraction of a year of age. The concept of exposed to risk is central to both the two-state Markov model and the Poisson model for mortality estimation, which you’ve encountered in earlier chapters.

12.0.1.3 3. Importance of Homogeneous Classes

For an accurate and meaningful assessment of risks, it is paramount to segment the observed data into homogeneous classes. A group of lives is considered homogeneous if all individuals within it are assumed to follow the same stochastic model of mortality, thereby sharing similar mortality characteristics. This crucial subdivision serves to reduce heterogeneity within the dataset, allowing for the calculation of more specific, relevant, and accurate mortality rates for distinct subgroups.

Common characteristics typically employed for subdividing data into homogeneous classes include: * Age. * Sex. * Policy type. * Smoker status. * Postcode. * Type of treatment. * Level of medication. * Severity of symptoms.

12.0.1.4 4. The Principle of Correspondence

The Principle of Correspondence is a bedrock principle in mortality investigations, ensuring that there is perfect consistency between the numerator (observed deaths) and the denominator (exposed to risk) in your rate calculations. This principle states: “A life alive at time t should be included in the exposure at age x at time t if and only if, were that life to die immediately, he or she would be counted in the death data d_x at age x”.

Adherence to this principle is particularly vital when data might be collected using different definitions of age (e.g., age last birthday, age next birthday, or age nearest birthday). By ensuring this correspondence, you prevent inherent biases from creeping into your estimated rates, which could lead to inaccurate risk assessments.

12.0.1.5 5. Exact Calculation of the Central Exposed to Risk

For an exact calculation of the central exposed to risk (c_xE), a comprehensive set of information for every single life under observation is indispensable. The procedure involves meticulously: * Recording all dates of birth. * Recording all dates of entry into observation. * Recording all dates of exit from observation. * If the reason for exit is also recorded (e.g., death, withdrawal), then the number of deaths (d_x) for a given age can also be precisely determined. * Finally, computing c_xE by summing the exact time each life spent under observation.

More specifically, for a life contributing to the exposed to risk for a given age label x, c_xE is the time from Date A to Date B, where: * Date A is the latest of: the date of reaching age label x, the start of the investigation period, or the date of entry into the study. * Date B is the earliest of: the date of reaching age label x+1, the date of death, or the date of exit from the study.

For example, if “age next birthday” is used as the age label, a life contributes to the exposed to risk for that age if their x-th birthday falls within the observation period.

12.0.1.6 6. Approximate Procedures: The Census Approach

In many practical actuarial investigations, particularly those involving large populations or historical data, the precise individual entry and exit dates are often unavailable or “incomplete”. In these common scenarios, the census approach is employed to approximate the exposed to risk.

The census approximation relies on periodic population counts (censuses) taken at specific dates. If P_x(t) denotes the number of lives with age label x at time t, the central exposed to risk over an investigation period from time 0 to N can be theoretically expressed as the integral: c_xE = ∫₀^N P_x(t) dt.

Since P_x(t) is typically unknown as a continuous function, this integral is approximated using numerical methods. The most common method is the trapezium rule, which assumes that the population number varies linearly between census dates.

A critical aspect of the census approach, reiterating the Principle of Correspondence, is ensuring that the age definition used for the population counts (P_x(t)) aligns with the age definition used for recording deaths (d_x). Different age definitions for the death data directly impact the specific age to which the estimated mortality rate applies.

The “rate interval” is defined as the specific one-year period during which a life has a particular age label, determined by the age definition used for the deaths. The impact of age definitions on the estimated rate is significant: * If deaths are classified by age nearest birthday, the estimate d_x / c_xE applies to exact age x (i.e., *μ̂_x* estimates μ_x). * If deaths are classified by age last birthday, the estimate d_x / c_xE applies to exact age x + 0.5 (i.e., *μ̂_x* estimates μ_{x+0.5}). * If deaths are classified by age next birthday, the estimate d_x / c_xE applies to exact age x - 0.5 (i.e., *μ̂_x* estimates μ_{x-0.5}).

You should be prepared to develop census formulae given various age definitions for both census and death data.

12.0.1.7 7. Estimating Transition Intensities

Once the number of observed deaths (d_x) and the corresponding exposed to risk (c_xE) have been rigorously determined, the estimated transition intensity (or force of mortality, *μ̂_x) is calculated as a simple ratio: μ̂_x = d_x / c_xE*.

This estimate represents the average force of mortality over the specific rate interval. From this estimated force of mortality, the probability of dying within a year (q_x) can be estimated using the fundamental relationship derived from a constant force of mortality assumption: *q̂_x = 1 - e^(-μ̂_x)*.

It’s important to note for your CS2 studies, particularly in the context of graduation (covered in Chapters 10 and 11), that these estimated piecewise constant intensities over single years of age are often conventionally treated as estimates of the force of mortality at the midway point of the year of age, i.e., *μ̂_x* estimates μ_{x+0.5}. While the true force of mortality likely varies within a year of age, *μ̂_x* effectively estimates the average force over that interval.

The maximum likelihood estimators (MLEs) for transition intensities, derived earlier in Chapter 4, possess desirable statistical properties: * They are asymptotically normally distributed. * They are asymptotically unbiased. * Asymptotically, their variance is equal to the Cramér-Rao lower bound (CRLB). This is a key result for constructing confidence intervals or performing hypothesis tests.

Furthermore, for a single decrement, the observed number of deaths D is often well-approximated by a Poisson distribution with parameter μ c_xE. This “Poisson approximation” supports the consistency between the two-state model and the Poisson model in terms of the MLE for μ*.

12.0.1.8 8. R Implementation for Exposed to Risk

While the Core Reading for Chapter 9 itself does not explicitly detail R code for exposed to risk calculations, the ActEd materials (including CS2B) heavily emphasise that R is an indispensable tool for performing the computational tasks discussed in this chapter and throughout the course. As your exam coach, I strongly advise you to gain hands-on experience using R for: * Loading and viewing data, especially handling date information. * Calculating exposed to risk using both the exact method (summing individual observation times) and the census method (approximating integrals of population counts). * Estimating transition rates based on the calculated exposed to risk and observed deaths. * Beyond this chapter, R is also used extensively for subsequent processes like performing graduation tests to assess the quality of fitted mortality rates.

Thorough comprehension of these concepts – from the theoretical underpinnings of exposed to risk and the Principle of Correspondence to the practical application of exact and census methods, and their R implementation – is absolutely fundamental. Mastering this chapter will provide a solid foundation for later topics such as graduation and mortality projection, and is crucial for your success in the CS2 examination.