Macroeconomic Impact on General Insurance Claims
Built an analytics pipeline to test whether Australian macro-economic indicators — wage growth, construction cost inflation, and interest rate cycles — can predict general insurance claim severity before the insurer's loss account catches up. End-to-end: ABS + RBA + APRA ingestion, lead-lag correlation analysis, regularised regression, and a four-page Streamlit dashboard with live stress-test scenario controls.
Domain
General Insurance · Macro
Data Sources
ABS · RBA · APRA
Stack
Python · Scikit-learn · Streamlit
Published
April 2026
Rebuilding a flood-damaged home, repairing a hail-dented car, or replacing stolen equipment all cost more when wages rise and materials inflate. Premiums are set months in advance — by the time claim severity jumps, the pricing cycle has already closed.
This project tests a commercially valuable hypothesis: can macro-economic indicators — wage growth, construction costs, CPI, and interest rates — provide an early warning signal 2–4 quarters before the loss account catches up?
Analysis was framed against three internal personas: Head of Claims (early severity signals), CFO (capital stress-test inputs), and Underwriting (loss ratio risk in pricing cycles). The question is whether macro data can surface cost pressure before the loss account does.
All data is publicly available from ABS, RBA, and APRA. The critical constraint: APRA migrated to AASB 17 reporting standards in 2023, meaning class-level claims data only extends back 9 quarters — this is the binding statistical limit across all models.
Five source series were ingested: ABS 6401.0 (CPI Insurance subgroup, monthly), ABS 6345.0 (Wage Price Index, quarterly since 1997), ABS 6427.0 (PPI Construction, quarterly), RBA F1.1 (Cash Rate Target, monthly since 1969), and APRA GI Statistics (quarterly claims, GWP, and loss ratios — n = 9 quarters).
Six-step pipeline from raw Excel to live Streamlit dashboard:
- —01 — Ingestion: Built a _find_header_row() helper that dynamically locates the Series ID row in ABS Excel files, which store 9 rows of metadata before the data header — robust across all ABS and RBA publications without manual inspection.
- —02 — Feature Engineering: 18 derived features including YoY % changes, 3- and 12-month rolling averages, inflation acceleration (first-difference of YoY), volatility scores, binary stress flags, and macro values at t−1Q through t−6Q to directly test the lagged impact hypothesis.
- —03 — EDA: Spearman rank correlations (appropriate for non-normal distributions and small N), dual-axis overlay charts pairing each macro series with insurance claims, and a lead-lag heatmap across 0–6 quarter lags with p-value significance testing.
- —04 — Modelling: Chronological 60/40 train-test split. OLS achieved R²≈1.000 — a textbook overfit flagged as a red flag, not a result. Ridge (R²=0.46–0.69) and Lasso (R²=0.59–0.86) reported as the authoritative estimates.
- —05 — Stress Testing: Four macro scenarios (Baseline, Persistent Inflation, Stagflation, Rate-cut) modelled against the Householders loss ratio with projected reserve buffer change and dollar implications at current GWP levels.
- —06 — Dashboard: Four-page Streamlit app covering Macro Trends, Insurance Performance, interactive Lag Correlation with selectable pairs and scatter drill-down, and a Scenario Stress Test with live sliders — so stakeholders can explore assumptions without requiring an analyst.
01 — Cash rate is the strongest predictor — and the direction is counterintuitive
The RBA cash rate showed the clearest directional signal across all macro indicators tested. Motor LR: r = −0.668 at 2Q lag. Householders LR: r = −0.655 at 3Q lag. The negative correlation makes economic sense: the 2022–23 rate hike cycle (0.1% → 4.35%) suppressed economic activity and claim frequency. As rates eased from late 2024, claims recovered. The lead-lag relationship is real and directionally actionable.
02 — Householders claims are weather-driven, not inflation-driven
Quarterly Householders gross claims ranged from $1.7B (Jun 2025) to $5.4B (Dec 2025) — a 3× swing in 18 months driven by Australian summer weather events, not cost inflation. Macro indicators alone are insufficient here; catastrophe event flags are required as covariates. Domestic Motor was stable at $3.2–$4.3B per quarter — a far cleaner regression target.
03 — WPI and PPI up 32–51% since 2015 — the structural cost pressure is real
Wage Price Index rose +32.5% since 2015. Construction PPI rose +51.4% — directly embedded in tradesperson rates and motor repair quotes. Even without statistically significant correlations from 9 quarters of APRA data, the directional structural case for pricing and reserve adjustment is compelling.
04 — OLS R²≈1.000 is a red flag, not a result
With n=7 training observations and 4 predictors, OLS memorises the data. This finding itself is an analytical insight: regression coefficients should be treated as directional guides until APRA publishes historical class-level data for pre-2023 years. Ridge and Lasso were reported as authoritative results — demonstrating statistical maturity over optimisation gaming.
Four macro scenarios stress-tested against a baseline Householders Loss Ratio of 0.720, with sensitivities calibrated to industry benchmarks:
- —Baseline (0% macro change): Projected LR = 0.720 — reference point, no reserve adjustment required.
- —Persistent Inflation (+4% CPI pa, +2.5% WPI, +3.0% PPI): Projected LR = 0.759 — reserve delta +3.90pp.
- —Stagflation (+5% CPI pa, +1.5% WPI, +4.5% PPI): Projected LR = 0.762 — reserve delta +4.20pp ≈ $168M per quarter at current Householders GWP (~$4B+).
- —Rate Cut / Soft Landing (−1% CPI, +0.5% WPI, −0.5% PPI): Projected LR = 0.718 — reserve delta −0.20pp, a marginal improvement.
Four non-trivial problems encountered and resolved during development:
- —ABS header offset: Files returned "Series not found" — 9 metadata rows before the data header caused an off-by-one error that silently loaded wrong column names. Fixed by changing from skiprows=header_row+1 to header=header_row, with a reusable _find_header_row() helper across all sources.
- —Date deduplication: Outer-joining monthly and quarterly series produced duplicate dates from period-start vs period-end boundary differences. Resolved by keeping the row with the most non-null values at each duplicate date — maximum data density, no manual selection.
- —Empty model inputs: All models silently skipped — rolling-average derived features had NaN values across all 9 APRA rows. Fixed by narrowing the predictor set to the four raw macro series (non-null for all 7 usable training rows), with the full feature set preserved for when extended APRA data is available.
- —XGBoost on Apple Silicon: C-library linker error (libomp.dylib not found) not caught by a standard ImportError. Widened to except Exception with a clear fix message ("run: brew install libomp"). Pipeline degrades gracefully; all other models proceed.
With only 9 quarters of APRA class-level data, no correlation result in this project crosses a conventional significance threshold. Every coefficient should be treated as directional, not precise. Householders loss ratio spikes in Q1 and Q4 2025 are almost certainly weather-event driven — macro models cannot distinguish these without a catastrophe flag variable.
Path to statistical validity: manually appending APRA's annual GI statistics publications from 2019–2022 would extend the dataset to ~40 quarters — sufficient for meaningful regression and conventional significance thresholds. That work is scoped as the natural next step.
- —Data Engineering: Multi-source ingestion, dynamic header detection, deduplication, frequency alignment across monthly and quarterly series.
- —Statistical Analysis: Spearman and Pearson correlation, lead-lag testing, p-value significance testing, small-N constraint acknowledgement.
- —Modelling: OLS, Ridge, Lasso with honest regularisation rationale, chronological train-test splitting.
- —Visualisation: Plotly, Seaborn, correlation heatmaps, dual-axis overlay charts.
- —Domain Knowledge: Insurance loss ratios, APRA / AASB 17 reporting standards, RBA cash rate policy cycles, ABS statistical publications.
- —Product Delivery: Four-page Streamlit dashboard, interactive scenario stress test, stakeholder persona framing (Claims, Finance, Underwriting).