How The NRL Tryscorer Model Works header 1

How The NRL Tryscorer Model Works header 2

Short Side · May 31, 2026

How The NRL Tryscorer Model Works

The tryscorer model answers one focused question:

if a team is expected to score this many tries, how should those tries be distributed across the named players?

It does not independently predict team scoring volume. That comes from the result model through nrl.nrl_predictions.expected_tries. The tryscorer model takes that team total and allocates it across the lineup.

In simple terms:

team expected tries x player try share = player expected tries

If a team is projected for 3.0 tries and a winger is estimated to own 18% of that team's try-scoring opportunity, the player receives:

3.0 x 0.18 = 0.54 expected tries

That expected-tries number is then converted into anytime, two-plus, and three-plus try probabilities.

Why It Predicts Share

Raw player tries are extremely noisy. Most players score zero tries in most games, and even strong finishers can go several weeks without scoring. A direct raw-tries model can easily mix up team opportunity with player ability.

This model separates the problem into two layers:

result model    -> how many tries the team should score
tryscorer model -> which players should receive those tries

The training target is:

target_try_share = player tries / team tries

Rows where the team scored no tries are excluded from the trainable target, because there is no try pool to distribute. This keeps the player model focused on allocation, while the result model owns team scoring expectation.

Training Data

The model trains from nrl.player_stats, starting from the configured start year:

python

TRYSCORER_PLAYER_STATS_START_YEAR = 2013

The main historical inputs are:

match date
match
team
player
player id
jersey number
position
minutes played
total fantasy points
tries
try assists
line breaks
tackle breaks
run metres
kick return metres
hit ups
receipts

These fields describe where the player is named, how often he has scored, how involved he has been, how many minutes he usually plays, and what team context he is entering. Team names, match keys, player names, and positions are canonicalised so historical stats and future lineup rows can be joined consistently.

Role Buckets

Position is simplified into role buckets using jersey number and position text:

fullback
wing
centre
half
hooker
middle
edge
bench
unknown

This matters because tryscoring is heavily role-dependent. Wingers and centres usually own a larger share of team tries than middles. Fullbacks, halves, hookers, and edge forwards each have different profiles. The role bucket gives the model a sensible baseline when a player has limited recent evidence.

Leakage-Safe Features

The feature engineering is designed so the current match never uses its own result as an input.

For player-level signals, the model uses shifted rolling and exponentially weighted averages. "Shifted" means the current row is excluded before the average is calculated.

The main player features include:

prior player games
recent player tries
recent player try share
recent minutes
recent fantasy points
recent run metres
recent kick return metres
recent line breaks
recent tackle breaks
recent try assists
recent hit ups
recent receipts

For example, player_tries_ewm_6 captures recent try output from previous games, while player_try_share_rolling_10 captures the player's recent share of team tries. Both are historical-only at the point of prediction.

The model also creates team and opponent context:

team tries for
team tries against
opponent tries for
opponent tries against

These are also shifted exponentially weighted averages. They help the allocation model understand recent attacking and defensive environments, but they do not replace the result model's expected_tries.

Model Training

The training script compares several regressors:

Linear Regression
Ridge
ElasticNet
Random Forest
HistGradientBoosting
XGBoost, if installed
LightGBM, if installed

Each model predicts target_try_share. Training rows are weighted by team tries, so games with a larger try pool carry more allocation information than games with one team try.

The deploy model is selected from the test leaderboard. By default, inference prefers a portable model so the saved artifact can be loaded without optional packages like XGBoost or LightGBM.

The artifact is saved to:

model/tryscorer_model_artifacts.pkl

It contains the trained model, model version, feature lists, numeric fill values, encoded categorical columns, role priors, training metadata, and model comparison results.

Validation

The train/test split is time-aware. By default, the holdout starts at:

python

TRYSCORER_HOLDOUT_START_DATE = 2025-01-01

If that split leaves too few rows on either side, the script falls back to a date quantile split. The principle is the same: earlier games train the model, later games test it.

The model reports error on predicted try share and on player tries after multiplying the predicted share by actual team tries in the test match. That second metric checks whether the allocation layer would have distributed the team's real try total sensibly. In live inference, actual team tries are replaced by result-model expected tries.

Lineup Normalisation

At inference time, the model scores named lineup rows. The raw model output is:

raw_try_share_score

That score is clipped above zero, then normalised across all named players in the same team and match:

try_share = player raw score / sum of team raw scores

This guarantees:

all player try shares in a team sum to 1

That constraint keeps the player projections consistent with the team total. If the result model says a team should score 2.7 tries, the player expected tries across that team will sum to 2.7. The tryscorer layer can change who gets the tries, but it cannot invent extra team scoring volume.

Probabilities

Player expected tries are converted into probabilities with a Poisson assumption:

lambda = player expected tries
P(1+ tries) = 1 - exp(-lambda)
P(2+ tries) = 1 - exp(-lambda) x (1 + lambda)
P(3+ tries) = 1 - exp(-lambda) x (1 + lambda + lambda^2 / 2)

These become the model's anytime, two-plus, and three-plus probabilities.

Inference Workflow

The inference script:

loads the saved artifact
fetches future lineups from nrl.lineups
fetches team expected tries from nrl.nrl_predictions
attaches expected tries to lineup rows
fetches historical player_stats
builds features for future lineup placeholders
predicts raw player try-share scores
normalises shares within each team
calculates player expected tries
calculates try probabilities
upserts rows into nrl.tryscorer_predictions

The output table stores the result-model team total as team_expected_tries and the player-level projection as expected_tries.

Why It Is Useful

The model combines team scoring expectation, named lineups, player scoring history, attacking involvement, role, minutes history, team context, opponent context, lineup normalisation, and probability conversion.

That makes it more disciplined than sorting players by recent tries. A player with three recent tries can still receive a modest projection if his role, minutes, involvement, and team expected tries do not support a large share. A winger with strong involvement in a high-try team projection can rate well even if his recent try count is quiet.

The key strength is that every player projection is tied back to the same team total. That prevents a common tryscorer-market problem: too many players from the same team being treated as if the side is likely to score far more tries than its match projection supports.

Limits

The model is not a play-by-play simulator. It does not know set plays, edge assignments, weather, late positional switches, or whether a team is likely to attack one side of the field more often.

It also depends on lineup quality. If jersey numbers are wrong or a late reshuffle changes a player's role, the projection can be wrong for reasons the historical data cannot fix.

The Poisson conversion is a practical approximation, not a perfect description of tryscoring. Real try events are affected by game state, repeated edge exposure, injuries, sin bins, and tactical matchups.

Final Thought

The tryscorer model is strongest because it separates team opportunity from player allocation.

The result model estimates how much scoring opportunity exists. The tryscorer model estimates how that opportunity should be shared by the named players. A probability layer then turns those expected tries into market-friendly outputs.