A classic challenge in forensic anthropologyForensic anthropology is a special sub-field of physical anthropology (the study of human remains) that involves applying skeletal analysis and techniques in archaeology to solving criminal cases. Read Full Definition is building a biological profile from skeletal remains. One of the first and most critical questions is determining the individual’s sex. But what if the remains belong to an older person? A long-held belief is that skulls, particularly those of females, can become more “robust” or male-like with age, potentially throwing off our analyses. This raises a crucial question for casework: does an individual’s age at death compromise the accuracyIn scientific and measurement contexts, "accuracy" refers to the degree of proximity or closeness between a measured value and the true or actual value of the measured quantity. Accuracy indicates how well a measurement reflects Read Full Definition of sex estimation from the skull?
A new study published in the International Journal of Legal Medicine by lead author Sarah-Kelly Houston and her team tackles this very issue, evaluating the popular Walker method on a contemporary South African sample to see if age really is a significant confounding factor.
The Research Breakdown
Methodology: Novelty and Robustness
The research team analyzed a substantial sample of 453 skulls from two large, documented South African skeletal collections: the Pretoria Bone Collection and the Raymond A. Dart Collection. The individuals ranged in age from 14 to 108 years, providing a wide spectrum to test the effects of aging.
The core of their methodology was scoring five sexually dimorphic cranial traits as described by Walker: the glabella, supra-orbital margins, nuchal crest, mastoid processes, and mental eminence. Each was scored on a 1 (female-leaning) to 5 (male-leaning) scale. Notably, the team had to exclude the mental eminence (chin) from their final analysis due to poor inter-observer agreement, a pragmatic decision that strengthens the reliability of their findings.
To test their hypothesis, they employed two different statistical classification models: Ordinal Logistic Regression (LR), the method used in the original Walker paper, and Random Forest Models (RFM), a more modern machine-learning technique. This dual-analysis approach adds a layer of robustness to their conclusions.
Key Findings: Data and Caveats
The study produced several clear and impactful findings:
- Nuchal Crest Was the Anomaly: Of the traits analyzed, only the nuchal crest (the bony ridge at the back of the skull) showed statistically significant differences across age groups, and this trend was driven almost entirely by the female sample. Older females tended to have more robust nuchal crests.
- Population as a Confounding Factor: Here’s the critical caveat. The sample had a skewed demographic distribution. The younger individuals were predominantly black South Africans, while the older individuals were predominantly white South Africans. Previous research has already shown that white South Africans tend to be more cranially robust than black South Africans. This means the observed “age-related” change in the nuchal crest was much more likely a reflection of population-based differences rather than a true biological aging process.
- Age-Specific Standards Didn’t Help: When the researchers created age-specific models (dividing the sample into under-40 and over-40 groups), the overall classification accuracy did not significantly improve. While accuracy for the younger group saw a slight bump, the accuracy for the older group decreased slightly.
- The Bottom Line: The study concludes that pre-selecting for age before applying the Walker traits is unnecessary for this population. The influence of population affinity on cranial morphology is a far stronger factor than age.
Expert Commentary:
Practical Implications
For forensic analysts, this is good news. It suggests that we don’t need to develop complex, age-partitioned standards for the Walker method, which would add another layer of potential error and complexity to our workflow. Standard operating procedures (SOPs) can remain focused on the most impactful variable: using the correct population-specific data
In a legal context, this research provides a strong basis for defending a sex estimation in court without having to add a major caveat that the individual’s advanced age could have skewed the results. It simplifies the testimony by showing that, for this validated method, age isn’t the bogeyman we once thought it was.
Contextualization
This study is a fantastic example of the validationValidation, often referred to as method validation, is a crucial process in the laboratory when introducing a new machine, technology, or analytical technique. It involves a series of systematic steps and assessments to ensure that Read Full Definition required by forensic quality standards like ISO 17025. It’s not enough to simply adopt a method developed on a different reference sampleReference sample - material from a verifiable/documented source which, when compared with evidence of an unknown source, shows an association or linkage between an offender, crime scene, and/or victim. Read Full Definition; labs and researchers must test and prove its efficacy on the populations they actually serve.
In the context of the hierarchy of propositions, this research operates squarely at the source level—it helps us answer the question, “Is this individual male or female?” By demonstrating that age is not a significant variable, it strengthens the confidence we can place in that source-level conclusion.
My Perspective: The STR Analyst’s View
From an STR analyst’s perspective, the core issue in this study—a strong signal masking a weaker one—is something we see every day. The challenge of a powerful population signal drowning out a subtle age-related signal is a perfect analogy for DNA mixture interpretation.
Think of it like this: trying to detect a faint biological signal from aging is like trying to hear a tiny cricket chirping. But if someone is playing loud music right next to you, that’s the population effect. The music is so dominant that you can’t reliably hear the cricket, or you might misinterpret the music as being part of the cricket’s sound. In this research, population variation was the loud music that made the faint chirp of aging statistically insignificant.
In the DNA
This study also underscores the absolute necessity of rigorous internal validation. Just as we would never implement a new STR kit or interpretation software without first testing it on our local population samples to establish reliable stochastic thresholds, forensic anthropologists must validate morphoscopic methods. This paper is a textbook case of that critical validation process in action, ensuring a fundamental tool in the forensic toolkit is being used reliably and accurately.
Conclusion
Houston and colleagues have delivered a clear and practical finding for forensic anthropology. Their work provides strong evidence
Original Research Paper
Houston, S.K., Brits, D., Myburgh, J. et al. The impact of age-related changes in the skull on sex estimation using morphoscopic traits. Int J Legal Med 139, 2991–3003 (2025). (Open Access | Creative Commons Attribution 4.0 International License)
Term Definitions
- Morphoscopic Traits: Qualitative, visually assessed features of the skeleton (e.g., shape, contour, robustness) used to estimate aspects of the biological profile.
- Sexual Dimorphism- Nuchal Crest: A prominent ridge of bone on the back of the skull (occipital bone) where neck muscles attach. It is generally more pronounced in males.
- Glabella: The smooth area of bone on the forehead between the eyebrows. It is typically more rounded and prominent in males.
- Population Affinity: An individual’s ancestral background (e.g., African, European, Asian). It is a more accurate and neutral term than “race” in a biological context.
- Ordinal Scale: A scale of measurement where the order of the values is important, but the differences between them are not necessarily equal (e.g., scoring a trait from 1 to 5).
- Logistic Regression (LR): A statistical method used to model the probability of a certain outcome (e.g., male or female) based on one or more predictor variables.
- Random Forest Model (RFM): A machine-learning method that operates by constructing a multitude of decision trees during training and outputting the mode of the classes for classification.
- Edentulism: The condition of being toothless, either partially or completely. This can cause resorption and remodeling of the jaw bones.
- Confounding Variable: An “extra” variable that was not accounted for in a study but can independently influence the relationship between the variables being studied.
