Repeatability and reproducibility are often cited as the hallmarks of good science. In forensic scienceDiscover the fascinating field of Forensic Science, the application of scientific principles to legal matters. This post delves into its many disciplines, from DNA analysis to crime scene investigation, its importance in the justice system, Read Full Definition, establishing these characteristics for pattern comparison disciplines—where the examiner is the de facto “instrument”—is crucial for courtroom admissibility. Forensic firearms and toolmark analysis, which involves matching microscopic striations on bullets and cartridge cases, is one such discipline. A comprehensive new study, published in the Journal of Forensic Sciences, provides key data
Information in analog or digital form that can be transmitted or processed. Read Full Definition on the reliability of firearms examiners, finding that while definitive conclusions are highly consistent, the indeterminate range remains subjective.
- The Research: Quantifying Consistency Across Examiners
- Methodology: Repeatability vs. Reproducibility
- Key Findings: High Reliability, Low Consistency in the Gray Area
- The Need for Standardizing the ‘Inconclusive’ Call
- The Statistical Value of Definitive Conclusions
- Lessons from STR DNA Analysis
- My Perspective: Upholding Confidence in Pattern Evidence
- Conclusion
The Examiner as the ‘Instrument’ of Comparison
In pattern comparison, the reliability of the technique is directly linked to the consistency of the human examiner. This study aims to quantify that consistency in forensic firearms examiners by measuring both:
- Repeatability: The ability of one examiner to make the same decision when re-examining the same material.
- Reproducibility: The ability of two different examiners to come to the same conclusion when independently evaluating the same material.
The data gathered uses the AFTE Range of Conclusions (Identification, Inconclusive A, B, or C, Elimination, or Unsuitable), which offers a granular view of examiner decisions.
The Research: Quantifying Consistency Across Examiners
The study was based on thousands of comparisons of both bullets and cartridge cases, with test sets blindly resubmitted to examiners.
Methodology: Repeatability vs. Reproducibility
The researchers used comparison sets from three different types of firearms. The same examiner re-examined samples for repeatability (over 5,700 comparisons by 105 examiners), and different examiners compared the same material for reproducibility (over 5,700 comparisons by over 190 examiner pairs). The analysis focused purely on the agreement of the paired conclusions, not their overall accuracyIn scientific and measurement contexts, "accuracy" refers to the degree of proximity or closeness between a measured value and the true or actual value of the measured quantity. Accuracy indicates how well a measurement reflects Read Full Definition.
Key Findings: High Reliability, Low Consistency in the Gray Area
The data demonstrated a clear trend: definitive conclusions are highly reliable, but the subjective range is not:
- Repeatability (Same Examiner): Averaged over bullets and cartridge cases, repeatability was high: 78.3% for known matches (Identification) and 64.5% for known nonmatches (Elimination). Disagreements were predominantly between a definitive decision and an Inconclusive category.
- Reproducibility (Different Examiners): Consistency between different examiners was lower, averaging 67.3% for known matches and 36.5% for known nonmatches.
- Reliability of Definitive Calls: The reliability of Identification and Elimination conclusions was high; instances of contradictory definitive decisions (ID to Elimination or vice versa) were rare (around 0.11% to 2.68% depending on the comparison).
- The Inconclusive Problem: The vast majority of disagreements were contained within the Inconclusive categories. When the three sub-levels of Inconclusive were pooled into a single category, agreement increased substantially, particularly for nonmatching sets. This highlights that the subjectivity is not in the final definitive call, but in choosing the level of Inconclusive to report.
The Need for Standardizing the ‘Inconclusive’ Call
This research is vital because it scientifically validates concerns about the subjective gray area of forensic and toolmark analysis. The data confirm that examiners are reliable in their core function, but the consistency of their reporting needs refinement.
The Statistical Value of Definitive Conclusions
The study’s finding that definitive conclusions are rarely reversed and rarely confused for their opposite is a crucial piece of data that supports the efficacy of firearms and toolmark analysis in court. However, the high variability in the Inconclusive category presents a challenge to transparency and trustworthiness. It suggests that while the AFTE Range of Conclusions provides a framework, the middle ground relies too heavily on individual judgment, which is detrimental to the scientific rigor of the discipline.
Lessons from STR DNA
DNA, or Deoxyribonucleic Acid, is the genetic material found in cells, composed of a double helix structure. It serves as the genetic blueprint for all living organisms. Read Full Definition Analysis
The issues found in toolmark analysis—inconsistent grading and subjectivity in non-definitive calls—mirror historical challenges in other pattern-comparison disciplines. As a Senior DNA analystA designated person who examines and analyzes seized drugs or related materials, or directs such examinations to be done; independently has access to unsealed evidence in order to remove samples from the evidentiary material for Read Full Definition experienced in STR DNA analysis, I recognize the parallel. Our field addressed similar issues in interpreting complex DNA mixtures by moving toward standardized, statistical software that removed much of the subjectivity. The high disagreement rate within the Inconclusive categories strongly suggests that forensic firearms and toolmark analysis would benefit from implementing objective, statistically driven decision models to guide or replace the subjective gradations of Inconclusive opinions.
My Perspective: Upholding Confidence in Pattern Evidence
Evidence is any form of proof, such as objects, materials, or scientific findings, presented to establish or disprove a fact in a legal proceeding. It is used to reconstruct events and link or exclude individuals Read Full Definition
This research is an essential step in upholding the confidence placed in pattern evidence. It confirms that the underlying principle of individualization is sound, but that the process by which examiners report uncertainty needs to be standardized. By embracing objective data (such as the pooling models demonstrated here), the field can reduce subjective variability and ensure that the evidence presented is not only accurate but also consistent across all laboratories and examiners.
Conclusion
This study provides compelling evidence that the repeatability and reproducibility of forensic firearms examiners are high for definitive conclusions, demonstrating their essential reliability. However, the research also reveals a significant source of variability within the subjective Inconclusive categories of the AFTE Range of Conclusions. These findings underscore the necessity for the forensic and toolmark analysis community to refine its decision-making framework, moving toward standardized, statistically guided protocols to ensure consistency and transparency in all reported conclusions.
Original Research Paper
Monson, K. L., Smith, E. D., & Peters, E. M. (2023). Repeatability and reproducibility of comparison decisions by firearms examiners. Journal of Forensic Sciences, 68(5), 1721-1740. https://doi.org/10.1111/1556-4029.15318
Term Definitions
- Firearms and Toolmark Analysis: The forensic discipline that examines and compares microscopic markings on bullets, cartridge cases, and other objects to link them to a specific firearm or tool.
- Repeatability: The degree of consistency shown by a single examiner when comparing the same evidence multiple times.
- Reproducibility: The degree of consistency shown when different examiners evaluate the same evidence and reach the same conclusion.
- Reliability (Test-Retest Reliability): The overall consistency of a measurement method, encompassing both repeatability and reproducibility.
- AFTE Range of Conclusions (Association of Firearm & Tool Mark Examiners): The standardized set of possible conclusions used by examiners, including Identification, Elimination, and Inconclusive (often with sub-levels A, B, and C).
- Inconclusive Call: A conclusion where the examiner cannot definitively identify or eliminate a source, often deemed the “gray area” of subjective judgment.