Bethel University Criminal Justice Vito & Higgins Evaluability Paper

DescriptionCJUS 801
This assignment requires you to develop an evaluability assessment for your chosen problem
(examples listed below) using Vito & Higgins evaluability assessment approach in Chapter 4.
You will identify and describe the program theory by outlining the components of the program
and determining which of them is measurable. You must cover the following in the paper:
identify the purpose and scope of the assessment, develop a program template that describes the
goals and objectives of the program, and create a short list of questions (5–10) for a focus group
or an interview that will help narrow down the scope of the program. You must discuss each
theory that supports different aspects of the program if multiple theories are being used. You do
not need to address how the program will be analyzed; this will be covered in the Program
Impact Paper. You must follow the outline recommended in Chapter 4 of Vito & Higgins.
Explain the assignment in detail. Specify the exact requirements of the assignment. Items to
include are outlined as follows:
• Length of assignment is 5 – 7 pages
o Excluding the title page, abstract, and reference section
• Format of assignment is the current APA format
• Number of citations are five (5)
• Acceptable sources are peer reviewed journal articles, scholarly articles published
within the last five years, and textbooks.
• Program examples are DARE, Scared Straight, MADD, Juvenile Diversion
programs, Drug Court, etc.
Note: Your assignment will be checked for originality via the Turnitin plagiarism tool.
This PDF is available at
Improving Evaluation of Anticrime
Programs (2005)
90 pages | 6 x 9 | PAPERBACK
ISBN 978-0-309-09706-2 | DOI 10.17226/11337
Committee on Improving Evaluation of Anti-Crime Programs; Committee on Law and
Justice; Division of Behavioral and Social Sciences and Education; National
Research Council
National Research Council. 2005. Improving Evaluation of Anticrime Programs.
Washington, DC: The National Academies Press.
Visit the National Academies Press at and login or register to get:
– Access to free PDF downloads of thousands of publications
– 10% off the price of print publications
– Email or social media notifications of new titles related to your interests
– Special offers and discounts
All downloadable National Academies titles are free to be used for personal and/or non-commercial
academic use. Users may also freely post links to our titles on this website; non-commercial academic
users are encouraged to link to the version on this website rather than distribute a downloaded PDF
to ensure that all users are accessing the latest authoritative version of the work. All other uses require
written permission. (Request Permission)
This PDF is protected by copyright and owned by the National Academy of Sciences; unless otherwise
indicated, the National Academy of Sciences retains copyright to all materials in this PDF with all rights
Improving Evaluation of Anticrime Programs
Evaluation of
Committee on Improving Evaluation of Anti-Crime Programs
Committee on Law and Justice
Division of Behavioral and Social Sciences and Education
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001
NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from
the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible
for the report were chosen for their special competences and with regard for appropriate balance.
This study was supported by Contract/Grant No. LJXX-I-03-02-A, between the
National Academy of Sciences and the United States Department of Justice. Support of the work of the Committee on Law and Justice is provided by the National
Institute of Justice. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect
the views of the organizations or agencies that provided support for the project.
International Standard Book Number 0-309-09706-1
Additional copies of this report are available from the National Academies Press,
500 Fifth Street, N.W., Lockbox 285, Washington, DC 20055; (800) 624-6242 or (202)
334-3313 (in the Washington metropolitan area); Internet,
Copyright 2005 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America.
Suggested citation: National Research Council. (2005). Improving Evaluation of Anticrime Programs. Committee on Improving Evaluation of Anti-Crime Programs.
Committee on Law and Justice, Division of Behavioral and Social Sciences and
Education. Washington, DC: The National Academies Press.
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general
welfare. Upon the authority of the charter granted to it by the Congress in 1863,
the Academy has a mandate that requires it to advise the federal government on
scientific and technical matters. Dr. Ralph J. Cicerone is president of the National
Academy of Sciences.
The National Academy of Engineering was established in 1964, under the charter
of the National Academy of Sciences, as a parallel organization of outstanding
engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors
engineering programs aimed at meeting national needs, encourages education and
research, and recognizes the superior achievements of engineers. Dr. Wm. A. Wulf
is president of the National Academy of Engineering.
The Institute of Medicine was established in 1970 by the National Academy of
Sciences to secure the services of eminent members of appropriate professions in
the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its
congressional charter to be an adviser to the federal government and, upon its
own initiative, to identify issues of medical care, research, and education. Dr.
Harvey V. Fineberg is president of the Institute of Medicine.
The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with
the Academy’s purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National
Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of
Medicine. Dr. Ralph J. Cicerone and Dr. Wm. A. Wulf are chair and vice chair,
respectively, of the National Research Council.
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Mark W. Lipsey (Chair), Center for Evaluation Research and
Methodology, Vanderbilt University
John L. Adams, Statistics Group, RAND Corporation, Santa Monica, CA
Denise C. Gottfredson, Department of Criminology and Criminal
Justice, University of Maryland, College Park
John V. Pepper, Department of Economics, University of Virginia
David Weisburd, Criminology Department, Hebrew University Law
Carol V. Petrie, Study Director
Ralph Patterson, Senior Program Assistant
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Charles Wellford (Chair), Department of Criminology and Criminal
Justice, University of Maryland at College Park
Mark H. Moore (Vice Chair), Hauser Center for Non-Profit Institutions
and John F. Kennedy School of Government, Harvard University
David H. Bayley, School of Criminal Justice, University at Albany,
Alfred Blumstein, H. John Heinz III School of Public Policy and
Management, Carnegie Mellon University
Richard Bonnie, Institute of Law, Psychiatry, and Public Policy,
University of Virginia Law School
Jeanette Covington, Department of Sociology, Rutgers University
Martha Crenshaw, Department of Political Science, Wesleyan
Steven Durlauf, Department of Economics, University of WisconsinMadison
Jeffrey Fagan, School of Law and School of Public Health, Columbia
John Ferejohn, Hoover Institution, Stanford University
Darnell Hawkins, Department of Sociology, University of Illinois,
Phillip Heymann, Harvard Law School, Harvard University
Robert L. Johnson, Department of Pediatric and Clinical Psychiatry and
Department of Adolescent and Young Adult Medicine, New Jersey
Medical School
Candace Kruttschnitt, Department of Sociology, University of
John H. Laub, Department of Criminology and Criminal Justice,
University of Maryland at College Park
Mark W. Lipsey, Center for Evaluation Research and Methodology,
Vanderbilt University
Daniel D. Nagin, H. John Heinz III School of Public Policy and
Management, Carnegie Mellon University
Richard Rosenfeld, Department of Criminology and Criminal Justice,
University of Missouri, St. Louis
Christy Visher, Justice Policy Center, Urban Institute, Washington, DC
Cathy Spatz Widom, Department of Psychiatry, New Jersey Medical
Carol V. Petrie, Director
Ralph Patterson, Senior Program Assistant
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
illions of dollars have been spent on crime prevention and control
programs over the past decade. However scientifically strong impact evaluations of these programs, while improving, are still uncommon in the context of the overall number of programs that have received funding. The report of the Committee on Improving Evaluation of
Anti-Crime Programs is designed as a guide for agencies and organizations responsible for program evaluation, for researchers who must design scientifically credible evaluations of government and privately sponsored programs, and for policy officials who are investing more and more
in the concept of evidence-based policy to guide their decisions in crucial
areas of crime prevention and control.
The committee could not have completed its work without the help of
numerous individuals who participated in the workshop that led to this
report. We are especially grateful to the presenters: John Baron, The Council for Excellence in Government; Richard Berk, University of California,
Los Angeles; Anthony Braga, Harvard University; Patricia Chamberlain,
Oregon Social Learning Center; Adele Harrell, the Urban Institute; Steven
Levitt, University of Chicago; Robert Moffitt, Johns Hopkins University;
Lawrence Sherman, University of Pennsylvania; Petra Todd, University
of Pennsylvania; Alex Wagenaar, University of Minnesota; and Edward
Zigler, Yale University. The committee thanks Sarah Hart, the director of
the National Institute of Justice, for her ongoing encouragement and interest in our work, Patrick Clark, our program officer, and Betty Chemers,
the director of the Evaluation Division, who both provided invaluable
guidance as we developed the workshop themes. The committee also
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
thanks all of those who gave of their time and intellectual talents to enrich
this report through their participation in the workshop discussion of the
papers. We have included biographical sketches of committee members
and staff as Appendix A and also a complete list of workshop participants
as Appendix B of this report.
This report has been reviewed in draft form by individuals chosen for
their diverse perspectives and technical expertise, in accordance with procedures approved by the National Research Council’s Report Review
Committee. The purpose of this independent review is to provide candid
and critical comments that will assist the institution in making its published report as sound as possible and to ensure that the report meets
institutional standards for objectivity, evidence, and responsiveness to the
study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process. We wish to
thank the following individuals for their review of this report: Philip J.
Cook, Department of Public Policy, Duke University; Brian R. Flay, Institute for Health Research and Policy, University of Illinois at Chicago;
Rebecca A. Maynard, Graduate School of Education, University of Pennsylvania; Therese D. Pigott, Research Methodology, School of Education,
Loyola University, Chicago; Patrick H. Tolan, Institute for Juvenile Research and Department of Psychiatry, University of Illinois at Chicago;
and Jack L. Vevea, Department of Psychology, University of California,
Santa Cruz.
Although the reviewers listed above have provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations nor did they see the final draft of the report
before its release. The review of this report was overseen by Brian Junker,
Department of Statistics, Carnegie Mellon University. Appointed by the
National Research Council, he was responsible for making certain that an
independent examination of this report was carried out in accordance with
institutional procedures and that all review comments were carefully considered. Responsibility for the final content of this report rests entirely
with the authoring committee and the institution.
Mark W. Lipsey, Chair
Committee on Improving
Evaluation of Anti-Crime Programs
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Executive Summary
What Questions Should the Evaluation Address?
When Is an Impact Evaluation Appropriate?
How Should an Impact Evaluation Be Designed?
How Should the Evaluation Be Implemented?
What Organizational Infrastructure and Procedures Support
High-Quality Evaluation?
Summary, Conclusions, and Recommendations:
Priorities and Focus
Biographical Sketches of Committee Members and Staff
Participant List: Workshop on Improving Evaluation of
Criminal Justice Programs
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Executive Summary
ffective guidance of criminal justice policy and practice requires
evidence about their effects on the populations and conditions they
are intended to influence. The role of evaluation research is to provide that evidence and to do so in a manner that is accessible and informative to policy makers. Recent criticisms of evaluation research in criminal justice indicate a need for greater attention to the quality of evaluation
design and the implementation of evaluation plans.
In the context of concerns about evaluation methods and quality, the
National Institute of Justice asked the Committee on Law and Justice of
the National Research Council to conduct a workshop on improving the
evaluation of criminal justice programs and to follow up with a report
that extracts guidance for effective evaluation practices from those
The workshop participants presented and discussed examples of
evaluation-related studies that represent the methods and challenges associated with research at three levels: interventions directed toward individuals; interventions in neighborhoods, schools, prisons, or communities; and interventions at a broad policy level.
This report highlights major considerations in developing and implementing evaluation plans for criminal justice programs. It is organized
around a series of questions that require thoughtful analysis in the development of any evaluation plan.
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Program evaluation is often taken to mean impact evaluation—assessing the effects of the program on its intended outcomes. However, the
concepts and methods of evaluation research include evaluation of other
aspects of a program such as the need for the program, its design, implementation, and cost-effectiveness. Questions about program effects are
not necessarily the evaluation questions most appropriate to address for
all programs, although they are usually the ones with the greatest generality and potential practical significance.
Moreover, evaluations of criminal justice programs may have no
practical, policy, or theoretical significance if the program is not sufficiently well developed for the results to have generality or if there is
no audience likely to be interested in the results. Allocating limited
evaluation resources productively requires careful assignment of priorities to the programs to be evaluated and the questions to be asked
about their performance.
• Agencies that sponsor and fund evaluations of criminal justice programs should assess and assign priorities to the evaluation opportunities
within their scope. Resources should be directed mainly toward evaluations with the greatest potential for practical and policy significance from
expected evaluation results and for which the program circumstances are
amenable to productive research.
• For such public agencies as the National Institute of Justice, that
process should involve input from practitioners, policy makers, and researchers about the practical significance of the knowledge likely to be
generated and the appropriate priorities to apply.
A sponsoring agency cannot launch an impact evaluation with reasonable prospects for success unless the specific program to be evaluated
has been identified; background information has been gathered that indicates that evaluation is feasible; and considerations that describe the key
issues for shaping the design of the evaluation are identified.
• The requisite background work may be done by an evaluator proposing an evaluation prior to submitting the proposal. To stimulate and
capitalize on such situations, sponsoring agencies should consider devoting some portion of the funding available for evaluation to support (a)
researchers proposing early stages of evaluation that address issues of
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
priority, feasibility, and evaluability and (b) opportunistic funding of impact evaluations proposed by researchers who find themselves in those
fortuitous circumstances that allow a strong evaluation to be conducted
of a significant criminal justice program.
• Alternatively, the requisite background work may be instigated by
the sponsoring agency for programs judged to be of high priority for impact evaluation. To accomplish this, agencies should undertake feasibility
or design studies that will assess whether an impact evaluation is likely to
be successful for a program of interest.
• The preconditions for successful impact evaluation are most easily
attained when they are built into a program from the start. Agencies that
sponsor program initiatives should consider which new programs may
be significant candidates for impact evaluation. The program initiative
should then be configured to require or encourage as much as possible
the inclusion of the well-defined program structures, record-keeping and
data collection, documentation of program activities, and other such components supportive of an eventual impact evaluation.
Evaluation design involves many practical and technical considerations related to sampling and the generalizability of results, statistical
power, measurement, methods for estimating program effects, and information that helps explain effects. There are no simple answers to the question of which designs best fit which evaluation situations and all choices
inevitably involve tradeoffs between what is desirable and what is practical and between the relative strengths and weaknesses of different methods. Nonetheless, some general guidelines can be applied when considering the approach to be used for a particular impact evaluation.
• A well-developed and clearly-stated Request for Proposals (RFP)
is the first step in guarding against implementation failure. When requesting an impact evaluation for a program of interest, the sponsoring agency
should specify as completely as possible the evaluation questions to be
answered, the program sites expected to participate, the relevant outcomes, and the preferred methods to be used. Agencies should devote
sufficient resources during the RFP-development stage, including support for site visits, evaluability assessments, pilot studies, pipeline analyses, and other such preliminary investigations necessary to ensure the
development of strong guidance to the field in RFPs.
• Development of the specifications for an impact evaluation (e.g.,
an RFP) and the review of proposals for conducting the evaluation should
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
involve expert panels of evaluators with diverse methodological backgrounds and sufficient opportunity for them to explore and discuss the
trade-offs and potential associated with different approaches.
• In order to strengthen the quality of application reviews, a twostage review is recommended: the policy relevance of the programs under consideration for evaluation should be first judged by knowledgeable policy makers, practitioners, and researchers. Proposals that pass
this screen should then receive a scientific review from a panel of wellqualified researchers, focusing solely on the scientific merit and likelihood of successful implementation of the proposed research.
• Given the state of criminal justice knowledge, randomized experimental designs should be favored in situations where it is likely that they
can be implemented with integrity and will yield useful results. This is
particularly the case where the intervention is applied to units for which
assignment to different conditions is feasible, e.g., individual persons or
clusters of moderate scope such as schools or centers.
• Before an impact evaluation design is implemented, the assumptions on which the validity of its results depends should be made explicit,
the data and analyses required to support credible conclusions about program effects should be identified, and the availability or feasibility of obtaining the required data should be demonstrated.
High-quality evaluation is most likely to occur when (a) the design is
tailored to the respective program circumstances in ways that facilitate
adequate implementation, (b) the program being evaluated understands,
agrees to, and fulfills its role in the evaluation, and (c) problems that arise
during implementation are anticipated as much as possible and dealt with
promptly and effectively.
• Plans and commitments for impact evaluation should be built
into the design of programs during their developmental phase whenever
A detailed management plan should be developed for implementation of an impact evaluation that specifies the key events and activities
and associated timeline for both the evaluation team and the program.
• Knowledgeable staff of the sponsoring agency should monitor the
implementation of the evaluation.
• Especially for larger projects, implementation and problem solving
may be facilitated by support of the evaluation team through such activities as meetings or cluster conferences of evaluators with similar projects
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
for the purpose of cross-project sharing or consultation with advisory
groups of veteran researchers.
The research methods for conducting an impact evaluation, the data
resources needed to adequately support it, and the integration and synthesis of results for policy makers and researchers are all areas in which
the basic tools need further development to advance high-quality evaluation of criminal justice programs. Agencies with a major investment in
evaluation, such as the National Institute of Justice, should devote a portion of available funds to methodological development in areas such as
the following:
• Research aimed at adapting and improving impact evaluation
designs for criminal justice applications; for example, development and
validation of effective uses of alternative designs such as regressiondiscontinuity, selection bias models for nonrandomized comparisons, and
techniques for modeling program effects with observational data.
• Development and improvement of new and existing databases in
ways that would better support impact evaluation of criminal justice programs. Measurement studies that would expand the repertoire of relevant
outcome variables and knowledge about their characteristics and relationships for purposes of impact evaluation (e.g., self-report delinquency and
criminality; official records of arrests, convictions, and the like; measures
of critical mediators).
• Synthesis and integration of the findings of impact evaluations in
ways that would inform practitioners and policy makers about the effectiveness of different types of criminal justice programs and the characteristics of the most effective programs of each type and that would inform
researchers about gaps in the research and the influence of methodological variation on evaluation results.
To support high-quality impact evaluation, the sponsoring agency
must itself incorporate and maintain sufficient expertise to set effective
and feasible evaluation priorities, manage the background preparation
necessary to develop the specifications for evaluation projects, monitor
implementation, and work well with expert advisory boards and review
• Agencies that sponsor a significant portfolio of evaluation research
in criminal justice, such as the National Institute of Justice, should main-
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
tain a separate evaluation unit with clear responsibility for developing
and completing high-quality evaluation projects. To be effective, such a
unit will generally need a dedicated budget, some authority over evaluation research budgets and projects, and independence from undue program and political influence on the nature and implementation of the
evaluation projects undertaken.
• The agency personnel responsible for developing and overseeing
impact evaluation projects should include individuals with relevant research backgrounds who are assigned to evaluation functions and maintained in those positions in ways that ensure continuity of experience with
the challenges of criminal justice evaluation, methodological developments, and the community of researchers available to conduct quality
• The unit and personnel responsible for developing and completing
evaluation projects should be supported by review and advisory panels
that provide expert consultation in developing RFPs, reviewing evaluation proposals and plans, monitoring the implementation of evaluation
studies, and other such functions that must be performed well in order to
facilitate high-quality evaluation research.
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
his is an especially opportune time to consider current practices
and future prospects for the evaluation of criminal justice programs. In recent years there have been increased calls from policy
makers for “evidence-based practice” in health and human services that
have extended to criminal justice as, for example, in the joint initiative of
the Office of Justice Programs and the Coalition for Evidence-Based Policy
on evidence-based crime and substance-abuse policy.1 This trend has been
accompanied by various organized attempts to use the findings of evaluation research to determine “what works” in criminal justice. The Maryland Report (Sherman et al., 1997) responded to a request by Congress to
review existing research and identify effective programs and practices.
The Crime and Justice Group of the Campbell Collaboration has embarked
on an ambitious effort to develop systematic reviews of research on the
effectiveness of crime and justice programs. The OJJDP Blueprints for Violence Prevention project identifies programs whose effectiveness is demonstrated by evaluation research and other lists of programs alleged to be
effective on the basis of research have proliferated (e.g., the National Registry of Effective Programs sponsored by the Substance Abuse and Mental
Health Services Administration). In addition, the National Research
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Council’s (NRC) Committee on Law and Justice has been commissioned
to prepare reports assessing research evidence on such topics as the effectiveness of policing policies (NRC, 2004), firearms policies (NRC, 2005),
illicit drug policies (NRC, 2001), and the prevention, treatment, and control of juvenile crime (NRC and Institute of Medicine, 2001).
These developments reflect recognition that effective guidance of
criminal justice policy and practice requires evidence about the effects of
those policies and practices on the populations and conditions they are
intended to influence. For example, knowledge of the ability of various
programs to reduce crime or protect potential victims allows resources to
be allocated in ways that support effective programs and efficiently promote these outcomes. The role of evaluation research is to provide evidence about these kinds of program effects and to do so in a manner that
is accessible and informative to policy makers. Fulfilling that function, in
turn, requires that evaluation research be designed and implemented in a
manner that provides valid and useful results of sufficient quality to be
relied upon by policy makers.
In this context especially, significant methodological shortcomings
would seriously compromise the value of evaluation research. And, it is
methodological issues that are at the heart of what has arguably been the
most influential stimulus for attention to the current state of evaluation
research in criminal justice. A series of reports2 by the U.S. General Accounting Office has been sharply critical of the evaluation studies conducted under the auspices of the Department of Justice. Because several
offices within the Department of Justice are major funders of evaluation
research on criminal justice programs, especially the larger and more influential evaluation projects, this is a matter of concern not only to the
Department of Justice, but to others who conduct and sponsor criminal
justice evaluation research.
The GAO reports focus on impact evaluation, that is, assessment of
the effects of programs on the populations or conditions they are intended
2Juvenile Justice: OJJDP Reporting Requirements for Discretionary and Formula Grantees and
Concerns About Evaluation Studies (GAO, 2001). Drug Courts: Better DOJ Data Collection and
Evaluation Efforts Needed to Measure Impact of Drug Court Programs (GAO, 2002a). Justice Impact Evaluations: One Byrne Evaluation Was Rigorous; All Reviewed Violence Against Women
Office Evaluations Were Problematic (GAO, 2002b). Violence Against Women Office: Problems
with Grant Monitoring and Concerns About Evaluation Studies (GAO, 2002c). Justice Outcome
Evaluations: Design and Implementation of Studies Require More NIJ Attention (GAO, 2003a).
Program Evaluation: An Evaluation Culture and Collaborative Partnerships Help Build Agency
Capacity (GAO, 2003b).
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
to change. The impact evaluations selected for review cover a wide range
of programs, most of which are directed toward a particular criminal justice problem or population and implemented in multiple sites (see Box
1-1). As such, these programs are relatively representative of the kinds of
initiatives that a major funder of criminal justice programs might support
and wish to evaluate for impact.
The GAO review of the design and implementation of the impact
evaluations for these programs identified a number of problem areas that
highlight the major challenges that must be met in a sound impact evaluation. These generally fell into two categories: (a) deficiencies in the evaluation design and procedures that were initially proposed and (b) difficulties implementing the evaluation plan. It is indicative of the magnitude of
the challenge posed by impact evaluation at this scale that, of the 30 evaluations for the programs shown in Box 1-1, one or both of these problems
were noted for 20 of them, and some of the remaining 10 were still in the
proposal stage and had not yet been implemented.
The most frequent deficiencies in the initial plan or the implementation of the evaluation identified in the GAO reviews were as follows:
• The sites selected to participate in the evaluation were not representative of the sites that had received the program.
• The program participants selected at the evaluation sites were not
representative of the population the program served.
• Pre-program baseline data on key outcome variables were not included in the design or could not be collected as planned so that change
over time could not be assessed.
• The intended program outcomes (e.g., reduced criminal activity,
drug use, or victimization in contrast to intermediate outcomes such as
increases in knowledge) were not measured or outcome measures with
doubtful reliability and validity were used.
• No means for isolating program effects from the influence of external factors on the outcomes, such as a nonparticipant comparison group
or appropriate statistical controls, were included in the design or the
planned procedure could not be implemented.
• The program and comparison groups differed on outcome-related
characteristics at the beginning of the program or became different due to
differential attrition before the outcomes were measured.
• Data collection was problematic; needed data could not be obtained
or response rates were low when it was likely that those who responded
differed from those who did not.
No recent review of evaluation research in the general criminal justice literature provides an assessment of methodology that is as compre-
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
BOX 1-1
Programs Represented in the Impact Evaluation Plans and
Projects Reviewed in Recent GAO Reports
Arrest Policies Program (treating domestic violence as a serious violation of
Breaking the Cycle (comprehensive service for adult offenders with druguse histories)
Chicago’s Citywide Community Policing Program (policing organized
around small geographic areas)
Children at Risk Program (comprehensive services for high-risk youth)
Comprehensive Gang Initiative (community-based program to reduce gangrelated crime)
Comprehensive Service-Based Intervention Strategy in Public Housing (program to reduce drug activity and crime)
Corrections and Law Enforcement Family Support (CLEFS) (stress intervention programs for law enforcement officers and families)
Court Monitoring and Batterer Intervention Programs (batterer counseling
programs and court monitoring)
Culturally Focused Batterer Counseling for African-American Men
Domestic Violence Victims’ Civil Legal Assistance Program (legal services
for victims of domestic violence)
Drug Courts (specialized court procedures and services for drug offenders)
Enforcement of Underage Drinking Laws Program
Gang Resistance Education and Training (GREAT) Program (school-based
gang prevention program)
Intensive Aftercare (programs for juvenile offenders after release from confinement)
Juvenile Justice Mental Health Initiative (mental health services to families
hensive as that represented in the collection of GAO reports summarized
above. What does appear in that literature in recent years is considerable
discussion of the role and applicability of randomized field experiments
for investigating program effects. In Feder and Boruch (2000), a special
issue of Crime and Delinquency was devoted to the potential for experiments in criminal justice settings, followed a few years later by a special
issue (Weisburd, 2003) of Evaluation Review on randomized trials in criminology. More recently, a new journal, Experimental Criminology, was
launched with an explicit focus on experimental and quasi-experimental
research for investigating crime and justice practice and policy. The view
that research on the effects of criminal justice interventions would be improved by greater emphasis on randomized experiments, however, is by
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
of delinquent youths with serious emotional disturbances)
Juvenile Mentoring Program (volunteer adult mentors for at-risk youth)
Multi-Site Demonstration for Enhanced Judicial Oversight of Domestic Violence Cases (coordinated response to domestic violence offenses)
Multi-Site Demonstration of Collaborations to Address Domestic Violence
and Child Maltreatment (community-based programs for coordinated
response to families with co-occurring domestic violence and child maltreatment)
Parents Anonymous (support groups for child abuse prevention)
Partnership to Reduce Juvenile Gun Violence Program (coordinated community strategies for selected areas in cities)
Project PATHE (school-based violence prevention)
Reducing Non-Emergency Calls to 911: Four Approaches
Responding to the Problem Police Officer: Early Warning Systems (identification and treatment for officers whose behavior is problematic)
Rural Domestic Violence and Child Victimization Enforcement Grant Program (coordinated strategies for responding to domestic violence)
Rural Domestic Violence and Child Victimization Grant Program (cooperative community-based efforts to reduce domestic violence, dating
violence, and child abuse)
Rural Gang Initiative (community-based gang prevention programs)
Safe Schools/Healthy Students (school services to promote healthy development and prevent violence and drug abuse)
Safe Start Initiative (integrated service delivery to reduce impact of family
and community violence on young children)
STOP Grant Programs (culture-specific strategies to reduce violence against
Indian women)
Victim Advocacy with a Team Approach (domestic violence teams to assist
no means universal. The limitations of experimental methods for such
purposes and alternatives using econometric modeling have also received
critical attention (e.g., Heckman and Robb, 1985; Manski, 1996).
In the context of these various concerns about evaluation methods
and quality, the National Institute of Justice asked the NRC Committee on
Law and Justice to organize a workshop on improving the evaluation of
criminal justice programs and to follow up with a report that extracted
guidance for effective evaluation practices from those proceedings. The
Academies appointed a small steering committee to guide workshop de-
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
velopment. The workshop was held in September 2003, and this report is
the result of the efforts of the steering committee to further develop the
themes raised there and integrate them as constructive advice about conducting evaluations of criminal justice programs.
The purpose of the Workshop on Improving the Evaluation of Criminal Justice Programs was to foster broader implementation of credible
evaluations in the field of criminal justice by promoting informed discussion of:
• the repertoire of applicable evaluation methods;
• issues in matching methods to program and policy circumstances;
• the organizational infrastructure requirements for supporting
sound evaluation.
This purpose was pursued through presentation and discussion of
case examples of evaluation-related studies selected to represent the methods and challenges associated with research at each of three different levels of intervention. The three levels are distinguished by different social
units that are the target of intervention and thus constitute the units of
analysis for the evaluation design. The levels and the exemplary evaluation studies and assigned discussant for each were as follows:
(1) Interventions directed toward individuals, a situation in which
there are generally a relatively large number of units within the
scope of the program being evaluated and potential for assigning
those units to different intervention conditions.

Multidimensional Family Foster Care (Patricia Chamberlain)
A Randomized Experiment: Testing Inmate Classification
Systems (Richard Berk)
Discussant (Adele Harrell)
(2) Interventions with neighborhoods, schools, prisons, or communities, a situation generally characterized by relatively few units
within the scope of the program and often limited potential for
assigning those units to different intervention conditions.

Hot Spots Policing and Crime Prevention (Anthony Braga)
Communities Mobilizing for Change (Alex Wagenaar)
Discussant (Edward Zigler)
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
(3) Interventions at the broad local, state, or national level where the
program scope encompasses a macro unit and there is virtually
no potential for assigning units to different intervention

An Empirical Analysis of LOJACK (Steven Levitt)
Racial Bias in Motor Vehicle Searches (Petra Todd)
Discussant (John V. Pepper)
After the research case studies in each category were presented, their
implications for conducting high-quality evaluations were discussed. A
final panel at the end of the workshop then discussed the infrastructure
requirements for strong evaluations.

Infrastructure Requirements for Consumption (and Production) of Strong Evaluations (Lawrence Sherman)
Recommendations for Evaluation (Robert Moffitt)
Bringing Evidence-Based Policy to Substance Abuse and
Criminal Justice (Jon Baron)
Papers presented at the workshop are provided on the Committee on
Law and Justice Website at
The intent of this report is not to summarize the workshop but, rather,
to draw upon its contents to highlight the major considerations in developing and implementing evaluation plans for criminal justice programs.
In particular, the report is organized around five interrelated questions
that require thoughtful analysis in the development of any evaluation
plan, with particular emphasis on impact evaluation:
1. What questions should the evaluation address?
2. When is it appropriate to conduct an impact evaluation?
3. How should an impact evaluation be designed?
4. How should the evaluation be implemented?
5. What organizational infrastructure and procedures support highquality evaluation?
In the pages that follow, each of these questions is examined and advice is distilled from the workshop presentations and discussion, and from
subsequent committee deliberations, for answering them in ways that will
help improve the evaluation of criminal justice programs. The intended
audience for this report includes NIJ, the workshop sponsor and a major
funder of criminal justice evaluations, but also other federal, state, and
local agencies, foundations, and other such organizations that plan, sponsor, or administer evaluations of criminal justice programs.
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
What Questions Should the
Evaluation Address?
riminal justice programs arise in many different ways. Some are
developed by researchers or practitioners and fielded rather narrowly at first in demonstration projects. The practice of arresting
perpetrators of domestic violence when police were called to the scene
began in this fashion (Sherman, 1992). Others spring into broad acceptance as a result of grass roots enthusiasm, such as Project DARE with its
use of police officers to provide drug prevention education in schools.
Still others, such as intensive probation supervision, arise from the challenges of everyday criminal justice practice. Our concern in this report is
not with the origins of criminal justice programs but with their evaluation
when questions about their effectiveness arise among policy makers, practitioners, funders, or sponsors of evaluation research.
The evaluation of such programs is often taken to mean impact evaluation, that is, an assessment of the effects of the program intervention on
the intended outcomes (also called outcome evaluation). This is a critical
issue for any criminal justice program and its stakeholders. Producing
beneficial effects (and avoiding harmful ones) is the central purpose of
most programs and the reason for investing resources in them. For this
reason, all the subsequent chapters of this report discuss various aspects
of impact evaluation.
It does not follow, however, that every evaluation should automatically focus on impact questions (Rossi, Lipsey, and Freeman, 2004; Weiss,
1998). Though important, those questions may be premature in light of
limited knowledge about other aspects of program performance that are
prerequisites for producing the intended effects. Or, they may be inap14
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
propriate in the context of issues with greater political salience or more
relevance to the concerns of key audiences for the evaluation.
In particular, questions about aspects of program performance other
than impact that may be important to answer in their own right, or in
conjunction with addressing impact questions, include the following:
1. Questions about the need for the program, e.g., the nature and
magnitude of the problem the program addresses and the characteristics
of the population served. Assessment of the need for a program deals
with some of the most basic evaluation questions—whether there is a
problem that justifies a program intervention and what characteristics of
the problem make it more or less amenable to intervention. For a program
to reduce gang-related crime, for instance, it is useful to know how much
crime is gang-related, what crimes, in what neighborhoods, and by which
2. Questions about program conceptualization or design, e.g.,
whether the program targets the appropriate clientele or social units, embodies an intervention that could plausibly bring about the desired
changes in those units and involves a delivery system capable of applying the intervention to the intended units. Assessment of the program
design examines the soundness of the logic inherent in the assumption
that the intervention as intended can bring about positive change in the
social conditions to which it is directed. One might ask, for instance,
whether it is a sound assumption that prison visitation programs for juvenile offenders, such as Scared Straight, will have a deterrent effect for
impressionable antisocial adolescents (Petrosino et al., 2003a).
3. Questions about program implementation and service delivery,
e.g., whether the intended intervention is delivered to the intended clientele in sufficient quantity and quality, if the clients believe they benefit
from the services, and how well administrative, organizational, personnel, and fiscal functions are handled. Assessment of program implementation, often called process evaluation, is a core evaluation function aimed
at determining how well the program is operating, especially whether it is
actually delivering enough of the intervention to have a reasonable chance
of producing effects. With a program for counseling victims of domestic
violence, for example, an evaluation might consider the number of eligible victims who participate, attendance at the counseling sessions, and
the quality of the counseling provided.
4. Questions about program cost and efficiency, e.g., what the program costs are per unit of service, whether the program costs are reasonable in relation to the services provided or the magnitude of the intended
benefits, and if alternative approaches would yield equivalent benefits at
equal or lower cost. Cost and efficiency questions about the delivery of
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
services relate to important policy and management functions even without evidence that those services actually produce benefits. Cost-benefit
and cost-effectiveness assessments are especially informative, however,
when they build on the findings of impact evaluation to examine the cost
required to attain whatever effects the program produces. Cost questions
for a drug court, for instance, might ask how much it costs per offender
served and the cost for each recidivistic drug offense prevented.
The design and implementation of impact evaluations capable of producing credible findings about program effects are challenging and often
costly. It may not be productive to undertake them without assurance
that there is a well-defined need for the program, a plausible program
concept for bringing about change, and sufficient implementation of the
program to potentially have measurable effects. Among these, program
implementation is especially critical. In criminal justice contexts, the organizational and administrative demands associated with delivering program services of sufficient quality, quantity, and scope to bring about
meaningful change are considerable. Offenders often resist or manipulate
programs, victims may feel threatened and distrustful, legal and administrative factors constrain program activities, and crime, by its nature, is
difficult to control. Under these circumstances, programs are often implemented in such weak form that significant effects cannot be expected.
Information about the nature of the problem a program addresses,
the program concept for bringing about change, and program implementation are also important to provide an explanatory context within which
to interpret the results of an impact evaluation. Weak effects from a poorly
implemented program leave open the possibility that the program concept is sound and better outcomes would occur if implementation were
improved. Weak effects from a well-implemented program, however, are
more likely to indicate theory failure—the program concept or approach
itself may be so flawed that no improvement in implementation would
produce the intended effects. Even when positive effects are found, it is
generally useful to know what aspects of the program circumstances
might have contributed to producing those effects and how they might be
strengthened. Absent this information, we have what is often referred to
as a “black box” evaluation—we know if the expected effects occurred
but have no information about how or why they occurred or guidance for
how to improve on them.
An important step in the evaluation process, therefore, is developing
the questions the evaluation is to answer and ensuring that they are appropriate to the program circumstances and the audience for the evaluation. The diversity of possible evaluation questions that can be addressed
and the importance of determining which should be addressed in any
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
given evaluation have several implications for the design and management of evaluation research. Some of the more important of those implications are discussed below.
Evaluations that focus on different questions, assess different programs in different circumstances, and respond to the concerns of different
audiences generally require different designs and methods. There will
thus be no single template or set of criteria for how evaluations should be
conducted or what constitutes high quality. That said, however, there are
several recognizable forms of evaluation to which similar design and quality standards apply (briefly described in Box 2-1).
A common and significant distinction is between evaluations concerned primarily with program process and implementation and those
focusing on program effects. Process evaluations address questions about
how and how well a program functions in its use of resources and delivery of services. They are typically designed to collect data on selected
performance indicators that relate to the most critical of these functions,
for instance, the amount, quality, and coverage of services provided. These
performance indicators are assessed against administrative goals, contractual obligations, legal requirements, professional norms, and other such
applicable standards. The relevant performance dimensions, indicators,
and standards will generally be specific to the particular program. Thus
this form of evaluation will be tailored to the program being evaluated
and will show little commonality across programs that are not replicates
of each other.
Process evaluations may assess program performance at one point in
time or be configured to produce periodic reports on program performance, generally referred to as “performance monitoring.” In the latter
case, the procedures for collecting and reporting data on performance indicators are often designed by an evaluation specialist but then routinized in the program as a management information system (MIS). When
conducted as a one-time assessment, however, process evaluations are
generally the responsibility of a designated evaluation team. In that case,
assessment of program implementation may be the main aim of the evaluation, or it may be integrated with an impact evaluation.
Program performance monitoring sometimes involves indicators of
program outcomes. This situation must be distinguished from impact
evaluation because it does not answer questions about the program’s effects on those outcomes. A performance monitoring scheme, for instance,
might routinely gather information about the recidivism rates of the of-
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
BOX 2-1
Major Forms of Program Evaluation
Process or Implementation Evaluation
An assessment of how well a program functions in its use of resources,
delivery of the intended services, operation and management, and the like.
Process evaluation may also examine the need for the program, the program concept, or cost.
Performance Monitoring
A continuous process evaluation that produces periodic reports on the
program’s performance on a designated set of indicators and is often incorporated into program routines as a form of management information system. It may include monitoring of program outcome indicators but does
not address the program impact on those outcomes.
Impact Evaluation
An assessment of the effects produced by the program; that is, the outcomes for the target population or settings brought about by the program
that would not have occurred otherwise. Impact evaluation may also incorporate cost-effectiveness analysis.
Evaluability Assessment
An assessment of the likely feasibility and utility of conducting an evaluation made before the evaluation is designed. It is used to inform decisions
about whether an evaluation should be undertaken and, if so, what form it
should take.
fenders treated by the program. This information describes the postprogram status of the offenders with regard to their reoffense rates and
may be informative if it shows higher or lower rates than expected for the
population being treated or interesting changes over time. It does not,
however, reveal the program impact on recidivism, that is, what change
in recidivism results from the program intervention and would not have
occurred otherwise.
Impact evaluations, in turn, are oriented toward determining
whether a program produces the intended outcomes, for instance, reduced recidivism among treated offenders, decreased stress for police
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
officers, less trauma for victims, lower crime rates, and the like. The programs that are evaluated may be demonstration programs, such as the
early forms of Multidimensional Treatment Foster Care Program (Chamberlain, 2003), that are not widely implemented and which may be
mounted or supervised by researchers to find out if they work (often
called efficacy studies). Or they may involve programs already rather
widely used in practice, such as drug courts, that operate with representative personnel, training, client selection, and the like (often called effectiveness studies). Such differences in the program circumstances, and
many other program variations, influence the nature of the evaluation,
which must always be at least somewhat responsive to those circumstances. For present purposes, we will focus on broader considerations
that apply across the range of criminal justice impact evaluations.
Determining the priority evaluation questions for a program or group
of programs may itself require some investigation into the program circumstances, stakeholder concerns, utility of the expected information, and
the like. Moreover, in some instances it may be necessary to have the answers to some questions before asking others. For instance, with relatively
new programs, it may be important to establish that the program has
reached an adequate level of implementation before embarking on an outcome evaluation. A community policing program, for instance, could require changes in well-established practices that may occur slowly or not
at all. In addition, any set of evaluation results will almost inevitably raise
additional significant questions. These may involve concerns, for example,
about why the results came out the way they did, what factors were most
associated with program effectiveness, what side effects might have been
missed, whether the effects would replicate in another setting or with a
different population, or whether an efficacious program would prove effective in routine practice.
It follows that producing informative, useful evaluation results may
require a series of evaluation studies rather than a single study. Such a
sustained effort, in turn, requires a relatively long time period over which
the studies will be supported and continuity in their planning, implementation, and interpretation.
The nature of a program, the circumstances in which it is situated, or
the available resources (including time, data, program cooperation, and
evaluation expertise) may be such that evaluation is not feasible for a par-
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
ticular program. Alternatively, the evaluation questions it is feasible to
answer for the program may not be useful to any identifiable audience.
Unfortunately, evaluation is often commissioned and well under way before these conditions are discovered.
The technique of evaluability assessment (Wholey, 1994) was developed
as a diagnostic procedure evaluators could use to find out if a program
was amenable to evaluation and, if so, what form of evaluation would
provide the most useful information to the intended audience. A typical
evaluability assessment considers how well defined the program is, the
availability of performance data, the resources required, and the needs
and interests of the audience for the evaluation. Its purpose is to inform
decisions about whether an evaluation should be undertaken and, if so,
what form it should take. For an agency wishing to plan and commission
an evaluation, especially of a large, complex, or diffuse program, a preliminary evaluability assessment can provide background information
useful for defining what questions the evaluation should address, what
form it should take, and what resources will be required to successfully
complete it. Evaluability assessments are discussed in more detail in
Chapter 3.
The diversity of potential evaluation questions and approaches that
may be applicable to any program allows much room for variation from
one evaluation team to another. Agencies that commission and sponsor
evaluations will experience this variation if the specifications for the evaluations they fund are not spelled out precisely. Such mechanisms as Requests for Proposals (RFPs) and scope of work statements in contracts are
often the initial forms of communication between evaluation sponsors and
evaluators about the questions the evaluation will answer and the form it
will take. Sponsors who clearly specify the questions of interest and the
form in which they expect the answers are more likely to obtain the information they want from an evaluation. At the same time, an evaluation
must be responsive to unanticipated events and circumstances in the field
that necessitate changes in the plan. It is advantageous, therefore, for the
evaluation plan to be both well-specified and also to have provisions for
adaptation and renegotiation when needed.
Development of a well-specified evaluation solicitation and plan shifts
much of the burden for identifying the focal evaluation questions and the
form of useful answers to the evaluation sponsor. More often, in contrast,
the sponsor provides only general guidelines and relies on the applicants
to shape the specific questions and approach. For the sponsor to be proac-
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
tive in defining the evaluation focus, the sponsoring agency and personnel must have the capacity to engage in thoughtful planning prior to commissioning the evaluation. That, in turn, may require some preliminary
investigation of the program circumstances, the policy context, feasibility,
and the like. When a programmatic approach to evaluation is needed, the
planning process must take a correspondingly long-term perspective, with
associated implications for continuity from one fiscal year to the next.
Agencies’ capabilities to engage in focused evaluation planning and
develop well-specified evaluation plans will depend on their ability to
develop expertise and sources of information that support that process.
This may involve use of outside expertise for advice, including researchers, practitioners, and policy makers. It may also require the capability to
conduct or commission preliminary studies to provide input to the process. Such studies might include surveys of programs and policy makers
to identify issues and potential sites, feasibility studies to determine if it is
likely that certain questions can be answered, and evaluability assessments that examine the readiness and appropriateness of evaluation for
candidate programs.
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
When Is an Impact
Evaluation Appropriate?
f the many evaluation questions that might be asked for any
criminal justice program, the one that is generally of most interest to policy makers is, “Does it work?” That is, does the program
have the intended beneficial effects on the outcomes of interest? Policy
makers, for example, might wish to know the effects of a “hot spots” policing program on the rate of violent crime (Braga, 2003) or whether vigorous enforcement of drug laws results in a decrease in drug consumption.
As described in the previous chapter, answering these types of questions
is the main focus of impact evaluation.
A valid and informative impact evaluation, however, cannot necessarily be conducted for every criminal justice program whose effects are
of interest to policy makers. Impact evaluation is inherently difficult and
depends upon specialized research designs, data collections, and statistical analysis (discussed in more detail in the next chapter). It simply cannot be carried out effectively unless certain minimum conditions and resources are available no matter how skilled the researchers or insistent
the policy makers. Moreover, even under otherwise favorable circumstances, it is rarely possible to obtain credible answers about the effects of
a criminal justice program within a short time period or at low cost.
For policy makers and sponsors of impact evaluation research, this
situation has a number of significant implications. Most important, it
means that to have a reasonable probability of success, impact evaluations should be launched only with careful planning and firm indications
that the prerequisite conditions are in place. In the face of the inevitable
limited resources for evaluation research, how programs are selected for
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
impact evaluation may also be critical. Broad priorities that spread resources too thinly may reduce the likelihood that any evaluation can be
carried out well enough to produce credible and useful results. Focused
priorities that concentrate resources in relatively few impact evaluations
may be equally unproductive if the program circumstances for those few
are not amenable to evaluation.
There are no criteria for determining which programs are most appropriate for impact evaluation that will ensure that every evaluation can
be effectively implemented and yield valid findings. Two different kinds
of considerations that are generally relevant are developed here—one relating to the practical or political significance of the program and one relating to how amenable it is to evaluation.
Across the full spectrum of criminal justice programs, those that may
be appropriate for impact evaluation will not generally be identifiable
through any single means or source. Participants in different parts of the
system will have different interests and priorities that focus their attention on different programs. Sponsors and funders of programs will often
want to know if the programs in which they have made investments have
the desired effects. Practitioners may be most interested in evaluations of
the programs they currently use and of alternative programs that might
be better. Policy makers will be interested in evaluations that help them
make resource allocation decisions about the programs they should support. Researchers often focus their attention on innovative program concepts with potential importance for future application.
It follows that adequate identification of programs that may be significant enough to any one of these groups to be candidates for impact
evaluation will require input from informed representatives of that group.
Sponsors of evaluation research across the spectrum of criminal justice
programs will need input from all these groups if they wish to identify
the candidates for impact evaluation likely to be most significant for the
Two primary mechanisms create programs for which impact evaluation may contribute vital practical information. One mechanism is the evolution of innovative programs or the combination of existing program elements into new programs that have great potential in the eyes of the
policy community. Such programs may be developed by researchers or
practitioners and fielded rather narrowly. The practice of arresting perpetrators of domestic violence when police were called to the scene began in
this fashion (Sherman, 1992). With the second mechanism, programs
spring into broad acceptance as a result of grassroots enthusiasm but may
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
lack an empirical or theoretical underpinning. Project DARE, with its use
of police officers to provide drug prevention education in schools, followed that path. Programs stemming from both sources are potentially
significant, though for different reasons, and it would be shortsighted to
focus on one to the exclusion of the other.
Given a slate of candidate programs for which impact evaluation may
have significance for the field from the perspective of one concerned group
or another, it may still be necessary to set priorities among them. A useful
conceptual framework from health intervention research for appraising
the significance of an intervention is summarized in the acronym,
RE-AIM, for Reach, Effectiveness, Adoption, Implementation, and Maintenance (Glasgow, Vogt, and Boles, 1999). When considering whether a
program is a candidate for impact evaluation these elements can be
thought of as a chain with the potential value of an evaluation constrained
by the weakest link in that chain. These criteria can be used to assess a
program’s significance and, correspondingly, the value of evaluation results about its effects. We will consider these elements in order.
Reach. Reach is the scope of the population that could potentially benefit from the intervention if it proves effective. Other things equal, an
intervention validated by evaluation that is applicable to a larger population has more practical significance than one applicable to a smaller
population. Reach may also encompass specialized, hard-to-serve populations for which more general programs may not be suitable. Drug
courts, from this perspective, have great reach because of the high prevalence of substance abuse among offenders. A culture-specific program to
reduce violence against Native American women, however, would also
have reach because there are currently few programs tailored for this
Effectiveness. The potential value of a program is, of course, constrained by its effectiveness when it is put into practice. It is the job of
impact evaluation to determine effectiveness, which makes this a difficult
criterion to apply when selecting programs for impact evaluation. Nonetheless, an informed judgment call about the potential effectiveness of a
program can be important for setting evaluation priorities. For some programs, there may be preliminary evidence of efficacy or effectiveness that
can inform judgment. Consistency with well-established theory and the
clinical judgment of experienced practitioners may also be useful touchstones. The positive effects of cognitive-behavioral therapies demonstrated for a range of mental health problems, for instance, supports the
expectation that they might also be effective for sex offenders.
Adoption. Adoption is the potential market for a program. Adoption
is a complex constellation of ideology, politics, and bureaucratic prefer-
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
ences that is influenced by intellectual fashion and larger social forces as
well as rational assessment of the utility of a program. Given equal effectiveness and ease of implementation, some programs will be less attractive and acceptable to potential users than others. The assessment of those
factors by potential adopters can thus provide valuable information for
prioritizing programs for impact evaluation. The widespread adoption
of bootcamps during the 1990s, for instance, indicated that this type
of paramilitary program had considerable political and social appeal
and was compatible with the program concepts held by criminal justice
Implementation. Some programs are more difficult to implement than
others, and for some it may be more difficult to sustain the quality of the
service delivery in ongoing practice. Other things equal, a program that is
straightforward to implement and sustain is more valuable than a program that requires a great deal of effort and monitoring to yield its full
potential. Mentoring programs as a delinquency prevention strategy for
at-risk juveniles, for instance, are generally easier and less costly to implement than family counseling programs with their requirements for highly
trained personnel and regular meetings with multiple family members.
Maintenance. Maintenance, in this context, refers to the maintenance
of positive program effects over time. The more durable the effect of a
program, the greater is its value as a beneficial social intervention. For
instance, if improved street lighting reduces street crimes by making high
crime areas more visible (Farrington and Welsh, 2002), the effects are not
likely to diminish significantly as long as criminals prefer to conduct their
business away from public view.
Making good judgments on such criteria in advance of an impact
evaluation will rarely be an easy task and will almost always have to be
done on the basis of insufficient information. To assess the potential significance of a criminal justice program and, hence, the potential significance of an impact evaluation of that program, however, requires some
such assessment. Because it is a difficult task, expert criminal justice professionals, policy makers, and researchers should be employed to review
candidate programs, discuss their significance for impact evaluation, and
make recommendations about the corresponding priorities.
A criminal justice program that is significant in terms of the criteria
described above may, nonetheless, be inappropriate for impact evaluation. The nature of the program and its circumstances, the prerequisites
for credible research, or the available resources may fall short of what is
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
required to conduct an adequate assessment of program effects. This is an
unfortunate circumstance, but one that must be recognized in any process
of decision making about where to invest resources for impact evaluation.
The number of impact evaluations found to be inadequately implemented
in the GAO reports reviewed in Chapter 1 of this report is evidence of the
magnitude of the potential difficulties in completing even well-designed
projects of this sort.
At issue is the evaluability of a program—whether the conceptualization, configuration, and situation of a program make it amenable to
evaluation research and, if so, what would be required to conduct the
research. Ultimately, effective impact evaluation depends on four basic
preconditions: (a) a sufficiently developed and documented program to
be evaluated, (b) the ability to obtain relevant and reliable data on the
program outcomes of interest, (c) a research design capable of distinguishing program effects from other influences on the outcomes, and (d) sufficient resources to adequately conduct the research. Item (c), relating to
research design for impact evaluation, poses considerable technical and
practical challenges and, additionally, must be tailored rather specifically
to the circumstances of the program being evaluated. It is discussed in the
next chapter of this report. The other preconditions for effective impact
evaluation are somewhat more general and are reviewed below.
The Program
At the most basic level, impact evaluation is most informative when
there is a well-defined program to evaluate. Finding effects is of little value
if it is not possible to specify what was done to bring about those effects,
that is, the program’s theory of change and the way it is operationalized.
Such a program cannot be replicated nor easily used by other practitioners who wish to adopt it. Moreover, before beginning a study, researchers should be able to identify the effects, positive and negative, that
the program might plausibly produce and know what target population
or social conditions are expected to show those effects.
Programs can be poorly defined in several different ways that will
create difficulties for impact evaluation. One is simply that the intended
program activities and services are not documented, though they may be
well-structured in practice. It is commonplace for many medical and mental health programs to develop treatment protocols—manuals that describe what the treatment is and how it is to be delivered—but this is not
generally the case for criminal justice programs. In such instances, the
evaluation research may need to include an observational and descriptive
component to characterize the nature of the program under consideration.
As mentioned in Chapter 2, a process evaluation to determine how well
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
the program is implemented and how completely and adequately it delivers the intended services is also frequently conducted along with an impact evaluation. These procedures allow any findings about program effects to be accompanied by a description of the program as actually
delivered as well as of the program as intended.
Another variant on the issue of program definition occurs for programs that provide significantly different services to different program
participants, whether inadvertently or by intent. A juvenile diversion
project, for instance, may prescribe quite different services for different
first offenders based on a needs assessment. A question about the impact
of this diversion program may be answered in terms of the average effect
on recidivism across the variously treated juveniles served. The mix of
services provided to each juvenile and the basis for deciding on that mix,
however, may be critical to any success the program shows. If those aspects are not well-defined in the program procedures, it can be challenging for the evaluation to fully specify these key features in a way that
adequately describes the program or permits replication and emulation
One of the more challenging situations for impact evaluation is a
multisite program with substantial variation across sites in how the program is configured and implemented (Herrell and Straw, 2002). Consider,
for example, a program that provides grants to communities to better coordinate the law enforcement, prosecutorial, and judicial response to domestic violence through more vigorous enforcement of existing laws. The
activities developed at each site to accomplish this purpose may be quite
different, as well as the mix of criminal justice participants, the roles designated for them in the program, and the specific laws selected for emphasis. Arguably under such circumstances each site has implemented a
different program and each would require its own impact evaluation. A
national evaluation that attempts to encompass the whole program has
the challenge of sampling sites in a representative manner but, even then,
is largely restricted to examining the average effects across these rather
different program implementations. With sufficient specification of the
program variants and separate effects at each site, more differentiated
findings about impact could be developed, but at what may be greatly
increased cost.
Outcome Data
Impact evaluation requires data describing key outcomes, whether
drawn from existing sources or collected as part of the evaluation. The
most important outcome data are those that relate to the most policyrelevant outcomes, e.g., crime reduction. Even when we observe relevant
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
outcomes, there may be important trade-offs between the sensitivity and
scope of the measure. For example, when evaluating the minimum drinking age laws, Cook and Tauchen (1984) considered whether to use “fatal
nighttime single-vehicle accidents” (which has a high percentage of alcohol-related cases, making it sensitive to an alcohol-oriented intervention)
or an overall measure of highway fatalities (which should capture the full
effect of the law, but is less sensitive to small changes). In some instances,
the only practical measures may be for intermediate outcomes presumed
to lead to the ultimate outcome (e.g., improved conflict-resolution skills
for a violence prevention program or drug consumption during the last
month rather than lifetime consumption). There are several basic features
that should be considered when assessing the adequacy and availability
of outcome data for an impact evaluation. In particular, the quality of the
evaluation will depend, in part, on the representativeness, accuracy, and
accessibility of the relevant data (NRC, 2004).
A fundamental requirement for outcome data is that they represent
the population addressed by the program. The standard scheme for accomplishing this when conducting an impact evaluation is to select the
research participants with a random sample from the target population,
but other well-defined sampling schemes can also be used in some instances. For example, case-control or response-based sampling designs
can be useful for studying rare events. To investigate factors associated
with homicide, a case-control design might select as cases those persons
who have been murdered, and then select as controls a number of subjects
from the same population with similar characteristics who were not murdered. If random sampling or another representative selection is not feasible given the circumstances of the program to be evaluated, the outcome
data, by definition, will not characterize the outcomes for the actual target
population served by the program. Similar considerations apply when
the outcome data are collected from existing records or data archives.
Many of the data sets used to study criminal justice policy are not probability samples from the particular populations at which the policy may
be aimed (see NRC, 2001). The National Crime Victimization Survey
(NCVS), for example, records information on nonfatal incidents of crime
victims but does not survey offenders. Household-based surveys such as
the NCVS and the General Social Survey (GSS) are limited to the population of persons with stable residences, thereby omitting transients and
other persons at high risk for crime and violence. The GSS is representative of the United States and the nine census regions, but it is too sparse
geographically to support conclusions at the finer levels of geographical
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
aggregation where the target populations for many criminal justice programs will be found.
The accuracy of the outcome data available is also an important consideration for an impact evaluation. The validity of outcome data is compromised when the measures do not adequately represent the behaviors
or events the program is intended to affect, as when perpetrators understate the frequency of their criminal behavior in self-report surveys. The
reliability of the data suffers when unsystematic errors are reflected in the
outcome measures, as when arrest records are incomplete. The bias and
noise associated with outcome data with poor validity or reliability can
easily be great enough to distort or mask program effects. Thus credible
impact evaluation cannot be conducted with outcome data lacking sufficient accuracy in either of these ways.
If the necessary outcome data are not accessible to the researcher, it
will obviously not be possible to conduct an impact evaluation. Data on
individuals’ criminal offense records that are kept in various local or regional archives, for instance, are usually not accessible to researchers without a court order or analogous legal authorization. If the relevant authorities are unwilling to provide that authorization, those records become
unavailable as a source of outcome data. The programs being evaluated
may themselves have outcome data that they are not willing to provide to
the evaluator, perhaps for ethical reasons (e.g., victimization reported to
counselors) or because they view it as proprietary. In addition, researchers may find that increasingly stringent Institutional Review Board (IRB)
standards preclude them from using certain sources of data that may be
available (Brainard, 2001; Oakes, 2002). Relevant data collected and
archived in existing databases may also be unavailable even when collected with public funding (e.g., Monitoring the Future; NRC, 2001).
Still another form of inaccessible data is encountered when nonresponse rates are likely to be high for an outcome measure, e.g., when a
significant portion of the sampled individuals decline to respond at all or
fail to answer one or more questions. Nonresponse is an endemic problem in self-report surveys and is especially high with disadvantaged,
threatened, deviant, or mobile populations of the sort that are often involved in criminal justice programs. An example from the report on illicit
drug policy (NRC, 2001:95-96) illustrates the problem:
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Suppose that 100 individuals are asked whether they used illegal drugs
during the past year. Suppose that 25 do not respond, so the nonresponse
rate is 25 percent. Suppose that 19 of the 75 respondents used illegal drugs
during the past year and that the others did not. Then the reported prevalence of illegal drug use is 19/75 = 25.3 percent. However, true prevalence among the 100 surveyed individuals depends on how many of the
nonrespondents used illegal drugs. If none did, then true prevalence is
19/100 = 19 percent. If all did, then true prevalence is [(19 + 25)/100] = 44
percent. If between 0 and 25 nonrespondents used illegal drugs, then
true prevalence is between 19 and 44 percent. Thus, in this example,
nonresponse causes true prevalence to be uncertain within a range of 25
The ability to conduct an adequate impact evaluation of a criminal
justice program will clearly depend on the availability of resources. Relevant resources include direct funding as a major component, but also
encompass a range of nonmonetary considerations. The time available for
the evaluation, for instance, is an important resource. Impact evaluations
not only require that specialized research designs be implemented but
that outcomes for relatively large numbers of individuals (or other affected units) be tracked long enough to determine program effects. Similarly, the availability of expertise related to the demanding technical aspects of impact evaluation research, cooperation from the program to be
evaluated, and access to relevant data that has already been collected are
important resources for impact evaluation.
The need for these various resources for an impact evaluation is a
function of the program’s structure and circumstances and the evaluation
methods to be used. For example, evaluations of community-based programs, with the community as the unit of analysis, will require participation by a relatively large numbers of communities. This situation will
make for a difficult and potentially expensive evaluation project. Evaluating a rehabilitation program for offenders in a correctional institution with
outcome data drawn from administrative records, on the other hand,
might require fewer resources.
No agency or group of agencies that sponsor program evaluation will
have the resources to support impact evaluation for every program of
potential interest to some relevant party. If the objective is to optimize the
practical and policy relevance of the resulting knowledge, programs
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
should be selected for evaluation on the basis of (a) the significance of the
program, e.g., the scope of practice and policy likely to be affected and (b)
the extent to which the circumstances of the program make it amenable to
sound evaluation research.
The procedures for making this selection should not necessarily be
the same for both these criteria. Judging the practical importance of a
program validated by impact evaluation requires informed opinion from
a range of perspectives. The same is true for identifying new program
concepts that are ripe for evaluation study. Surveys or expert review procedures that obtain input from criminal justice practitioners, policy makers, advocacy groups, researchers, and the like might be used for this
With a set of programs judged significant identified, assessment of
how amenable they are to sound impact evaluation research is a different
matter. The expertise relevant to this judgment resides mainly with evaluation researchers who have extensive field experience conducting impact
evaluations of criminal justice programs. This expertise might be marshaled through a separate expert review procedure, but there are inherent
limits to that approach if the expert informants have insufficient information about the programs at issue. Trustworthy assessments of program
evaluability depend upon rather detailed knowledge of the nature of the
program and its services, the target population, the availability of relevant
data, and a host of other such matters.
More informed judgments about the likelihood of successful impact
evaluation will result if this information is first collected in a relatively
systematic manner from the programs under consideration. The procedure for accomplishing this is called evaluability assessment (introduced in
Chapter 2). The National Institute of Justice has recently begun conducting evaluability assessments as part of its process for selecting programs
for impact evaluation. Their procedure1 involves two stages: an initial
screening using administrative records and telephone inquiries plus a site
visit to programs that survive the initial screening. The site visit involves
observations of the project as well as interviews with key project staff, the
project director, and (if appropriate) key partners and members of the
target population. Box 3-1 lists some of the factors assessed at each of
these stages.
The extent to which the results of such an assessment are informative
when considering programs for impact evaluation is illustrated by NIJ’s
1There are actually two different assessment tools —one for local and another for national
programs. This description focuses on the local assessment instrument.
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
experience with this procedure. In the most recent round of evaluability
assessments, a pool of approximately 200 earmarked programs was reduced to only eight that were ultimately judged to be good candidates for
an impact evaluation that would have a reasonable probability of yielding useful information.
BOX 3-1
Factors Considered in Each Stage of NIJ Evaluability Assessments
Initial Project Screening

What do we already know about projects like these?
What could an evaluation of this project add to what we know?
Which audiences would benefit from this evaluation?
What could they do with the findings?
Is the grantee interested in being evaluated?
What is the background/history of this project?
At what stage of implementation is it?
What are the project’s outcome goals in the view of the project
Does the proposal/project director describe key project elements?
Do they describe how the project’s primary activities contribute to
Can you sketch the logic by which activities should affect goals?
Are there other local projects providing similar services that could be
used for comparison?
Will samples that figure in outcome measurement be large enough
to generate statistically significant findings for modest effect sizes?
Is the grantee planning an evaluation?
What data systems exist that would facilitate evaluation?
What are the key data elements contained in these systems?
Are there data to estimate unit costs of services or activities?
Are there data about possible comparison samples?
In general, how useful are the data systems to an impact evaluation?
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Site Visit

Is the project being implemented as advertised?
What is the intervention to be evaluated?
What outcomes could be assessed? By what measures?
Are there valid comparison groups?
Is random assignment possible?
What threats to a sound evaluation are most likely to occur?
Are there hidden strengths in the project?
What are the sizes and characteristics of the target populations?
How is the target population identified (i.e., what are eligibility
criteria)? Who/what gets excluded as a target?
Have the characteristics of the target population changed over time?
How large would target and comparison samples be after one year of
What would the target population receive in a comparison sample?
What are the shortcomings/gaps in delivering the intervention?
What do recipients of the intervention think the project does?
How do they assess the services received?
What kinds of data elements are available from existing data sources?
What specific input, process, and outcome measures would they
How complete are data records? Can you get samples?
What routine reports are produced?
Can target populations be followed over time?
Can services delivered be identified?
Can systems help diagnose implementation problems?
Does staff tell consistent stories about the project?
Are their backgrounds appropriate for the project’s activities?
What do partners provide/receive?
How integral to project success are the partners?
What changes is the director willing to make to support the
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
How Should an Impact
Evaluation Be Designed?
ssuming that a criminal justice program is evaluable and an impact evaluation is feasible, an appropriate research design must
be developed. The basic idea of an impact evaluation is simple.
Program outcomes are measured and compared to the outcomes that
would have resulted in the absence of the program. In practice, however,
it is difficult to design a credible evaluation study in which such a comparison can be made. The fundamental difficulty is that whereas the program being evaluated is operational and its outcomes are observable, at
least in principle, the outcomes in the absence of the program are counterfactual and not observable. This situation requires that the design provide
some basis for constructing a credible estimate of the outcomes for the
counterfactual conditions.
Another fundamental characteristic of impact evaluation is that the
design must be tailored to the circumstances of the particular program
being evaluated, the nature of its target population, the outcomes of interest, the data available, and the constraints on collecting new data. As a
result, it is difficult to define a “best” design for impact evaluation a priori.
Rather, the issue is one of determining the best design for a particular
program under the particular conditions presented to the researcher when
the evaluation is undertaken. This feature of impact evaluation has significant implications for how such research should be designed and also
for how the quality of the design should be evaluated.
Copyright National Academy of Sciences. All rights reserved.
Improving Evaluation of Anticrime Programs
Establishing credible estimates of what the outcomes would have been
without the program, all else equal, is the most demanding part of impact
evaluation, but also the most critical. When those estimates are convincing, the effects found in the evaluation can be attributed to the program
rather than to any of the many other possible influences on the outcome
variables. In this case, the evaluation is considered to have high internal
validity. For example, a simple comparison of recidivism rates …
Purchase answer to see full

We offer the bestcustom writing paper services. We have done this question before, we can also do it for you.

Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.