0% found this document useful (0 votes)
48 views9 pages

Results Reporting in Single Case Experiments and Single Case Meta-Analysis

This paper addresses common omissions in reporting results from single case experimental designs. It discusses inconsistencies in implementing reporting guidelines and reasons for continued limitations and omissions. These include evolving standards over time and mixed messages regarding appropriate analysis methods. The paper aims to improve reporting for individual studies and meta-analyses by providing more explicit examples and justification.

Uploaded by

eup_1983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views9 pages

Results Reporting in Single Case Experiments and Single Case Meta-Analysis

This paper addresses common omissions in reporting results from single case experimental designs. It discusses inconsistencies in implementing reporting guidelines and reasons for continued limitations and omissions. These include evolving standards over time and mixed messages regarding appropriate analysis methods. The paper aims to improve reporting for individual studies and meta-analyses by providing more explicit examples and justification.

Uploaded by

eup_1983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Research in Developmental Disabilities 79 (2018) 10–18

Contents lists available at ScienceDirect

Research in Developmental Disabilities


journal homepage: www.elsevier.com/locate/redevdis

Results reporting in single case experiments and single case meta-


T
analysis

Kimberly J. Vannesta, , Corey Peltierb, April Haasa
a
Texas A&M University, United States
b
University of Oklahoma, United States

A R T IC LE I N F O ABS TRA CT

Keywords: Single Case Experimental Design is a discipline grounded in applied behavior analysis where the
Single case needs of individual clients and the application of scientific inquiry are fundamental tenets. These
Experimental design two principles remain tantamount in the conduct of research using this methodology and the
Effect size expansion of the method into evidence-based practice determinations. Although recommenda-
Results
tions for quality indicators are widespread, implementation is not. Concurrent to the rise of
Reporting
Single case research
quality indicators is an increasing interest in analysis methodology. Visual analysis has a history
Guidelines of application and validity, newer forms of analysis less so. While some argue for concordance
between the two, it may be the differences that are worth exploration in understanding char-
acteristics of trend and variability in much of the published literature. Design choice and visual
analysis decisions are rarely fully articulated. Statistical analyses are likewise inadequately jus-
tified or described. Recommendations for the explicit language of reporting as derived from prior
meta-analysis and a current review of two leading journals provides a scaffold consistent with
existing guidelines but additive in detail, exemplars, and justification. This is intended to improve
reporting of results for individual studies and their potential use in future meta-analytic work.

What this paper adds

This paper addresses common omissions in the current literature and addresses a need for more explicit illustration. Although
visual, statistical and meta-analysis standards are addressed in existing documents, less attention is given specifically to what ade-
quate and explicit reporting looks like in the context of decision making required for analysis, (e.g., phase change comparisons, data
and participants included in design, omnibus, and moderator analysis).
Single case experimental designs (SCED) refer to a specific time-series methodology with unique and identifying characteristics
(Anon, 2008; Kazdin, 1986; Kennedy, 2005). Measures are repeated and may be direct or indirect (e.g., observation or self-rating).
Treatment effects are determined when a functional relation exists between a dependent and independent variable based on an
experimental manipulation. Single cases may be an individual person, behavior, or a group of persons or behaviors (Lobo, Moeyaert,
Cunha, & Babik, 2017). These subjects or participants serve as their own control in the comparison of “phases” such as a non-
treatment or baseline phase and an intervention. Phases may also be two or more interventions. (Beretvas & Chung, 2008;
Kratochwill et al., 2010; Moeyaert, Ugille, Ferron, Beretvas, & Van den Noortgate, 2014; Smith, 2012).
High quality studies contribute to the knowledge base (Faggion & Giannakopoulos, 2013; Hall, Lee, & Zurakowski, 2017; Rotta,
Salgado, Silva, Correr, & Fernandez-Llimos, 2015; Wells, Kolt, Marshall, Hill, & Bialocerkowski, 2013). Standards for engaging in


Corresponding author.
E-mail address: [email protected] (K.J. Vannest).

https://doi.org/10.1016/j.ridd.2018.04.029
Received 4 November 2017; Received in revised form 24 April 2018; Accepted 30 April 2018
Available online 28 June 2018
0891-4222/ © 2018 Elsevier Ltd. All rights reserved.
K.J. Vannest et al. Research in Developmental Disabilities 79 (2018) 10–18

high quality studies using these designs have been available for more than a decade (Horner et al., 2005) and are now a component of
federal funding expectations (IES, WWC, 2017a,b). Many reporting guidelines, checklists, rubrics, and recommendations exist
(Maggin, Briech, Chafouleas, Ferguson, & Clark, 2014) and more continue to appear and evolve. As illustration, Tate et al. (2008)
provided an 11-item rating scale for quality and use of statistics field tested with 85 reports and 0.84 reliability overall. This was
followed by a “radical revision” (p. 619) and published as a Risk of Bias Scale with improved reliability on subscales (0.87–0.95). It
contained an increase in the number of items from 11 to 15, a change in the scale from binary (2) to three points, and a broadening
from SCED to include n = 1 cases (Tate et al., 2013). Later, Tate et al. (2016) developed a 26-item checklist for reporting guidelines,
Single-Case Reporting Guideline In BEhavioural Interventions (Tate et al., 2016) with global expert consensus. This document
provides “rational and examples of adequate reporting” (p. 44). Alternate versions appear in Wolery, Dunlap, and Ledford (2011).

1. Inconsistencies in guideline implementation

Problems continue to exist in the quality of reporting results for individual studies and meta-analysis. Evaluations of quality in the
published research indicate just 1/3 meet or exceed criteria (Moeller, Dattilo, & Rusch, 2014). Meta-analysis of single-case experi-
mental design are also notably deficient in reporting overall (Jamshidi et al., 2017) and particularly related to publication bias
(Denis, Van den Noortgate, & Maes, 2011; Gage, Cook, & Reichow, 2017; Goldman & Meghan, 2016; Gough, Oliver, & Thomas, 2013;
Talbott, Maggin, Van Acker & Kumm, 2017). Other critical omissions relate to fidelity of implementation, treatment integrity and
social validity (Brock & Huber, 2017; Kalef, Reid, & MacDonald, 2013; King, Lemons, & Davidson, 2016; Snodgrass, Chung, Meadan
& Halle, 2018). Some issues may be problems with the quality of the study itself. Examples include missing justification of design
selection in relationship to threats to internal validity (Moeller et al., 2014), incomplete or thin participant description (Kalef et al.,
2013), insufficient replications of treatment effect (King, Lemmon, & Davidson, 2016), a lack of generalization and maintenance data
(Neely, Garcia, Bankston, & Green, 2018), and a lack of decision making rules and problems with reliability in visual analysis (Ninci,
Vannest, Willson, & Zhang, 2015).
The assertion to “always report effect sizes” (Wilkinson and the Task Force on Statistical Inference 1999, p. 599); or that “it is
almost always necessary to include some measure of effect in the results section” (APA Publication Manual 2010, p. 34) is not widely
applied in either group or SCED. Some reports indicate only half (of research studies) report measures of effect; (Meline & Wang,
2004; Peng, Chen, Chiang & Chiang, 2013). Studies that do may fail to describe the relationship between the effect size (ES) and the
conclusions of the study and/or also fail to calculate effects for all results (Alhija & Levy 2009; Fritz, Morris, & Richler, 2012; Odgaard
& Fowler 2010; Zientek, Capraro, & Capararo, 2008). Additional errors in reporting results include unaddressed discrepant p-values
and effect sizes (Peng et al., 2013) and large numbers of studies without experimental control (Thompson, 2006).

2. Reasons for continued limitations and omissions

Problematic reporting of SCED results may be related to evolving or conflicting expectations in the standards across time (Fidler,
2005; Alhija & Levy, 2009; Wood, 2017). Problems may also be related to mixed messages and controversy regarding the type of
analysis (i.e., Baer, 1977; Harrison, Thompson, & Vannest, 2009; Michael, 1974; Maggin et al., 2014; Parsonson, Baer, Kratochwill, &
Levin 1992; Perone, 1999; Tincani & Travers, 2017). Although the expectations and practices for reporting the results of visual
analysis appear widely accepted and clearly articulated (Baer, Wolf, & Risley, 1968; Cooper, Heron, & Heward, 2007; Lane & Gast,
2014; Wolery, Busick, Reichow, & Barton, 2010). It appears the addition of statistical analysis is at least part of the confound.
However, the use of both visual and statistical analysis is advocated by many (Brossart, Parker, Olson, & Mahadevan, 2006;
Brossart, Vannest, Davis & Patience, 2014; Franklin, Allison & Gorman, 1996; Manolov, 2017) and indicated by What Works
Clearinghouse (WWC) design standards v3 after a study meets quality indicators (WWC, 2017a,b). Recommending multiple forms of
analysis to understand treatment effects, particularly in applied environments is not new “The environment is not under the im-
mediate control of the observer… we do the best we can, calling in all observational, experimental, and statistical aids we can
summon” (Watson, 1919, p. 28). The pairing of visual and statistical analyses provides a triangulation of information for under-
standing treatment effects, certainly there is room for more clarity, illustration, and training.

3. Purpose of the manuscript

This paper addresses some of the prevalent and persistent omissions in results reporting of statistical analysis in SCED. The work
also outlines reporting relationships between statistical and visual analysis; single case reporting and meta-analysis. We are not
addressing the broader quality of a study or meta-analysis, but more specifically addressing the results section and describing why
this level of explicitness is needed. The talking points are intended to be applicable across statistical methodological choices.
Illustrations are provided using Tau-U as a robust and accessible non-parametric index, with the recognition that multiple analyses
are ideal, and that any analysis should fit the characteristics of the data.
The presentation of the recommendations and explicit illustrations are considered additive, have not yet appeared in the lit-
erature, and are intended to improve the science. The paper does three things. First, presents a conceptual framework for how effect
size reporting in SCED is complementary to visual analysis and illustrates explicit language for reporting results. Second, this article
counters the idea that perfect agreement between visual and statistical analysis validates the use of an effect size, instead presenting
correspondence-reporting between visual and statistical analyses as a source of reliability. Third, this paper illustrates re-
commendations and provides justifications for improved results reporting based on prior and current reviews of literature.

11
K.J. Vannest et al. Research in Developmental Disabilities 79 (2018) 10–18

4. Results reporting of statistical analysis in relationship to visual analysis

Visual analysis is a precursor to statistical analysis (WWC, 2017a,b) and may provide utility as a scaffold for results reporting. The
recommendations presented here for reporting statistical results in relationship to visual analysis are derived from classic and
contemporary works (Baer et al., 1968; Cooper et al., 2007; Lane & Gast, 2014; Ledford, Lane, & Severini, 2018; Wolery et al., 2010).
Improved visual and statistical results reporting improve our understanding of the experimental outcomes and facilitate better
subsequent meta-analysis.
A strong reporting of visual analysis references a source of procedural steps or decision making such as those outlined by Lane and
Gast (2014) or those by Parsonson & Baer (1978) and described in WWC guidelines (e.g. baseline prediction, within-phase patterns
given sufficiency and consistency, adjacent phase change contrast, and determination of a functional relation). Ideal result reporting
of statistical analysis would include parallel data reporting between statistical and visual analysis across each point. Illustrative
examples include a) levels and trend for baseline prediction (Parker, Cryer, & Byrns, 2006); b) variability and trend lines for patterns
(Wolery et al., 2010); c) quantitative descriptions of change at onset of intervention and which data were used in the determination
(Lane & Gast, 2014); d) confidence intervals and p values for data sufficiency (Busk & Marascuilo, 2015); e) reliability of functional
analysis determinations between trained raters and explanations of the criteria used in the determination such as overlap, consistency
(Parsonson & Baer, 1992); f) subsequent effect sizes, confidence intervals, and p-values after a functional analysis determination to
support and confirm or identify potential errors in the statistical analysis or visual analysis (Harrington & Velicer, 2015; Hott et al.,
2015; Ledford et al., 2018; Manolov, 2017; Vannest & Ninci, 2016).
Wolery et al. (2010) identified critical areas for consideration in the appropriate use of a statistical analysis. We use these topics as
our scaffold in constructing illustrations for reporting (e.g. include the logic of replication, use all the data, estimate magnitude,
reflect data characteristics, address autocorrelation, and allow moderator analysis). Currently published guidelines do not address
this level of specificity in results reporting (see Table 1; CEC, 2014; WWC, 2017a,b). There are eight critical areas identified, in this
paper we challenge a single recommendation. That is, the need for agreement between visual and statistical analysis.
Given the low rates of agreement across visual analysis found in comprehensive meta-analysis (Ninci et al., 2015; Smith, 2012),
we do not support that a statistical analysis needs to be in high-agreement with visual analysis to provide valuable information. Our
reasons are several. First, either or both types of analysis could be wrong. Second, reliability or agreement between two visual
analysis is rarely reported in the literature (Brossart et al., 2006; Maggin, Johnson, Chafouleas, Ruberto, & Berggren, 2012; Ninci
et al., 2015). Third, the interventionist and the visual analysts are often the same individual. Fourth, the visual analyst is rarely, if
ever, blind to the purpose of the study in deciding if a functional relationship exists and is more typically a part of the author team
with a conflict of interest. Fifth, post-hoc analysis of author reported results finds large disagreement in conclusion validity (Matyas &
Greenwood, 1990).
Instead, we propose that results reporting include two sources of reliability or consistency. First, inter-rater reliability of the visual
analysis (independent, and preferably naïve to the study). This seems justified given broadly documented challenges to visual analysis
(Ninci et al., 2015; Smith, 2012). Second, an evaluation of the agreement between a reliable visual analysis results and the sub-
sequent statistical analysis result (Vannest & Ninci, 2016). Agreement ratings could be aggregated across sub-decisions such as within
baseline trend, variability, overlap, or agreement could be between and omnibus evaluation of certainty in treatment effect and a
calculation of effect size, p-value and confidence interval (Appendix A-WWC, 2017a,b). Correspondence between findings adds
support to conclusions (Gast, 2005). Prior studies find positive predictive value at 0.91 (Bobrovitz & Ottenbacher, 1998). Dis-
agreement in findings identifies potential idiosyncrasies of the data or the scale that create a mismatch for one or both analysis
(Brossart, Vannest, Davis, & Patience, 2014; Ottenbacher, 1993). The finding of 0.14 agreement between ITSA and visual analysis is
illustrative (Harrington & Velicer, 2015) as is the 0.38 agreement across three statistical analysis (Reza Nourbakhsh & Ottenbacher,
1994). This is valuable information for the field in developing understandings of best-fit analysis (Author, Bobrovitz & Ottenbacher,

Table 1
How to explicitly evaluate for consistency between statistical and visual analysis.
A Statistical Analysis Should: Decision making and reporting prompts

Include logic of replication Identify the phase-contrasts selected for analysis (A1-B1) and how they represent the design. Identify if effects are consistent
across demonstrations. If an omnibus ES is this consistent, does this reflect individual scores?
Use all the data Identify data used, reduced, or removed from within a phase (All data in baseline were compared pair-wise to all data in
intervention; 6 data points provided a median score and trend line; a single outlier from a baseline of 5). Identify if data from a
reversal is included and how it is reported or interpreted.
Estimate magnitude Raw score differences are X. The Effect Size index of X shows change as X. This ES index is interpreted as ____. The distribution
of these ES are ____.
Reflect data characteristics Identify and adjust for undesirable baseline trend. Use confidence intervals to inform variability. Identify or plan for
weaknesses in the number of data points that may impact interpretation.
Agree with visual analysis Two (interventionist, independent blind) raters were provided a figure of appropriate ratio and all labels to identify a
functional relation in the design and the size of the effect (s,m,l; %; significance). The correlation is X or the reliability is X.
This is consistent or not. Report disagreements and resolutions. Describe features of that data that may contribute such as
trend, short data streams, missing data, or questionable consistency.
Address Autocorrelation Assess and report on autocorrelation and use an analysis robust enough to address problems.
Moderator Analysis Identify the number of studies, participants, and data contributing to each moderator analysis.

12
K.J. Vannest et al. Research in Developmental Disabilities 79 (2018) 10–18

1998; Manolov & Moeyaert, 2016; Moeyaert, Ugille et al., 2014; Moeyaert, Ferron, Beretvas, & Van den Noortgate, 2014).

5. A best effect size

This decade, a growing number of effect size indexes have been created for or applied to SCED; Bayesian (Rindskopf, 2013),
Multilevel analysis (Baek et al., 2014), Randomization (Heyvaert & Onghena, 2014), d-statistic (Shadish, Hedges, Pustejovsky,
Boyajian et al., 2014), Regression (Swaminathan, Rogers, Horner, Sugai & Smolkowski, 2014), Generalized Least Squares (GLS;
Swaminathan et al., 2014); Standardized Mean Differences (SMD; Shadish, Hedges, & Pustejovsky, 2014); Hierarchical Linear
Modeling (HLM; Gage, Lewis & Stichter, 2012); Non-overlap of All Pairs (NAP; Parker & Vannest, 2009), the Improvement Rate
Difference (IRD; Parker, Vannest, & Brown, 2009) the Percent of Data Exceeding the Phase A Median Trend (PEM-T; Wolery et al.,
2010); Tau, or Tau non overlap and Tau-U (Parker, Vannest, Davis, & Sauber, 2011); and Tau-C (Tarlow, 2017). This list is not
exhaustive but illustrative. The large and growing number is consistent with group research literature where more than 40 effects
sizes are available (Vacha-Haase & Thompson, 2004).
No current or historical consensus exists for use of a single or best effect size index in single case experimental designs (Campbell,
2004; Wolery, Busick, Reichow, & Barton, 2008; WWC, 2017a,b). There is no one best ES index and no singular way to describe the
data in a SCED. It is inappropriate to run multiple analysis looking for the strongest effect size result. It is good practice however, to
identify the appropriate effect size analysis based on an understanding of the characteristics of the data within a design (e.g., trend,
stability, autocorrelation, amount of data). It is also a strength in reporting to enlist the use of multiple analysis calculations and
compare results. Our science should seek greater understanding of the data in order to inform future studies of the variables of
interest, future design choices, future parameters for data requirements, and improved analysis methodologies.

6. Recommendations for reporting results in individual and meta-Analytic studies of SCED

Recommendations, justifications, and concrete illustrations for improved results reporting is the third purpose of the work. The
recommendations that follow are derived from the limitations identified in prior comprehensive reviews of the literature on results
reporting (i.e., Alhija & Levy 2009; Fritz et al., 2012; Meline & Wang, 2004; Peng et al., 2013) and reporting single case results (i.e.,
Crawford, Garthwait, & Porter, 2010; Hott, Limberg, Ohrt & Schmit, 2015; Maggin et al., 2014; Odgaard & Fowler 2010; Shadish &
Sullivan, 2011; Smith, 2012; Tate et al., 2016; Zientek et al., 2008) and a current search in two relevant and high impact factor (IF)
journals; Exceptional Children (EC; IF 2.29) and Journal of Special Education (JSE, IF 1.279).
Reviews and meta-analysis containing single case data (2005–2018), including online pre-publication papers, were examined for
the details contained in results reporting (n = 20). Searches and terms, coding procedures and reliability conform to relevant PRISMA
standards. This work is part of a larger ongoing study and beyond the scope of the current manuscript, specific procedures, extended
results, and raw data are available from the first author. Shortfalls in the examined literature are consistent with findings reported in
comprehensive reviews (as listed already in the introduction with references) e.g. information is lacking for describing the types of
participants and who is included in which analysis, designs described without rationale or relationship to the research question,
information is missing on data are included in analysis. Statistical analyses are varied and incompletely described. PND remains the
most prevalent effect size reported (n = 11) followed by SMD (5), Tau-U (4) and IRD (4), HLM (2), and DHPS, PAND, PEM (once
each). Six Metas included multiple methods or compared methods, none compared statistical outcomes to visual outcomes.
Confidence intervals and p-values were provided in four studies (Brock et al., 2017; Cumming & Draper Rodríguez, 2017; Losinski,
Wiseman, White, & Balluch, 2016); seven studies provided one but not another. Use of quality indicators (QI; e.g., CEC, 2014; Horner
et al., 2005) were employed in thirteen studies, but less than half included score reliability. Studies reporting publication bias
numbered just two.

7. Recommendation I. Report on issues related to participants

The need to adequately describe participants included in an SCED study is widely articulated (e.g., CEC guidelines, WWC
guidelines, SCRIBE guidelines). However, the consistent shortcomings in participant descriptions remains (i.e., ethnicity, diagnostic
information), so problems meeting quality indicators remains (e.g., 67% of studies reviewed met quality indicator (QI) criterion for
participants (Common, Lane, Pustejovsky, Johnson, & Johl, 2017). Meta-analysis of meta-analysis (such as is common in the CBT
literature where large numbers of studies exist) report a failure to adequately describe participants in 68–86% of reviewed Metas
(Moeller et al., 2014).
What is not explicit in existing guidelines is the need to numerically identify the relative amount of data contributed by parti-
cipants. Typically, this information is conveyed in a design figure through visually representing the data points in phases. The reader
can readily access the amount of data per participant by looking at the figure. However, in an omnibus calculation of effect, the data
may be obscured. If an MBD of four participants includes a first baseline phase of 5 data, and a last participant baseline phase of 20
data, or another participant with missing data, those numbers should be reflected clearly to point out these differing relative con-
tributions.
The data included in analysis could be specified in text or table. Consider two examples of a design demonstrating experimental
control. One is a multiple baseline design example where participant A demonstrates an increase in three verbal behaviors. The
second is an MBD were three participants demonstrating increases in a verbal behavior. A functional relation may be demonstrated in
each case, however the threats to internal validity are not the same and the generalizability is not the same. In the first example one

13
K.J. Vannest et al. Research in Developmental Disabilities 79 (2018) 10–18

person demonstrated effect three times. In the second example, three people demonstrated effects just once.
Combining studies for interpretation within a meta-analysis may lose this level of participant information. Reporting the results of
a meta-analysis should include the total number of participants providing data in any omnibus analysis as well as the number of
participants for each moderator analysis. Particularly when moderator analyses are involved, numbers of participants and their data
for each analysis should also be described.
Caution in interpretation is suggested for instances of moderator analysis when the number of studies or participants are few. The
rule of 5-20-3 (5 data points, 20 participants, 3 geographic areas; U.S. Department of Education, Institute of Education Sciences, What
Works Clearinghouse, 2017) for determining an evidence-based practice seems reasonably suggested, but is notably arbitrary for this
application. This does not mean that moderator analysis should not be conducted in smaller groupings, as these results are important
for building theory and suggest new studies to empirically test the role of hypothesized moderator variables. When author teams
contribute multiple studies to a body of literature, or studies with large numbers of participants, transparency indicates identifying if
a disproportionate number of participants reflects work from single research team. Explicit descriptions of data contributed by
participants provides needed information for subsequent use and interpretation.

7.1. Rationale

Unequal amounts of data across participants can statistically skew an aggregation of effect across the design so a large effect for
student A could overwhelm lesser effects for students B and C. An important implication for interpretation of this type of data could
be lost i.e., the effect varied in impact across participants. Illustrating this point in the context of a meta-analysis, envision 5 studies
with a total of 20 participants providing 10 data points (total 200). This same 200 data points could also reflect 10 participants with 6
data (60), and a single participant in a multiple baseline across behaviors contributing the remaining data (140). Another implication
would be obfuscation when those 200 data points derived from a smaller participant groups, i.e., 5 studies with 1 participant each
and 40 data per participant. Reporting participant numbers included in analysis enables the reader to understand if the statistical
representation includes the logic of replication (Wolery et al., 2010)

8. Recommendation II: describe the design in relationship to the research

Identifying the design used in an individual study or listing and counting the designs employed in a meta-analysis is common.
Information is generally not reported about why the design was selected to answer the research question. Information about lim-
itations of the design are likewise missing from most results reporting. Discussing design choice in more detail provides explicit
information for interpretation or subsequent use in a meta-analysis. Referencing how a design choice is related to a research question
may facilitate better statistical analysis choices.
Design choices have consequences for the analysis of SCED. Selecting an alternating treatment design may be well suited for
comparing relative effects of a treatment for a participant but many not provide enough data for subsequent statistical analysis.
Engaging in a MBD where the initial baseline is dramatically shortened so as to expedite treatment onset for the last participant may
be socially valid but risks uninterpretable data in all but the most ideal experimental control situations. Avoiding baseline may be
justified for the client with self-injurious behavior but limits data for analysis. Describing how a design is adequate for addressing
threats to the internal validity of a study guides decision making in analysis. Design logic is purposeful and informs which phases will
be compared and which data are used in a statistical analysis or meta-analytic review.
Decisions about data reduction in statistical analysis may be related to the design. For example, in an MBD across three behaviors,
the full design shows three replications of effect by comparing A to B. Calculating an ES on the design would involve aggregating 3
AvB treatments into a weighted average. If this MBD had a total of 45 data points (10, 15, 20, for each leg of the design respectively)
all the data would be used in the calculation. Compare this to another study with a reversal or probe design, in these examples,
calculating the effect size would likely involve only the first AvB of the design thus reducing a long series of data to half or a fraction
of the total data.
Results reporting in meta-analysis should include the number and types of designs and the relative data contributed to each
analysis. List and quantify the designs used. Grouping designs that appear just once or infrequently as “other” is expedient but less
transparent than listing the specific design variations. Aggregated data is derived from designs demonstrating a functional relation
and meeting quality thresholds, identifying the types of designs and the frequency with which they occur reflects a connection to the
field-based work.

8.1. Rationale

The rigor of individual SCED is sometimes questioned for short data series (Kratochwill et al., 2010) or a lack of randomization
(Kratochwill & Levin, 2010). Designs are sometimes mismatched to the research question and this mismatch impacts results or
interpretability. Selecting a reversal or ABAB design indicates a belief that the behavior will function under stimulus control and is
not learned. The use of a graphic organizer for example assumes the improved behavior will revert without the learning strategy,
when a careful review of prior studies may indicate that participants sometimes adopt the strategy cognitively, or adapts the strategy
by using their own form of scaffolding even without the use of the formal tool and thus the behavior will not go back to baseline
levels. Articulating the design selected and providing the justification in relationship to the research question or population could
avoid errors in design and pre-correct editorial critique. This transparent decision making adds rigor to the science and discussion

14
K.J. Vannest et al. Research in Developmental Disabilities 79 (2018) 10–18

points for future replication (i.e. was this design an ideal match? Would a different design be likely to provide more data or better
demonstration of effect?).
Transparency in describing design choices and the impact of that design on results would aid future researchers in designing
better studies. Describing designs found in a meta-analysis allows inference of data reduction, reveals patterns in a scientific area of
inquiry, and adds to the knowledge base. Adequate design description enables the reader to understand if the statistical re-
presentation reflects the logic of replication and uses all the data (Wolery et al., 2010).

9. Recommendation III. Provide details of the data by analysis

The amount of data used in analyses will vary widely and be inconsistent with the number of participants, behaviors, settings, or
variables under investigation. Data may number 50 in an analysis from one source or several. The amount of data may be dictated by
the analysis (e.g., 50 data points for ARIMA). A MBD procedure would indicate intervention is introduced after stability in baseline or
the prior legs of the design (Ledford & Gast, 2018). A probe design would employ rapidly reversing treatment and baseline conditions
or multiple treatments (Ledford & Gast, 2018). Results on the length of data streams collected should include the number of data in
baseline and treatment(s), the range of data per phase across the design, and the mean number of data points in the design per phase.
Decisions to collect an amount of data should be referenced as either an a-priori determination or related to specific criterion
within the intervention. For example, “The researcher identified a minimum of 9 data points would be needed to reduce error and
increase certainty” or “baseline data of 5 points were collected to align with the What Works Clearinghouse standards (2017) for
evidence based practices” or even “baseline data (n = 1) reflects a decision made for ethical principles related to the needs of the
client (i.e., dependent variable of self-injury)”. This may also include a comment about baseline data collection ending when a
criterion of stability is reached and defining stability as a specific number of data in the same value or value range (i.e., 3 data with
the same score or 9 data within a 20% variance of the mean). Any criterion should be empirically described such as “stability equals
a +/− value of 2 frequency counts using the raw data” or “stability is defined as a 20% variance from the mean of the baseline.”
Reporting results for a meta-analysis requires the same degree of specificity about the amount of data used in an omnibus effect
and moderators. As a fictitious example: Overall eight studies provided 14 SCED with a total of 120 baseline data and 180 inter-
vention data. AB phase contrasts total 30. The shortest baseline was one datum, the longest 20. The mean baseline was seven.
Intervention data ranged from three to 15 the mean length of intervention data phases was five. Two moderators were of interest. The
importance of setting in treatment effect and the relative effectiveness of trained and untrained implementers. Three studies occurred
in a classroom, four studies in the home, 1 study was in a clinic and not included in the analysis. Studies in a classroom provided 20
baseline data and 20 intervention data from 15 AB phase contrasts. Four home studies produced 90 baseline data and 150 inter-
vention data from 15 AB phase contrasts.

9.1. Rationale

It is important to know how much data contributed to conclusions about functional relations, aggregated effects, meta-analytic
interpretations, and moderator analysis. Meta-analysis of studies will involve inconsistent number of studies, number of participants,
number of AB phase contrasts, and amount of data for each analysis. Reporting data for each level of analysis is consistent with
recommendations that statistical analysis use all the data (Wolery et al., 2010). Differentiating when all or parts of the data are
removed in the analysis itself (e.g. in an overlap treatment that reduces data, or from a design where only specific phases are used
(e.g. A1vB1) from a reversal design is important.

10. Recommendation IV. Address variability and trend in baseline

Undesirable baseline trend weakens or negates treatment effects. Quantitatively describing data in baseline by the range of values,
a mean, median, or mode, trend and stability is more than adequately covered in the literature (see Anon, 1986; Kazdin, 1982;
Wolery et al., 2010; Gast & Ledford, 2014; Horner et al., 2005; WWC, 2017a,b). For example, an individual study could describe
baseline data that range from zero to nine with a mean of 4.71. The description could also quantify variability with an average
distance from the mean or standard deviation and provide a standardized percentage of variability. For example, if the SD was 3.25,
the variability of the baseline would be 31% [(mean − st. dev)./mean]. A general rule for “stability” is the data is variable by < 20%
but there are many examples in the literature demonstrating wide variability in baseline due to the applied setting of the study.
Quantitatively describing trend is calculable in many ways and should include reference to the number of data in the analysis
because when data streams demonstrate trend and have 5 or fewer data points that trend may be unlikely to continue. In a previous
study 38% (13 of 34) of baselines demonstrating statistically significant trend in the first five data did not maintain a significant trend
in the next five (Parker et al., 2011). Clarify the directionality of the trend as desirable or undesirable rather than increasing or
decreasing without a qualifier. Undesirable trend in a baseline condition reflects data in baseline going in the same direction as the
intended treatment effect.
Adjusting for trend in the analysis can be accomplished by hand (ECL, White & Haring, 1989; White & Haring 1980; GROT, Parker
& Vannest 2012; Tau-U or Tau Parker et al., 2011) and more advanced calculation OLS, Regression, (Allison & Gorman, 1993); SMD,
(Shadish, Hedges, & Pustejovsky, 2014; Shadish, Hedges, Pustejovsky, Boyajian et al., 2014); ARIMA, (Harrop & Velicer, 1985). Other
recommended methods include Slope and Level Change (Solanas, Manolov & Onghena, 2010). Procedures addressing trend such as
randomization, or maximum likelihood estimation are reviewed in Manolov (2017) with a comprehensive discussion and access to a

15
K.J. Vannest et al. Research in Developmental Disabilities 79 (2018) 10–18

web-based program for visual and statistical agreement related to trend).

10.1. Rationale

Comparing trend statistically and visually is notably important (Parsonson & Baer, 1992) but remains a problem in results
reporting. Plotting trend in baseline affects the accuracy of visual analysis; errors in visual analysis occur when trend is present but
not presented graphically or with trend lines to aid in the visual analysis (Ninci et al., 2015). Manolov (2017) and Tate et al. (2016)
suggest transparency and explicitness in reporting, recommending that “researchers clearly state exactly how a trend line is fitted and
how the trend is controlled for, if such a control is performed” (Manolov, 2017 p 17).

11. Conclusions

Conducting SCED studies and SCED meta-analysis involves several design and analytic choices. Improvements in our reporting
practices are expanding, largely due to published standards and guidelines for quality (CEC, 2014; Horner et al., 2005; WWC,
2017a,b; Tate et al., 2016) but consistent shortcomings include the level of specificity with which we describe the data going into the
analysis. Several meta-Metas and a current review of two leading journals find these deficits continue. Problems are specific to lack of
information about participants, designs, data streams, trend and variability, all of which contribute to an obfuscation of effects. As
more complex and sophisticated analysis take center stage a greater level of specificity is needed to ensure an accurate portrayal of
data and related interpretation when describing results.
Overall, the consistency, transparency, and quality of reporting for individual studies, systematic reviews, and meta-analysis
varies widely (Delaney et al., 2005; Jamshidi et al., 2017; Maggin, Briesch, & Chafouleas, 2013; Tate et al., 2008, 2013). This problem
is not specific to education and related fields but prevalent also in fields of epidemiology and medicine (e.g., Delaney et al., 2005;
Moher, Liberati, Tetzlaff, & Altman; 2010; Moher, Tetzlaff, Tricco, Sampson, & Altman, 2007). Future studies may begin to focus
more on how to improve the use of the reporting guidelines by authors and editors or determine if there is a need for more wide-
spread training and accessibility. This is critical given continued results reporting problems confound the accuracy of meta-analytic
techniques and the determination of evidence-based practices (Jamshidi et al., 2017).

References

Alhija, F. N. A., & Levy, A. (2009). Effect size reporting practices in published articles. Educational and Psychological Measurement, 69, 245–265. http://dx.doi.org/10.
1177/0013164408315266.
Allison, D. B., & Gorman, B. S. (1993). Calculating effect sizes for meta-analysis: The case of the single case∗. Behaviour Research and Therapy, 31, 621–631. http://dx.
doi.org/10.1016/0005-7967(93)90115-B.
American Psychological Association (2010). Publication manual of the American psychological association (6th ed.). Washington, DC: American Psychological
Association.
Baek, E. K., Moeyaert, M., Petit-Bois, M., Beretvas, S. N., Van den Noortgate, W., & Ferron, J. M. (2014). The use of multilevel analysis for integrating single-case
experimental design results within a study and across studies. Neuropsychological Rehabilitation, 24, 590–606. http://dx.doi.org/10.1080/09602011.2013.835740.
Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91–97. http://dx.doi.
org/10.1901/jaba.1968.1-91.
Baer, D. M. (1977). Perhaps it would be better not to know everything. Journal of Applied Behavior Analysis, 10, 167–172. http://dx.doi.org/10.1901/jaba.1977.10-
167.
Beretvas, S. N., & Chung, H. (2008). A review of meta-analyses of single-subject experimental designs: Methodological issues and practice. Evidence-Based
Communication Assessment and Intervention, 2, 129–141. http://dx.doi.org/10.1080/17489530802446302.
Bobrovitz, C. D., & Ottenbacher, K. J. (1998). Comparison of visual inspection and statistical analysis of single-subject data in rehabilitation research. American Journal
Phys Medical Rehabilitation, 77(2), 94–102.
Brock, M. E., & Huber, H. B. (2017). Are peer support arrangements an evidence-based practice? A systematic review. The Journal of Special Education, 51, 150–163.
http://dx.doi.org/10.1177/0022466917708184.
Brock, M. E., Cannella-Malone, H. I., Seaman, R. L., Andzik, N. R., Schaefer, J. M., Page, E. J., et al. (2017). Findings across practitioner training studies in special
education: A comprehensive review and meta-analysis. Exceptional Children, 84, 7–26. http://dx.doi.org/10.1177/0014402917698008.
Brossart, D. F., Parker, R. I., Olson, E. A., & Mahadevan, L. (2006). The relationship between visual analysis and five statistical analyses in a simple AB single-case
research design. Behavior Modification, 30, 531–563. http://dx.doi.org/10.1177/0145445503261167.
Brossart, D. F., Vannest, K. J., Davis, J. L., & Patience, M. A. (2014). Incorporating nonoverlap indices with visual analysis for quantifying intervention effectiveness in
single-case experimental designs. Neuropsychological Rehabilitation, 24, 464–491. http://dx.doi.org/10.1080/09602011.2013.868361.
Busk, P. L., & Marascuilo, L. A. (2015). Statistical analysis in single-case research: Issues, procedures, and recommendations, with applications to multiple behaviors. In
T. R. Kratochwill, & J. R. Levin (Eds.). Single-case research design and analysis: New directions for psychology and education (pp. 159–186). New York NY: Routledge.
Campbell, J. M. (2004). Statistical comparison of four effect sizes for single-subject designs. Behavior Modification, 28, 234–246. http://dx.doi.org/10.1177/
0145445503259264.
Common, E. A., Lane, K. L., Pustejovsky, J. E., Johnson, A. H., & Johl, L. E. (2017). Functional assessment-based interventions for students with or at-risk of high-
incidence disabilities: Field testing single-case synthesis methods. Remedial and Special Education, 38(6), 331–352.
Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Council for Exceptional Children (2014). Council for exceptional children standards for evidence-based practices in special education. Arlington, VA: Council for Exceptional
Children.
Crawford, J. R., Garthwaite, P. H., & Porter, S. (2010). Point and interval estimates of effect sizes for the case-controls design in neuropsychology: Rationale, methods,
implementations, and proposed reporting standards. Cognitive Neuropsychology, 27, 245–260. http://dx.doi.org/10.1080/02643294.2010.513967.
Cumming, T. M., & Draper Rodríguez, C. (2017). A meta-analysis of mobile technology supporting individuals with disabilities. The Journal of Special Education, 51,
164–176. http://dx.doi.org/10.1177/0022466917713983.
Delaney, A., Bagshaw, S. M., Ferland, A., Manns, B., Laupland, K. B., & Doig, C. J. (2005). A systematic evaluation of the quality of meta-analyses in the critical care
literature. Critical Care, 9, R575–R582. http://dx.doi.org/10.1186/cc3803.
Denis, J., Van den Noortgate, W., & Maes, B. (2011). Self-injurious behavior in people with profound intellectual disabilities: A meta-analysis of single-case studies.
Research in Developmental Disabilities, 32, 911–923. http://dx.doi.org/10.1016/j.ridd.2011.01.014.
Faggion, C. M., & Giannakopoulos, N. N. (2013). Critical appraisal of systematic reviews on the effect of a history of periodontitis on dental implant loss. Journal of
Clinical Periodontology, 40, 542–552. http://dx.doi.org/10.1111/jcpe.12096.

16
K.J. Vannest et al. Research in Developmental Disabilities 79 (2018) 10–18

Fidler, F. (2005). From statistical significance to effect estimation: Statistical reform in psychology, medicine, and ecology. University of Melbourne [Unpublished doctoral
dissertation].
Franklin, R. D., Allison, D. B., & Gorman, B. S. (1996). Design and analysis of single-case research.
Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141,
2–18. http://dx.doi.org/10.1037/a0024338.
Gage, N. A., Lewis, T. J., & Stichter, J. P. (2012). Functional behavioral assessment-based interventions for students with or at risk for emotional and/or behavioral
disorders in school: A hierarchical linear modeling meta-analysis. Behavioral Disorders, 37, 55–77. http://dx.doi.org/10.1177/019874291203700201.
Gage, N. A., Cook, B. G., & Reichow, B. (2017). Publication bias in special education meta-analyses. Exceptional Children, 83, 428–445. http://dx.doi.org/10.1177/
0014402917691016.
Gast, D. L. (2005). Visual analysis of graphic data. In G. Sugai, & R. Horner (Vol. Eds.), Encyclopedia of behavior modification and cognitive behavior therapy: Educational
applications: Vol. 3, (pp. 1595–1599). Thousand Oaks, CA: Sage.
Gast, D. L., & Ledford, J. R. (2014). Single case research methodology: Applications in special education and behavioral sciences (2nd E.d.)). New York, NY: Routledge.
Gough, D. A., Oliver, S., & Thomas, J. (2013). Learning from research: Systematic reviews for informing policy decisions: A quick guide. London: Nesta.
Hall, A. M., Lee, S., & Zurakowski, D. (2017). Quality assessment of meta-analyses published in leading anesthesiology journals from 2005 to 2014. Anesthesia &
Analgesia, 124, 2063–2067. http://dx.doi.org/10.1213/ANE.0000000000002074.
Harrington, & Velicer, W. F. (2015). Comparing visual and statistical anslysis in single-case studies using published studies. Multivariate Behavioral Research, 50(2),
162–183.
Harrison, J., Thompson, B., & Vannest, K. J. (2009). Interpreting the evidence for effective interventions to increase the academic performance of students with ADHD:
Relevance of the statistical significance controversy. Review of Educational Research, 79, 740–775. http://dx.doi.org/10.3102/0034654309331516.
Harrop, J. W., & Velicer, W. F. (1985). A comparison of alternative approaches to the analysis of interrupted time-series. Multivariate Behavioral Research, 20(1), 27–44.
http://dx.doi.org/10.1207/s15327906mbr2001_2.
Heyvaert, M., & Onghena, P. (2014). Analysis of single-case data: Randomisation tests for measures of effect size. Neuropsychological Rehabilitation, 24, 507–527.
http://dx.doi.org/10.1080/09602011.2013.818564.
Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidence-based practice in special
education. Exceptional Children, 71, 165–179. http://dx.doi.org/10.1177/001440290507100203.
Hott, B. L., Limberg, D., Ohrt, J. H., & Schmit, M. K. (2015). Reporting results of single-case studies. Journal of Counseling & Development, 93, 412–417. http://dx.doi.
org/10.1002/jcad.12039.
Jamshidi, L., Heyvaert, M., Declercq, L., Fernández-Castilla, B., Ferron, J. M., Moeyaert, M., et al. (2017). Methodological quality of meta-analyses of single-case
experimental studies. Research in Developmental Disabilities. http://dx.doi.org/10.1016/j.ridd.2017.12.016.
Kalef, L., Reid, G., & MacDonald, C. (2013). Evidence-based practice: A quality indicator analysis of peer-tutoring in adapted physical education. Research in
Developmental Disabilities, 34, 2514–2522. http://dx.doi.org/10.1016/j.ridd.2013.05.004.
Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York, NY: Oxford University Press.
Kazdin, A. E. (1986). Comparative outcome studies of psychotherapy: Methodological issues and strategies. Journal of Consulting and Clinical Psychology, 54, 95–105.
http://dx.doi.org/10.1037/0022-006X.54.1.95.
Kennedy, C. H. (2005). Single case designs for educational research. Boston, MA: Allyn & Bacon.
King, S. A., Lemons, C. J., & Davidson, K. A. (2016). Math interventions for students with autism spectrum disorder: A best-evidence synthesis. Exceptional Children, 82,
443–462. http://dx.doi.org/10.1177/0014402915625066.
Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods,
15(2), 124–144. http://dx.doi.org/10.1037/a0017736.
Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., et al. (2010). What Works Clearinghouse Single- Case Designs Technical
Documentation.
Lane, J. D., & Gast, D. L. (2014). Visual analysis in single case experimental design studies: Brief review and guidelines. Neuropsychological Rehabilitation, 24, 445–463.
Ledford, J. R., & Gast, D. L. (Eds.). (2018). Single care research methodology: Applications in special education. New York: Routledge.
Ledford, J. R., Lane, J. D., & Severini, K. E. (2018). Systematic use of visual analysis for assessing outcomes in single case design studies. Brain Impairment, 19, 4–17.
http://dx.doi.org/10.1017/BrImp.2017.16.
Lobo, M. A., Moeyaert, M., Cunha, A. B., & Babik, I. (2017). Single-case design, analysis, and quality assessment for intervention research. Journal of Neurologic Physical
Therapy, 41, 187–197. http://dx.doi.org/10.1097/NPT.0000000000000187.
Losinski, M., Wiseman, N., White, S. A., & Balluch, F. (2016). A meta-analysis of video-modeling based interventions for reduction of challenging behaviors for students
with EBD. The Journal of Special Education, 49, 243–252. http://dx.doi.org/10.1177/0022466915602493.
Maggin, D. M., Johnson, A. H., Chafouleas, S. M., Ruberto, L. M., & Berggren, M. (2012). A systematic evidence review of school-based group contingency inter-
ventions for students with challenging behavior. Journal of School Psychology, 50, 625–654. http://dx.doi.org/10.1016/j.jsp.2012.06.001.
Maggin, D. M., Briesch, A. M., & Chafouleas, S. M. (2013). An application of the What Works Clearinghouse standards for evaluating single-subject research: Synthesis
of the self-management literature base. Remedial and Special Education, 34, 44–58. http://dx.doi.org/10.1177/0741932511435176.
Maggin, D. M., Briesch, A. M., Chafouleas, S. M., Ferguson, T. D., & Clark, C. (2014). A comparison of rubrics for identifying empirically supported practices with
single-case research. Journal of Behavioral Education, 23, 287–311. http://dx.doi.org/10.1007/s10864-013-9187-z.
Manolov, R., & Moeyaert, M. (2016). How can single-case data be analyzed? Software resources, tutorial, and reflections on analysis. Behavior Modification, 41(2),
179–228.
Manolov, R. (2017). Reporting single-case design studies: Advice in relation to the designs’ methodological and analytical peculiarities. Anuario De Psicología, 47,
45–55. http://dx.doi.org/10.1016/j.anpsic.2017.05.004.
Matyas, T. A., & Greenwood, K. M. (1990). Visual analysis of single-case time series: Effects of variability, serial dependence and magnitude of intervention effects.
Journal of Applied Behavior Analysis, 23, 341–351.
Meline, T., & Wang, B. (2004). Effect-size reporting practices in AJSLP and other ASHA journals, 1999–2003. American Journal of Speech-Language Pathology, 13,
202–207. http://dx.doi.org/10.1044/1058-0360.
Michael, J. L. (1974). Statistical inference for individual organism research: Mixed blessing or curse? Journal of Applied Behavior Analysis, 7, 647–653. http://dx.doi.
org/10.1901/jaba.1974.7-647.
Moeller, J. D., Dattilo, J., & Rusch, F. (2014). Applying quality indicators to single-case research designs used in special education: A systematic review. Psychology in
the Schools, 52, 139–153. http://dx.doi.org/10.1002/pits.21801.
Moeyaert, M., Ferron, J. M., Beretvas, S. N., & Van den Noortgate, W. (2014). From a single-level analysis to a multilevel analysis of single-case experimental designs.
Journal of School Psychology, 52, 191–211. http://dx.doi.org/10.1016/j.jsp.2013.11.003.
Moeyaert, M., Ugille, M., Ferron, J. M., Beretvas, S. N., & den Noortgate, W. V. (2014). Three-level analysis of single-case experimental data: Empirical validation. The
Journal of Experimental Education, 82, 1–21. http://dx.doi.org/10.1080/00220973.2012.745470.
Moher, D., Tetzlaff, J., Tricco, A. C., Sampson, M., & Altman, D. G. (2007). Epidemiology and reporting characteristics of systematic reviews. PLoS Medicine, 4, e78.
http://dx.doi.org/10.1371/journal.pmed.0040078.
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & PRISMA Group (2010). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA
statement. International Journal of Surgery, 8, 336–341. http://dx.doi.org/10.1016/j.ijsu.2010.02.007.
Neely, L., Garcia, E., Bankston, B., & Green, A. (2018). Generalization and maintenance of functional communication training for individuals with developmental
disabilities: A systematic and quality review. Research in Developmental Disabilities. http://dx.doi.org/10.1016/j.ridd.2018.02.002.
Ninci, J., Vannest, K. J., Wilson, V., & Zhang, N. (2015). Interrater agreement between visual analysts of single-case data: A meta-analysis. Behavior Modification, 39,
510–541. http://dx.doi.org/10.1177/0145445515581327.
Nourbakhsh, M. R., & Ottenbacher, K. J. (1994). The statistical analysis of single-subject data: A comparative examination. Physical Therapy, 74, 768–776.
Odgaard, E. C., & Fowler, R. L. (2010). Confidence intervals for effect sizes: Compliance and clinical significance in the Journal of Consulting and Clinical Psychology.
Journal of Consulting and Clinical Psychology, 78, 287–297. http://dx.doi.org/10.1037/a0019294.
Ottenbacher, K. J. (1993). Interrater agreement of visual analysis in single-subject decisions: Quantitative review and analysis. American Journal of Mental Retardation,

17
K.J. Vannest et al. Research in Developmental Disabilities 79 (2018) 10–18

98, 135–142.
Parker, R. I., & Vannest, K. (2009). An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy, 40, 357–367. http://dx.doi.org/10.
1016/j.beth.2008.10.006.
Parker, R. I., & Vannest, K. J. (2012). Bottom up analysis of single-case research designs. Journal of Behavioral Education, 17, 254–265. http://dx.doi.org/10.1007/
s10864-012-9153-1.
Parker, R. I., Cryer, J., & Byrns, G. (2006). Controlling baseline trend in single-case research. School Psychology Quarterly, 21(4), 418–444.
Parker, R. I., Vannest, K. J., & Brown, L. (2009). The improvement rate difference for single-case research. Exceptional Children, 75, 135–150. http://dx.doi.org/10.
1177/0145445511399147.
Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap and trend for single-case research: Tau-U. Behavior Therapy, 42, 284–299.
http://dx.doi.org/10.1016/j.beth.2010.08.006.
Parsonson, B. S., & Baer, D. M. (1978). The analysis and presentation of graphic data. Single subject research101–165.
Parsonson, B., & Baer, D. (1992). Visual analysis of data, and current research into the stimuli controlling it. In T. Kratochwill, & J. Levin (Eds.). Single-case research
design and analysis: New directions for psychology and education (pp. 15–40). Hillsdale, NJ: Erlbaum.
Parsonson, B. S., Baer, D. M., Kratochwill, T. R., & Levin, J. R. (1992). The visual analysis of data, and current research into the stimuli controlling it. Single-case research
design and analysis: New directions for psychology and education15–40.
Peng, C. Y. J., Chen, L. T., Chiang, H. M., & Chiang, Y. C. (2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review, 25,
157–209. http://dx.doi.org/10.1007/s10648-013-9218-2.
Perone, M. (1999). Statistical inference in behavior analysis: Experimental control is better. The Behavior Analyst, 22, 109–116.
Rindskopf, D. (2013). Fully Bayesian estimation of data from single case designs. Society for Research on Educational Effectiveness.
Rotta, I., Salgado, T. M., Silva, M. L., Correr, C. J., & Fernandez-Llimos, F. (2015). Effectiveness of clinical pharmacy services: An overview of systematic reviews
(2000–2010). International Journal of Clinical Pharmacy, 37, 687–697. http://dx.doi.org/10.1007/s11096-015-0137-9.
Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods, 43, 971–980.
http://dx.doi.org/10.3758/s13428-011-0111-y.
Shadish, W. R., Hedges, L. V., & Pustejovsky, J. E. (2014). Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: A primer and
applications. Journal of School Psychology, 52, 123–147.
Shadish, W. R., Hedges, L. V., Pustejovsky, J. E., Boyajian, J. G., Sullivan, K. J., Andrade, A., et al. (2014). A d-statistic for single-case designs that is equivalent to the
usual between-groups d-statistic. Neuropsychological Rehabilitation, 24, 528–553. http://dx.doi.org/10.1080/09602011.2013.819021.
Smith, J. D. (2012). Single-case experimental designs: A systematic review of published research and current standards. Psychological Methods, 17, 510–550. http://dx.
doi.org/10.1037/a0029312.
Snodgrass, M. R., Chung, M. Y., Meadan, H., & Halle, J. W. (2018). Social validity in single-case research: A systematic literature review of prevalence and application.
Research in Developmental Disabilities, 74, 160–173. http://dx.doi.org/10.1016/j.ridd.2018.01.007.
Solanas, A., Manolov, R., & Onghena, P. (2010). Estimating slope and level change in N = 1 designs. Behavior Modification, 34, 195–218. http://dx.doi.org/10.1177/
0145445510363306.
Swaminathan, H., Rogers, H. J., Horner, R. H., Sugai, G., & Smolkowski, K. (2014). Regression models and effect size measures for single case designs.
Neuropsychological Rehabilitation, 24, 554–571. http://dx.doi.org/10.1080/09602011.2014.887586.
Talbott, E., Maggin, D. M., Van Acker, E. Y., & Kumm, S. (2017). Quality indicators for reviews of research in special education. Exceptionality, 1–21. http://dx.doi.org/
10.1080/09362835.2017.1283625.
Tarlow, K. R. (2017). An improved rank correlation effect size statistic for single-case designs: Baseline corrected Tau. Behavior Modification, 41, 427–467. http://dx.
doi.org/10.1177/0145445516676750.
Tate, R., Mc Donald, S., Perdices, M., Togher, M., Schultz, R., & Savage, S. (2008). Rating the methodological quality of single subject designs and n-of-1 trials:
Introducing the Single-Case Experimental Design (SCED) Scale. Neuropsychological Rehabilitation, 18, 385–401. http://dx.doi.org/10.1080/09602010802009201.
Tate, R. L., Perdices, M., Rosenkoetter, U., Wakim, D., Godbee, K., Togher, L., et al. (2013). Revision of a method quality rating scale for single-case experimental
designs and n-of-1 trials: The 15-item Risk of Bias in N-of-1 Trials (RoBiNT) Scale. Neuropsychological Rehabilitation, 23, 619–638. http://dx.doi.org/10.1080/
09602011.2013.824383.
Tate, R. L., Perdices, M., Rosenkoetter, U., McDonald, S., Togher, L., Shadish, W., et al. (2016). The single-case reporting guideline In behavioural interventions
(SCRIBE) 2016: Explanation and elaboration. Archives of Scientific Psychology, 4, 10–31.
Thompson, C. K. (2006). Single subject controlled experiments in aphasia: The science and the state of the science. Journal of Communication Disorders, 39, 266–291.
http://dx.doi.org/10.1016/j.jcomdis.2006.02.003.
Tincani, M., & Travers, J. (2017). Publishing single-case research design studies that do not demonstrate experimental control. Remedial and Special Education, 39,
118–128. http://dx.doi.org/10.1177/0741932517697447.
U.S. Department of Education, Institute of Education Sciences, What Works Clearinghouse. (2017, October). What works clearinghouse: Procedures and standards
handbook (Version 4.0). Retrieved from https://ies.ed.gov/ncee/wwc/.
Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51, 473–481. http://dx.doi.org/10.1037/
0022-0167.51.4.473.
Vannest, K. J., & Ninci, J. (2016). Evaluating intervention effects in single case research. Journal of Counseling and Development, 93, 403–411. http://dx.doi.org/10.
1002/jcad.12038.
Watson, J. B. (1919). Psychology: From the standpoint of a behaviorist. Philadelphia, PA: Philadelphia and London J. B. Lippincott Company.
Wells, C., Kolt, G. S., Marshall, P., Hill, B., & Bialocerkowski, A. (2013). Effectiveness of Pilates exercise in treating people with chronic low back pain: A systematic
review of systematic reviews. BMC Medical Research Methodology, 13, 7. http://dx.doi.org/10.1186/1471-2288-13-7.
What Works Clearinghouse. (2017a). Procedures handbook (Version 4.0). Retrieved from https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_procedures_
handbook_v4.pdf.
What Works Clearinghouse. (2017b). Standards handbook (Version 4.0). Retrieved from https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_standards_
handbook_v4.pdf.
White, O., & Haring, N. (1980). Exceptional teaching (2nd ed.). Columbus, OH: Charles E. Merrill.
Wilkinson, L. (1999). Task force on statistical inference, american psychological association, science directorate. (1999). statistical methods in psychology journals:
Guidelines and explanations. American Psychologist, 54, 594–604. http://dx.doi.org/10.1037/0003-066X.54.8.594.
Wolery, M., Busick, M., Reichow, B., & Barton, E. (2008). Quantitative synthesis of single subject research. Conference on research innovations in early intervention.
Wolery, M., Busick, M., Reichow, B., & Barton, E. E. (2010). Comparison of overlap methods for quantitatively synthesizing single-subject data. The Journal of Special
Education, 44, 18–28.
Wolery, M., Dunlap, G., & Ledford, J. R. (2011). Single-case experimental methods: Suggestions for reporting. Journal of Early Intervention, 33, 103–109. http://dx.doi.
org/10.1177/1053815111418235.
Wood, T.W. (2017). Does the What Works Clearinghouse really work? Investigations into issues of policy, practice, and transparency (A National Institute for Direct
Instruction White Paper). Office of Research and Evaluation. Retrieved from https://www.nifdi.org/docman/research/white-papers/1431-does-the-what-works-
clearinghouse-really-work-investigations-into-issues-of-policy-practice-and-transparency/fileGoogleScholar.
Zientek, L. R., Capraro, M. M., & Capraro, R. M. (2008). Reporting practices in quantitative teacher education research: One look at the evidence cited in the AERA
panel report. Educational Researcher, 37, 208–216. http://dx.doi.org/10.3102/0013189X08319762.

18

You might also like