Halperin (2022) Accuracy in Predicting Repetitions To Task Failure in Resistance Exercise - A Scoping Review and Exploratory Meta Analysis
Halperin (2022) Accuracy in Predicting Repetitions To Task Failure in Resistance Exercise - A Scoping Review and Exploratory Meta Analysis
https://doi.org/10.1007/s40279-021-01559-x
SYSTEMATIC REVIEW
Abstract
Background Prescribing repetitions relative to task failure is an emerging approach to resistance training. Under this
approach, participants terminate the set based on their prediction of the remaining repetitions left to task failure. While this
approach holds promise, an important step in its development is to determine how accurate participants are in their predic-
tions. That is, what is the difference between the predicted and actual number of repetitions remaining to task failure, which
ideally should be as small as possible.
Objective The aim of this study was to examine the accuracy in predicting repetitions to task failure in resistance exercises.
Design Scoping review and exploratory meta-analysis.
Search and Inclusion A systematic literature search was conducted in January 2021 using the PubMed, SPORTDiscus, and
Google Scholar databases. Inclusion criteria included studies with healthy participants who predicted the number of repeti-
tions they can complete to task failure in various resistance exercises, before or during an ongoing set, which was performed
to task failure. Sixteen publications were eligible for inclusion, of which 13 publications covering 12 studies, with a total of
414 participants, were included in our meta-analysis.
Results The main multilevel meta-analysis model including all effects sizes (262 across 12 clusters) revealed that participants
tended to underpredict the number of repetitions to task failure by 0.95 repetitions (95% confidence interval [CI] 0.17–1.73),
but with considerable heterogeneity (Q(261) = 3060, p < 0.0001, I2 = 97.9%). Meta-regressions showed that prediction accuracy
slightly improved when the predictions were made closer to set failure (β = − 0.025, 95% CI − 0.05 to 0.0014) and when the
number of repetitions performed to task failure was lower (≤ 12 repetitions: β = 0.06, 95% CI 0.04–0.09; > 12 repetitions:
β = 0.47, 95% CI 0.44–0.49). Set number trivially influenced prediction accuracy with slightly increased accuracy in later
sets (β = − 0.07 repetitions, 95% CI − 0.14 to − 0.005). In contrast, participants’ training status did not seem to influence
prediction accuracy (β = − 0.006 repetitions, 95% CI − 0.02 to 0.007) and neither did the implementation of upper or lower
body exercises (upper body – lower body = − 0.58 repetitions; 95% CI − 2.32 to 1.16). Furthermore, there was minimal
between-participant variation in predictive accuracy (standard deviation 1.45 repetitions, 95% CI 0.99–2.12).
Conclusions Participants were imperfect in their ability to predict proximity to task failure independent of their training
background. It remains to be determined whether the observed degree of inaccuracy should be considered acceptable. Despite
this, prediction accuracies can be improved if they are provided closer to task failure, when using heavier loads, or in later
sets. To reduce the heterogeneity between studies, future studies should include a clear and detailed account of how task
failure was explained to participants and how it was confirmed.
1 Introduction
* Israel Halperin
[email protected]
Prescribing the number of repetitions to complete per set of
1
School of Public Health, Sackler Faculty of Medicine, Tel- exercise is a key variable in resistance-training programs.
Aviv University, Tel‑Aviv, Israel Traditionally, the number of repetitions are prescribed in a
2
Sylvan Adams Sports Institute, Tel Aviv University, Tel‑Aviv, predetermined and fixed manner (e.g., three sets of 10 repeti-
Israel tions), using a certain percentage of one repetition maximum
3
Faculty of Sport, Health, and Social Sciences, Solent (1RM) [1–4]. While the traditional prescription approach
University, Southampton, UK
Vol.:(0123456789)
378 I. Halperin et al.
was to investigate the prediction accuracy estimates across instructions, and prediction error (actual minus predicted
studies. Additionally, we examined if the following vari- repetitions to task failure). The main datum we were looking
ables influence prediction accuracy: training status, timing to extract was prediction error and this was extracted for all
of prediction, repetition range (indicative of relative load), groups and conditions within each study; thus, there were
set number, and upper or lower body exercise. multiple predictions extracted for each included study in this
analysis. In cases where they were not reported, one author
(JS) emailed the authors of the manuscripts requesting the
2 Methods raw or mean values. If the authors did not reply within
1 month, we resorted to calculating the prediction errors
2.1 Search Strategy based on the figures (data were digitized using WebPlotDigi-
tizer; v4.3, Ankit Rohatgi; https://apps.automeris.io/wpd/)
This systematic search and review was conducted according and tables. Data were extracted to a csv file for meta-analysis
to the Preferred Reporting Items for Systematic Reviews and by JS, and to a Word table that was edited by IH.
Meta-Analyses (PRISMA) guidelines. We based our search
criteria on our familiarity with common RIR terminology 2.2 Statistical Analysis
alongside the use of search filters containing Medical Sub-
ject Headings (MeSH). Two reviewers (IHN and TM) per- The exploratory meta-analysis was performed using the
formed electronic searches using the Google Scholar, Pub- ‘metafor’ package in R (v 4.0.2; R Core Team, https://
Med/MEDLINE and SPORTDiscus databases, harvesting www.r-project.org/) [38]. All analysis codes utilized are
any data record until 5 January 2021. The search included presented in the electronic supplementary material (ESM;
the following terms: (‘resistance-train*’ OR ‘resistance- https://osf.io/grynu/).
exercise*’ OR ‘strength-train*’ OR ‘strength-exercise*’ OR Studies were grouped according to their design and
‘weight-train*’) AND (‘estim*’ OR ‘evaluate*’ OR ‘pre- reporting (i.e., whether they reported the paired actual and
dict*’ OR ‘assess*’ AND ‘proximity-to-failure’ OR ‘rep*- predicted repetitions or the paired difference) for appropriate
in-reserve’ OR ‘rir’ OR ‘rep*-to-failure’ OR ‘failure’ OR calculation of raw mean change score sizes using the ‘escalc’
‘musc*-exhaus*’ OR ‘musc*-fatigue’). We note that Google function in ‘metafor’ (see analysis code). We opted to ana-
Scholar has a 256-character search limit, which forced us to lyze using the raw, as opposed to standardized, mean change
reduce the number of included terms. scores, given that all effects were of the same construct and
Studies were included if they met the following crite- measurement, i.e., number of repetitions. We examined the
ria: (1) the study was published in English and was either difference between the actual repetition number performed
published in a peer-reviewed journal or as an MSc or PhD to task failure and the predicted repetition number (actual
thesis; (2) participants had no known medical conditions minus predicted number of repetitions). Scores were calcu-
or injuries; (3) the implemented modality was resistance- lated such that positive values indicated that underpredic-
exercise; (4) participants had to predict proximity to task tion had occurred. That is, the number of repetitions pre-
failure before or during a set; and (5) participants had to dicted was smaller than the number of repetitions actually
reach task failure in all sets that they provided a prediction performed to momentary failure. For example, if a trainee
for. Two reviewers (IHN and TM) assessed relevant records, predicted to have two more repetitions prior to task failure,
which were downloaded into Endnote (version 20; Clari- but then completed six, the person underpredicted by four
vate Analytics, Philadelphia, PA, USA). All duplicates were repetitions. Correlations between predicted and actual rep-
removed before screening. To enable simultaneous screening etitions were reported for most studies, and, when absent, we
of titles and abstracts by the reviewers, all potential records were usually able to obtain access to the raw data to enable
were uploaded to abstrackr [37]. When an abstract indicated their calculation. For those studies where we were unable to
inclusion and agreement was reached by both reviewers, the obtain these, we imputed the mean correlation from across
full-text article was assessed for eligibility. Any disagree- those studies where these data were available as a reason-
ment regarding the eligibility that arose between the review- able estimate.
ers was settled by IH or if consensus was reached following Because of the nested structure of the effect sizes cal-
further discussion. This search strategy was also duplicated culated from the studies included (i.e., studies often had
internally to check for consistency by two additional review- multiple groups and reported effects within these groups for
ers (PAK and MW), who did so independently. multiple conditions), multilevel mixed-effects meta-analyses
The following data were extracted from eligible studies: with both study and intrastudy groups (i.e. where there were
title; participant characteristics regarding sample size, sex, multiple groups within a given study) were included, as ran-
age, exercises, sets, loads, and timing of task failure predic- dom effects in the model were performed. Cluster robust
tion (before or during the set); and set endpoint definition, point estimates and precision of those estimates using 95%
380 I. Halperin et al.
compatibility (confidence) intervals (CIs) were produced accuracy. For this, we fit the same models described above
[39], weighted by inverse sampling variance to account to estimates of the log transformed standard deviation (ln 𝜎 ̂ )
for the within- and between-study variance (tau-squared). for prediction accuracy and its variance [42]. These models
Restricted maximal likelihood estimation was used in all were then exponentiated back to their raw scales (i.e., stand-
models. A main model was produced including all effects; ard deviations) for visualization purposes.
that is to say, for each condition performed by each group For all models, we opted to avoid dichotomizing the
within each study. Thus, the models included all predictions existence of an effect for the main results, and therefore did
made by all groups in each of the included studies. Consider- not employ traditional null hypothesis significance testing,
ing the heterogeneity of study methods used in terms of the which has been extensively criticized [43, 44]. Instead, we
specific resistance training protocols for which predictions considered the implications of all results compatible with
were made, as well as the experience of participants, this these data, from the lower limit to the upper limit of the
main model was merely interpreted generally as to whether interval estimates, with the greatest interpretive emphasis
it seemed people tended to over- or underestimate repeti- placed on the point estimate.
tions to task failure or whether they were fairly accurate in The risk of small study bias was examined visually
their predictions. Several exploratory meta-regression and through contour-enhanced funnel plots. Q and I2 statistics
subgroup analyses of moderators (i.e., predictors of effects) were also produced and reported [45]. A significant Q sta-
were also conducted to explore study protocols and par- tistic is typically considered indicative of effects likely not
ticipant characteristics. Moderators examined using meta- being drawn from a common population. I2 values indicate
regression included mean resistance training history of the degree of heterogeneity in the effects: 0–40% indicates
participants in months, when the prediction was made as not important, 30–60% indicates moderate heterogeneity,
a percentage of the total number of repetitions performed 50–90% indicates substantial heterogeneity, and 75–100%
to task failure, the mean number of repetitions performed indicates considerable heterogeneity [46]. Furthermore, by
to task failure, the set number for which the prediction was way of a sensitivity analysis, we also replicated all mod-
made, and whether upper or lower body exercise was used. els omitting any predictions made prior to the set initiating
We observed a non-linear effect of the mean number of rep- the results of which were not materially different and are
etitions performed to task failure, and therefore to model this included in the supplementary material (see https://osf.io/
in an interpretable manner, we employed linear splines with 9yahn/ and https://osf.io/tnuwz/).
a knot selected at 12 repetitions using the ‘lspline’ package
[40]. A knot position of 12 repetitions was chosen as this is
historically considered the upper end of the ‘hypertrophy’ 3 Results
repetition range before moving into the ‘endurance’ range
[41]. This in essence meant separate regression models were 3.1 Included Studies
fit between 0 and 12 repetitions and > 12 repetitions. Sub-
group analyses consisted of a comparison between upper After initial searches and screening, 13 publications that
and lower body exercises. Multilevel models with robust covered 12 studies that met the inclusion criteria were identi-
estimates were produced for each subgroup, and fixed effects fied. Specifically, two publications reported the same data on
with moderator’s model were used to compare the models. some of the outcomes [20, 27] but one of these publications
Note, we were not able to obtain data to permit all studies [20] included additional data that were not reported in the
to be included in all meta-regression or subgroup models, other publication [27]. The same data were only used once
and therefore indicate the number included when reporting for the analysis. Additional search approaches identified
this. Furthermore, a small number of effects (n = 5) reported no further studies that met the inclusion criteria. Thus, the
in studies had zero variances. In a supplementary analysis, final number of studies included was 13 [13, 20, 22, 27–36].
we re-ran models with these included after imputing a small Details of the search and inclusion process are shown in
constant variance to them (3 × 10−7 ) . This was to check that the PRISMA flow chart (Fig. 1). Details of the studies are
findings were not unduly influenced by their exclusion. The reported in Table 1. The author’s description of the predic-
results of these models were not materially different (see tion process and the set endpoint definitions are listed in
https://osf.io/9dhzq/) and thus the main findings reported Table 2 in the ESM (https://o sf.i o/2 fwue/). The pooled num-
here are those with zero variance effects excluded to not ber of participants in the included studies was 414 across 25
unduly overwhelm the weighting of other effects in the mod- groups within studies and with sample sizes ranging from 6
els and meta-analytic scatterplots. to 53 participants (median 14) per group within each study.
Lastly, we included additional exploratory models Full details of all included studies can be seen in the data
examining the between-participant variation in predictive extraction table (https://osf.io/6sc72/).
Accuracy in Predicting Repetitions to Task Failure in Resistance Exercise 381
Fig. 1 PRISMA flow chart illustrating the different phases of the search and study selection. PRISMA Preferred Reporting Items for Systematic
Reviews and Meta-Analyses, TF task failure
3.2 Main Model—All Effects caterpillar plot, while Fig. 3 presents the funnel plot of
all effects.
The main model including all effect sizes (262 across 12
clusters [median 11.5, range 3–60 effects per cluster]) sug- 3.3 Exploratory Meta‑regression Analyses
gests that participants, on average, underpredicted, with
a point and interval estimate of 0.95 repetitions (95% CI 3.3.1 Training History
0.17–1.73). There was however considerable heterogeneity
(Q(261) = 3060.90, p < 0.0001, I2 = 97.89%). Figure 2 pre- Meta-regression suggested that prediction accuracy was not
sents all effect sizes and interval estimates in an ordered moderated much by the mean training history (in months)
of participants in the samples (β = − 0.0061 repetitions, 95%
382 I. Halperin et al.
Hackett et al. 2012 [13] 17 M 8.2 ± 3.2 years 10th rep 5 70% 1RM Bench press
(32 ± 5) Back squat
Lemos et al. 2017 [30] 11 F Unclear 2nd rep 1 50% 1RM Chest fly
(22 ± 1) 70% 1RM Leg extension
90% 1RM Pull-down
Leg curl
Biceps curl triceps
Extension military
press
Servais 2015 [32] 12 F 4 ± 2 years Before the set 5 65% 1RM Bench press
(20 ± 1)
12 M
(22 ± 1)
Hackett et al. 2017 [27] 28 F 3.6 ± 4.6 years 10th rep 10 70% 1RM Chest press
(28 ± 9.5) 5.5 ± 6.1 years 80% 1RM Leg press
53 M
(27.3 ± 9)
Steele et al. 2017 [28] 69 F 1.5 months to 3 years Before the set 1 Self-estimated 10RM Seated row
(25 ± 8) Bench press
72 M Leg press
(29 ± 10) Elbow flexion
Pull-down
Sit-up
Hackett et al. 2018 [20] 21 F 4 years 10th rep 3 70% 1RM Chest press
(30.4) 6.7 years 80% 1RM Leg press
27 M
(26.6)
Odgers 2020 [33] 13 F ≥ 6 months 6 RPE 4 80% 1RM Front squat
(30 ± 5.4) 9 RPE Hex deadlift
14 M
(29 ± 5.7)
Sousa 2018 [29] 10 M 6 ± 4 years 4 RIR 4 80% 1RM Back squat
(25 ± 4) 1 RIR Bench press
Deadlift
Back squat
Bench press
Deadlift
Ratto 2019 [31] 20 M 4.7 years Pre-set 1 100 kg Bench press
(20 ± 2) 4th rep
8th rep
12th rep
Zourdos et al. 2021 [22] 25 M 5 ± 3 years 5 RIR 1 70% 1RM Back squat
(25 ± 3) 3 RIR
1 RIR
Emanuel et al. 2020 [35] 10 M ≥ 1 year Before the set 2 70% 1RM Bench press
(29.5 ± 4) 83% 1RM Back squat
Bench press
Back squat
Mansfield et al. 2020 [34] 20 M 6 ± 4 years 8th rep 3 60% 1RM Bench press
(26 ± 4) 3rd rep 80% 1RM Prone row
Bench press
Prone row
Hackett 2021 [36] 20 M 7 ± 4.7 years 10th rep 5 70% 1RM Bench press
(26.3 ± 9) Back squat
Fig. 2 Ordered caterpillar plot presenting all effect sizes and 95% interval estimates from all included studies
CI − 0.0195 to 0.0073; 262 across 12 clusters [median 11.5, to 12 repetitions (first spline), but was strongly moderated
range 3–60 effects per cluster]). There was however con- by repetition ranges that included 12 or more repetitions
siderable heterogeneity (I2 = 97.64%). Figure 4 shows the (second spline). For the first linear spline, accuracy did
meta-analytic scatter plot for this analysis. not change much when performing fewer repetitions, but
the second linear spline revealed that accuracy decreased
3.3.2 When Prediction was Made as predictions in sets composed a higher repetition range
(first linear spline [≤ 12 repetitions], β = 0.06 repeti-
Meta-regression suggested that prediction accuracy was tions, 95% CI − 0.04 to 0.16; second linear spline [> 12
moderated by how close to task failure participants were repetitions], β = 0.47 repetitions, 95% CI 0.35–0.58; 238
when they made their prediction (expressed as a percent- across 11 clusters [median 11, range 3–60 effects per
age of total repetitions performed to task failure). Accu- cluster]). There was however considerable heterogeneity
racy increased slightly as predictions were made with (I 2 = 88.25%). Figure 6 shows the meta-analytic scatter
closer proximity to task failure (β = − 0.025 repetitions, plot for this analysis.
95% CI − 0.051 to − 0.001; 238 across 11 clusters [median
11, range 3–60 effects per cluster]). There was however 3.3.4 Set Number
considerable heterogeneity (I2 = 97.90%). Figure 5 shows
the meta-analytic scatter plot for this analysis. Meta-regression suggested that prediction accuracy was
trivially moderated by which set number the prediction
was made on (β = − 0.072 repetitions, 95% CI − 0.14
3.3.3 Repetition Range to − 0.005]; 262 across 12 clusters [median 11, range 3–60
effects per cluster]). There was however considerable het-
Meta-regression suggested that prediction accuracy was erogeneity (I2 = 97.76%). Figure 7 shows the meta-analytic
trivially moderated by the repetition ranges performed up scatter plot for this analysis.
384 I. Halperin et al.
Fig. 3 Contour-enhanced funnel plot in which the individual study effect sizes, illustrated by single dots, are plotted against the inverse standard
error, serving as an indication of precision. While slightly skewed, the results suggest little bias
3.3.5 Upper versus Lower Body Exercises variation was lowest when predictions were made closer to
task failure, when fewer repetitions were performed per set,
Subgroup models revealed prediction accuracy was as well as in later sets. All outputs for models examining log-
slightly worse for lower body effects (1.51 repetitions, 95% transformed standard deviations are available in the ESM
CI − 0.38 to 3.40; 118 effects across 8 clusters [median (https://osf.io/7kx9e/) in addition to plots.
13, range 2–29 effects per cluster; I 2 = 99.48%]) com-
pared with upper body effects (0.92 repetitions, 95% CI
0.09–1.75; 131 effects across 9 clusters [median 8, range 4 Discussion
4–40 effects per cluster; I2 = 97.29%]), but between-model
comparison suggested the difference was unclear and the In this scoping review and meta-analysis, we explored
estimate imprecise (upper body – lower body − 0.59 repeti- participants’ prediction accuracy when following the RIR
tions; 95% CI − 2.30 to 1.13). approach in resistance training. Overall, across studies, par-
ticipants underpredicted proximity to task failure by approxi-
3.4 Exploratory Analysis of Between‑Participant mately one repetition. Prediction accuracy improved when
Variation predictions were made closer to task failure, when fewer
repetitions per set were completed, as well as in later sets.
The main model for log-transformed standard deviations Conversely, and somewhat surprisingly, training status did
in predictive accuracy revealed a relatively low between- not seem to influence prediction accuracy, nor was there
̂ = 0.37, 95% CI − 0.01 to 0.75)
participant variation (ln 𝜎 much difference between upper or lower body exercises.
that, when exponentiated, was 1.45 repetitions (95% CI Furthermore, there was relatively minimal between-partic-
0.99–2.12). In general, the pattern of moderator effects was ipant variation in predictive accuracy, suggesting that the
similar to that found in our exploratory meta-regressions primary source of error is due to systematic underprediction.
of predictive accuracy. That is to say, between-participant
Accuracy in Predicting Repetitions to Task Failure in Resistance Exercise 385
It is not entirely clear whether the underprediction of is because predictions early in a set coupled with perform-
proximity to task-failure of approximately one repetition is ing a greater number of repetitions allows for a wider range
large enough to be considered meaningful. Mainly, the pre- of errors to be made, in contrast to predictions later in a set
diction error of one repetition in the context of the total num- coupled with completing fewer repetitions. Furthermore,
ber of repetitions completed per set can considerably impact prediction accuracy in later sets may improve due to either
the interpretation. To illustrate, an underprediction error of a practice element or that lingering fatigue means that the
one repetition can be considered small in a set composed sets are performed with relatively greater loads [27, 35, 36].
of 20 repetitions (5% error) and large in a set composed of Moreover, it is often assumed that predictions of proximity
five repetitions (20% error). Most of the reviewed studies to task failure are made based on either remembered or pres-
included more than 10 repetitions per set (average of 12.6 ently experienced perception of effort [16, 51]. However,
repetitions), which can partly assist in framing and interpret- there is the potential for other salient experiences, such as
ing this result. While it is difficult to interpret the direction discomfort, to be conflated with perception of effort and
and magnitude of this prediction error, developing a deeper to influence one’s prediction [52–54]. This possibility may
understanding of it can help in designing, interpreting, and explain the general underprediction found across studies and
comparing studies. For example, future research examin- also that predictions worsened with lower loads and thus
ing the dose–response relationships of different proximities higher repetition ranges. Lower loads performed to task fail-
to task failure may benefit from knowing the magnitude of ure typically elicit greater perceptions of discomfort [52,
prediction errors [47–50]. 53]. Practically, if using RIR, participants should predict
The finding that prediction accuracy improved when the proximity to task failure as the set unfolds, rather than before
predictions were provided towards the end of a set composed it begins, for better prediction accuracies. Sets of lower rep-
of fewer repetitions, as well as in later sets, is logical. This etitions, which are commonly associated with heavier loads,
386 I. Halperin et al.
Fig. 5 Meta-analytic scatter plot of when prediction was made and prediction accuracy. reps repetitions
will also lead to better prediction accuracies. The fact that (the sentence before the parentheses) and momentary fail-
training status did not impact prediction accuracy comes as ure (the part in parentheses). If participants predict that no
a surprise as prior studies that included participants with a additional repetitions can be completed at the point of task
range of training histories have typically shown that training failure, then it can be assumed that the last repetition was
background is associated with improved predictive accuracy successful. As such, the last repetition should be defined as
[27, 28], although in one study this may have also been due RM. However, if concentric failure was achieved, then the
to more trained participants utilizing heavier loads/lower final repetition should be defined as momentary failure. If
repetitions [28]. The large heterogeneity between studies the definition of task failure that they are trying to predict is
may partly explain these trivial effects. not clearly explained to participants, larger prediction errors
In conducting this scoping review, we have identified a can be expected.
number of methodological issues that warrant discussion. There are also inconsistencies between studies in which
The main issue is that clear distinction between RM and of the two set endpoints were used to represent task failure.
momentary failure is not always present (see Table 2 in This is the case with both scale instructions and the criteria
the ESM; https://osf.io/2fwue/). For example, in the Esti- used to define task failure. Ideally, in order to achieve higher
mated Repetition to Failure scale used in studies by Hackett prediction accuracies, sets that end with momentary failure
and colleagues [27], the following is stated in the scale’s are superior to those ending with RM. This is because reach-
instructions “… “0” is where the subject estimated no ing momentary failure is, by definition, the point in which no
additional repetitions could be completed (concentric fail- more repetitions can be completed, whereas RM is, by defi-
ure reached)”. Considering the operational definitions of nition, an unverified prediction that the subsequent repetition
RM and momentary failure provided by Steele et al. [15], cannot be completed. For example, a trainee who assumes to
this explanation does not clearly differentiate between RM have reached the point of RM may be able to complete three
Accuracy in Predicting Repetitions to Task Failure in Resistance Exercise 387
Fig. 6 Meta-analytic scatter plot of the repetition ranges performed to task failure, and prediction accuracy
more repetitions before reaching momentary failure. Hence, their task failure prediction, it is possible that they set this
there is more room for predication error unless momentary particular number as the goal of the set—in a sense, a self-
failure is achieved. We acknowledge that requesting and fulfilling prophecy. It is possible that if participants did not
ensuring that participants reach momentary failure is not a provide a prediction they would have completed more or
simple task. It can be argued that it is impossible to truly ver- less repetitions. Since all of the analyzed studies included
ify whether momentary failure was achieved and, for ethical in our meta-analysis suffered from this anchoring bias, the
reasons, participants cannot be forced to reach momentary observed estimates may be smaller than what they truly are.
failure. The inconsistent task failure anchors in the RIR stud- To overcome the anchoring bias, Armes et al. [55] used a
ies could have biased the estimates in this meta-analysis, as deception design where participants completed sets of knee
we treated task failure to be similar across studies. Future extension to both RM and momentary failure, and were told
studies should consider how task failure is explained to that the purpose of the study was to inspect the reliability of
participants and should include a detailed account of the their performance across trials; however, the true purpose of
instructions and how RM or momentary failure were defined this study was to examine their predictive ability. By bypass-
and monitored [15]. In studies that include both RM and ing the effects of the anchoring bias, the authors observed
momentary failure as task failure, including the ratio of sets an average underprediction error of two repetitions in an
that ended with task failure utilizing participant self-reports internal meta-analysis of their experiments. Hence, imple-
and experimenter’s observations may assist in explaining menting such deception designs may reveal that predictive
dissimilar results between studies. abilities are in fact worse than the present estimate suggests.
Another methodological issue in the literature is the Despite the methodological limitations of RIR literature
anchoring bias that arises when participants provide their discussed above, the RIR approach has benefits. Mainly,
task failure prediction. That is, once participants report prescribing repetitions relative to task failure may help to
388 I. Halperin et al.
ensure that a consistent effort is reached in a given set, even Additionally, including the ratio of sets that ended due to
if the number of repetitions are different between and within momentary failure and RM may assist in explaining different
participants, sets, and exercises [16, 18, 21, 56]. In contrast, study results. Lastly, the use of deception-based designs may
prescribing a fixed and predetermined number of repetitions overcome the anchoring biases and result in better estimates
using a specific percentage of 1RM accounts for consid- of participants’ prediction accuracy.
erably less variability in one’s abilities [57, 58]. As such,
the prediction errors identified in the present study, coupled
with methodological concerns of RIR approaches, should be 5 Conclusion
viewed and weighted relative to the alternative prescription
approaches. Moreover, studies comparing sets taken and not We found that participants typically underpredict proximity
taken to task failure on various outcomes can benefit from to task failure by approximately one repetition; however, it
implementing RIR approaches in their designs, because such is unclear whether this degree of underprediction represents
studies implement a binary task failure and not to task failure acceptable prediction accuracy. Practitioners and trainees
approach in which proximity to task failure is not accounted choosing to apply RIR-based techniques can improve predic-
for [49, 50]. By comparing groups that follow different RIR tion accuracy by providing the prediction towards the end of
set endpoints (e.g., momentary failure vs. 1RIR vs. 2RIR), sets composed of fewer repetitions, as well as in later sets.
richer and more insightful comparisons can be made. In Participants’ background in resistance training did not seem
order to strengthen RIR designs, future studies should con- to meaningfully impact prediction accuracy, nor were there
sider the task failure they are using and provide a clear and differences between upper and lower body exercises.
detailed account of how it was explained and confirmed.
Accuracy in Predicting Repetitions to Task Failure in Resistance Exercise 389
Supplementary Information The online version of this article (https:// 12. Knowles OE, Drinkwater EJ, Urwin CS, Lamon S, Aisbett B.
doi.org/10.1007/s40279-021-01559-x) contains supplementary mate- Inadequate sleep and muscle strength: Implications for resist-
rial, which is available to authorized users. ance training. J Sci Med Sport. 2018;21(9):959–68.
13. Hackett DA, Johnson NA, Halaki M, Chow CM. A novel
scale to assess resistance-exercise effort. J Sports Sci.
Declarations 2012;30(13):1405–13.
14. Zourdos MC, Klemp A, Dolan C, Quiles JM, Schau KA, Jo
Funding No financial support was received for the conduct of this E, et al. Novel resistance training-specific rating of perceived
article or for the preparation of this manuscript. exertion scale measuring repetitions in reserve. J Strength Cond
Res. 2016;30(1):267–75.
Conflict of interest Israel Halperin, Tomer Malleron, Itai Har-Nir, Pa- 15. Steele J, Fisher J, Giessing J, Gentil P. Clarity in reporting ter-
troklos Androulakis-Korakakis, Milo Wolf, James Fisher, and James minology and definitions of set endpoints in resistance training.
Steele declare they had no conflicts of interest. Muscle Nerve. 2017;56(3):368–74.
16. Helms ER, Cronin J, Storey A, Zourdos MC. Application of the
Ethical approval Not applicable. repetitions in reserve-based rating of perceived exertion scale
for resistance training. Strength Cond J. 2016;38(4):42–9.
Consent to participate Not applicable. 17. Helms ER, Byrnes RK, Cooke DM, Haischer MH, Carzoli JP,
Johnson TK, et al. RPE vs. percentage 1RM loading in perio-
Consent for publication Not applicable. dized programs matched for sets and repetitions. Front Physiol.
2018;9:247.
Availability of data and material All data are available in the Open 18. Graham T, Cleather DJ. Autoregulation by “Repetitions in
Science Framework by accessing https://osf.io/jzuwq/ reserve” leads to greater improvements in strength over a
12-week training program than fixed loading. J Strength Cond
Author contributions IH and JS wrote the first draft of the manuscript. Res. 2021;35(9):2451–6. https://d oi.o rg/1 0.1 519/J SC.0 0000
TM, IHN, PAK and MW performed the literature search, and JS per- 00000003164.
formed the meta-analyses. All authors were involved in the interpreta- 19. Shattock K, Tee JC. Autoregulation in resistance training: a
tion of the meta-analyses, and read, revised, and approved the final comparison of subjective versus objective methods. J Strength
manuscript. Cond Res. 2020. https://d oi.o rg/1 0.1 519/J SC.0 00000 0000
003530.
20. Hackett DA, Cobley SP, Halaki M. Estimation of repetitions to
failure for monitoring resistance exercise intensity: building a
case for application. J Strength Cond Res. 2018;32(5):1352–9.
References 21. Kemmler W, Kohl M, Fröhlich M, Jakob F, Engelke K, von
Stengel S, et al. Effects of high-intensity resistance training on
1. Wilkins LW. ACSM’s health-related physical fitness assessment osteopenia and sarcopenia parameters in older men with osteo-
manual. 4th ed. Philadelphia: American College of Sports Medi- sarcopenia—one-year results of the randomized controlled fran-
cine; 2013. conian osteopenia and sarcopenia trial (FrOST). J Bone Miner
2. Sands WA, Wurth JJ, Hewit JK. Basics of strength and condition- Res. 2020;35(9):1634–44.
ing manual. Colorado Springs: National Strength and Condition- 22. Zourdos MC, Goldsmith JA, Helms ER, Trepeck C, Halle JL,
ing Association; 2012. Mendez KM, et al. Proximity to failure and total repetitions
3. Ratamess NA, Alvar BA, Evetoch TE, Housh TJ, Ben Kibler W, performed in a set influences accuracy of intraset repetitions in
Kraemer WJ, et al. Progression models in resistance training for reserve-based rating of perceived exertion. J Strength Cond Res.
healthy adults. Med Sci Sports Exerc. 2009;41(3):687–708. 2021;35:S158–65.
4. Hass CJ, Feigenbaum MS, Franklin BA. Prescription of resistance 23. Halperin I, Emanuel A. Rating of perceived effort: meth-
training for healthy populations. Sports Med. 2001;31(14):953–64. odological concerns and future directions. Sports Med.
5. Westcott WL. Resistance training is medicine: effects of strength 2020;50(4):679–87.
training on health. Curr Sports Med Rep. 2012;11(4):209–16. 24. Steele J. What is (perception of) effort? Objective and subjective
6. Kraemer WJ, Ratamess NA, French DN. Resistance training for effort during task performance. PsyArXiv. Epub 6 Jun 2020. doi:
health and performance. Curr Sports Med Rep. 2002;1(3):165–71. https://doi.org/10.31234/osf.io/kbyhm.
7. Winett RA, Carpinelli RN. Potential health-related benefits of 25. Arede J, Vaz R, Gonzalo-Skok O, Balsalobre-Fernandéz C,
resistance training. Prev Med. 2001;33(5):503–13. Varela-Olalla D, Madruga-Parera M, et al. Repetitions in reserve
8. Richens B, Cleather DJ. The relationship between the number of vs maximum effort resistance training programs in youth female
repetitions performed at given intensities is different in endurance athletes. J Sports Med Phys Fitness. 2020;60(9):1231–9.
and strength trained athletes. Biol Sport. 2014;31(2):157–61. 26. Buskard ANLJK, Eltoukhy MM, Strand KL, Villanueva L,
9. Shimano T, Kraemer WJ, Spiering BA, Volek JS, Hatfield DL, Desai PP, Signorile JF. Optimal approach to load progressions
Silvestre R, et al. Relationship between the number of repeti- during strength training in older adults. Med Sci Sports Exerc.
tions and selected percentages of one repetition maximum in free 2019;51(11):2224–33.
weight exercises in trained and untrained men. J Strength Cond 27. Hackett DA, Cobley SP, Davies TB, Michael SW, Halaki M. Accu-
Res. 2006;20(4):819–23. racy in estimating repetitions to failure during resistance exercise.
10. Hoeger WW, Barette SL, Hale DF, Hopkins DR. Relationship J Strength Cond Res. 2017;31(8):2162–8.
between repetitions and selected percentages of one repetition 28. Steele J, Endres A, Fisher J, Gentil P, Giessing J. Ability to predict
maximum. J Strength Cond Res. 1987;1(1):11–3. repetitions to momentary failure is not perfectly accurate, though
11. Grgic J, Trexler ET, Lazinica B, Pedisic Z. Effects of caffeine improves with resistance training experience. PeerJ. 2017;5:4105.
intake on muscle strength and power: a systematic review and 29. Sousa CA. Assessment of accuracy of intra-set rating of perceived
meta-analysis. J Int Soc Sports Nutr. 2018;15(1):1–10. exertion in the squat, bench press, and deadlift [Master’s thesis].
Boca Raton: Florida Atlantic University; 2018.
390 I. Halperin et al.
30. Lemos EA, Caldas LC, Leopoldo APL, Leopoldo AS, Fer- 46. Higgins J, Green S. Cochrane handbook for systematic reviews of
reira LG, Lunz W. The perception of effort is not a valid tool interventions 51 0 (updated March 2011). London: The Cochrane
for establishing the strength-training zone. J Hum Sport Exerc. Collaboration; 2011.
2017;12(3):593–606. 47. Giebetasing J, Fisher J, Steele J, Rothe F, Raubold K, Eichmann
31. Ratto AG. Application of the predicted repetitions-to-failure per- B. The effects of low-volume resistance training with and without
ceived exertion scale for the NFL-225lb bench press test [Master’s advanced techniques in trained subjects. J Sports Med Phys Fit-
thesis]. Arcata: Humboldt State University; 2019. ness. 2016;56(3):249–58.
32. Servais B. Regulating resistance exercise intensity using percep- 48. Giessing J, Eichmann B, Steele J, Fisher J. A comparison of low
tual response and the “anticipatory feedback” model [Master’s volume “high-intensity-training” and high volume traditional
thesis]. Arcata: Humboldt State University; 2015. resistance training methods on muscular performance, body
33. Odgers JB. Exertion/velocity profiling and assessment of accuracy composition, and subjective assessments of training. Biol Sport.
of intra-set rating of perceived exertion in the front squat and hex- 2016;33(3):241–9.
agonal bar deadlift [Master’s thesis]. Regina: Faculty of Graduate 49. Grgic J, Schoenfeld BJ, Orazem J, Sabol F. Effects of resistance
Studies and Research, University of Regina; 2020. training performed to repetition failure or non-failure on muscular
34. Mansfield SK, Peiffer JJ, Hughes LJ, Scott BR. Estimating rep- strength and hypertrophy: a systematic review and meta-analysis.
etitions in reserve for resistance exercise: an analysis of factors J Sport Health Sci. 2021. https://doi.org/10.1016/j.jshs.2021.01.
which impact on prediction accuracy. J Strength Cond Res. 2020. 007.
https://doi.org/10.1016/j.jshs.2021.01.007. 50. Vieira AF, Umpierre D, Teodoro JL, Lisboa SC, Baroni BM,
35. Emanuel A, Rozen S II, Halperin I. The effects of lifting lighter Izquierdo M, et al. Effects of resistance training performed to fail-
and heavier loads on subjective measures. Int J Sports Physiol ure or not to failure on muscle strength, hypertrophy, and power
Perform. 2020;16(2):176–83. output: a systematic review with meta-analysis. J Strength Cond
36. Hackett DA. Influence of movement velocity on accuracy of esti- Res. 2021;35(4):1165–75.
mated repetitions to failure in resistance-trained men. J Strength 51. Coquart JB, Eston RG, Noakes TD, Tourny-Chollet C, L’Hermette
Cond Res. 2021. https://d oi.o rg/1 0.1 519/J SC.0 00000 00000 03978. M, Lemaître F, et al. Estimated time limit: a brief review of a
37. Wallace BC, Small K, Brodley CE, Lau J, Trikalinos TA. Deploy- perceptually based scale. Sports Med. 2012;42(10):845–55.
ing an interactive machine learning system in an evidence-based 52. Stuart C, Steele J, Gentil P, Giessing J, Fisher JP. Fatigue and
practice center: abstrackr. In: Proceedings of the 2nd ACM perceptual responses of heavier-and lighter-load isolated lum-
SIGHIT international health informatics symposium; 2012. pp. bar extension resistance exercise in males and females. PeerJ.
819–24. 2018;6:4523.
38. Viechtbauer W. Conducting meta-analyses in R with the metafor 53. Fisher JP, Farrow J, Steele J. Acute fatigue, and per-
package. J Stat Softw. 2010;36(3):1–48. ceptual responses to resistance exercise. Muscle Nerve.
39. Hedges LV, Tipton E, Johnson MC. Robust variance estimation in 2017;56(6):E141–6.
meta-regression with dependent effect size estimates. Res Synth 54. Emanuel A, Smukas IIR, Halperin I. An analysis of the perceived
Methods. 2010;1(1):39–65. causes leading to task-failure in resistance-exercises. PeerJ.
40. Bojanowski M. lspline: Linear splines with convenient parametri- 2020;8:e9611.
sations. R package version 10.0. 2017. https://CRANR-project. 55. Armes C, Standish-Hunt H, Androulakis-Korakakis P, Michalo-
org/package=lspline. poulos N, Georgieva T, Hammond A, et al. “Just one more rep!”–
41. Schoenfeld BJ, Grgic J, Van Every DW, Plotkin DL. Loading Ability to predict proximity to task failure in resistance trained
Recommendations for Muscle Strength, Hypertrophy, and Local persons. Front Psychol. 2020;11:3760.
Endurance: A Re-Examination of the Repetition Continuum. 56. Fairman CM, Zourdos MC, Helms ER, Focht BC. A scientific
Sports (Basel). 2021;9(2). rationale to improve resistance training prescription in exercise
42. Nakagawa S, Poulin R, Mengersen K, Reinhold K, Engqvist oncology. Sports Med. 2017;47(8):1457–65.
L, Lagisz M, et al. Meta-analysis of variation: ecological and 57. Phillips SM, Winett RA. Uncomplicated resistance training and
evolutionary applications and beyond. Methods Ecol Evol. health-related outcomes: evidence for a public health mandate.
2015;6(2):143–52. Curr Sports Med Rep. 2010;9(4):208–13.
43. McShane BB, Gal D, Gelman A, Robert C, Tackett JL. Abandon 58. Steele J, Fisher J, Skivington M, Dunn C, Arnold J, Tew G, et al.
statistical significance. Am Stat. 2019;73(Suppl 1):235–45. A higher effort-based paradigm in physical activity and exercise
44. Amrhein V, Greenland S, McShane B. Scientists rise up against for public health: making the case for a greater emphasis on resist-
statistical significance. Nature. 2019;567(7748):305–7. ance training. BMC Public Health. 2017;17(1):1–8.
45. Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring
inconsistency in meta-analyses. BMJ. 2003;327(7414):557–60.