JCP 02 00020
JCP 02 00020
Center for Cybersecurity Research and Analysis, Capitol Technology University, 11301 Springfield Road, Laurel,
MD 20708, USA; [email protected]
Abstract: Security analysts working in the modern threat landscape face excessive events and alerts,
a high volume of false-positive alerts, significant time constraints, innovative adversaries, and a
staggering volume of unstructured data. Organizations thus risk data breach, loss of valuable human
resources, reputational damage, and impact to revenue when excessive security alert volume and
a lack of fidelity degrade detection services. This study examined tactics to reduce security data
fatigue, increase detection accuracy, and enhance security analysts’ experience using security alert
output generated via data science and machine learning models. The research determined if security
analysts utilizing this security alert data perceive a statistically significant difference in usability
between security alert output that is visualized versus that which is text-based. Security analysts
benefit two-fold: the efficiency of results derived at scale via ML models, with the additional benefit
of quality alert results derived from these same models. This quantitative, quasi-experimental,
explanatory study conveys survey research performed to understand security analysts’ perceptions
via the Technology Acceptance Model. The population studied was security analysts working in
a defender capacity, analyzing security monitoring data and alerts. The more specific sample was
security analysts and managers in Security Operation Center (SOC), Digital Forensic and Incident
Response (DFIR), Detection and Response Team (DART), and Threat Intelligence (TI) roles. Data
analysis indicated a significant difference in security analysts’ perception of usability in favor of
visualized alert output over text alert output. The study’s results showed how organizations can
Citation: McRee, G.R. Improved more effectively combat external threats by emphasizing visual rather than textual alerts.
Detection and Response via
Optimized Alerts: Usability Study. J. Keywords: user acceptance; user experience; security alert; detection; data science; visualization;
Cybersecur. Priv. 2022, 2, 379–401. visual alert output; text alert output
https://doi.org/10.3390/jcp2020020
1.1. Background
Many organizations must deal with a high volume of security alert and event data
derived from security devices and detective capabilities [3]. A Dimensional Research study
found that these organizations face a large burden due to alert overload, where 99% of
security professionals surveyed acknowledge that high volumes of security alerts are prob-
lematic. The Dimensional Research study also determined that primary challenges include
many minor problems or noise (68%), wasted time chasing false positives (66%), team
members who feel overwhelmed (50%), excessive time spent triaging alerts (47%), and an
increased overall security risk (42%) [4]. Bartos found that one of the core issues an analyst
faces is the large number of alerts generated by numerous cybersecurity tools. When con-
sidering additional data received via various sharing and collaborative platforms, the issue
is further amplified. As such, for security analysts, data prioritization and summarization
are essential to reduce the excessive amount of information presented. Prioritization is
consistently identified as a core tenet of security incident handling in numerous studies [5].
A lack of prioritization can result in security data fatigue, analyst burnout, and ineffective
or insufficient incident response [6]. Organizations face increased risk and liability if their
capacity to respond to high-fidelity detections is reduced by excessive alert noise [7]. As in-
dicated by FireEye data, in organizations that receive 17,000 alerts weekly, more than 51% of
the alerts are false positives, and only 4% of the alerts are thoroughly investigated [8]. More
narrowly, Seals found that 80% of organizations who receive 500 or more severe/critical
alerts per day investigate fewer than 1% of them [9]. The issue is exacerbated by data
volumes. Oltsik reported that, as part of security operations, 38% of organizations collect,
process, and analyze more than 10 terabytes monthly. As of 2017, 28% of organizations
collect, process, and analyze substantially more data than in the two years prior, while
another 49% of organizations collect, process, and analyze somewhat more data today
than the two years prior [10]. A recent survey of 50 SOC professionals, Managed Security
Services Providers (MSSP), and Managed Detection and Response (MDR) providers evalu-
ated the state of incident response within SOCs and found numerous causes for concern.
Nearly half of respondents reported a false-positive rate of 50% or higher, which was so
high because security information and event management (SIEM) and incident response
tools are improperly tuned and alert on known-good activity, resulting in investigations
with a high rate of false positives [11]. Respondents reported that when their SOC had too
many alerts for analysts to process, 38% either turn off high-volume alerting features or
hire more analysts. Additionally, respondents felt that their main job responsibility was less
to analyze and remediate security threats and more to reduce alert investigation time or
the volume of alerts [11]. All of this results in significant security analyst turnover. A large
majority (80%) of respondents indicated that their SOC had experienced at least 10% analyst
turnover. The largest pool of respondents (45%) indicated a 10–25% turnover, and more
than a third (35%) lost a quarter or more of their SOC analysts in less than 12 months [11].
Slatman’s research focused on data-driven security operations and security analytics to
investigate and address the investigation challenges security analysts face [3]. The chal-
lenges are categorized into four main categories: an increasingly complex IT environment,
limited business alignment, ever-evolving adversaries and corresponding attacks, and
inadequate resources with respect to people and technology. The concept of data-driven
security operations is the seminal starting point for this research. A focus on data-driven
security operations addresses and enables discussions related to challenges that security
analysts face, as well as opportunities for improvements such as applied machine learning
and visualization.
The specific business problem is: organizations risk data breach, loss of valuable
human resources, reputation, and revenue due to excessive security alert volume and a lack
of fidelity in security event data. A Cloud Security Alliance survey illuminated the problem
further. With an average of two billion transactions a month at the average enterprise,
IT security professionals say that 40.4% of alerts received lack actionable intelligence to
investigate, and another 31.9% report ignored alerts due to false positives [12]. Chickowski
J. Cybersecur. Priv. 2022, 2 381
stated that as much as 25% of a security analyst’s time is spent processing false-positive
alerts, commonly erroneous security alerts or false indicators of confidence, before focusing
on true-positive findings. Every hour an analyst spends on the job, 15 min are wasted on
false positives, leading the typical organization to waste between 286 and 424 h per week on
false positives [13]. In addressing this problem, improving the efficiency of security analysts
can be helpful. In a survey that examines specific areas where high- and low-performing
SOCs diverge, with a focus on the challenges both groups struggle with, Ponemon found
key data points in the differences and similarities between the two classes of SOCs. Even
highly effective SOCs suffer from job-related stress affecting security analysts, where 55%
of respondents from high-performing SOCs rated their stress level as a 9 or 10 on a 10-point
scale. Twenty-two percent of survey respondents rated their SOC as ineffective, citing a lack
of visibility into the attack surface and a lack of timely remediation as the core factors [14].
To examine opportunities for increased efficiencies, this study used a survey questionnaire
based on the Technology Acceptance Model (TAM) to test for statistical differences between
security analysts’ responses regarding perception and usability of text-based alert output
(TAO) versus visualized alert output (VAO).
Figure 1. Theoretical
Figure 1. Theoretical Framework:
Framework: Technology
TechnologyAcceptance
AcceptanceModel.
Model.Adapted
Adaptedfrom
from “Perceived
“Perceived Use-
Useful-
fulness, Perceived
ness, Perceived EaseEase of Use,
of Use, andand User
User Acceptance
Acceptance of Information
of Information Technology”
Technology” [17].[17].
Section 1 provided an overview of the study with the context and background for the
research problem and statement, as well as purpose and significance. Section 2 includes
details of the methodological approach
approach used
used by the researcher for this study.
study. Section 3
offers background on the research results and provides a description of the sample as well
as hypothesis testing, inclusive of a summary and data analysis. Section 4 concludes the
study with a discussion of the research results, coupled with its conclusions,
conclusions, limitations,
limitations,
implications for practice, and recommendations for future
future research.
research.
2. Material
2. Material and
and Methods
Methods
2.1. Design and Methodology
2.1. Design and Methodology
The researcher utilized a quantitative, quasi-experimental, explanatory methodology
The researcher utilized a quantitative, quasi-experimental, explanatory methodology
for the envisioned study, using survey research to better understand related phenomena.
for the envisioned study, using survey research to better understand related phenomena.
Quantitative methods are used to measure behavior, knowledge, opinions, or attitudes in
Quantitative methods are used to measure behavior, knowledge, opinions, or attitudes in
business research, as is pertinent when the Technology Acceptance Model is the utilized
business research, as is pertinent when the Technology Acceptance Model is the utilized
instrument. An online survey was used to test for statistically significant differences in
instrument. An online survey was used to test for statistically significant differences in the
the level of acceptance of alert output between those choosing VAO in all scenarios and
level of acceptance of alert output between those choosing VAO in all scenarios and those
those having some or complete preference for TAO, with VAO and TAO being generated
having some or complete preference for TAO, with VAO and TAO being generated via
via data science/machine learning methods as predicted by the TAM. In pursuit of further
data science/machine
insights learning
relevant to potential methodsinassecurity
differences predicted by theperceptions
analysts’ TAM. In pursuit ofand
of visual further
text
insights
analytics,relevant to potential
the research questiondifferences
that guidesinthis
security
study analysts’
was: perceptions of visual and
text analytics, the research question that guides this study was:
• RQ1: Is there a difference in the level of acceptance of security alert output between
• RQ1:
those Is there
with a difference
a preference forinVAO
the level of acceptance
and those of security
with a preference foralert
TAO,output between
with VAO and
those with a preference for VAO and those with a preference for TAO,
TAO generated via data science/machine learning methods, as predicted by the TAM?with VAO and
TAO generated via data science/machine learning methods, as predicted by the
# Sub-questions were:
TAM?
SQ1: Does the adoption of VAO have a significant impact on the four
o Sub-questions were:
individual TAM components: PU, PEU, AU, and IU?
SQ1:SQ2: DoesDoes
the adoption of VAO
the adoption have have
of TAO a significant impact
a significant on theonfour
impact theindi-
four
vidual TAM components: PU, PEU, AU, and IU?
individual TAM components: PU, PEU, AU, and IU?
SQ2: Does the adoption of TAO have a significant impact on the four indi-
The online survey utilized for this study incorporated visual images as part of the
vidual to
questioning process, TAM components:
create clarity andPU, PEU,answering
compel AU, and IU?in full. To further minimize
The onlineand
non-response, survey utilized
to prepare forfor
data this studythe
testing, incorporated visual
following were images as part of the
included:
questioning
• process, to create clarity and compel answering in full.
As part of this quantitative, quasi-experimental, explanatory study, To further minimize
the online survey
non-response, and to prepare data for testing, the following
for data collection utilized a 7-point Likert scale. were included:
•• As
Thepart of this
online quantitative,
survey quasi-experimental,
questionnaire explanatory
and survey experiment, givenstudy,
that the
thisonline survey
research was
for data collection
specifically focusedutilized a 7-point Likert
on visualization versusscale.
text, incorporated visual elements, which
• The
leadonline survey
to a higher questionnaire
response qualityand
andsurvey experiment,
generate interestinggiven that this
interaction research
effects [19].was
specifically focused on visualization versus text, incorporated visual elements,
The target population for this study was global information security analysts working which
lead to a higher response quality and generate interesting interaction effects [19].
in a blue team (defender) capacity, analyzing security monitoring data and alerts. This is an
appropriate population
The target given
population forthe
thissignificant
study was challenges the industry
global information faces due
security to thework-
analysts sheer
scaleinofa security
ing blue teamdata, and the resulting
(defender) capacity, difficulties security monitoring
analyzing security analysts facedata
seeking
and precise and
alerts. This
efficient
is answers to
an appropriate alert-related
population questions.
given Participants
the significant were solicited
challenges fromfaces
the industry this population
due to the
via social
sheer scalemedia, including
of security data, LinkedIn and Twitter,
and the resulting mailingsecurity
difficulties lists, industry
analystspartners, and
face seeking
J. Cybersecur. Priv. 2022, 2 383
contact lists. The researcher ensured prequalification with a job and role-specific question.
Survey participants who did not meet population requirements were disqualified.
Data analysis for this study utilized a mixed ANOVA because it enables efficiency
while keeping variability low [20]. In other words, given the within-subjects component
of this study where all participants undertook the same three scenarios, a mixed ANOVA
allowed for partitioning out variability as a function of individual differences. Additionally,
a mixed ANOVA provided the benefit of efficiency while keeping variability low, thereby
keeping the validity of the results higher yet allowing for smaller subject groups [20].
2.3. Instrumentation
The TAM implies that positive perception of usefulness and ease of use (perceived
usability) influence intention to use, which in turn influences the actual likelihood of
use [21]. Original construction of the TAM for measurement of PU and PEU resulted in
a 12-item instrument that was shown to be reliable [22]. It consisted of the two factors
PU and PEU and was correlated with intentions to use and self-report usage [17]. This
quantitative, quasi-experimental, explanatory study utilized a 7-point Likert scale to assess
the level of acceptance and the perceived ease of use and perceived usefulness of alerts
in three scenarios (the within-subjects independent variable). The preferred alert output
(VAO or TAO) forms the basis of the between-subjects independent variable. Likert-type
scale response anchors set the range between agreement and disagreement; as an example,
1 indicated strong disagreement and 7 indicated strong agreement with a statement.
2.4. Hypotheses
The following research questions served to determine if a relationship exists between
the dependent variable, which is the level of acceptance of alert output, and the two
independent variables, which are Session (1, 2, or 3) and Maximum Visual. Maximum
J. Cybersecur. Priv. 2022, 2 384
Visual had two levels: one where VAO was chosen for all scenarios and one where TAO
was chosen for some or all scenarios.
• Is there a difference in the level of acceptance of alert outputs between those preferring
VAO in all scenarios and those preferring TAO in some or all scenarios, as predicted
by the TAM?
# Sub-questions:
Does the adoption of VAO have a significant impact on the four indi-
vidual TAM components: PU, PEU, AU, and IU?
Does the adoption of TAO have a significant impact on the four indi-
vidual TAM components: PU, PEU, AU, and IU?
The following research hypotheses explored the research questions for a relationship
between the independent variable of Maximum Visual (a preference for VAO in all scenarios
versus a preference for TAO in some or all scenarios), and the dependent variable, which
is the level of acceptance of alert outputs. The dependent variable is specific to security
analysts’ perception of machine learning (ML)- and data science (DS)-generated alert
output.
The null and alternative hypotheses are stated as:
H1: There is no significant difference in the level of acceptance of alert outputs between those
preferring VAO in all scenarios and those preferring TAO in some or all scenarios, as predicted by
the TAM.
H2: There is a significant difference in the level of acceptance of alert outputs between those
preferring VAO in all scenarios and those preferring TAO in some or all scenarios, as predicted by
the TAM.
Omnibus tests are applicable to these hypotheses, where H1: R-squared is equal to
0 and H2: R-squared is greater than 0. Table 2 highlights the relationship between the
research questions and the hypotheses.
Table 2. Research question and hypotheses testing.
Both parametric and non-parametric tests were performed. Mixed ANOVA tested
whether the level of acceptance of alert outputs is influenced by the within-subjects variable
Scenario and the between-subjects variable Maximum Visual. Mixed ANOVA was also
repeated for the four sub-scales of PU, PEU, AS, and IU, with Bonferroni corrections for
multiple comparisons. Additionally, a Mann–Whitney U test was performed, comparing
the level of acceptance of alert outputs of the two levels of Maximum Visual, and a Friedman
test compared the level of acceptance across the three scenarios.
3. Results
3.1. Background
The specific business problem that oriented this study is: organizations risk data
breach, loss of valuable human resources, reputation, and revenue due to excessive security
alert volume and a lack of fidelity in security event data. To determine means of support
for security analysts experiencing these security event-specific challenges, the study asked
if there is a difference in the level of acceptance of security alert outputs between those
preferring VAO in all scenarios, and those preferring TAO in some or all scenarios, as
predicted by the TAM. The dependent variable was participants’ level of acceptance of
security alert output: the within-subjects independent variable is Scenario, and the between-
subjects independent variable is Maximum Visual (preference for VAO in all scenarios
versus preference for TAO in some or all scenarios). SurveyMonkey was utilized to deliver
an online survey to participants, from which the collected data were analyzed. The survey
queried a population of cybersecurity analysts and managers in SOC, DFIR, DART, and
TI roles, targeted for participation via social media. Twitter and LinkedIn were utilized.
The LinkedIn campaign included the use of Linked Helper to create a list of potential
participants whose profiles matched the desired role descriptions from connections in the
researcher’s network of 1411 connections as of this writing. The final filtered list resulted in
234 potential participants to whom an invitation to participate was sent. A 7-point Likert
scale survey queried participants regarding their perspectives on perceived ease of use and
perceived usefulness of ML and DS-generated alert output across three scenarios with TAO
and VAO results [23]. Of 119 respondents, 24 disqualified themselves and 95 identified
themselves as qualified, 81 of whom completed all 3 scenarios.
J. Cybersecur. Priv. 2022, 2 386
Figure 2. Standardized
Figure 2. Standardized residual normality for
residual normality for Scenarios
Scenarios 1–3.
1–3.
Given that the residuals are skewed, Friedman’s test was also conducted, as a non-
parametric equivalent of a within-subjects one-way ANOVA. It only considers the impact
of the within-subjects variable Scenario.
Finally, reliability was assumed where Cronbach’s alpha measures the internal con-
J. Cybersecur. Priv. 2022, 2 387
Given that the residuals are skewed, Friedman’s test was also conducted, as a non-
parametric equivalent of a within-subjects one-way ANOVA. It only considers the impact
of the within-subjects variable Scenario.
Finally, reliability was assumed where Cronbach’s alpha measures the internal consis-
tency of questions related to the same issues across each of the three scenarios. If Cronbach’s
alpha ranged from 0 to 1 and scores were expected to be between 0.7 and 0.9, the result for
this study represents good consistency [27]. Using a scale comprised of 18 TAM questions
for each scenario, and 81 valid cases, with 14 excluded (n = 95), the reliability statistic
for each scenario as indicated by Cronbach’s alpha was 0.958 for Scenario 1, 0.971 for
Scenario 2, and 0.986 for Scenario 3.
Within-Subjects Factors
Scenarios Dependent Variable
1 S1_tot
2 S2_tot
3 S2_tot
Between-Subjects Factors
Value Label N
0.00 No 22
Maximum Visual
1.00 Yes 59
J. Cybersecur. Priv. 2022, 2 388
Table 4. Cont.
Descriptive Statistics
Maximum Visual Mean Std. Deviation N
No 107.7273 11.65856 22
S1_tot Yes 110.2034 15.15754 59
Total 109.5309 14.26454 81
No 104.7727 14.91223 22
S2_tot Yes 109.9661 15.87556 59
Total 108.5556 15.70032 81
No 88.6364 29.03618 22
S3_tot Yes 104.6102 21.62136 59
Total 100.2716 24.7255 81
The Maximum Visual variable (Vis_max) defined the participants who said yes to VAO
in all three scenarios, labeled Yes (N = 59), and the participants who selected a mix of VAO
and TAO or all TAO results across all three scenarios, labeled No (N = 22). Maximum Visual
is the study’s between-subjects independent variable. It was one of the main factors in the
mixed ANOVA, as can be seen in Table 5.
Table 5. Maximum Visual IVs (between-subjects factors).
6. Independent
Table 6.
Table Independent samples
samples Mann–Whitney
Mann–Whitney U
U test
test summary.
summary.
TotalN
Total N 8181
Mann–Whitney U 863.500
Mann–Whitney U 863.500
Test Statistic 863.500
Test Statistic
Standard Error 863.500
94.140
Standard ErrorTest Statistic
Standardized 94.140
2.279
Asymptotic Sig.
Standardized (2-sided
Test test)
Statistic 0.023
2.279
Asymptotic Sig. (2-sided test) 0.023
3.7. Friedman Test
The Mann–Whitney
A related U test indicates
samples Friedman test wasthat there is ato
conducted significant
assess thedifference (U = 863.5,
measurements of thep
= 0.023)
same
J. Cybersecur. Priv. 2022, 2, x FOR PEER in the level of acceptance of alert output between the respondents
dependent variable under different conditions for each participant, namely
REVIEW who selected
12 of 23 the three
visual output
scenarios across
for this studyalldefined
scenarios
by (n
the=variables
59) as compared to theand
S1_tot, S2_tot, respondents
S3_tot. Rankwho provided
frequencies
mixed responses
are shown (n =422).
in Figure andAs thesuch, the null
statistical hypothesis,
summary that therein
is represented is Table
no statistically
7. signif-
icant difference in the level of acceptance of alert output between those who preferred
VAO in all scenarios and those preferring TAO in some or all scenarios, is rejected.
The effect size is calculated by dividing the Standardized Test Statistic, Z, by the
.
square root of the number of pairs: = = 0.253. The effect size, according to Co-
√ √
hen’s classification of effect, is moderate, given 0.1 (small effect), 0.3 (moderate effect), and
0.5 and above (large effect).
Figure
Figure4.4.Related
Relatedsamples Friedman’s
samples two-way
Friedman’s ANOVA
two-way by ranks.
ANOVA by ranks.
Total N 81
Test Statistic 5.496
J. Cybersecur. Priv. 2022, 2 390
Total N 81
Test Statistic 5.496
Degree of Freedom 2
Asymptotic Sig. (2-sided test) 0.064
The Friedman test carried out to compare the score ranks for the three scenarios found
there to be no significant difference between scenarios: x2 (2) = 5.496, p < 0.064. The result
indicates that scenario mean ranks did not differ significantly from scenario to scenario
when not also factoring for responses based on output preference (Maximum Visual).
Effect size was not applicable as no measurable significance was found.
Approx. Greenhouse–
Within-Subjects Effect Mauchly’s W df Sig.
Chi-Square Geisser
Scenarios 0.625 36.652 2 0.000 0.727
Figure
Figure 6.
6. Estimated
Estimatedmarginal
marginal means—PU.
means—PU.
Estimatedmarginal
Figure7.7.Estimated
Figure marginalmeans—perceived
means—perceivedease
easeuse
use(PEU).
(PEU).
.
Figure 8. Estimated marginal means—attitude toward using (AU).
Figure 8. Estimated marginal means—attitude toward using (AU).
3.12. Mixed ANOVA—Intention to Use (IU)
Two-way mixed ANOVA (mixed ANOVA) with Bonferroni correction, computed
using α = 0.0125, was performed for IU in isolation. α = 0.0125 was applicable as one
quarter of α = 0.05 given that the TAM measures related to IU represent one of four tests of
related measures.
Mixed ANOVA was again applied, where the within-subjects variables equating to
score totals for each of the three study scenarios were represented by Intention2Use (IUS1_tot,
IUS2_tot, and IUS3_tot), and between-subjects factors were again represented by Maximum
Visual (Vis_max), labeled as Yes (n = 59) and No (n = 22).
Participants were presented with three scenarios exhibiting security alert output for
the results of applied models, where the output was both VAO and TAO. A mixed ANOVA
computed using α = 0.0125 with a Greenhouse–Geisser correction showed that scores
varied significantly across scenarios specific to Intention to Use (IUS1_tot, IUS2_tot, and
IUS3_tot) in tests of within-subject effects, and significantly again when differentiated for
Maximum Visual:
Scenarios: (F (1.447, 114.327) = 24.493, p < 0.001, ηp2 = 0.237)
Scenarios∗Vis_max: (F (1.447, 114.327) = 5.728, p = 0.009, ηp2 = 0.068)
Post hoc tests using the Bonferroni correction revealed that favorable scores for IU de-
creased insignificantly from Scenario 1 to Scenario 2 by an average of 0.304 points (p = 0.758)
but declined significantly from Scenario 1 to Scenario 3 by an average of 2.692 points
(p < 0.001). A significant decrease was noted from Scenario 2 to Scenario 3 by an addi-
tional 2.388 points (p < 0.001). The differences in scores were not meaningful between
Scenarios 1 and 2 (IUS1_tot and IUS2_tot) and Maximum Visual (Vis_max) = No, but
were quite impactful between Scenarios 2 and 3 (IUS2_tot and IUS3_tot) and Maximum
Visual (Vis_max) = No. As is consistent throughout this analysis, there was a significant
difference noted in Scenario 3 (IUS3_tot) compared to Scenarios 1 and 2, as well as Max-
imum Visual = Yes versus Maximum Visual = No. Again, a substantial 19% decrease in
mean score for Maximum Visual = No was noted in Scenario 3 as compared to Scenario 2,
indicating a significant decrease in IU for participants selecting TAO. As is the case for AU,
no change in IU was noted for participants selecting VAO for Scenario 2 as compared to
Scenario 1. Also noteworthy was the largest percentage of decrease in mean scores of all
J. Cybersecur. Priv. 2022, 2 395
results recorded, specifically for Scenario 3, indicating that intention to use was low for any
aspect of Scenario 3, TAO, or VAO.
Via estimated marginal means between-subjects, where Maximum Visual = Yes or
Maximum Visual = No, inclusive only of IU data with α = 0.0125 and Bonferroni correc-
tion, pairwise comparisons yielded a small 1.423 point mean difference in favor of VAO,
insignificant at p = 0.040. As such, there was not a significant main effect of Maximum
Visual scores (F (1, 79) = 4.378, p = 0.040, ηp2 = 0.053) on the level of acceptance of alert
J. Cybersecur. Priv. 2022, 2, x FOR PEER REVIEW 18 of 23
output, as indicated by the sum of participants’ scores for IU. These results are represented
visually in Figure 9.
Estimatedmarginal
Figure9.9.Estimated
Figure marginalmeans—intention
means—intentiontotouse
use(IU).
(IU).
The mixed ANOVA using α = 0.05 with a Greenhouse–Geisser correction was signifi-
cant when differentiated for Maximum Visual: F (1.455, 114.915) = 5.634, p = 0.010.
Table 11 represents the outcomes for parametric tests of between-subjects effects.
The mixed ANOVA using α = 0.05 with Bonferroni adjustment was significant:
(F (1, 79) = 4.111, p = 0.046.
In summary, the null hypothesis was rejected, as follows:
• Non-parametric: U = 863.5, p = 0.023
• Parametric:
# Within-subjects: (F (1.455, 114.915) = 5.634, p = 0.010, ηp2 = 0.067)
# Between-subjects: (F (1, 79) = 4.111, p = 0.046, ηp2 = 0.049)
As such, for RQ1: is there a difference in the level of acceptance of security alert output
between those with a preference for VAO and those with a preference for TAO, with VAO
and TAO generated via data science/machine learning methods, as predicted by the TAM?
the answer is yes.
Additional sub-questions were examined in this analysis. Specifically, the sub-questions
are stated as:
• SQ1: Does the adoption of VAO have a significant impact on the four individual TAM
components: PU, PEU, AU, and IU?
• SQ2: Does the adoption of TAO have a significant impact on the four individual TAM
components: PU, PEU, AU, and IU?
Outcomes indicate mixed results in answering the sub-questions. Table 12 states the
results of within-subjects effects per individual TAM components.
The within-subjects findings indicated that PU and PEU were not significantly influ-
enced by the adoption of VAO or TAO, while AU and IU were significantly influenced by
the adoption of VAO. Table 13 states the results of between-subjects effects per individual
TAM components.
J. Cybersecur. Priv. 2022, 2 397
Observed
TAM Factor Adjustment df F Sig. Partial Eta Squared
Power
PU Bonferroni 1 7.643 0.007 0.088 0.584
PEU Bonferroni 1 0.842 0.362 0.011 0.055
AU Bonferroni 1 4.566 0.036 0.055 0.343
IU Bonferroni 1 4.378 0.040 0.053 0.328
α = 0.0125.
The between-subjects findings indicate that PU was the only TAM component to be
significantly influenced by the adoption of VAO.
As a result, the answer to SQ1 is yes, in part:
• The TAM components PU and PEU were not significantly influenced by the adoption
of VAO within-subjects, while AU and IU were significantly influenced by the adoption
of VAO within-subjects.
• The TAM component PU was significantly influenced by the adoption of VAO
between-subjects.
The answer to SQ2 is universally no. No individual TAM component was significantly
influenced by TAO adoption, and TAO adoption trailed VAO in near totality.
3.14. Summary
The results indicate that there was a difference in acceptance as predicted by TAM. The
dependent variable, security analysts’ level of acceptance of security alert output, and the
two independent variables, Scenario and ML/DS-generated alert output (TAO and VAO),
were assessed with non-parametric and parametric methods. Both the Mann–Whitney U
test and the mixed ANOVA determined that there was a difference between the acceptance
of VAO and TAO in favor of VAO. The mixed ANOVA also demonstrated that two of the
TAM factors, AU and IU, were influenced by the adoption of VAO and TAO.
4. Discussion
4.1. Discussion of the Results
This study sought to determine if there is a difference between the adoption of VAO
and TAO generated via data science/machine learning methods as predicted by the TAM.
The related hypothesis tested for significant differences in the level of acceptance of alert
outputs between those preferring VAO in all scenarios and those preferring TAO in some or
all scenarios, as predicted by the TAM. The null hypothesis was rejected. A non-parametric
test, the Mann–Whitney test, indicated a significant difference in the level of acceptance
of output between those preferring visual alerts in all scenarios, and other preferences
(U = 863.5, p = 0.023). This result was repeated in the between-subjects element of a
mixed ANOVA, F (1, 79) = 4.111, p = 0.046, ηp2 = 0.049. The within-subjects element
of the mixed ANOVA, relating to different responses to each scenario, was also statisti-
cally significant, F (1.455, 114.915) = 5.634, p = 0.010, ηp2 = 0.067. These results indicate a
statistically significant difference in perception that favors VAO.
and usability. This research intentionally joined these tenets to improve security analysts’
experience with optimized alert output derived from ML/DS to address challenges of
scale, detection fidelity, and usability. This contribution to the body of knowledge enables
industry and academia to further refine security detection methods and products to reduce
risk and better protect organizations. Specific contributions follow, and are discussed
further in Section 4.4:
• Enables industry, service, and application providers to develop, deploy, and utilize
tools and capabilities that include visualizations for alert output.
• Indicates that the interface for security analysts working daily with such tools and
capabilities offers a favorable user experience that is rich in visual features.
• Clarifies that issues specific to this study’s problem statement can be rectified with
visual alert output derived from machine learning and data science intended to reduce
the burden on security analysts.
4.3. Limitations
This study’s results did not conform to expectations for normality, exhibiting a note-
worthy skew towards strongly agree, or a 7 on the Likert scale. Bias may have been
introduced in two distinct ways. First, TAM-based user experience (UX) studies are best
delivered using a left-to-right layout, where 1 = Extremely disagree and 7 = Extremely
agree [18]. Additionally, Lewis suggested that all questionnaire items have a positive tone
such that greater levels of agreement indicate a better user experience [18]. This could
explain why the normality histograms as seen in Figures 2–4 show such a strong skew to
the right (strongly agree). Second, the researcher may have introduced additional bias by
describing the VAO with a caveat stating that users who selected visual output would have
the ability to mouse over the graphical interface and interact with specific data points. No
such additional benefit or opportunity was discussed for users who preferred TAO.
Scenario 3 included a dynamic, animated visualization, where alert counts moved
through days of the month over a five-month period. The researcher asserts that this visual
was not met with positive perception and likely viewed as of low quality and difficult
to interpret as compared to the static visuals seen in Scenarios 1 and 2. Additionally,
the researcher did not randomize the scenarios as delivered to participants. As such, all
participants received the scenarios in the same order. Thus, order effects could explain
the decline in positive perception of Scenario 3 for participants. Order effects refer to
the phenomenon where different orders for the presentation of questions, or response
alternatives, may systematically influence respondents’ answers [28]. Scores may decrease
over time from fatigue, or increase due to learning, and order effects can interfere with
estimates of the effect of the treatment itself during analysis, a disadvantage of repeated
measures designs [29].
5. Conclusions
Organizations dealing with a high volume of security alert and event data, that are
also facing a high burden due to alert overload, should consider implementing features
and capabilities that incorporate visual alert output. These organizations risk data breach,
loss of valuable human resources, reputation, and revenue due to excessive security alert
volumes and a lack of fidelity in security event data. Visualization can benefit security
analysts faced with these burdens on behalf of their organizations. This quantitative, quasi-
experimental, explanatory study determined that security analysts perceive improved
usability of security alert output that is visualized rather than text-based. The related
hypothesis tested for significant differences in the level of acceptance of output between
those affirming a maximum visual preference (three out of three scenarios) and those
showing a preference for text in at least one scenario. The results determined that those
showing maximum visual preference had a significantly higher acceptance of alert output
(U = 863.5, p = 0.023). This finding was also supported by the main between-subjects effect
of a mixed ANOVA, F (1, 79) = 4.111, p = 0.046, ηp2 = 0.049. The ANOVA’s within-subjects
main effect (scenario) was also statistically significant, F (1.455, 114.915) = 5.634, p = 0.010,
ηp2 = 0.067. All supporting data are available with Supplementary Martials, including a
literature review. These findings represent an opportunity to enhance and enable higher-
order analysis, including detection development, tuning, and validation, as well as threat
hunting and improved investigations: cut the noise, hone the signal.
Supplementary Materials: The following supporting information can be downloaded at: https:
//github.com/holisticinfosec/Optimized-Alerts-Usability-Study (accessed on 24 May 2022).
Funding: This research received no external funding.
Institutional Review Board Statement: The study was conducted in accordance with the Declaration
of Helsinki, and approved by the Institutional Review Board of Capitol Technology University
(approved 28 September 2020).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Conflicts of Interest: The author declares no conflict of interest.
J. Cybersecur. Priv. 2022, 2 400
References
1. Khan, M. Security Analysts Are Overworked, Understaffed and Overwhelmed—Here’s How AI Can Help. Available online: https:
//securityintelligence.com/security-analysts-are-overworked-understaffed-and-overwhelmed-heres-how-ai-can-help (accessed
on 21 September 2020).
2. Cambridge Intelligence. Visualizing Cyber Security Threats. Available online: https://cambridge-intelligence.com (accessed on 1
May 2021).
3. Slatman, H. Unboxing Security Analytics: Towards Effective Data Driven Security Operations; Computer Science, University of Twente:
Enschede, The Netherlands, 2016.
4. Dimensional Research. 2020 State of SecOps and Automation. Available online: https://www.sumologic.com/brief/state-of-
secops (accessed on 5 May 2021).
5. Bartos, V.; Zadnik, M.; Habib, S.M.; Vasilomanolakis, E. Network entity characterization and attack prediction. Future Gener.
Comput. Syst. 2019, 97, 674–686. [CrossRef]
6. Sundaramurthy, S.C.; Bardas, A.G.; Case, J.; Ou, X.; Wesch, M.; McHugh, J.; Rajagopalan, S.R. A Human Capital Model for
Mitigating Security Analyst Burnout. In Proceedings of the Symposium on Usable Privacy and Security, Ottawa, CA, USA,
22–24 July 2015.
7. Paul, C.L.; Dykstra, J. Understanding operator fatigue, frustration, and cognitive workload in tactical cybersecurity operations. J.
Inf. Warf. 2017, 16, 1–11.
8. FireEye. The Numbers Game: How Many Alerts Is Too Many to Handle? Available online: https://www.fireeye.com/offers/rpt-
idc-numbers-game-special-report.html (accessed on 11 June 2020).
9. Seals, T. Less Than 1% of Severe/Critical Security Alerts Are Ever Investigated. Available online: https://www.infosecurity-
magazine.com/news/less-than-1-of-severe-critical (accessed on 12 July 2021).
10. Oltsik, J. The Problem with Collecting, Processing, and Analyzing More Security Data. Available online: https://www.esg-global.
com/blog/the-problem-with-collecting-processing-and-analyzing-more-security-data (accessed on 10 April 2021).
11. CriticalStart. The Impact of Security Alert Overload. Available online: https://www.criticalstart.com (accessed on 10 April 2021).
12. Kohgadai, A. Alert Fatigue: 31.9% of IT Security Professionals Ignore Alerts. Available online: https://www.skyhighnetworks.
com/cloud-security-blog/alert-fatigue-31-9-of-it-security-professionals-ignore-alerts (accessed on 10 April 2021).
13. Chickowski, E. Every Hour SOCs Run, 15 Minutes Are Wasted on False Positives. Available online: https://securityboulevard.
com/2019/09/every-hour-socs-run-15-minutes-are-wasted-on-false-positives (accessed on 2 September 2019).
14. Ponemon. 2020 Devo SOC Performance Report: A Tale of Two SOCs. Available online: https://www.devo.com (accessed on 8
February 2021).
15. Giacobe, N.A. Measuring the Effectiveness of Visual Analytics and Data Fusion Techniques on Situation Awareness in Cyber-Security;
Penn State University: State College, PA, USA, 2013.
16. Venkatesh, V.; Davis, D. A theoretical extension of the technology acceptance model: Four longitudinal field studies. Inf. Syst. Res.
2000, 46, 186–204. [CrossRef]
17. Davis, F.D. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Q. 1989, 13,
319–340. [CrossRef]
18. Lewis, J.R. Comparison of Four TAM Item Formats: Effect of Response Option Labels and Order. J. Usability Stud. 2019, 14,
224–236.
19. Deutskens, E.; De Ruyter, K.; Wetzels, M.; Oosterveld, P. Response rate and response quality of Internet-based surveys: An
experimental study. Mark. Lett. 2004, 15, 21–36. [CrossRef]
20. Lumen. Repeated-Measures ANOVA. Boundless Statistics. Available online: https://courses.lumenlearning.com/boundless-
statistics/chapter/repeated-measures-anova (accessed on 19 August 2021).
21. Lewis, J.R.; Utesch, B.S.; Maher, D.E. Measuring Perceived Usability: The SUS, UMUX-LITE, and AltUsability. Int. J. Hum. Comput.
Interact. 2015, 31, 496–505. [CrossRef]
22. Szajna, B. Software evaluation and choice: Predictive validation of the technology acceptance instrument. MIS Q. 1994, 18, 319.
[CrossRef]
23. Shahrabi, M.A.; Ahaninjan, A.; Nourbakhsh, H.; Ashlubolagh, M.A.; Abdolmaleki, J.; Mohamadi, M. Assessing psychometric
reliability and validity of Technology Acceptance Model (TAM) among faculty members at Shahid Beheshti University. Manag.
Sci. Lett. 2013, 3, 2295–2300. [CrossRef]
24. U.S. Bureau of Labor Statistics. Information Security Analysts: Occupational Outlook Handbook: U.S. Bureau of Labor
Statistics. Available online: https://www.bls.gov/ooh/computer-and-information-technology/information-security-analysts.
htm (accessed on 14 June 2019).
25. Barlett, J.E.; Kotrlik, J.W.; Higgins, C.C. Organizational research: Determining appropriate sample size in survey research. Inf.
Technol. Learn. Perform. J. 2001, 19, 43–50.
26. Hoskin, T. Parametric and Nonparametric: Demystifying the Terms. Available online: https://www.mayo.edu/research/
documents/parametric-and-nonparametric-demystifying-the-terms/doc-20408960 (accessed on 19 August 2021).
27. Lane, D.M. Online Statistics Education: A Multimedia Course of Study. Available online: https://onlinestatbook.com (accessed
on 19 August 2021).
J. Cybersecur. Priv. 2022, 2 401
28. Strack, F. Order Effects in Survey Research: Activation and Information Functions of Preceding Questions. In Context Effects in
Social and Psychological Research; Schwarz, N., Sudman, S., Eds.; Springer: New York, NY, USA, 1992; pp. 23–34.
29. Minitab Blog Editor. Repeated Measures Designs: Benefits, Challenges, and an ANOVA Example. Available online: https://blog.
minitab.com/en/adventures-in-statistics-2/repeated-measures-designs-benefits-challenges-and-an-anova-example (accessed
on 19 August 2021).
30. Ben-Asher, N.; Gonzalez, C. Effects of cyber security knowledge on attack detection. Comput. Hum. Behav. 2015, 48, 51–61.
[CrossRef]