Data Visualization Final
Data Visualization Final
Student Name
Institution
Department
Course
Module
Lecturer
Submission date
Data Visualization
Knowledge Building
Data visualization is a term that refers to the methods that are used to transmit content
visually or information by storing it as graphic elements (Tschandl et al., 2018). The topic
selected is; “Prevalence of Mental Disorders and Substance use Disorders”. With data
visualization, the selected dataset shall be analyzed and provide more information on how
psychological disorders prevail in various parts of the world. The visual charts creates will
also work to provide further information on the nature of mental conditions for the last three
decades. The dataset is made up of 9 columns and 6840 rows. The nine columns include
entity which holds the country name or region where the data is collected from, year column,
and the rest of the columns hold prevalence percentage for specific mental illnesses. These
columns are schizophrenia, alcohol, drug use, anxiety, depression, bipolar, eating disorders.
From a glimpse of the dataset it is evident that depression as well as anxiety is the most
Datasets are often created for specific research or practical purposes, and can be
obtained from a variety of sources such as government agencies, research institutions, and
online databases (Vieira et al., 2018). In some cases, a researcher or organization may create
their own dataset by collecting data through surveys, experiments, or other methods. There
are many possible sources for datasets on mental disorders. Some examples include:
Government agencies such as the Centers for Disease Control and Prevention (CDC)
or the World Health Organization (WHO) may collect data on the prevalence and
Research institutions and universities may conduct studies on mental disorders and
Online databases such as Kaggle or the Open Science Framework may host datasets
My choice of using this particular dataset on mental disorders was largely informed
by the kind of visualization topic chosen and other variety of factors, including the research
question being addressed, the availability of the data, and the suitability of the data for the
intended analysis (Mirman, 2017). It is important to carefully consider the limitations and
biases of any dataset, and to properly cite the source of the data in any published work.
There are a number of observations discovered from visualization of the dataset used.
Firstly, the dataset contains records from all over the world; country wise, per continent as
well as per region. For fast visualization, there was need to filter the data into various
continents, regions and some countries. For instance a filtration of England shows that
bipolar disorder has been on the increase for the last three decades, as seen in figure 1 below.
anxiety though anxiety seems to have emerged towards the end of the last decade. From the
visualization also it is clear that depression factor of Bipolar disorder in England is reducing
over the years. Figure 2 below shows bipolar disorder prevalence in England with other
As shown in figure 3, the trend lines shows a steady growth in bipolar cases in
England. This visualization is informed by the fact that majority of people who are diagnosed
with bipolar disorder also suffer from an anxiety issue. These include post-traumatic stress
disorder (PTSD), generalized anxiety disorder (GAD), panic disorder, as well as social
phobia (Lee and Yoon, 2017). Anxiety and depression, either on their own or in conjunction
with another mental health condition, have been linked to a heightened likelihood of suicidal
From the above visualizations, it is clear that depression levels were high at the
beginning of the study. Also notable is that there is a slight increase in depression
immediately after 2010. This could be the cause of inflation and economic meltdown of the
year 2008.
Theoretical Framework
ASSERT Framework
The ASSERT model is comprised of the following six components: Ask a question to
looking for evidence. Organize this information so that it can provide a response to the query,
Data Visualization
Imagine other ways to respond to the question using the information that is currently
accessible. Finally, after you have represented the data in a relevant visualization for the
purpose of answering the question, Use these words to tell a story with some meaning (Rees
and Laramee, 2019). The diagram below shows various levels of the assert framework
utilized.
Ask
The question asked for the above visualization is: “What is the relationship of bipolar
disorder to anxiety and depression disorders?” The question provides a clear information of
what specific data are to be searched in the internet or the dataset being analyzed.
Search
The dataset was obtained from Our Word in Data website, https://ourworldindata.org/.
This dataset fully corresponds to the type of visualization envisioned for this task.
Structure
Filtered the data per country to analyze data from England only.
Data Visualization
Envision
I researched on the internet the previous visualizations related to mental disorders
especially in England and how they were visualized the kind of questions answered.
Represent
Use of R Studio to design visual scatterplot charts to determine the relationship
Tell
Explained the results visualized conclusively.
Grammar Graphics
Data: The data for this visualization consists of a table with nine columns: "Entity", “Year”,
The rows represent the population in percentage for individuals living with the psychological
Aesthetics: The variable "Anxiety" is mapped to the color aesthetic, so each anxiety level is
represented with a different color. The variable "Year" is mapped to the x-axis position, so
the populations are plotted at different points along the x-axis depending on the year. The
variable "Bipolar" is mapped to the y-axis position, so the height of each point on the plot
Geometry: The geometry used in this visualization is points, with each point representing the
Scales: The y-axis uses a linear scale, with the minimum value set to 0 and the maximum
value determined by the maximum bipolar prevalence in the data. The x-axis uses an ordinal
Coordinate systems: The visualization uses a Cartesian coordinate system, with the x-axis
Annotations: The title and axis labels provide additional context for the visualization.
Data Visualization
Accessibility
Accessibility in visualization refers to the design and use of visualizations in a way
that is inclusive and usable for people with a wide range of abilities and disabilities. This
includes considerations such as visual acuity, color perception, and cognitive abilities, as well
as factors such as cultural and linguistic diversity (Linderman et al., 2019). By considering
these factors, visualizations can be made more accessible and usable for a wider audience.
These visualizations are designed to represent data in a way that is easy to understand
and interact with, even for users who may have visual impairments, hearing impairments,
cognitive impairments, or motor impairments. The figure below shows a chart designed with
There are a number of different approaches that can be taken when creating
accessibility dataset visualizations. For example, designers can use high-contrast colors, large
font sizes, and clear labels to make the visualization easier to read for people with visual
impairments (Kraak and Ormeling, 2020). They can also provide audio descriptions of the
Additionally, they can simplify the visualization by using simple shapes, patterns, and colors
to represent the data, making it easier to understand for users with cognitive impairments.
Data Visualization
Finally, they can make the visualization touch-based, allowing users with motor impairments
There are many benefits to using accessibility dataset visualizations. By making data
more accessible and understandable, these visualizations can help to promote more informed
decision-making and better understanding of complex concepts. They can also help to break
down barriers to information and ensure that all users, regardless of their abilities, have equal
and understandable for all users. By designing these visualizations with accessibility in mind,
we can help to ensure that everyone has the opportunity to fully engage with and understand
Visual clarity: Visualizations should be designed to be clear and easy to read, using
Color accessibility: Colors should be chosen and used in a way that is legible and
of the data, such as text descriptions or data tables, to enable users with visual
Usability: Visualizations should be easy to use and navigate, with clear labels and
data more accessible and understandable for a wider range of users. This can be particularly
important for users who may rely on visualizations to understand and interpret data, such as
researchers, analysts, and decision-makers (Goldman et al., 2019). The visuals created
majorly relied on the use of colors to showcase the various trends required. I utilized high-
Data Visualization
contrast colors, large font sizes, and clear labels to make the visualization easier to read for
people with visual impairments. I also simplified visualization by using simple shapes,
patterns, and colors to represent the data, making it easier to understand for users with
cognitive impairments.
Visualization Choice
A scatterplot is a type of data visualization that uses points to represent the values of
two different variables. It is often used to show the relationship between two variables, such
as the relationship between age and income (Murray, 2017). The choice of a scatterplot as the
visualization method can be justified based on the goal of the visualization. Since the goal is
to show the relationship between two variables, a scatterplot was effective choice because it
allows the viewer to see the distribution of the data and how the two variables are correlated.
Scatterplots are particularly useful for identifying patterns and trends in the data, such as a
lines or regression lines to help show the strength and direction of the relationship between
the two variables (Sullivan et al., 2017). This can be useful for making predictions or for
identifying outliers in the data. The scatterplot is a flexible and effective visualization method
for showing the relationship between two variables, making it a good choice for many data
analysis tasks.
There are several alternative visualization methods that can be used to show the
relationship between two variables, depending on the characteristics and goals of the data
1. Line plots: Line plots show the relationship between two variables by connecting
data points with a line. They can be useful for showing trends over time or for
comparing multiple groups. However, line graphs can be difficult to compare when
there are multiple lines on the same graph, as the viewer has to mentally combine the
Data Visualization
lines to compare the trends (Luo et al., 2018). This can be especially challenging
when the lines are closely spaced or have different scales. Additionally, line graphs do
not show the distribution of the data, so it can be difficult to see the underlying
2. Bar charts: Bar charts can be used to compare the relationship between two variables
by displaying one variable as the x-axis and the other as the y-axis. This can be useful
for comparing the distribution of the data or for showing the relationship between a
categorical variable and a continuous variable. The downsides of bar chart for this
particular visualization is that it has limited ability to show large amounts of data.
When there are many data points, it can be difficult to fit all of the bars on a single
chart without making the chart cluttered or hard to read. Bar charts are also typically
used to compare the distribution of a categorical variable, so they are not well-suited
for showing trends over time or the relationship between two continuous variables.
3. Heat maps: Heat maps use color to show the relationship between two variables by
encoding the values of one variable as the x-axis and the values of the other variable
as the y-axis. They can be useful for showing patterns and trends in the data,
particularly when there are many data points. However, heat maps have limited ability
to show individual data points: Heat maps use color to encode the values of two
variables, so it can be difficult to see the individual data points or to determine precise
values from the map. Scatterplots, on the other hand, use individual points to
represent the data, making it easier to see and interpret the individual data points.
4. Bubble charts: Bubble charts are similar to scatterplots, but they use the size of the
points to represent a third variable. This can be useful for showing the relationship
between three variables or for adding an additional level of detail to the visualization.
Bubble charts can be difficult to interpret accurately, particularly when the data has a
Data Visualization
wide range of values or when the bubbles are densely packed. This can make it
difficult to determine precise values from the chart. Additionally, bubble charts can be
as the size of the bubbles. This can make them harder to interpret and less suitable for
certain audiences.
The choice of visualization method will depend on the goals of the data analysis, the
characteristics of the data, and the preferences of the audience (Liu et al., 2018). It may be
helpful to experiment with different visualization methods to find the one that best
Ethical Implications
The dataset used as well as the topic chosen is related to health information of people
around the world (Martin, 2020). Using visualizations to represent health data can have
important ethical implications, as the way that the data is represented can significantly affect
how it is perceived and understood by the viewer (Pu and Kay, 2020). It is important to
consider the ethical implications of using visualizations in health datasets and to take steps to
ensure that the visualizations are used in a responsible and transparent manner (LaRossa and
Bennett, 2018). Some potential ethical considerations when using visualizations in health
datasets include:
personal or identifying information about the individuals in the dataset, as this could
represent the data, and to avoid using techniques that might mislead the viewer.
carefully designed and used. It is important to consider the potential impacts of the
Data Visualization
visualizations on different groups of people and to ensure that they do not reinforce
harmful stereotypes.
using their health data in visualizations, and to ensure that they understand how the
Visualizations can be a powerful tool for communicating health data and insights, but
they can also be used to misinform the public or arrive at inaccurate conclusions if they are
not designed or used correctly (Nolte et al., 2018). For instance, use of selective presentation
of data: Visualizations can be used to present only a portion of the health data, or to exclude
certain data points, in order to support a particular viewpoint or conclusion. This can give a
misleading or incomplete picture of the data and can lead to inaccurate conclusions about the
Some analysts could also use misleading scales in their visualization. The choice of
scale on an axis can significantly affect the appearance of the data and the conclusions that
are drawn from it (Sedrakyan et al., 2019). For example, using a large scale can make small
differences appear larger than they are, which could lead to overstating the importance of a
particular treatment or risk factor. The way that data is encoded in a visualization, such as the
use of color or the position of data points, can affect the conclusions that are drawn from it
(Cao, 2017). For example, using a particular color to represent a certain group or condition
can create unconscious biases in the viewer and lead to inaccurate conclusions.
The titles and labels on a visualization can shape the viewer's interpretation of the data (Qin
et al., 2020). Using misleading or biased titles or labels can lead to inaccurate conclusions
about the significance of the data or the implications for public health.
Data Visualization
Proposals
The visualizations can be improved by making them clearer and easier to read, such
They can also be improved by adding context or background information that helps
Theses visualizations can be made more engaging and interactive by adding elements
These visualizations can be made more accessible and inclusive by adding alternative
Ensuring accuracy and transparency by ensuring that they accurately and fairly
represent the data, and by being transparent about the methods and sources used to
References
Cao, L. (2017). Data science: a comprehensive overview. ACM Computing Surveys
(CSUR), 50(3), 1-42.
Goldman, M., Craft, B., Hastie, M., Repečka, K., McDade, F., Kamath, A., ... & Haussler, D.
(2019). The UCSC Xena platform for public and private cancer genomics data
Press.
LaRossa, R., & Bennett, L. A. (2018). Ethical dilemmas in qualitative family research. In The
Lee, C. H., & Yoon, H. J. (2017). Medical big data: promise and challenges. Kidney research
Linderman, G. C., Rachh, M., Hoskins, J. G., Steinerberger, S., & Kluger, Y. (2019). Fast
Liu, J., Tang, T., Wang, W., Xu, B., Kong, X., & Xia, F. (2018). A survey of scholarly data
Luo, Y., Qin, X., Tang, N., & Li, G. (2018, April). Deepeye: Towards automatic data
Martin, K. E. (2020). Ethical issues in the big data industry. In Strategic Information
Mirman, D. (2017). Growth curve analysis and visualization using R. Chapman and
Hall/CRC.
Murray, S. (2017). Interactive data visualization for the web: an introduction to designing
Nolte, H., MacVicar, T. D., Tellkamp, F., & Krüger, M. (2018). Instant clue: a software suite
for interactive data visualization and analysis. Scientific reports, 8(1), 1-8.
Pu, X., & Kay, M. (2020, April). A probabilistic grammar of graphics. In Proceedings of the
Qin, X., Luo, Y., Tang, N., & Li, G. (2020). Making data visualization more efficient and
Rees, D., & Laramee, R. S. (2019, February). A survey of information visualization books. In
Sedrakyan, G., Mannens, E., & Verbert, K. (2019). Guiding the choice of learning dashboard
Sullivan, B. L., Phillips, T., Dayer, A. A., Wood, C. L., Farnsworth, A., Iliff, M. J., ... &
Kelling, S. (2017). Using open access observational data for conservation action: A
Tschandl, P., Rosendahl, C., & Kittler, H. (2018). The HAM10000 dataset, a large collection
data, 5(1), 1-9.
Vieira, C., Parsons, P., & Byrd, V. (2018). Visual learning analytics of educational data: A
systematic literature review and research agenda. Computers & Education, 122, 119-
135.