0% found this document useful (0 votes)
9 views

Chapter 10 - Data at Scale

Data at Scale Human Computer interaction
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Chapter 10 - Data at Scale

Data at Scale Human Computer interaction
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Data at Scale

Sara Muneeb
[email protected]
Computer Science Department
Introduction
Data at scale (big data) describes
all kinds of data including
 databases of numbers
 images of people, things and places
 footage of conversations recorded
 Videos
 Texts
 environmentally sensed data.
Data at Scale
 Data at scale has huge potential for
grounding and elucidating problems

 Itcan be collected, used, and communicated


in a wide variety of ways
Big Data and Issues
However, beyond societal benefits, data can also be
used in potentially harmful ways

◦ Security Issues
Data collected by multiple organization contains sensitive
information which is always prone to being attacked by
hackers and leaked online.

◦ Privacy Risk
The collected information can be used for malicious activities,
and lead to elimination of freedom of anonymity.

◦ Unintentional Data Discrimination


Inaccurate big data may create an analytical biases, and
unintentionally discriminate individuals based on age, race,
gender and ethnicity.
Approaches for Collecting and
Analyzing Data
1. Scrapping and 2nd Source Data
2. Collecting Personal Data
3. Crowdsourcing Data
4. Sentiment Analysis
5. Social Network Analysis
6. Combining Multiple Source of
Data
Scrapping and 2nd Source Data
Scrapping
 One way to extract data is by “scraping” it
from the websites (assuming that this is
allowed by the application).

 Once the data is scraped, it can be entered


into a spreadsheet (human readable) for
study and analyzed using data science tools.
Scrapping and 2nd Source
Data
Second Source
The openly available big data
that Google and other companies
now provide for researchers to
mine offers a “second source”
methodology
Search terms,
Facebook posts,
Instagram comments, and so on
Analytical Tool: Google Trend
Scrapping and 2nd Source
Data
 Analysis of this data can
indirectly reveal new insights
about the users’ concerns,
desires, behaviors, and habits.

 Important is what is done with


the new available data?
Collecting Personal Data
Nowadays, many apps and wearable
devices exist that people can buy off
the shelf, which can collect all sorts of
personal data and visualize it.
For example quantify health, screen time, and
sleep.

Self-trackingis also increasingly being


used by people who have a condition
or disease as a form of self-care.
Collecting Personal Data
Quantified-self projects generate
lots of data.
Crowdsourcing Data
People crowdsource information or
work together using online
technologies to collect and share data.

Crowd Research, where many


researchers from all over the world
come together to work on large
problems.
For example climate science
Crowdsourcing Data
Conducting research on a massive
scale enables potentially hundreds or
thousands of people to work on a
single project.
Examples: iSpotNature, eBird, iNaturalist and the
Zooniverse.

Crowd projects raise a number of


issues as to who owns and manages
the data.
Sentiment Analysis
Sentiment analysis is a technique that
is used to infer the effect of what a
group of people or a crowd is feeling or
saying.

The phrases that people use when


offering their opinions or views are
scored as being negative, positive, or
neutral.
For Example
anger, sadness, or fear (negative feelings)
happiness, joy, or enthusiasm (positive feelings).
Sentiment Analysis
Sources
◦ Tweets
◦ Text
◦ Online reviews
◦ Social media
◦ Facial expressions
Tools
◦ DisplayR
◦ CrowdFlower
◦ MonkeyLearn
Sentiment Analysis
Sentiment analysis is commonly used
by marketing and advertising
companies to decide on what types of
ads to design and place.

Sentiment analysis as a technique is


not an exact science and should be
viewed more as a heuristic than as an
objective evaluation method.
Social Network Analysis
 Social network analysis (SNA) is a
method based on social network theory
for analyzing and evaluating the
strength of social ties within a network.

 Used to understand the relationships


that form among people and groups
within and across different social media
platforms, and with offline social
networks too.
Social Network Analysis
Sources
1. Weibo
2. Tencent
3. Baidu
4. Facebook
5. Twitter
6. Instagram
7. YouTube.
Social Network Analysis
Two main entities make up a social network.
1. Nodes, which are also sometimes called
entities or vertices, represent people and
topics.

2. Edges are the connections between the


nodes, which are also known as links or
ties. The edges show the connections
among nodes, for example, the
members of a family.
Directional and Nondirectional Edges
Combining Multiple Sources of
Data
Collecting data from multiple sources
by combining automatic sensing and
subjective reporting.

The goal is to obtain a more


comprehensive picture about a
domain
such as a population’s mental health
Combining Multiple Sources of
Data
Example: Studentlife (Harari et al.,
2017)

Student’s activity, sleep, and attendance levels against


deadlines during a term Source: StudentLife Study
Visualizing and Exploring
Data
 Visualization
include being able to see it and
understand the way that it is represented
and its context (data should be meaningful)
1. What kind of data is it?
2. What is the data about?
3. Why was it collected?
4. Why was it analyzed and represented in a
particular way?

 The skills needed to understand and


interpret visualizations are referred to as
visual literacy.
Visualizing and Exploring Data

A simplified path for data to be


meaningful
Visualizing and Exploring
Data
 Thegoal of data visualization tools is to
amplify human cognition so that users
can see patterns, trends, correlations,
and anomalies in the data that lead
them to gain new insights and make
new discoveries

 Datavisualization tools can help users


change and manipulate variables to see
what happens
“overview first, zoom and filter, and then details on
demand.” Ben Shneiderman (1996)
Example

A market map of the S&P 500, which is a financial index for


stocks. Green indicates stocks that increased in value, and
red indicates stocks that decreased in value that day.
Visualizing and Exploring Data -
Dashboard
The dashboard is an interactive panel of
control widgets that contains
Sliders
Checkboxes
Radio buttons,
and coordinated multiple window
displays of different kinds of graphical
representations
such as bar and line graphs, heat maps, tree
maps, infographics, word clouds, scatterplots, and
other kinds of visualizations.
Example

A dashboard that was created to show changes in sales


information.
Ethical Design Concerns
By “ethics,” this is usually taken
to mean “the standards of
conduct that distinguish between
right and wrong, good and bad,
and so on”

Privacy by design is a way to


avoid collecting excessive data
that might be sensitive but not
needed
Data Ethics Principles
1. Fairness refers to impartial and just treatment or
behavior without favoritism or discrimination

2. Accountability refers to whether an intelligent or


automated system that uses AI algorithms can
explain its decisions in ways that enable people to
believe they are accurate and correct.

3. Transparency refers to the extent to which a system


makes its decisions visible and how they were derived

4. Explainability refers to a growing expectation in HCI


and AI that systems, especially those that collect data
and make decisions about people, provide
explanations that laypeople can understand.
Conclusion
Introduction
Approaches to collecting and
analyzing data
Visualizing and exploring data
Ethical design concerns

You might also like