0% found this document useful (0 votes)
47 views8 pages

Week 4 PSOSM - NPTEL

The document discusses key concepts related to privacy and security in online social media, including different forms of privacy, risks of public self-disclosure, re-identification, personally identifiable information (PII), and k-anonymity. It also covers advantages of NumPy arrays over Python lists, methods for creating NumPy arrays, and random number generation in NumPy. The content is structured as a series of questions and answers, providing insights into both privacy issues and programming techniques.

Uploaded by

mrdankberozgar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views8 pages

Week 4 PSOSM - NPTEL

The document discusses key concepts related to privacy and security in online social media, including different forms of privacy, risks of public self-disclosure, re-identification, personally identifiable information (PII), and k-anonymity. It also covers advantages of NumPy arrays over Python lists, methods for creating NumPy arrays, and random number generation in NumPy. The content is structured as a series of questions and answers, providing insights into both privacy issues and programming techniques.

Uploaded by

mrdankberozgar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Week 4 PSOSM

Got it! Please provide the first 5 questions from your assignment on "Privacy and Security in Online Social Media," and I'll provide you with detailed
explanations and insights on the concepts behind each one.

Question 1: What are the different forms of Privacy?

Accepted Answer:
Information

Communication
Territorial
Bodily

Explanation:

The four main forms of privacy are essential to understanding how personal information is protected in both digital and real-world contexts. Here's a
breakdown:

1. Information Privacy:
Refers to the protection of personal data or information. This involves ensuring that individuals' private data (e.g., browsing history,
financial records) is not exposed, misused, or accessed without consent.
Related Concepts: Data protection laws like GDPR (General Data Protection Regulation) are focused on information privacy, ensuring users
have control over their personal data.
2. Communication Privacy:
Involves the security and confidentiality of communication, such as phone calls, emails, or social media messages. Communication privacy
ensures that conversations remain private and protected from surveillance or unauthorized interception.
Related Concepts: Encryption technologies like SSL/TLS or end-to-end encryption in apps (e.g., WhatsApp) are used to secure
communications.
3. Territorial Privacy:
Refers to privacy in relation to physical space or location. This can include privacy in one's home, office, or even public spaces, protecting
individuals from unwanted physical intrusion or surveillance.
Related Concepts: Surveillance systems, geolocation tracking, and issues related to “smart” homes are relevant to territorial privacy.
4. Bodily Privacy:

Concerns the protection of one's body and personal physical attributes. This can involve biometric data protection (e.g., fingerprints, facial
recognition) and protecting one's personal health data.
Related Concepts: Issues with biometric security systems (e.g., facial recognition at airports), healthcare data privacy (HIPAA in the U.S.).

These forms of privacy are interlinked, and a violation in one can often lead to implications in the others. For example, a data breach (information
privacy) could expose private communications or location information, leading to broader concerns about communication and territorial privacy.

Question 2: "Please Rob Me" highlighted issues specifically stemming from:

Accepted Answer:

Public Self-Disclosure of Information on Social Media by users

Explanation:

"Please Rob Me" was a website that showcased how people publicized their location on social media, making them easy targets for burglars. The
website took advantage of location check-ins (e.g., on Twitter and Foursquare) and demonstrated that by publicly revealing where someone was (e.g.,
posting they are on vacation), it could give others the information needed to rob their home.

Public Self-Disclosure of Information:

People often share excessive personal information on social media, such as their current location, travel plans, or activities, without
realizing the potential risks. This kind of information can be used to track their routines, habits, or identify gaps in security.
Related Concepts: Social engineering tactics, location tracking, and the importance of controlling privacy settings on social media
platforms.

This incident highlighted the need for awareness around oversharing and the dangers of posting real-time personal data online.

Question 3: Using an unidentified source of information to extract an individual’s sensitive information is called:

Accepted Answer:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/8
Re-identification

Explanation:

Re-identification is the process of matching anonymous or de-identified data with an individual, allowing their sensitive information to be uncovered.
This can happen when seemingly anonymous data is cross-referenced with other datasets, leading to the identification of individuals.

Re-identification:

This is commonly used in data analytics, where data is anonymized for privacy reasons but is later re-identified through linking it with other
datasets that contain identifiable information.

Related Concepts: De-identification, pseudonymization, and data anonymization are all strategies to protect privacy. However, re-
identification is a significant risk when such strategies fail.

Re-identification attacks highlight why even anonymized datasets can sometimes reveal sensitive details about individuals, making data privacy a
crucial concern in research and data sharing.

Question 4: Considering each attribute in isolation, which of the following are examples of Personally Identifiable
Information (PII):

Accepted Answer:
Aadhaar Number

College Student ID

Explanation:

Personally Identifiable Information (PII) refers to any data that can be used to identify a specific individual. This includes direct identifiers like names
or identification numbers.

Aadhaar Number:

In India, the Aadhaar number is a unique identification number assigned to each resident. Since it is tied directly to an individual's identity,
it is classified as PII.

College Student ID:

Similar to Aadhaar, a student ID is unique to an individual in a specific institution and can identify them. Thus, it also qualifies as PII.

However, other attributes like race and age can be considered PII only when combined with other identifiers (like name or address). Alone, they might
not be directly used to identify a person.

Related Concepts: PII vs. Non-PII. For example, general demographic information (age, race) can become PII when linked with other identifiers.
Laws such as the GDPR specifically regulate PII and its usage.

Question 5: What is the purpose of k-anonymity as discussed in the lecture?

Accepted Answer:

Protect PIIs of re-identified individuals

Anonymise data without losing utility

Explanation:

k-anonymity is a concept in data privacy aimed at protecting individuals’ identities when sharing data for research, analysis, or any other purposes.
The primary goal is to ensure that any given individual in a dataset cannot be uniquely identified.

Protecting PIIs of re-identified individuals:

The purpose of k-anonymity is to ensure that data anonymization techniques do not expose personally identifiable information (PII). By
ensuring that at least "k" individuals share the same anonymized attributes, k-anonymity prevents attackers from identifying individuals
based on the dataset.

Anonymize data without losing utility:

K-anonymity strikes a balance between data privacy and the usability of data. The anonymized data must still be useful for analysis, despite
the protection of individual identities. The challenge is to modify the dataset in such a way that it meets both criteria.

Related Concepts:

k-anonymity is part of a broader family of privacy models like l-diversity and t-closeness, which aim to ensure data utility while
maintaining privacy.
k-anonymity can be compromised if attackers possess background knowledge or access to additional datasets, which is why newer
models are often used alongside k-anonymity.

Summary of Concepts:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/8
Privacy Forms: Information, Communication, Territorial, Bodily.

Self-Disclosure on Social Media: Sharing personal data can lead to security risks.

Re-identification: Process of identifying an individual from anonymized data.


Personally Identifiable Information (PII): Information that can identify an individual.

K-anonymity: Technique to anonymize data while maintaining utility and privacy.

This framework of concepts should help you understand the topics and answer similar questions in the future.

Question 1: Latanya Sweeney’s experiments aimed to:

Accepted Answer:

Re-identify individuals from Match.com using their Facebook profiles

Identification of individuals walking on a campus with data from Facebook

Predict social security number from publicly available data

Explanation:

Latanya Sweeney, a well-known privacy researcher, conducted several studies that explored the risks of re-identifying individuals from anonymized
data. These experiments highlighted how even seemingly anonymous datasets can be used to identify individuals through external information,
especially when combined with other publicly available data.

1. Re-identify individuals from Match.com using their Facebook profiles:

This experiment involved taking anonymized data from Match.com (such as zip codes, birth dates) and using Facebook profiles to identify
individuals. By cross-referencing data points like location and age, Sweeney showed that anonymized datasets can be re-identified.

2. Identification of individuals walking on a campus with data from Facebook:

Using data like check-ins, timestamps, and location information from Facebook, Sweeney demonstrated how researchers or even malicious
actors could track and identify individuals on a campus. The combination of location data and social media activity can compromise
personal privacy.

3. Predict social security number from publicly available data:

This experiment highlighted how a combination of publicly available information, such as name, date of birth, and other basic identifiers,
can lead to the prediction or re-identification of sensitive data like Social Security numbers. This shows the vulnerability of personal data
even when seemingly separated from direct identifiers.

Related Concepts:

Anonymization: The process of removing identifying information from data, which, as Sweeney's experiments show, can be undone through
data mining techniques.

Re-identification attacks: These are the risks associated with matching anonymized data to identifiable individuals, often by correlating with
external datasets.

Question 2: What are the advantages of NumPy Arrays over Lists in Python?

Accepted Answer:

NumPy arrays are faster and more compact than Python lists.
NumPy uses much less memory to store data and it provides a mechanism of specifying the data types.

This allows the code to be optimized even further.

Explanation:

NumPy is a powerful numerical computing library in Python that offers significant advantages over standard Python lists, especially when dealing with
large datasets or performing numerical computations.

1. Faster and more compact:

NumPy arrays are implemented in C and optimized for numerical operations, which makes them significantly faster than Python lists. Lists
in Python are general-purpose containers that hold objects, which can lead to overhead for numerical operations.
2. Memory efficiency:

NumPy arrays store data more efficiently by using contiguous memory blocks, whereas Python lists are arrays of pointers to objects, which
require more memory. This makes NumPy arrays more space-efficient, especially when handling large datasets.
3. Data type specification:
NumPy arrays allow you to specify the type of data they store (e.g., integers, floats). This helps optimize memory usage and speeds up
computations by eliminating the overhead of type-checking that occurs in Python lists, which can store mixed data types.
4. Optimization:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/8
Because NumPy arrays are more compact and type-specific, they enable better optimization during mathematical and scientific
computations. This includes vectorized operations that can be applied element-wise, avoiding Python's slower loops.

Related Concepts:

Pandas: Another powerful library in Python used for data manipulation and analysis. While NumPy focuses on numerical arrays, Pandas is better
suited for handling structured data (like data frames).

Memory management in Python: When dealing with large amounts of data, using NumPy arrays over Python lists can significantly reduce
memory usage and improve performance.

Question 3: Write Python code to convert the list [1, 2, 3, 4] to a NumPy array.

Accepted Answer:

python

`import numpy as np
arr = np.array([1, 2, 3, 4])
`

Explanation:

To convert a list to a NumPy array, you use the `np.array()` function from the NumPy library. This function takes a Python list as input and converts it
to a NumPy array.

Correct Answer:

python

`import numpy as np
arr = np.array([1, 2, 3, 4])
`

This is the correct way to convert a list into a NumPy array. The `np.array()` function takes a list as an argument and returns a NumPy array.

Related Concepts:

NumPy array creation: There are other methods to create NumPy arrays, such as `np.zeros()`, `np.ones()`, and `np.arange()` for generating
arrays with specific patterns or values.
Array dimensions: The resulting array will have a single dimension (1D) in this case, but you can create multidimensional arrays using nested
lists or specifying the shape explicitly.

Question 4: What is the output of the following code?

python

`import numpy as np
a = np.array([range(i, i + 3) for i in [1, 2, 3]])
print(a)
`

Accepted Answer:

python

`[[1 2 3]
[2 3 4]
[3 4 5]]
`

Explanation:

This code snippet uses a list comprehension combined with the `range()` function to create a 2D NumPy array. Let's break it down:

1. `range(i, i+3)` creates a sequence of numbers starting from `i` to `i+2`. For example, when `i = 1`, `range(1, 4)` gives `[1, 2, 3]`.
2. The list comprehension `[range(i, i+3) for i in [1, 2, 3]]` iterates over the values `[1, 2, 3]`, creating the following list of ranges:
For `i = 1`, the range is `[1, 2, 3]`

For `i = 2`, the range is `[2, 3, 4]`


For `i = 3`, the range is `[3, 4, 5]`

This results in the array:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/8
lua

`[[1 2 3]
[2 3 4]
[3 4 5]]
`

Related Concepts:

List comprehensions: A concise way to generate lists in Python. This approach can be applied to create NumPy arrays as well.
NumPy 2D arrays: The resulting array is a 2D NumPy array. In this case, it is a 3x3 matrix.

Question 5: Which of the following codes can be used to make a NumPy array of 10 random numbers?

Accepted Answer:

python

`import numpy as np
random_array = np.random.rand(10)
`

python

`import numpy as np
random_array = np.random.random(10)
`

Explanation:

1. Using `np.random.rand(10)`:
`np.random.rand(10)`generates an array of 10 random numbers sampled from a uniform distribution over [0, 1). This method is
straightforward and widely used when you need random numbers.

2. Using `np.random.random(10)`:
`np.random.random(10)` is similar to `np.random.rand(10)`, as it also generates 10 random numbers between 0 and 1. The main difference is
that `random` is a submodule of `np.random`.

3. Using a for loop and `random.random()`:


This method first creates an empty list, then fills it with random values generated by Python's built-in `random` module. Although this
approach works, it's slower than using NumPy's built-in functions, which are optimized for performance.

Related Concepts:

Random number generation in NumPy: `np.random` provides a suite of functions for generating random numbers from various distributions
(e.g., `np.random.randn()` for normal distribution).
Efficiency: Using `np.random` is much more efficient than using Python's `random` module, especially when dealing with large arrays or complex
numerical simulations.

Summary of Concepts:

1. Privacy Experiments: Re-identification, privacy risks in social media.


2. NumPy Arrays: Advantages over lists include speed, memory efficiency, and type control.

3. NumPy Array Creation: Convert lists to arrays using `np.array()`.


4. List Comprehensions and 2D Arrays: Using list comprehension to create multi-dimensional arrays in NumPy.
5. Random Number Generation: Efficient ways to generate random numbers using NumPy.

These notes cover the key concepts behind the answers, helping you understand both the specific question and related topics for better application in
future questions.

Question 1: Which of the following NumPy functions can be used to generate an NxN diagonal matrix?

Options:
1. np.eye(5)
2. np.diagonal(np.eye(5))

3. np.diagonal(5)
4. None of the above

Correct Answer:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/8
np.eye(5)

Concept:

Diagonal Matrix: A matrix in which the entries outside the main diagonal are all zero. In other words, a square matrix where only the diagonal
elements can have values, and all other elements are zero.

NumPy's `np.eye()` function: This function is used to create a 2D array with ones on the diagonal and zeros elsewhere. The argument `5`
indicates the size of the matrix, making it a 5x5 matrix. Thus, `np.eye(5)` generates a 5x5 identity matrix (a type of diagonal matrix where the
diagonal elements are ones).
`np.diagonal()`:
This function is used to get the diagonal elements of a matrix, not to create a diagonal matrix. It returns the diagonal
elements from an array, which isn't useful for creating an NxN diagonal matrix.

Explanation for Similar Topics:

Identity Matrix: The output of `np.eye()` is an identity matrix, which is a specific type of diagonal matrix where all diagonal elements are 1.

Other ways to create diagonal matrices: You can also use `np.diag()` to create a diagonal matrix from a given array of values. For example,
`np.diag([1, 2, 3])` will create a 3x3 diagonal matrix with values 1, 2, and 3 on the diagonal.

Question 2: Which of the following Python code can be used to read a CSV file using Pandas?

Options:
1. import pandas as pd data = pd.read_excel('file.csv')

2. import pandas as pd data = pandas.readcsv('file.csv')


3. import pandas as pd data = pd.read_csv('file.csv')
4. None of the above.

Correct Answer:

import pandas as pd
data = pd.read_csv('file.csv')

Concept:

Pandas `read_csv()` function: This function is used to read CSV (Comma Separated Values) files into a Pandas DataFrame. A CSV file is a text file
where each line represents a data record, and each record is separated by commas.

Example: `data = pd.read_csv('file.csv')` reads the CSV file 'file.csv' and returns a DataFrame where each column corresponds to a column in
the CSV file.
Common Mistakes:

`read_excel()` is for reading Excel files, not CSV. It's used with `.xls` or `.xlsx` file extensions, not `.csv`.

`pandas.readcsv()` is incorrect because Pandas uses `pd.read_csv()`, not `readcsv()`. Method names in Python are case-sensitive, so this
will result in an error.

Explanation for Similar Topics:

`read_excel()`: This method is used when working with Excel files. You can use it to read `.xls` and `.xlsx` file formats into a DataFrame.

Other I/O functions in Pandas:

`read_json()`: To read JSON files.

`read_html()`: To parse HTML files and extract tables into DataFrames.


`read_sql()`: To read from SQL databases into a DataFrame.

Question 3: Consider the below structure of a dataframe. Which of the following Python snippets can be used to plot a
histogram of the 'amount' column?

Options:

1. df['amount'].plot.hist()
2. df[df.columns[4]].plot.hist()

3. import matplotlib.pyplot as plt


plt.hist(df['amount'])
plt.xlabel('Amount')
plt.ylabel('Frequency')

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/8
plt.title('Histogram of Amount')
plt.show()
4. None of the Above.

Correct Answer:

df['amount'].plot.hist()
df[df.columns[4]].plot.hist()

import matplotlib.pyplot as plt


plt.hist(df['amount'])
plt.xlabel('Amount')
plt.ylabel('Frequency')
plt.title('Histogram of Amount')
plt.show()

Concept:

Histogram: A histogram is a graphical representation of the distribution of numerical data. It shows the frequency of data points in specified
ranges (bins).

Pandas Plotting: Pandas provides built-in support for basic plotting using `plot()` and `plot.hist()` for histograms. For instance,
`df['amount'].plot.hist()` will plot a histogram of the `amount` column directly from the DataFrame.

Matplotlib: Matplotlib is a powerful plotting library in Python. The function `plt.hist(df['amount'])` from Matplotlib can also be used to plot a
histogram of the `amount` column.

Explanation for Similar Topics:

Other types of plots in Pandas:

`df['amount'].plot.line()`: To plot a line graph.


`df['amount'].plot.bar()`: To plot a bar graph.

Customization with Matplotlib: You can customize the histogram using additional Matplotlib features like `bins`, `color`, `alpha`, etc.

Question 4: What is the output of the following code?

Code:

python

`pd.Series(['A', 'B', 'C'], index=[1, 2, 3])


`

Correct Answer:

1A
2B
3C

Concept:

Pandas Series: A Series is one-dimensional labeled array capable of holding any data type (integers, strings, etc.). The labels are called the index,
and in the example above, `1, 2, 3` are the indices, while `A, B, C` are the values.
The `pd.Series()` function creates a new Series where the first argument is the data (a list of strings), and the second argument is the index (a
list of integers). The result is a Series where the indices correspond to the values.

Explanation for Similar Topics:

Series vs. DataFrame: A Series is essentially a single column, while a DataFrame is a two-dimensional table. You can think of a DataFrame as a
collection of Series that share the same index.

Custom Indexing: You can specify custom indices while creating a Series, and Pandas will respect those custom labels for accessing the data.

Question 5: Select the correct option for plotting the below function using Matplotlib.

Options:

1. plt.plot(np.cos(x))
plt.show()

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/8
2. plt.scatter(x, np.sin(x))
plt.show()

3. plt.scatter(np.cos(x), x)
plt.show()

4. None of the above.

Correct Answer:

plt.plot(np.cos(x))
plt.show()

Concept:

Matplotlib: Matplotlib is a popular plotting library used to create static, animated, and interactive visualizations in Python. The `plt.plot()`
function is used to create line plots.

`np.cos(x)`: This generates the cosine of the values in `x`. A plot of `np.cos(x)` will display a curve representing the cosine function.

Line Plot vs. Scatter Plot: `plt.plot()` is used for line plots, while `plt.scatter()` is used for creating scatter plots, where individual points are
plotted. In this case, the question is asking for a line plot of the cosine function, so `plt.plot(np.cos(x))` is the correct choice.

Explanation for Similar Topics:

Other types of plots in Matplotlib:

Line Plot: `plt.plot(x, y)` where `x` and `y` are arrays of data.
Scatter Plot: `plt.scatter(x, y)` where `x` and `y` represent the coordinates of the points.

Bar Plot: `plt.bar(x, height)` where `x` is the position of bars and `height` is the value represented by the bars.

These explanations give a clear understanding of the concepts, and the reasoning behind why each answer is correct. Let me know if you'd like to go
through more questions!

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/8

You might also like