0% found this document useful (0 votes)
165 views2 pages

Assignment Index Format Guidelines

The document provides instructions for 14 data science assignments involving tasks like data analysis, data visualization, and Python programming. Some of the key tasks mentioned include analyzing a dataset of salaries and tenures to find average salaries for different experience levels, generating random passwords, character frequency analysis of text files, plotting different graphs like scatter plots, bar charts, histograms using sample data provided.

Uploaded by

Chirantan Sahoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views2 pages

Assignment Index Format Guidelines

The document provides instructions for 14 data science assignments involving tasks like data analysis, data visualization, and Python programming. Some of the key tasks mentioned include analyzing a dataset of salaries and tenures to find average salaries for different experience levels, generating random passwords, character frequency analysis of text files, plotting different graphs like scatter plots, bar charts, histograms using sample data provided.

Uploaded by

Chirantan Sahoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Centre for Data Science

Institute of Technical Education & Research, SOA, Deemed to be University

Introduction to Data Science using Python(CSE 3054)


M INOR A SSIGNMENT-1

1. An anonymous dataset containing each user’s salary (in dollars) and tenure as a data scientist (in
years) is given.

salaries and tenures = [(83000, 8.7), (88000, 8.1), (48000, 0.7),


(76000, 6), (69000, 6.5), (76000, 7.5), (60000, 2.5), (83000, 10),
(48000, 1.9), (63000, 4.2)]

Find out the average salary for each tenure and print a massage according to its value, i.e. ”less than
two”, ”between two and five” and ”more than five” tenure and group together the salaries correspond-
ing to each bucket. Compute the average salary for each group.

2. For the above data there seems to be a correspondence between years of experience and paid accounts
Users with very few and very many years of experience tend to pay; users with average amounts of
experience don’t. Find out the condition for this correspondence and print it.

3. Write a Python Script to generate random passwords (alphanumeric). Ask users to enter the length of
password and number of passwords they want to generate and then save all the generated passwords
as a textfile named “[Link]”.

4. Given a file named “[Link]” containing several lines/paragraph, find all unique characters (ignore
space, comma, full stop, brackets, and quotes etc.) present in the file. Capital and small letter are
counted as same.
Find the frequency (fi) of all characters in the file and print the output as follows.
The character “a” is present times in the document.
The character “t” is present times in the document.

5. Use the above program as a function and use it to write another function to compare contents of two
files “[Link]” and “[Link]”.

a. The output must also give the following information.


File MyText1 contain more (or less or equal) characters than MyText2.
b. The output must be printed in the following format depending on content of the file.
File MyText1 contain more (or less or equal)unique characters than MyText2.
c. The frequency of each characters must be summarised.
The frequency of character of character “x” in file MyText1 ismore (or less or equal)to
characters than MyText2.
d. The relative frequency of each characters also must be summarised.
The relative frequency of character of character “x” in file MyText1 ismore (or less or
equal)to characters than MyText2.

The input files should be nonempty.

6. Read a lists named StringList1 containing strings from the key board. Generate a string MStringList1
that contains all items of StringList1 that are repeated twice or more number of times and print this
list. By observing the outcome of MStringList1 perform the following tasks:

1
Centre for Data Science
Institute of Technical Education & Research, SOA, Deemed to be University

a. Check wather an item of MStringList1 occurs even number of times or odd number of times in
StringList1.
b. Remove the ith (i ≥ 2) occurrence of a given word in a StringList1.

7. From the file ”[Link]” count frequencies of various alphabets (Convert upper case into lower
case), plot the results for this as a bar chart with x-axis being the letter and y-axis as the corresponding
frequency.

8. Use the following data to plot the number of applicant per year as a scatter plot.

year = [2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012]
no application per year = [921261, 929198, 1043739, 1186454,
1194938, 1304495, 1356805, 1282000, 479651]

9. Plot xsinx, x2 sinx , x3 sinx and x4 sinx in a single plot in the range x ∈ [−10, 10].

10. Plot histogram for age of male and female in different plots for the following data of male and female
age.

male age = [53,51,71,31,33,39,52,27,54,30,64,26,21,54,52,20,59,32]


female age = [53,65,68,21,75,46,24,63,61,24,49,41,39,40,25,54,42,
32,48,23,23]

11. Plot the temperature extremes in certain region of India for each month, starting in January, which are
given by (in degrees Celsius).

max: 17, 19, 21, 28, 33, 38, 37, 37, 31, 23, 19, 18
min: -62, -59, -56, -46, -32, -18, -9, -13, -25, -46, -52, -58

12. Python Program to find all Numbers in a Range (given by user) which are Perfect Squares and Sum
of all Digits in the Number is Less than 10.

13. Plot a bar chart with axis labels for given data:

mentions = [500, 505]


years = [2017, 2018]

Do not give any extra condition for x-axis as well as y-axis. Now again plot the bar chart for this data
and start y-axis from 0.
State the difference in both the bar chart.

14. Plot the scatter plot for following data with unequal axis and then equal axis. Also state the difference
in two.

test 1 grades = [ 99, 90, 85, 97, 80]


test 2 grades = [100, 85, 60, 90, 70]

Common questions

Powered by AI

First, compile items that appear more than once into a new list, maintaining frequency information. Check if the occurrence count for each item in this list is odd or even. To modify the list, ensure only the second or further occurrence (i.e., i ≥2) of an item is removed without affecting the first occurrence. This requires tracking occurrences using list indexing or a dictionary to maintain the order.

To compare frequencies, write a program that reads both files, convert characters to lowercase, and count character occurrences while ignoring spaces and punctuation. Compare absolute counts to establish which file has more or fewer occurrences for each character. For relative frequency, divide character counts by the total number of characters in each file to compare proportions. Summarize results as 'more', 'less', or 'equal' for both frequency types.

Using unequal axis scales in scatter plots can distort the relationship between test grades, making differences appear more significant than they are. Equal scales ensure accurate plot representation, providing a realistic view of grade correlation. Distorted axes might suggest false trends or outperformances, while equal axes represent actual data relationships, aiding proper interpretation.

When plotting a bar chart, starting the y-axis from a value greater than 0 can exaggerate differences as bars don't linearly represent relative sizes; this may mislead viewers about the magnitude of changes. Starting the y-axis from 0 provides a true scale for comparing data visually, ensuring an accurate representation of proportions and differences without distortion.

To determine the average salary for different experience groups, the dataset should be grouped into three categories based on tenure: 'less than two' years, 'between two and five' years, and 'more than five' years. For each group, filter the salaries and compute the average by summing the salaries within each group and dividing by the number of entries in the group.

According to the data, users with very few (less than two years) and very many (more than five years) years of experience tend to pay for accounts, suggesting that those starting their careers and those with considerable experience value or need paid resources. It appears there is a perceived benefit or necessity for paid accounts at these career stages.

Iterate through the user-defined range and calculate perfect squares. For each perfect square, compute the sum of its digits. Check if the digit sum is less than 10, and if so, include the number in the result list. Utilize mathematical operations and control structures within a loop to efficiently implement this logic.

Scatter plots, with application years on the x-axis and number of applications on the y-axis, show the trend of application numbers over time. This visual format helps identify growth, decline, or variability in applications, with each point representing a specific year's data. Patterns can be recognized as clusters or lines, indicating trends and anomalies.

To generate and store random passwords, use Python's 'random' and 'string' libraries to create alphanumeric passwords. Prompt users for password length and count, generate accordingly, and write each password to a file named 'MyPasswords.txt' using file handling operations to persist the data.

To visualize alphabet frequency, first process the text file to convert all letters to lowercase and count each letter's occurrence while ignoring other characters. Using matplotlib or a similar library, plot these frequencies on a bar chart with letters on the x-axis and their corresponding frequencies on the y-axis, highlighting the distribution visually.

You might also like