PSNA COLLEGE OF ENGINEERING & TECHNOLOGY, DINDIGUL -624622.
(An Autonomous Institution, Affiliated to Anna University Chennai)
Department of Computer Science and Engineering
Academic Year 2024 – 2025 (ODD Semester)
<Assignment – I> SET: I Course Code : OCS353 Course Name : Data science Fundamentals Degree : B.E Programme : CSE Semester & Sec : VII & B Type of Assignment : <Common > Max. Marks : 40 Faculty In-charge : Mrs.P.Anitha Christy Angelin Date : 28.10.2024 Course Objectives CO 1 Gain knowledge on data science process. CO 2 Perform data manipulation functions using Numpy and Pandas. CO 3 Understand different types of machine learning approaches. CO 4 Perform data visualization using tools. CO 5 Handle large volumes of data in practical scenarios. Answer All questions. ( 4*10=40 Marks) Q.No. Questions Mar CO BL k You are a data analyst intern at a healthcare facility that has been experiencing a rise in patient wait times, leading to lower patient satisfaction scores. The management team has asked you to analyze hospital operational data to determine the factors contributing to these long wait times. The data includes patient appointment schedules, staff availability, department capacities, and patient feedback. 1. Define the Problem : Clearly articulate the problem the healthcare facility faces due to increasing patient wait times and its potential impact on patient satisfaction and operational efficiency. 2. Data Collection and Preparation : 1 Identify the types of data you would need to analyze to understand the 15 1 3 causes of long patient wait times and describe how you would prepare this data for analysis, including any data cleaning steps. 3. Model Selection : Discuss at least two analytical methods you might use to identify the root causes of increased wait times. Justify your choices based on the data available. 4. Evaluation : What metrics would you use to assess the effectiveness of your analysis and the solutions implemented? Explain why these metrics are significant for the healthcare facility. You are analyzing the monthly rainfall (in mm) in three regions for the past year: Region A: [120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230] Region B: [50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160] Region C: [30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140] Perform the following operations using NumPy array functions: 2 1. Create NumPy Arrays 15 2 3 2. Create a NumPy array for each region’s monthly rainfall data. 3. Find the Month with Maximum Rainfall 4. Determine and print the month with the highest rainfall for each region. 5. Calculate the average monthly difference in rainfall between Region A and Region B and print the result. You are a real estate agent trying to predict the sale price of houses in a neighbor hood based on various features, such as the size of the house, number of bedrooms, location, and age of the property. You have 4 10 3 3 historical data on houses sold in the past with their respective features and prices. Explain how you will build a supervised learning model for predicting house prices.