Course Code NASSCOM Future Skills - Associative Data Analyst Course Type: LTP
NAS1001 Job Role: SSC/Q2101 Credit:4
Pre-requisite NIL
Course Objectives:
1. To establish clearly the objectives and scope of the predictive analysis
2. Use R programming language to identify suitable data sources to agree the
methodological approach
3. Validate and review data accurately and identify anomalies
4. To appreciate the current trends in data analysis procedure
5. Carry out rule-based analysis of the data in line with the analysis plan
6. Apply statistical models to perform Regression Analysis, Clustering and
Classification
7. Present the results and inferences from your analysis using R tool
8. To improve document management and team work
Expected Course Outcome:
Students will be able to:
1. Understand R with Business Intelligence, Business Analytics, Data and Information
2. Contextually integrate and correlate information automatically to gain faster
insights
3. Implement statistical analysis techniques for solving practical problems.
4. Graphically interpret data and Find a meaningful pattern in data
5. Perform statistical analysis on variety of data.
Student Learning Outcomes (SLO) 1,2,5,9,12
[1] Having an ability to apply mathematics and science in engineering applications
[2] Having a clear understanding of the subject related concepts and of contemporary issues
[5] Having design thinking capability
[9] Having problem solving ability- solving social issues and engineering problems
[12] Having adaptive thinking and adaptability
Unit :1 Introduction to Analytics and R programming 9 hours
Introduction to R, R Studio (GUI): R Windows Environment, introduction to various data types,
Numeric, Character, date, data frame, array, matrix etc., Working with datasets and files: Reading
Datasets, Working with different file types .txt,.csv , R studio, Files, Datasets, Extracting Datasets,
Preparing datasets. Data Cleaning, Data imputation, Data conversion
Unit :2 Summarizing Data & Revisiting Probability 9 hours
Introduction to statistical learning and R-Programming- Outliers, Combining Datasets in R,
Functions and loops. Summary Statistics - Summarizing data with R.
Unit:3 Document Creation and Knowledge Sharing: 9 hours
Access existing documents, language standards, templates and documentation tools from their
organization’s knowledge base. Confirm the content and structure of the documents with
appropriate people, Create documents using standard templates and agreed language standards.
Review documents with appropriate people and incorporate their inputs.
Unit:4 Self and work Management 9 hours
Establish and agree their work requirements with appropriate people - Keep their immediate
work area clean and tidy - utilize their time effectively - Use resources correctly and efficiently -
Treat confidential information correctly - Work in line with organization’s policies and procedures
- Work within the limits of their job role
Unit:5 Team Work and Communication 9 hours
Communicate with colleagues clearly, concisely and accurately - Work with colleagues to
integrate their work effectively with them - Pass on essential information to colleagues in line
with organizational requirements - Work in ways that show respect for colleagues - carry out
commitments they have made to colleagues - Let colleagues know in good time if they cannot
carry out their commitments, explaining the reasons - Identify any problems they have working
with colleagues and take the initiative to solve these problems
Total Lecture hours: 45 hours
Mode of Teaching and Learning:
Flipped Class Room, Activity Based Teaching/Learning, Digital/Computer based models, wherever
possible to augment lecture for practice/tutorial and minimum 2 hours lectures by industry experts
on contemporary topics
Mode of Evaluation and assessment:
The assessment and evaluation components may consist of unannounced open book examinations,
quizzes, student’s portfolio generation and assessment, and any other innovative assessment
practices followed by faculty, in addition to the Continuous Assessment Tests and Term End
Examinations.
Text Book(s)
1. Trevor Hastie and Rob Tibshirani, “An Introduction to Statistical Learning with Applications in
R”, Springer, 2017.
2. Mark van der Loo, Edwin de Jonge, “Learning R Studio for R Statistical Computing”, Packt
Publishing, 2012.
3. Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. “Mining of Massive Datasets”.
Cambridge University Press. 2014.
Reference Books
1. Hadley Wickham and Garrett Grolemund, “R for Data Science: Import, Tidy, Transform,
Visualize, and Model Data”, O’Reilly, 2017.
2. Grolemund, Garrett. “Hands-on programming with R”, O’ Reilly Media, Inc., 2014.
3. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze, “Introduction to Information
Retrieval”, Cambridge University Press, First South Asian Edition, 2008.
4. Trevor Hastie, Robert Tibshirani, Jerome Friedman, “The Elements of Statistical Learning”,
Springer, Second Edition, 2011.
5. https://www.sscnasscom.com/qualification-pack/SSC/Q2101/.
List of Challenging Experiments (Indicative) SLO: 1,2,5,9,12
1. Understanding of R System and installation and configuration of R-
Environment and R-Studio, Understanding R Packages, their installation
and management
2. Understanding of nuts and bolts of R:
a. R program Structure
b. R Data Type, Command Syntax and Control Structures
c. File Operations in R
3. Excel and R integration with R connector.
4. Preparing Data in R
a. Data Cleaning
b. Data imputation
c. Data conversion
5. Outliers detection using R
6. Correlation and Regression Analysis in R
7. Clustering Algorithms implementation using R
8. Classification Algorithm implementation using R
Classification (Spam/Not spam)
9. Case study on Stock Market Analysis and applications. Stock data can be
obtained from Yahoo! Finance, Google Finance. A team of students can
apply statisctical modelling on the stock data to uncover hidden patterns.
R provides tools for moving averages, auto regression and time-series
analysis which forms the crux of financial applications.
10. Detect credit card fraudulent transactions - The dataset can be obtained
from Kaggle. The team will use a variety of machine learning algorithms
that will be able to discern fraudulent from non-fraudulent one.
Total Laboratory Hours 30 hours
Recommended by Board of Studies 24.06.2020
Approved by Academic Council Date