0% found this document useful (0 votes)
65 views

Unit I

Mtech 1-1 syllabus JNTU GV

Uploaded by

MA Dhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Unit I

Mtech 1-1 syllabus JNTU GV

Uploaded by

MA Dhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

I Year - I Semester ESSENTIAL FOR MACHINE L T P C

M.Tech LEARNING(MTDS1101) 3 0 0 3

Course Objectives:
 Provide you with knowledge of various tools of machine learning are having a rich
mathematical theory.
 This will help the student‟s to develop new algorithms of machine/deep learning using
python, it is necessary to have knowledge of all such mathematical concepts.
 This course will focus on topics from matrix algebra, calculus, optimization, and
probability theory those are having strong linkage with machine learning.
 Applications of these topics will be introduced in ML with help of some real- life
examples.
Course Outcomes:
After the completion of the course, student will be able to:
 Explain how to implement matrix algebra, calculus, optimization, and probability theory
which can be further will be used in machine learning algorithms.
 Understand the key concepts machine learning are having a rich mathematical theory.

UNIT-I:
CALCULUS
Calculus: Derivatives and Integrals, Introduction to Derivatives, Mathematical Definition of
Derivatives, Derivatives of Linear and Nonlinear Functions, Derivative Rules, Partial
Derivatives and Gradients, Integrals and the Area Under the Curve, Example, Riemann Sum,
Mathematical Definition,
Hands-On Project: Gradient Descent- Cost function, Derivative of the Cost Function,
Implementing Gradient Descent.

UNIT-II:
STATISTICS AND PROBABILITY
Introduction, Descriptive Statistics: Mean, Variance and Standard Deviation, Covariance and
Correlation, Covariance Matrix, random Variables, Random Variable: Definitions and Notation,
Discrete and Continuous Random Variables, Probability Distributions: Probability Mass
Functions, Probability Density Functions, Joint Probability, Marginal Probability, Conditional
Probability, Expectation and Variance of Random Variables: Cumulative Distribution
Functions, Expectation and Variance of Random Variables,
Hands-On Project: The Central Limit Theore m.

UNIT-III:
LINEAR ALGEBRA
Scalars and Vectors: What are Vectors? Geometric and Coordinate Vectors, Vector Spaces,
Special Vectors, Operations and Manipulations on Vectors: Scalar Multiplication, Vector
Addition, Transposition, Norms: Definitions, Common Vector Norms, Norm Representations,
The Dot Product: Definition, Geometric interpretation: Projections, Properties,
Hands-on Project: Regularization.
UNIT-IV:
MATRICES AND TENSORS
Introduction: Matrix Notation, Shapes, Indexing, Main Diagonal, Tensors, Frobenius Norm,
Operations and Manipulations on Matrices: Addition and Scalar Multiplication, Transposition,
Matrix Product: Matrices with Vectors, Matrices Product. Transpose of a Matrix Product,
Special Matrices: Square Matrices, Diagonal Matrices, Identity Matrices, Inverse Matrices,
Orthogonal Matrices, Symmetric Matrices, And Triangular Matrices.
Hands-on Project: Image Classifier.
Systems of Linear Equations
System of linear equations: Row Picture, Column Picture, Number of Solutions, Representation
of Linear Equations With Matrices, System Shape: Overdetermined Systems of Equations,
Underdetermined Systems of Equations, Projections: Solving Systems of Equations, Projections
to Approximate Unsolvable Systems, Projections Onto a Line, Projections Onto a Plane.
Hands-on Project: Linear Regression Using Least Squares Approximation

UNIT-V:
EIGENVECTORS, EIGENVALUES, AND EIGENDECOMPOSITION
Eigenvectors and Eigenvalues, Change of Basis, Linear Combinations of the Basis Vectors, The
Change of Basis Matrix, Example: Changing the Basis of a Vector, Linear Transformations in
Different Bases: Transformations, Transformation Matrix in Another Basis, Interpretation,
Eigen decomposition: First Step: Change of Basis,
Eigenvectors and Eigenvalues, Diagonalization, Eigen decomposition of Symmetric Matrices
Hands-On Project: Principal Component Analysis

SINGULAR VALUE DECOMPOSITION


Non-square Matrices: Different Input and Output Spaces, Specifying the Bases, Expression of
the SVD: Notation, Singular Vectors and Singular Values, Finding the Singular Vectors and the
Singular Values, Summary, Geometry of the SVD: Two-Dimensional Example, Comparison
with Eigen decomposition, Three-Dimensional Example, Summary, Low-Rank Matrix
Approximation: Full SVD, Thin SVD and Truncated SVD, Decomposition into Rank One
Matrices.
Hands-On Project: Image Compression

TEXT BOOKS
1. Hadrien Jean, Essential Math for Data Science: Take Control of Your Data with Fundamental
Calculus, Linear Algebra, Probability, and Statistics. Haliotis Publishng , 2020.
2. Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong., Mathematics for Machine
Learning, Cambridge University Press, 2020.
3. Gilbert Strang, Linear Algebra and Learning from Data. Wellesley Publishers, 2019.
4. Otto Bretscher, Linear Algebra with Applications (Fifth Edition). Pearson Education, 2013.
I Year - I Semester DATA SCIENCE METHODOLOGIES L T P C
M.Tech AND PROGRAMMING ( MTDS1102) 3 0 0 3

Course Objectives:
 Provide you with the knowledge and expertise to become a proficient data scientist. 
 Demonstrate an understanding of statistics and machine learning concepts that are vital
for data science.
 Produce Python code to statistically analyze a dataset. 
 Critically evaluate data visualizations based on their design and use for communicating
stories from data.

Course Outcomes:
After the completion of the course, student will be able to
 Explain how data is collected, managed and stored for data science. 
 Understand the key concepts in data science, including their real- world applications and
the toolkit used by data scientists.
 Implement data collection and management scripts using Python Pandas. 

UNIT I:
PYTHON Basics and Programming Concepts: Introducing Python, Types and Operations -
Numbers, Strings, Lists, Tuples, Dictionaries, Files, Numeric Types, Dynamic Typing;
Statements and Syntax - Assignments, Expressions, Statements, Loops, iterations,
comprehensions; Functions - Function Basics, Scopes, Arguments, Advanced Functions;
Modules - Module Coding Basics, Module Packages, Advanced Module Topics; Classes and
OOP - Class, Operator Overloading, Class Designing; Exceptions and Tools - Exception Basics,
Exception Coding Details, Exception Objects, Designing With Exceptions, Parallel System
Tools

UNIT II:
GUI Programming: Graphical User Interface - Python gui development options, Adding
Widgets, GUI Coding Techniques, Customizing Widgets; Internet Programming - Network
Scripting, Client-Side scripting, Pymailgui client, server-side scripting, Pymailcgi server; Tools
and Techniques - databases and persistence, data structures, text and language, python/c
integration

UNIT III:
Pandas and NumPy: Numpy Basics - Fast Element wise array functions, Multidimensional
Array, Data Processing using arrays, file i/o with arrays; Pandas - Data Structures, Essential
Functionality, Summarizing and Computing Descriptive Statistics, Handling Missing Data,
Hierarchical Indexing

UNIT IV:
Data Preprocessing: Data Loading, Storage, and File Formats - Reading and Writing data in
text format, binary data formats, interacting with html and web apis, interacting with databases;
Data Wrangling: Clean, Transform, Merge, Reshape - Combining and Merging Data Sets,
Reshaping and Pivoting, Data Transformation, String Manipulation; Data Aggregation and
Group Operations – Group by Mechanics, Data Aggregation, Group by Operations and
Transformations, Pivot Tables and Cross- Tabulation
UNIT V:
Data Visualization: A Brief matplotlib API Primer, Plotting Functions in pandas, Time Series,
Financial and Economic Data Applications

TEXT BOOKS:

1. Learning Python, 5th Edition, MarkLutz, OReilly, 2013.


2. Python Programming: A Modern Approach, VamsiKurama, Pearson.,1/e
3. Programming Python, 4th Edition, MarkLutz, OReilly, 2010.
4. Python For Data Analysis, 2nd Edition, WesMckinney, O Reilly, 2017.

REFERENCE BOOKS:

1. Python: The Complete Reference, 1st Edition, Martin C. Brown, McGraw Hill Education,
2018.
2. Head First Python, 2nd Edition, Paul Barry, O′Reilly, 2016.
I Year - I Semester DATA MINING L T P C
M.Tech (MTDS1103) 3 0 0 3

Course Objectives:
1. Students will be enabled to understand and implement classical models and algorithms in
data mining.
2. They will learn how to analyze the data, identify the problems, and choose the relevant
models and algorithms to apply.
3. They will further be able to assess the strengths and weaknesses of various methods and
algorithms and to analyze their behavior
Course Outcomes:
After the completion of the course, student will be able to:
1. Compare types of data, quality of data, suitable measures required to perform data analysis.
(UNIT-I)
2. Choose appropriate classification technique to perform classification, model building and
evaluation (UNIT-II)
3. Make use of association rule mining techniques on categorical and continuous data (UNIT
III)
4. Identify and apply clustering algorithm (with open source tools), interpret, evaluate and
report the result (UNIT IV)
5. Analyze and Compare anomaly detection techniques (UNI-V)

UNIT I:
Introduction to Data mining, types of Data, Data Quality, Data Processing, Measures of
Similarity and Dissimilarity, Exploring Data: Data Set, Summary Statistics, Visualization,
OLAP and multi-dimensional data analysis.

UNIT II:
Classification: Basic Concepts, Decision Trees and model evaluation: General approach for
solving a classification problem, Decision Tree induction, Model over fitting: due to presence of
noise, due to lack of representation samples, Evaluating the performance of classifier. Nearest
Neighborhood classifier, Bayesian Classifier, Support vector Machines: Linear SVM, Separable
and Non Separable case.

UNIT III:
Association Analysis: Problem Definition, Frequent Item-set generation, rule generation,
compact representation of frequent item sets, FP-Growth Algorithms. Handling Categorical,
Continuous attributes, Concept hierarchy, Sequential, Sub graph patterns

UNIT IV:
Clustering: Over view, K-means, Agglomerative Hierarchical clustering, DBSCAN, Cluster
evaluation: overview, Unsupervised Cluster Evaluation using cohesion and separation, using
proximity matrix, Scalable Clustering algorithm

UNIT V:
Anomaly Detection: Characteristics of Anomaly Detection Problems and Methods, Statistical
Approaches, Proximity-based Approaches, Clustering-based Approaches and Reconstruction-
based Approaches
TEXT BOOKS:
1. Introduction to Data Mining: Pang-Ning Tan; Michael Steinbach; Anuj Karpatne; Vipin
Kumar, 2nd edition.
2. Data Mining: The Textbook, Charu C. Aggarwal , Springer, May 2015

REFERENCE BOOKS:

1. Fundamentals of data warehouses, 2nd Edition,Jarke, Lenzerini, Vassiliou, Vassiliadis,


Springer.
2. Data Mining, Concepts and Techniques, 2nd edition, Jiawei Han, Micheline Kamber, Elsevier,
2006.
3. Data Mining: Practical Machine Learning Tools and Tec hniques, 2nd Edition, Ian H. Witten,
Eibe Frank, Elsevier, 2005

Suggested NPTEL Course and other Useful Websites:


1. https://nptel.ac.in/courses/106105174/
2. http://cse20-iiith.vlabs.ac.in/
I Year - I Semester ARTIFICIAL INTELLIGENCE L T P C
M.Tech (MTDS1103) 3 0 0 3

Course Objectives:
 Gain a historical perspective of Artificial Intelligence (AI) and its foundations.
 Become familiar with basic principles of AI toward problem solving, inference,
perception, knowledge representation, and learning.
 Investigate applications of AI techniques in intelligent agents, expert systems, artificial
neural networks and other machine learning models.
 Experience AI development tools such as an „AI language‟, expert system shell, and/or
data mining tool. Experiment with a machine learning model for simulation and analysis.
 Explore the current scope, potential, limitations, and implications of intelligent systems.

Course Outcomes:
At the end of the course, student will be able to
 Demonstrate knowledge of the building blocks of AI as presented in terms of intelligent
agents.
 Analyze and formalize the problem as a state space, graph, design heuristics and select
amongst different search or game based techniques to solve them.
 Develop intelligent algorithms for constraint satisfaction problems and also design
intelligent systems for Game Playing.
 Attain the capability to represent various real life problem domains using logic based
techniques and use this to perform inference or planning.
 Solve problems with uncertain information using Bayesian approaches. 

UNIT-I:
Introduction to artificial intelligence: Introduction , history, intelligent systems, foundations
of AI, applications, tic-tac-tie game playing, development of AI languages, current trends in
AI, Problem solving: state-space search and control strategies: Introduction, general
problem solving, characteristics of problem, exhaustive searches, heuristic search techniques,
iterative-deepening a*, constraint satisfaction

UNIT-II:
Proble m reduction and game playing: Introduction, problem reduction, game playing,
alpha-beta pruning, two-player perfect information games, Logic concepts: Introduction,
propositional calculus, proportional logic, natural deduction system, axiomatic system,
semantic tableau system in proportional logic, resolution refutation in proportional logic,
predicate logic

UNIT-III:
Knowledge representation: Introduction, approaches to knowledge representation,
knowledge representation using semantic network, extended semantic networks for KR,
knowledge representation using frames, advanced knowledge representation techniques:
Introduction, conceptual dependency theory, script structure, cyc theory, case grammars,
semantic web.

UNIT-IV:
Uncertainty measure: probability theory: Introduction, probability theory, Bayesian belief
networks, certainty factor theory, dempster-shafer theory
UNIT-V:
Fuzzy sets and fuzzy logic: Introduction, fuzzy sets, fuzzy set operations, types of
membership functions, multi valued logic, fuzzy logic, linguistic variables and hedges, fuzzy
propositions, inference rules for fuzzy propositions, fuzzy systems.

TEXT BOOKS:
1. Artificial intelligence, A modern Approach, 2nd edition, Stuart Russel, Peter Norvig,
Prentice Hall
2. Artificial Intelligence, Saroj Kaushik, 1 st Edition, CENGAGE Learning, 2011.

REFERENCE BOOKS:
1. Artificial intelligence, structures and Strategies for Complex problem solving, 5 th Edition,
George F Lugar, PEA
2. Introduction to Artificial Intelligence, Ertel, Wolf Gang, Springer, 2017
3. Artificial Intelligence, A new Synthesis, 1st Edition, Nils J Nilsson, Elsevier, 1998
4. Artificial Intelligence- 3rd Edition, Rich, Kevin Knight, Shiv Shankar B Nair, TMH
5. Introduction To Artificial Intelligence And Expert Systems, 1st Edition, Patterson, Pearson
India, 2015
I Year - I Semester PREDICTIVE ANALYTICS L T P C
M.Tech (MTDS1103) 3 0 0 3

Course Objectives:

Students develop skills in predictive analytics that will allow them to:
 Develop and use advanced predictive analytics methods;
 Develop expertise in the use of popular tools and software for predictive analytics;
 Learn how to develop predictive analytics questions, identify and select the most
appropriate predictive analytics methods and tools, apply these methods to answer the
respective questions and presenting data-driven solutions.

Course Outcomes:

After completing this course, you will be able to:


 Design effective experiments and analyze the results.
 Use resampling methods to make clear and bulletproof statistical arguments without
invoking esoteric notation.
 Explain and apply a core set of classification methods of increasing complexity
(rules, trees, random forests), and associated optimization methods (gradient descent
and variants).
 Explain and apply a set of unsupervised learning concepts and methods.

UNIT-1
INTRODUCTION TO MACHINE LEARNING
Introduction to analytics and machine learning, why machine learning? , framework for
developing machine learning models, why python? , python stack for data science, getting
started with anaconda platform
Introductions to python: declining variables, conditional statement, generating sequence
numbers, control flow statements, functions, working with collections: list, tuples,set,
dictionaries; functional programming: map , filter ; modules and packages, other features .

UNIT-2
DESCRIPTIVE ANALYTICS
Working with data frames in python: IPL Dataset Description using DataFrame in python ,
Loading Dataset into pandas data frame ,Displaying first few records of the Dataframe ,
Finding summary of the dataframe , slicing and indexing of dataframe, Values counts and cross
tabulations , sorting data frames by column value, creating new columns ,Growing and
Aggregating, Joining dataframes , Renaming columns , Applying operations to multiple
columns, Filtering records based on conditions, Removing a column or a row from a dataset ;
Handling missing values,
Exploring of Data using visualization: - Drawing plots, bar chart, histogram, Distribution or
density plot, box plot, comparing distributions, scatter plot, pair plot , correlation and Heatmap.

UNIT- 3
PROBABILITY DISTRIBUTIONS AND HYPOTHESIS TESTS
Overview, Probability theory – terminology: random experiment , sample space , event ;
random variables, binomial distinction: example is of exponential distribution , exponential
distribution : example of exponential distribution, nominal distribution: example of nominal
distribution, mean and variance , confidence interval , cumulative probability distribution, other
important distribution; central limit theorem , Hypothesis test : z-test , one-sample t- Test , two-
sample t-Test , paired sample t-Test , chi-square goodness of fit test; Analysis of variance
(ANOVA):Example of One-way ANOVA

UNIT 4:
LINEAR REGRESSION:
Simple Linear Regression, Steps in Building a Regression Model,
Building Simple Linear Regression Model: Creating Feature Set (X) and Outcome Variable(Y),
Splitting the Dataset Into Traing and Validation Sets, Fitting the Model: Printing Estimated
Parameters and Interpreting Them, Complete Coe for Building Regression Model.
Model Diagnostics: Co-efficient of Determination (R-Squared or R2 ),Hypothesis Test fr the
Regression Co-eficient,Analysis of Vriance (ANOVA) in Regression Analysis, Regression
Model Summary Using Python.
Residual Analysis: Check for Normal Distribution of Residual, Test of Homoscedaticity.
Outer Analysis: Z-score, Cook‟s Distance, Leverage Values
Making Prediction and Measuring Accuracy: Prediction Using the validation set, Finding R-
Squared and RMSE, Calculating Prediction Intervals.
Multiple Linear regression: Predicting the SOLID PRICE (Auction Price) of players,
Developing Multiple Linear Regression Model Using Python: Loading the Dataset, Displaying
the First Five Records,
Categorical Encoding Feature, Splitting the Dataset into Train Validation Sets, Building the
model on training Dataset,
Multi-Collinearity and Handling Multi-Collinearity: Variance Inflation Factor, Checking
Correlation of columns with Large VIF‟s, Building a New Model Removing Multi –
Collinearity.
Residual Analysis in Multiple Linear Regression: Test for Normality of Residuals (P-P Plot),
Residual Plot for Homoscedaticity and Model Specification.
Detecting Influencers, Transforming Response Variable, Making Prediction on the validation
set: Measuring RMSE, Measuring R-Squared Value, Auto Correlation between Error Terms

UNIT 5:
CLASSIFICATION PROBLEMS:
Classification Review, Binary Logistic Regression, and Credit Classification: Encoding
Categorical Features, Splitting dataset into training and test sets, Building Logistic Regression
Model, Printing Model summary, Model Diagnostics, Predicting Test data, creating a Confusion
matrix, Receiver Operating Characteristics (ROC) and Area under Curve (AUC), Finding
Optimal Classification Cut off: Youden‟s Index, Cost Based Approach.
Gain Chart and Lift Chart: Loading and preparing the dataset, Building the logistic Regression
Model
Classification Tree (Decision Tree Learning): Splitting the dataset, Building Decision Tree
Classifier using Gini Criteria, Measuring Test Accuracy, Displaying the Tree, Understanding
Gini Impurity, Building Decision Tree using Entropy Criteria, Finding Optimal Criteria and
Max Depth, Benefits of Decision Tree.

TEXT BOOKS:
1. Manaranjan Pradhan and U Dinesh Kumar, “Machine Learning Using Python”, Wiley, 2019
2. U Dinesh Kumar, “Business Analytics – The Science of Data Driven Decision Making”,
Wiley, 2017.
3. Dinesh Kumar, U., Crocker, J., Chitra, T., and Saranga, H., (2006), Reliability and Six
Sigma, Springer, USA
4. Dinesh Kumar, U., Knezevic, J., Crocker J., and El-Haram, M., (2000), Reliability,
Maintenance and Logistic Support - A Life Cycle Approach, Kluwer Academic Publishers,
USA
I Year - I Semester INTERNET OF EVERYTHING L T P C
M.Tech (MTDS1104) 3 0 0 3

Course Objectives:

 Learns about various types of sensors, actuators and different network protocols.
 Construction of wireless sensor networks and communication using different
connectivity technologies
 To Know about how m2M communication performs and communication between user
and the device
 To Know about programming platforms to implement IOT
 Learns about how data is handled generated by IOT application
 how IoT is used for industrial purpose , able to build IoT applications

Course Outcomes:

 Aware about how sensors and actuators are connected by using different network
protocols
 Node behavior in wireless sensor networks and known about which connectivity
technology was used according to the application.
 Knows about Arduino boards and their connection with sensors and actuators
 Learns about how Pi OS is installed and how code is embedded into the board
 Came to know about how data is stored using cloud computing and knows about sensor
clouds.
 Construction various IOT applications using various sensors and Actuators

UNIT I:
Introduction: Sensing & actuation, Communication-Part I, Part II, Networking-Part I, Part II,
Industry 4.0: Globalization and Emerging Issues, The Fourth Revolution, LEAN Production
Systems, Smart and Connected Business Perspective, Smart Factories.

UNIT II:
Industry 4.0: Cyber Physical Systems and Next Generation Sensors, Collaborative Platform and
Product Lifecycle Management, Augmented Reality and Virtual Reality, Artificial Inte lligence,
Big Data and Advanced Analysis

UNIT III:
Cybersecurity in Industry 4.0, Basics of Industrial IoT: Industrial Processes-Part I, Part II,
Industrial Sensing & Actuation, Industrial Internet Systems. IIoT-Introduction, Industrial IoT:
Business Model and Reference Architecture: IIoT-Business Models, IIoT Reference
Architecture

UNIT IV:
Industrial IoT- Layers: IIoT Sensing, IIoT Processing, IIoT Communication, IIoT Networking,
Industrial IoT: Big Data Analytics and Software Defined Networks: IIoT Analytics –
Introduction
UNIT V : Industrial IoT- Application Domains: Healthcare, Power Plants, Inventory
Management & Quality Control, Plant Safety and Security (Including AR and VR safety
applications), Facility Management. Industrial IoT- Application Domains: Oil, chemical and
pharmaceutical industry, Applications of UAVs in Industries.

TEXT BOOKS:

1. .Industry 4.0: The Industrial Internet of Things”, by Alasdair Gilchrist (Apress)


2. “Industrial Internet of Things: Cybermanufacturing Systems “by Sabina Jeschke,
Christian Brecher, Houbing Song, Danda B. Rawat (Springer).
3. Internet of Things: Architecture, Design Principles And Applications, Raj kamal,
McGraw Hill Higher Education
4. Internet of Things, A.Bahgya and V.Madisetti, Univesity Press, 2015.

REFERENCE BOOKS:
1. Designing the Internet of Things, Adrian McEwen and Hakim Cassimally, Wiley, 2013
2. Getting Started with the Internet of Things (Make: Projects), CunoPfister , Oreilly, 2011
I Year - I Semester SOCIAL MEDIA ANALYTICS L T P C
M.Tech (MTDS1104) 3 0 0 3

Course Objectives:
The learning objective of the course Social Media Analytics is to provide students with
essential knowledge of network analysis applicable to real world data

Course Outcomes:
After the completion of the course, student will be able to
 Demonstrate social network analysis and measures.
 Analyze random graph models and navigate social networks data
 Analyze the experiment with small world models and clustering models.
 Compare the application driven virtual communities from social network Structure.

UNIT - I:
INTRODUCTION: Social Networks: Preliminaries and properties, Homophily, Triadic
Closure and Clustering Coefficient, Dynamics of Network Formation, Power-Law Degree
Distributions, Measures of Centrality and Prestige, Degree Centrality, Closeness Centrality,
Betweenness Centrality, Rank Centrality

UNIT - II:
COMMUNITY DISCOVERY IN SOCIAL NETWORKS: Introduction, Communities in
Context, Core Methods, Quality Functions. The Kernighan-Lin (KL) algorithm,
Agglomerative/Divisive Algorithms, Spectral Algorithms, Multi- level Graph Partitioning,
Markov Clustering

UNIT – III:
LINK PREDICTION IN SOCIAL NETWORKS: Introduction, Feature based Link
Prediction, Feature Set Construction, Classification Models, Bayesian Probabilistic Models,
Link Prediction by Local Probabilistic Models, Network Evolution based Probabilistic Model,
Hierarchical Probabilistic Model, Probabilistic Relational Models, Relational Bayesian
Network, Relational Markov Network, Linear Algebraic Methods

UNIT- IV:
SOCIAL INFLUENCE ANALYSIS: Introduction, Influence Related Statistics, Edge
Measures, Node Measures, Social Similarity and Influence, Homophily, Existential Test for
Social Influence, Influence and Actions, Influence and Interaction, I nfluence Maximization in
Viral Marketing, Influence Maximization

UNIT – V:
OPINION MINING AND SENTIMENT ANALYSIS: The Problem of Opinion Mining,
Document Sentiment Classification, Sentence Subjectivity and Sentiment Classification,
Opinion Lexicon Expansion, Aspect-Based Sentiment Analysis, Mining Comparative Opinions

TEXT BOOKS:
Social Network Data Analytics, Charu C. Aggarwal, Springer, 2011
Data mining The Text book, 1st Edition, Charu C Aggarwal , Springer Publications, 2015
Mining Text Data, Charu C. Aggarwal, Cheng Xiang Zhai, Springer Publications, 2012
REFERENCE BOOKS:
1. Networks, Crowds, and Markets: Reasoning about a Highly Connected World, David
Easley, Jon Kleinberg, Cambridge University Press, 2010.
2. Stanley Wasserman, Katherine Faust. Social network analysis: methods and applications.
Cambridge University Press, 1994
3. Networks: An Introduction, M. E. J. Newman, Oxford University Press, March 2010
4. Analyzing the Social Web, Jennifer Golbeck, Morgan Kaufmann Elsevier Publishers,
2014
I Year - I Semester BIG DATA ANALYTICS L T P C
M.Tech ( MTDS1104) 3 0 0 3

Course Objectives:
This course is aimed at enabling the students to
 Provide an overview of an exciting growing field of big data analytics.
 Introduce the tools required to manage and analyze big data like Hadoop, NoSQL,
Map Reduce, HIVE, Cassandra, and Spark.
 Teach the fundamental techniques and principles in achieving big data analytics with
scalability and streaming capability.
 Optimize business decisions and create competitive advantage with Big Data analytics 

Course Outcomes:
After the completion of the course, student will be able to
 Illustrate on big data and its use cases from selected business domains. 
 Interpret and summarize on NoSQL, Cassandra
 Analyze the HADOOP and Map Reduce technologies associated with big data
analytics and explore on Big Data applications Using Hive. 
 Make use of Apache Spark, RDDs etc. to work with datasets. 
 Assess real time processing with Spark Streaming.

UNIT I:
What is big data, why big data, convergence of key trends, unstructured data, industry examples of
big data, web analytics, big data and marketing, fraud and big data, risk and big data, credit risk
management, big data and algorithmic trading, big data and healthcare, big data in medicine,
advertising and big data, big data technologies, introduction to Hadoop, open source technologies,
cloud and big data, mobile business intelligence, Crowd sourcing analytics, inter and trans firewall
analytics.

UNIT II:
Introduction to NoSQL, aggregate data models, aggregates, key-value and document data models,
relationships, graph databases, schema less databases, materialized views, distribution models,
sharding, master-slave replication, peer- peer replication, sharding and replication, consistency,
relaxing consistency, version stamps, Working with Cassandra ,Table creation, loading and reading
data.

UNIT III:
Data formats, analyzing data with Hadoop, scaling out, Architecture of Hadoop distr ibuted file
system (HDFS), fault tolerance ,with data replication, High availability, Data locality , Map
Reduce Architecture, Process flow, Java interface, data flow, Hadoop I/O, data integrity,
compression, serialization. Introduction to Hive, data types and file formats, HiveQL data
definition, HiveQL data manipulation, Logical joins, Window functions, Optimization, Table
partitioning, Bucketing, Indexing, Join strategies.

UNIT IV:
Apache spark- Advantages over Hadoop, lazy evaluation, In memory proces sing, DAG, Spark
context, Spark Session, RDD, Transformations- Narrow and Wide, Actions, Data frames ,RDD to
Data frames, Catalyst optimizer, Data Frame Transformations, Working with Dates and
Timestamps, Working with Nulls in Data, Working with Complex Types, Working with JSON,
Grouping, Window Functions, Joins, Data Sources, Broadcast Variables, Accumulators, Deploying
Spark- On-Premises Cluster Deployments, Cluster Managers- Standalone Mode, Spark on YARN ,
Spark Logs, The Spark UI- Spark UI History Server, Debugging and Spark First Aid
UNIT V:
Spark-Performance Tuning, Stream Processing Fundamentals, Event-Time and State full Processing
- Event Time, State full Processing, Windows on Event Time- Tumbling Windows, Handling Late
Data with Watermarks, Dropping Duplicates in a Stream, Structured Streaming Basics - Core
Concepts, Structured Streaming in Action, Transformations on Streams, Input and Output.

TEXT BOOKS:
1. Big Data, Big Analytics: Emerging, Michael Minnelli, Michelle Chambers, and Ambiga Dhiraj
2. SPARK: The Definitive Guide, Bill Chambers & Matei Zaharia, O'Reilley, 2018 Edition
3. Business Intelligence and Analytic Trends for Today's Businesses", Wiley,2013
4.P. J. Sadalage and M. Fowler, "NoSQL Distilled: A Brief Guide to the Emerging World
Polyglot Persistence", Addison-Wesley Professional,2012
5. Tom White, "Hadoop: The Definitive Guide", Third Edition, O'Reilley,2012

REFERENCE BOOKS:

1. "Hadoop Operations", O'Reilley, Eric Sammer, 2012


2. "Programming Hive", O'Reilley, E. Capriolo, D. Wampler, and J. Ruthe rglen, 2012
3. "HBase: The Definitive Guide", O'Reilley, Lars George,2011
4. "Cassandra: The Definitive Guide", O'Reilley, Eben Hewitt, 2010
5. "Programming Pig", O'Reilley, Alan Gates, 2011.
I Year - I Semester RESEARCH METHODOLOGY AND IPR L T P C
M.Tech (MTDS1105) 2 0 0 2

Course Objectives:

 To give an overview of the research methodology and explain the technique of defining a
research problem
 To explain the functions of the literature review in research.
 To explain carrying out a literature search, its review, developing theoretical and conceptual
frameworks and writing a review.
 To explain various research designs and their characteristics.
 To explain the details of sampling designs, measurement and scaling techniques and also
different methods of data collections.
 To discuss leading International Instruments concerning Intellectual Property Rights.

Course Outcomes:
At the end of the course, students will be able to –
1. Formulate a research problem for a given engineering domain.
2. Analyze the available literature for given research problem.
3. Develop technical writing and presentation skills.
4. Comprehend concepts related to patents, trademark and copyright.

UNIT I:
Meaning of research problem, Sources of research problem, Criteria Characteristics of a good
research problem, Errors in selecting a research problem, Scope and objectives of research problem.
Approaches of investigation of solutions for research problem, data collection, analysis,
interpretation, Necessary instrumentations

UNIT II:
Effective literature studies approaches, analysis Plagiarism, Research ethics, Effective technical
writing, how to write report, Paper Developing a Research Proposal, Format of resea rch proposal, a
presentation and assessment by a review committee

UNIT III:
Nature of Intellectual Property: Patents, Designs, Trade and Copyright. Process of Patenting and
Development: technological research, innovation, patenting, development. International Scenario:
International cooperation on Intellectual Property. Procedure for grants of patents, Patenting under
PCT.

UNIT IV:
Patent Rights: Scope of Patent Rights. Licensing and transfer of technology. Patent information and
databases. Geographical Indications.

UNIT V:
New Developments in IPR: Administration of Patent System. New developments in IPR; IPR of
Biological Systems, Computer Software etc. Traditional knowledge Case Studies, IPR and IITs.
CASE STUDY:
1. Prepare a prediction model for profit of 50_startups data. Do transformations for getting better
predictions of profit and make a table containing R^2 value for each prepared model.(Multi linear
Regression)

R&D Spend -- Research and devolop spend in the past few years
Administration -- spend on administration in the past few years
Marketing Spend -- spend on Marketing in the past few years
State -- states from which data is collected
Profit -- profit of each state in the past few years
2. Let‟s consider a Company dataset with around 10 variables and 400 records.
The attributes are as follows:
 Sales -- Unit sales (in thousands) at each location
 Competitor Price -- Price charged by competitor at each location
 Income -- Community income level (in thousands of dollars)
 Advertising -- Local advertising budget for company at each location (in thousands of dollars)
 Population -- Population size in region (in thousands)
 Price -- Price Company charges for car seats at each site
 Shelf Location at stores -- A factor with levels Bad, Good and Medium indicating the quality
of the shelving location for the car seats at each site
 Age -- Average age of the local population
 Education -- Education level at each location
 Urban -- A factor with levels No and Yes to indicate whether the store is in an urban or rural
location
 US -- A factor with levels No and Yes to indicate whether the store is in the US or not
The company dataset looks like this:

Proble m State ment:


A cloth manufacturing company is interested to know about the segment or attributes causes high
sale. (Decision Tree)

3. Perform clustering (Both hierarchical and K means clustering) for the airlines data to obtain
optimum number of clusters.
Draw the inferences from the clusters obtained.
Data Description: The file EastWestAirlinescontains information on passengers who belong to an
airline‟s frequent flier program. For each passenger the data include information on their mileage
history and on different ways they accrued or spent miles in the last year. The goal is to try to identify
clusters of passengers that have similar characteristics for the purpose of targeting different segments
for different types of mileage offers

ID --Unique ID
Balance--Number of miles eligible for award travel
Qual_mile--Number of miles counted as qualifying for Topflight status
cc1_miles -- Number of miles earned with freq. flyer credit card in the past 12 months:
cc2_miles -- Number of miles earned with Rewards credit card in the past 12 months:
cc3_miles -- Number of miles earned with Small Business credit card in the past 12 months:
1 = under 5,000
2 = 5,000 - 10,000
3 = 10,001 - 25,000
4 = 25,001 - 50,000
5 = over 50,000
Bonus miles--Number of miles earned from non-flight bonus transactions in the past 12 months
Bonus_trans--Number of non-flight bonus transactions in the past 12 months
Flight_miles_12mo--Number of flight miles in the past 12 months
Flight_trans_12--Number of flight transactions in the past 12 months
Days_since_enrolled--Number of days since enrolled in flier program
Award--whether that person had award flight (free flight) or not

TEXT BOOKS:
1. Stuart Melville and Wayne Goddard, “Research methodology: an introduction for science &
engineering students” Juta Education, 1996.

REFERENCE BOOKS:
1. Ranjit Kumar, 2nd Edition, “Research Methodology: A Step by Step Guide for beginners”
2. Halbert, “Resisting Intellectual Property”, Taylor & Francis Ltd , 2007.
3. Mayall, “Industrial Design”, McGraw Hill, 1992.
4. Niebel, “Product Design”, McGraw Hill, 1974.
5. Asimov, “Introduction to Design”, Prentice Hall, 1962.
6. Robert P. Merges, Peter S. Menell, Mark A. Lemley, “Intellectual Property in New Technological
Age”, 2016.
7. T. Ramappa, “Intellectual Property Rights Under WTO”, S. Chand,2008
I Year - I Semester DATA SCIENCE PROGRAMMING LAB L T P C
M.Tech (MTDS1106) 0 0 4 2

Course Objectives:
After the completion of the course, student will be able to
 Implement data science operations like data collection, management and storing.
 Apply Python programming concepts in data science, including their real-world applications.
 Implement data collection and management scripts using Python Pandas.

List of Experime nts:

Experime nt 1:
Installation of anaconda.
Write a Python Program to Find the Sum of the Series: 1 + 1/2 + 1/3 + .. + 1/N

Experime nt2:
Write a Python Program to Split the array and add the first part to the end

Experime nt 3:
Write a Python Program to Create a List of Tuples with the First Element as the Number and Second
Element as the Square of the Number

Experime nt 4:
Write a Python program to count number of vowels using sets in given string

Experime nt 5:
Write a program to implement permutation of a given string using inbuilt function

Experime nt 6:
Write a python program to sort list of dictionaries by values in Python – Using lambda function.

Experime nt 7:
Write a Python Program for following sorting:
i. Quick Sort
ii. HeapSort

Experime nt 8:
Write a Python Program to Reverse a String Using Recursion

Experime nt 9:
Write a Python Program to Count the Number of Words in a Text File

Experime nt 10:
Write a Python Program to Read the Contents of a File in Reverse Order

Experime nt 11:
Write a program to Merge and Join Data Frames with Pandas in Python
Experime nt 12:
Write a program to implement Merge and Join Data Frames with Python Pandas

Experime nt 13:
Write a Python Program to Append the Contents of One File to Another File

Experime nt 14:
How to install and Load CSV files to Python Pandas

Experime nt 15:
Write a program to implement Data analysis and Visualization with Python using pandas.

Experime nt 16:
Write a program to Implement Plotting Functions in python pandas.

Text Books:
1. Learning Python, 5th Edition, MarkLutz, OReilly, 2013.
2. Programming Python, 4th Edition, MarkLutz, OReilly, 2010.
3. Python for Data Analysis, 2nd Edition, WesMckinney, O Reilly, 2017.
I Year - I Semester ADVANCED COMPUTING LAB L T P C
M.Tech (MTDS1107) 0 0 4 2
Course Objectives:

 Implement various heuristics search techniques.


 Solve problems with uncertain information using Bayesian approaches. 
 Implement data summarization, query, and analysis. 
 Applying data modelling techniques to large datasets. 
 Creating applications for Big Data analytics.
 Building a complete business data analytic solution. 

List of Experime nts:

Experime nt 1:
Write a python program to implement following Best First Heuristic Search in artificial
intelligence.

Experime nt 2:
Write a python program to implement following A* Heuristic Search in artificial intelligence.

Experime nt 3:
Write a python program to implement following Hill climbing Heuristic Search in artificial
intelligence.

Experime nt 4:
Write a python program to implement following Bidirectional Heuristic Search in artificial
intelligence.

Experime nt 5:
Do the following case study:
i) For the Bayesian network given in fig below and the corresponding probabilities,
generate the conditional probability table.
ii) Also the compute the following probabilities:
a) Joint probability P(A,B, C,D)
b) P(A|B)
c) P(A|C)
d) P(A|B,C)
Experime nt 6:
(a) Perform setting up and Installing Hadoop in its two operating modes:
i. Pseudo distributed,
ii. Fully distributed.
(b) Use web based tools to monitor your Hadoop setup.

Experime nt 7:
(a) Implement the following file management tasks in Hadoop:
i. Adding files and directories
ii. Retrieving files
iii. Deleting files
(b) Benchmark and stress test an Apache Hadoop cluster

Experime nt 8:
(a) Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
i. Find the number of occurrence of each word appearing in the input file(s)
ii. Performing a MapReduce Job for word search count (look for specific keywords in a
file)

Experime nt 9:
Stop word elimination problem:
Input:
i. A large textual file containing one sentence per line
ii. A small file containing a set of stop words (One stop word per line)
Output:
iii. A textual file containing the same sentences of the large input file without
the words appearing in the small file.

Experime nt 10:
Write a Map Reduce program that mines weather data. Weather sensors collecting data
every hour at many locations across the globe gather large volume of log data, which is a
good candidate for analysis with MapReduce, since it is semi structured and record-
oriented.
Data available at: https://github.com/tomwhite/hadoopbook/ tree/master/input/ncdc/all.
(a) Find average, max and min temperature for each year in NCDC dataset?
(b) Filter the readings of a set based on value of the measurement, Output the line of
input files associated with a temperature value greater than 30.0 and store it in a
separate file.

Experime nt 11:
Install and Run Pig then write Pig Latin scripts to sort, group, join, project, and filter your data.

Experime nt 12:
Install and Run Hive then use Hive to create, alter, and drop databases, tables, views,
functions, and indexes.

Experime nt 13:
Install, Deploy & configure Apache Spark Cluster. Run apache spark applications using Scala.
Experime nt 14:
Perform Data analytics using Apache Spark on Amazon food dataset, find all the pairs of
items frequently reviewed together.
Write a single Spark application that:
(a) Transposes the original Amazon food dataset, obtaining a Pair RDD of the type:
<user_id> → <list of the product_ids reviewed byuser_id>
(b) Counts the frequencies of all the pairs of products reviewed together;
(c) Writes on the output folder all the pairs of products that appear more than once and
their frequencies. The pairs of products must be sorted by frequency.

Experime nt 15:
Write a python program to implement following: breadth-first search and depth first
search.

Text Books:
1. Artificial Intelligence with Python - Heuristic Search, Prateek Joshi, Packt, 2017.
2. Big Data, Big Analytics: Emerging, Michael Minnelli, Michelle Chambers, and Ambiga Dhiraj,
Wiley, 2013.
3. SPARK: The Definitive Guide, Bill Chambers &Matei Zaharia, O'Reilley, 2018Edition

You might also like