0% found this document useful (0 votes)

3 views

Pipelines, Functions, Oops

DATA SCIENCE

Uploaded by

Buvanesh Nallaperumal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Pipelines, Functions, Oops

DATA SCIENCE

Uploaded by

Buvanesh Nallaperumal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Pipelines, Functions, Oops

(Designing Scalable Data Pipelines with

Functional Programming and Object-
Oriented Principles in Python)
Agenda

1. Introduction to Python Functions

Pipelines, Functions, Oops

2. Types of Functions

3. User Defined Functions

4. Generators

5. Classes, Objects & OOPS

6. Python Pipelines

7. Advantages of using Pipelines

8. Pipeline @ various Stages

9. Dunder Methods & Usages

print(), len(), type()
add = lambda x, y: x + y
def greet(name): print(add(5, 3)) # Outputs: 8
Built-in
return f"Hello, {name}!" Functions
TYPES
def count_up_to(max):
def factorial(n): count = 1
if n == 1: while count <= max:
return 1 User Anonymous yield count
Python Functions

Defined (Lambda)
else: Functions Functions count += 1
return n * factorial(n - 1)
def square(x):
import asyncio return x * x
Recursive Generator
Functions Functions
async def say_hello(): nums = [1, 2, 3, 4]
await asyncio.sleep(1) sq_nos = map(square, nums)
print("Hello!")
Asynchron Higher- class MyClass:
ous Order
asyncio.run(say_hello()) Functions Functions @staticmethod
def static_method():
def my_decorator(func): print("This is a static method.")
def wrapper(): Static and
func() Decorators Class @classmethod
Methods
return wrapper def class_method(cls):
@my_decorator print("This is a class method.")
def say_hello():
print("Hello!") MyClass.static_method()
MyClass.class_method()
say_hello()
1. Built-in Functions : pre-defined in Python and are always available to use.
2. User-Defined Functions : Used defines and creates them using the def keyword.
3. Anonymous (Lambda) Functions : Small, one-line functions using the lambda keyword.
They don’t require a def keyword or a name.
4. Recursive Functions : Functions that call themselves to solve smaller instances of the
Types of Functions

same problem.
5. Higher-Order Functions : Functions that take other functions as arguments or return
functions as their result.
6. Generator Functions : Special functions that return a generator object. They use yield
instead of return to produce a series of values lazily, one at a time.
7. Decorators : Functions that modify the behavior of other functions. They take a function as
an argument and return a new function with additional or altered behavior.
8. Static and Class Methods
•Static Methods: Defined with @staticmethod decorator, they don’t require access to the
instance or class and behave like regular functions but belong to the class's namespace.
•Class Methods: Defined with @classmethod decorator, they take the class (cls) as the
first parameter and can modify class state.
9. Asynchronous Functions (Async/Await)
These are functions defined using async def, allowing for asynchronous programming. They
can perform non-blocking operations with the use of await.
User-Defined Functions :
 Enhances the Maintainability, Modularity, Reusability, Readability upon development
User Defined Functions

init : A special method used as a constructor to initialize objects.

__name__ : A special variable that determines if a script is run directly or imported.

 name = main() <<= If the script is being run directly

 __name__ = Name of the module <<= If the script is imported into another script.
Generator :
A special type of iterator that lazily produces values one at a time using the yield statement,
allowing efficient memory usage by generating items on the fly as they are needed.
Applications :
 Efficient Data Processing:
Reading Large Files: Read large files line by line without loading the entire file into memory.
Generators

Streaming Data: Process streams of data, such as log files or network responses, incrementally.
 Lazy Evaluation: Infinite Sequences: Generate Fibonacci numbers or prime numbers
 Memory Efficiency: Large Datasets: Work by generating values on-the-fly w/o storing in memory.
 Pipelines: Data Pipelines: where each stage yields data to the next stage.
 Stateful Iteration: Implement custom iterators that maintain their state b/n iterations, allowing
complex iteration logic.

 Backtracking Algorithms: Search Problems: Solve problems that require backtracking, such as
generating permutations or combinations, where the generator can pause and resume.

 Concurrency: Asynchronous Programming: Use generators with asyncio for asynchronous

programming to handle tasks like I/O operations without blocking the main thread.

 Caching and Memorization: Use generators to cache results of expensive computations and yield
them as needed.
Class :
A Blueprint for creating objects.

It defines a set of attributes and methods that the created objects (instances) will have.

Object :
Class, Objects & OOPS

An instance of a class.
A self-contained entity, that consists of attributes (variables) & methods (functions) defined by its
class.

Inheritance :
A mechanism by which one class (child or subclass) can inherit attributes and methods from
another class (parent or superclass).
This allows for code reuse and the creation of a hierarchical relationship between classes.

Polymorphism:
The ability of different classes to be treated as instances of the same class through inheritance.
It allows a single method to behave differently based on the object that it is acting upon.

Encapsulation:
The practice of bundling the data (attributes) and methods that operate on the data into a single
unit, or class, and restricting access to some of the object's components. This is usually done by
making attributes private (using an underscore _) and providing public methods to access or
modify them
What is a Pipeline ?
 A series of data processing steps that are connected together, where the output of one step
becomes the input for the next.

Why Pipeline ?
Introduction to Pipelines

 Need for pipeline – Automation in workflows (Apache Airflow)

 Need for efficient, repeatable, and scalable processes in data science.

 Avoiding manual intervention in repetitive tasks to reduce errors and increase productivity.

Real-world example :
Scenario:

Imagine a company that needs to regularly analyze customer data to predict future purchasing
trends. Without a pipeline, this process would involve manually cleaning the data, selecting
features, and running models each time new data is available.
Pipeline Solution:

By creating a data pipeline, the company can automate the entire process: data cleaning, feature
selection, model training, and evaluation are all done automatically whenever new data is added.
This not only saves time but also ensures that the process is consistent and repeatable.
Advantages of using Pipelines Advantages of Pipelines

 Automation
Reduces manual intervention, making the process more efficient & less error-prone.

 Consistency
Ensures that the same transformations are applied to training and test data.

 Modularity
Simplifies process of modifying individual components w/o affecting entire pipeline.

 Reusability
Pipelines can be reused across different projects or datasets.

 Scalability
Facilitates scaling the process for large datasets and more complex models.
I. Data Collection:
 The initial step involves gathering data from various sources. This could include databases,
files, APIs, or web scraping.
II. Data Preprocessing:
Pipeline @ various stages

 Data Cleaning: Handling missing values, outliers, and correcting inconsistencies.

 Feature Engineering: Creating new features or transforming existing ones to improve model
performance.
 Feature Selection: Choosing relevant features for model training and reducing dimensionality if
needed.
III. Data Transformation:
 Scaling/Normalization: Standardizing or normalizing data to ensure that all features contribute
equally to model training.
 Encoding: Converting categorical variables into numerical format, often using techniques like
one-hot encoding or label encoding.
IV. Model Building:
 Choosing Algorithms: Selecting appropriate machine learning algorithms based on the
problem (e.g., regression, classification).
 Training: Fitting the model to the training data.
 Hyperparameter Tuning: Optimizing model parameters to improve performance.
V. Evaluation:
 Validation: Assessing model performance using a validation dataset.
 Metrics: Measuring performance using metrics like accuracy, precision, recall, F1 score, or
Pipeline @ various stages

ROC AUC, depending on the problem.

VI. Model Deployment:
 Integration: Deploying the trained model into a production environment where it can make
predictions on new data.
 Monitoring: Continuously monitoring model performance and retraining as necessary to
handle concept drift or changes in data distribution.
VII. Result Interpretation:
 Visualization: Creating plots and graphs to interpret the results and communicate findings.
 Reporting: Documenting the results and insights gained from the analysis.
VIII. Pipeline Management:
 Automation: Implementing workflows to automate repetitive tasks and ensure consistency.
 Version Control: Keeping track of changes in the pipeline and models for reproducibility and
debugging.
Object Initialization and Representation

•init(self, ...): Initializes a new instance of a class.

•__del__(self): Destructor method, called when an object is about to be destroyed.
•__repr__(self): Returns a string that represents the object for debugging and development.
Usages of Dunder Methods

•str(self): Returns a user-friendly string representation of the object.

•__format__(self, format_spec): Defines custom formatting for the format() function and
formatted string literals.

customize behavior for built-in operations and to

Comparison and Ordering

Dunder Methods : Used to implement or

•__eq__(self, other): Defines behavior for equality comparison (==).
•__ne__(self, other): Defines behavior for inequality comparison (!=).

support Python's data model.

•__lt__(self, other): Defines behavior for less-than comparison (<).
•__le__(self, other): Defines behavior for less-than-or-equal comparison (<=).
•__gt__(self, other): Defines behavior for greater-than comparison (>).
•__ge__(self, other): Defines behavior for greater-than-or-equal comparison (>=).

Arithmetic Operations

•add(self, other): Defines behavior for addition (+).

•__sub__(self, other): Defines behavior for subtraction (-).
•__mul__(self, other): Defines behavior for multiplication (*).
•__truediv__(self, other): Defines behavior for division (/).
•__floordiv__(self, other): Defines behavior for floor division (//).
•__mod__(self, other): Defines behavior for modulus (%).
•__pow__(self, other): Defines behavior for exponentiation (**).
Unary Operations

•neg(self): Defines behavior for unary negation (-).

•__pos__(self): Defines behavior for unary positive (+).
Usages of Dunder Methods

•abs(self): Defines behavior for the abs() function.

•__invert__(self): Defines behavior for bitwise negation (~).

Container Methods

•len(self): Defines behavior for the len() function.

•__getitem__(self, key): Defines behavior for indexing (self[key]).
•__setitem__(self, key, value): Defines behavior for setting item values (self[key] = value).
•__delitem__(self, key): Defines behavior for deleting items (del self[key]).
•__contains__(self, item): Defines behavior for membership tests (in).

Iteration and Context Management

•iter(self): Returns an iterator object for iteration.

•__next__(self): Returns the next item from the iterator.
•__enter__(self): Defines behavior for entering a context manager (with statement).
•__exit__(self, exc_type, exc_value, traceback): Defines behavior for exiting a context manager
(with statement).

Callable Objects

•call(self, ...): Allows an instance of a class to be called as if it were a function.

Object Conversion and String Representation

•__copy__(self): Defines behavior for copying objects using the copy module.
•__deepcopy__(self, memo): Defines behavior for deep copying objects.
Usages of Dunder Methods

Object Construction and Destruction

•__new__(cls, ...): Defines behavior for creating a new instance of a class, called before __init__.
•__hash__(self): Defines behavior for hashing an object (used in hash-based collections like sets
and dictionaries).

Special Methods for Collections

•reversed(self): Defines behavior for the reversed() function.

•__contains__(self, item): Defines behavior for membership testing (in).

Miscellaneous

•call(self, ...): Allows an instance of a class to be called as if it were a function.

•__eq__(self, other): Defines behavior for equality comparison (==).
•__ne__(self, other): Defines behavior for inequality comparison (!=).

C_HAMOD_2404
No ratings yet
C_HAMOD_2404
41 pages
BAPI PO Creation - Example & Documentation
No ratings yet
BAPI PO Creation - Example & Documentation
6 pages
13 Introduction To Python Function and Classes
No ratings yet
13 Introduction To Python Function and Classes
15 pages
3.classes: 3.1. Python Scopes and Namespaces
No ratings yet
3.classes: 3.1. Python Scopes and Namespaces
10 pages
EDA 2425 T03b Functions
No ratings yet
EDA 2425 T03b Functions
26 pages
UNIT 3 PC NOTES
No ratings yet
UNIT 3 PC NOTES
25 pages
Chapter 4 5 6 Unit Test 2 Notes
No ratings yet
Chapter 4 5 6 Unit Test 2 Notes
14 pages
Short Notes On Python
No ratings yet
Short Notes On Python
12 pages
Computational Thinking and Programming – 2
No ratings yet
Computational Thinking and Programming – 2
5 pages
Abu Hamour Branch, Doha - Qatar: M.E.S Indian School (Mesis)
No ratings yet
Abu Hamour Branch, Doha - Qatar: M.E.S Indian School (Mesis)
9 pages
Computer
No ratings yet
Computer
35 pages
5th chap question python
No ratings yet
5th chap question python
19 pages
Python Functions: PFE610S NUST 2019
No ratings yet
Python Functions: PFE610S NUST 2019
15 pages
W02.2- Python Concept
No ratings yet
W02.2- Python Concept
14 pages
Functions In python
No ratings yet
Functions In python
9 pages
Advanced Python Concepts
No ratings yet
Advanced Python Concepts
8 pages
01.Funtions in Python
No ratings yet
01.Funtions in Python
6 pages
Functions
No ratings yet
Functions
6 pages
Class12_CS_Chapter7
No ratings yet
Class12_CS_Chapter7
36 pages
C++ Chapter 3 manfg
No ratings yet
C++ Chapter 3 manfg
19 pages
PP_UNIT-3(PART-2)
No ratings yet
PP_UNIT-3(PART-2)
14 pages
Python_7
No ratings yet
Python_7
6 pages
Python Mid Exam Que Ans-1
No ratings yet
Python Mid Exam Que Ans-1
23 pages
Namma Kalvi 12th Computer Science Chapter 7 and 8 Notes em 215067
No ratings yet
Namma Kalvi 12th Computer Science Chapter 7 and 8 Notes em 215067
9 pages
OOPS Notebook
No ratings yet
OOPS Notebook
8 pages
Top 25 Python Interview Questions
No ratings yet
Top 25 Python Interview Questions
28 pages
Python 09 OOP
No ratings yet
Python 09 OOP
19 pages
Function Notes Chapter 2
No ratings yet
Function Notes Chapter 2
6 pages
Python Unit - 4
No ratings yet
Python Unit - 4
7 pages
Adv PGM PY C1
No ratings yet
Adv PGM PY C1
32 pages
01. PT - Chapter 1 - Functions and processing techniques
No ratings yet
01. PT - Chapter 1 - Functions and processing techniques
69 pages
Day 11 Functions in Class
No ratings yet
Day 11 Functions in Class
4 pages
Presentation 4
No ratings yet
Presentation 4
31 pages
Python Unit 4
No ratings yet
Python Unit 4
8 pages
08 Classes Objects
No ratings yet
08 Classes Objects
16 pages
Lecture 3 - Python Functions
No ratings yet
Lecture 3 - Python Functions
11 pages
Experiment 5-PY
No ratings yet
Experiment 5-PY
9 pages
AI_LabManual02
No ratings yet
AI_LabManual02
7 pages
Inbound 4254748104641735430
No ratings yet
Inbound 4254748104641735430
7 pages
Python OOP
No ratings yet
Python OOP
19 pages
Functions
No ratings yet
Functions
27 pages
Learn Python 3 - Functions
No ratings yet
Learn Python 3 - Functions
4 pages
Lecture 11 - Functions
No ratings yet
Lecture 11 - Functions
17 pages
Polymorphism in Python
No ratings yet
Polymorphism in Python
3 pages
CS50 - Section 7: Sat, Nov 4
No ratings yet
CS50 - Section 7: Sat, Nov 4
22 pages
Learn Python 3_ Functions Cheatsheet _ Codecademy
No ratings yet
Learn Python 3_ Functions Cheatsheet _ Codecademy
6 pages
UNIT-04 CLASSES
No ratings yet
UNIT-04 CLASSES
18 pages
Day 5 - Python Lect 4
No ratings yet
Day 5 - Python Lect 4
20 pages
Python Question Bank - Class Test 2
No ratings yet
Python Question Bank - Class Test 2
15 pages
Week05 Review Functions
No ratings yet
Week05 Review Functions
27 pages
Understanding The 10 Most Difficult Python Concepts - by Joanna - Geek Culture
No ratings yet
Understanding The 10 Most Difficult Python Concepts - by Joanna - Geek Culture
24 pages
PP Unit-3
No ratings yet
PP Unit-3
70 pages
Python_6
No ratings yet
Python_6
6 pages
Python 3 - Object Oriented
No ratings yet
Python 3 - Object Oriented
12 pages
function notes
No ratings yet
function notes
6 pages
Constructor: Class Def Class Def
No ratings yet
Constructor: Class Def Class Def
11 pages
Lambda Functions, Modules & Packages
No ratings yet
Lambda Functions, Modules & Packages
8 pages
Functions-UDF-2
No ratings yet
Functions-UDF-2
54 pages
Python Unit 2
No ratings yet
Python Unit 2
25 pages
Decorators
No ratings yet
Decorators
12 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
2 pages
Statistics N Probability
No ratings yet
Statistics N Probability
31 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
Logistic Regression Algorithm
No ratings yet
Logistic Regression Algorithm
8 pages
Summer Assignment
No ratings yet
Summer Assignment
6 pages
OEM Wise Course Analysis
No ratings yet
OEM Wise Course Analysis
11 pages
A Structured Approach To SQL Query Design
No ratings yet
A Structured Approach To SQL Query Design
21 pages
CS403 Courtney Paradice Database Systems For Management CCBY Chapter1
100% (1)
CS403 Courtney Paradice Database Systems For Management CCBY Chapter1
31 pages
Alv Events Whole All Events
No ratings yet
Alv Events Whole All Events
7 pages
Phyton Report
No ratings yet
Phyton Report
30 pages
Example: Splitter Within Splitter, Data - Changed Event, Checkbox ALV Column
No ratings yet
Example: Splitter Within Splitter, Data - Changed Event, Checkbox ALV Column
12 pages
Database Normalization Slides
No ratings yet
Database Normalization Slides
27 pages
SQL Basics For RPG Developers
No ratings yet
SQL Basics For RPG Developers
76 pages
FinalDocument - Shopping Mall Administration-1
No ratings yet
FinalDocument - Shopping Mall Administration-1
46 pages
Chapter 7 - RUN - TIME ENVIRONMENT
No ratings yet
Chapter 7 - RUN - TIME ENVIRONMENT
85 pages
Integrating BMC Remedy Action Request System With Single Sign-On (SSO) and Other Client-Side Login Intercept Technologies
No ratings yet
Integrating BMC Remedy Action Request System With Single Sign-On (SSO) and Other Client-Side Login Intercept Technologies
24 pages
Week-03 Program-02: (Https://swayam - Gov.in)
No ratings yet
Week-03 Program-02: (Https://swayam - Gov.in)
3 pages
Wolaita Sodo University School of Informatics: Department of Computer Science
No ratings yet
Wolaita Sodo University School of Informatics: Department of Computer Science
50 pages
Amazon Coding Placement Paper
No ratings yet
Amazon Coding Placement Paper
10 pages
Name: Mishra Bhaskar Anupam Enroll: 14012141016 Class: 6th ME B Program For "FORCED CONVECTION"
No ratings yet
Name: Mishra Bhaskar Anupam Enroll: 14012141016 Class: 6th ME B Program For "FORCED CONVECTION"
10 pages
Microservices-Based Software Architecture and Approaches
No ratings yet
Microservices-Based Software Architecture and Approaches
8 pages
Error Log
No ratings yet
Error Log
19 pages
CP1 Reviewer 1stsemmid
No ratings yet
CP1 Reviewer 1stsemmid
12 pages
False Position Method: Roots of Equation
No ratings yet
False Position Method: Roots of Equation
4 pages
Uttam Resume
No ratings yet
Uttam Resume
4 pages
The Armsim# User Guide: 1. Overview
No ratings yet
The Armsim# User Guide: 1. Overview
42 pages
Python Programming KNC 302-1-2020
No ratings yet
Python Programming KNC 302-1-2020
4 pages
PhaserByExample v2 5
No ratings yet
PhaserByExample v2 5
107 pages
Teach Yourself
No ratings yet
Teach Yourself
17 pages
ATtiny13, ATtiny2313, Instruction Set
No ratings yet
ATtiny13, ATtiny2313, Instruction Set
19 pages
Internet Technologies
No ratings yet
Internet Technologies
91 pages
Obstacle Avoiding Robot Arduino Code-1 PDF
No ratings yet
Obstacle Avoiding Robot Arduino Code-1 PDF
5 pages