0% found this document useful (0 votes)
3 views

Pipelines, Functions, Oops

DATA SCIENCE
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Pipelines, Functions, Oops

DATA SCIENCE
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Pipelines, Functions, Oops

(Designing Scalable Data Pipelines with


Functional Programming and Object-
Oriented Principles in Python)
Agenda

1. Introduction to Python Functions


Pipelines, Functions, Oops

2. Types of Functions

3. User Defined Functions

4. Generators

5. Classes, Objects & OOPS

6. Python Pipelines

7. Advantages of using Pipelines

8. Pipeline @ various Stages

9. Dunder Methods & Usages


print(), len(), type()
add = lambda x, y: x + y
def greet(name): print(add(5, 3)) # Outputs: 8
Built-in
return f"Hello, {name}!" Functions
TYPES
def count_up_to(max):
def factorial(n): count = 1
if n == 1: while count <= max:
return 1 User Anonymous yield count
Python Functions

Defined (Lambda)
else: Functions Functions count += 1
return n * factorial(n - 1)
def square(x):
import asyncio return x * x
Recursive Generator
Functions Functions
async def say_hello(): nums = [1, 2, 3, 4]
await asyncio.sleep(1) sq_nos = map(square, nums)
print("Hello!")
Asynchron Higher- class MyClass:
ous Order
asyncio.run(say_hello()) Functions Functions @staticmethod
def static_method():
def my_decorator(func): print("This is a static method.")
def wrapper(): Static and
func() Decorators Class @classmethod
Methods
return wrapper def class_method(cls):
@my_decorator print("This is a class method.")
def say_hello():
print("Hello!") MyClass.static_method()
MyClass.class_method()
say_hello()
1. Built-in Functions : pre-defined in Python and are always available to use.
2. User-Defined Functions : Used defines and creates them using the def keyword.
3. Anonymous (Lambda) Functions : Small, one-line functions using the lambda keyword.
They don’t require a def keyword or a name.
4. Recursive Functions : Functions that call themselves to solve smaller instances of the
Types of Functions

same problem.
5. Higher-Order Functions : Functions that take other functions as arguments or return
functions as their result.
6. Generator Functions : Special functions that return a generator object. They use yield
instead of return to produce a series of values lazily, one at a time.
7. Decorators : Functions that modify the behavior of other functions. They take a function as
an argument and return a new function with additional or altered behavior.
8. Static and Class Methods
•Static Methods: Defined with @staticmethod decorator, they don’t require access to the
instance or class and behave like regular functions but belong to the class's namespace.
•Class Methods: Defined with @classmethod decorator, they take the class (cls) as the
first parameter and can modify class state.
9. Asynchronous Functions (Async/Await)
These are functions defined using async def, allowing for asynchronous programming. They
can perform non-blocking operations with the use of await.
User-Defined Functions :
 Enhances the Maintainability, Modularity, Reusability, Readability upon development
User Defined Functions

__init__ : A special method used as a constructor to initialize objects.


__name__ : A special variable that determines if a script is run directly or imported.

 __name__ = main() <<= If the script is being run directly


 __name__ = Name of the module <<= If the script is imported into another script.
Generator :
A special type of iterator that lazily produces values one at a time using the yield statement,
allowing efficient memory usage by generating items on the fly as they are needed.
Applications :
 Efficient Data Processing:
Reading Large Files: Read large files line by line without loading the entire file into memory.
Generators

Streaming Data: Process streams of data, such as log files or network responses, incrementally.
 Lazy Evaluation: Infinite Sequences: Generate Fibonacci numbers or prime numbers
 Memory Efficiency: Large Datasets: Work by generating values on-the-fly w/o storing in memory.
 Pipelines: Data Pipelines: where each stage yields data to the next stage.
 Stateful Iteration: Implement custom iterators that maintain their state b/n iterations, allowing
complex iteration logic.

 Backtracking Algorithms: Search Problems: Solve problems that require backtracking, such as
generating permutations or combinations, where the generator can pause and resume.

 Concurrency: Asynchronous Programming: Use generators with asyncio for asynchronous


programming to handle tasks like I/O operations without blocking the main thread.

 Caching and Memorization: Use generators to cache results of expensive computations and yield
them as needed.
Class :
A Blueprint for creating objects.

It defines a set of attributes and methods that the created objects (instances) will have.

Object :
Class, Objects & OOPS

An instance of a class.
A self-contained entity, that consists of attributes (variables) & methods (functions) defined by its
class.

Inheritance :
A mechanism by which one class (child or subclass) can inherit attributes and methods from
another class (parent or superclass).
This allows for code reuse and the creation of a hierarchical relationship between classes.

Polymorphism:
The ability of different classes to be treated as instances of the same class through inheritance.
It allows a single method to behave differently based on the object that it is acting upon.

Encapsulation:
The practice of bundling the data (attributes) and methods that operate on the data into a single
unit, or class, and restricting access to some of the object's components. This is usually done by
making attributes private (using an underscore _) and providing public methods to access or
modify them
What is a Pipeline ?
 A series of data processing steps that are connected together, where the output of one step
becomes the input for the next.

Why Pipeline ?
Introduction to Pipelines

 Need for pipeline – Automation in workflows (Apache Airflow)

 Need for efficient, repeatable, and scalable processes in data science.

 Avoiding manual intervention in repetitive tasks to reduce errors and increase productivity.

Real-world example :
Scenario:

Imagine a company that needs to regularly analyze customer data to predict future purchasing
trends. Without a pipeline, this process would involve manually cleaning the data, selecting
features, and running models each time new data is available.
Pipeline Solution:

By creating a data pipeline, the company can automate the entire process: data cleaning, feature
selection, model training, and evaluation are all done automatically whenever new data is added.
This not only saves time but also ensures that the process is consistent and repeatable.
Advantages of using Pipelines Advantages of Pipelines

 Automation
Reduces manual intervention, making the process more efficient & less error-prone.

 Consistency
Ensures that the same transformations are applied to training and test data.

 Modularity
Simplifies process of modifying individual components w/o affecting entire pipeline.

 Reusability
Pipelines can be reused across different projects or datasets.

 Scalability
Facilitates scaling the process for large datasets and more complex models.
I. Data Collection:
 The initial step involves gathering data from various sources. This could include databases,
files, APIs, or web scraping.
II. Data Preprocessing:
Pipeline @ various stages

 Data Cleaning: Handling missing values, outliers, and correcting inconsistencies.


 Feature Engineering: Creating new features or transforming existing ones to improve model
performance.
 Feature Selection: Choosing relevant features for model training and reducing dimensionality if
needed.
III. Data Transformation:
 Scaling/Normalization: Standardizing or normalizing data to ensure that all features contribute
equally to model training.
 Encoding: Converting categorical variables into numerical format, often using techniques like
one-hot encoding or label encoding.
IV. Model Building:
 Choosing Algorithms: Selecting appropriate machine learning algorithms based on the
problem (e.g., regression, classification).
 Training: Fitting the model to the training data.
 Hyperparameter Tuning: Optimizing model parameters to improve performance.
V. Evaluation:
 Validation: Assessing model performance using a validation dataset.
 Metrics: Measuring performance using metrics like accuracy, precision, recall, F1 score, or
Pipeline @ various stages

ROC AUC, depending on the problem.


VI. Model Deployment:
 Integration: Deploying the trained model into a production environment where it can make
predictions on new data.
 Monitoring: Continuously monitoring model performance and retraining as necessary to
handle concept drift or changes in data distribution.
VII. Result Interpretation:
 Visualization: Creating plots and graphs to interpret the results and communicate findings.
 Reporting: Documenting the results and insights gained from the analysis.
VIII. Pipeline Management:
 Automation: Implementing workflows to automate repetitive tasks and ensure consistency.
 Version Control: Keeping track of changes in the pipeline and models for reproducibility and
debugging.
Object Initialization and Representation

•__init__(self, ...): Initializes a new instance of a class.


•__del__(self): Destructor method, called when an object is about to be destroyed.
•__repr__(self): Returns a string that represents the object for debugging and development.
Usages of Dunder Methods

•__str__(self): Returns a user-friendly string representation of the object.


•__format__(self, format_spec): Defines custom formatting for the format() function and
formatted string literals.

customize behavior for built-in operations and to


Comparison and Ordering

Dunder Methods : Used to implement or


•__eq__(self, other): Defines behavior for equality comparison (==).
•__ne__(self, other): Defines behavior for inequality comparison (!=).

support Python's data model.


•__lt__(self, other): Defines behavior for less-than comparison (<).
•__le__(self, other): Defines behavior for less-than-or-equal comparison (<=).
•__gt__(self, other): Defines behavior for greater-than comparison (>).
•__ge__(self, other): Defines behavior for greater-than-or-equal comparison (>=).

Arithmetic Operations

•__add__(self, other): Defines behavior for addition (+).


•__sub__(self, other): Defines behavior for subtraction (-).
•__mul__(self, other): Defines behavior for multiplication (*).
•__truediv__(self, other): Defines behavior for division (/).
•__floordiv__(self, other): Defines behavior for floor division (//).
•__mod__(self, other): Defines behavior for modulus (%).
•__pow__(self, other): Defines behavior for exponentiation (**).
Unary Operations

•__neg__(self): Defines behavior for unary negation (-).


•__pos__(self): Defines behavior for unary positive (+).
Usages of Dunder Methods

•__abs__(self): Defines behavior for the abs() function.


•__invert__(self): Defines behavior for bitwise negation (~).

Container Methods

•__len__(self): Defines behavior for the len() function.


•__getitem__(self, key): Defines behavior for indexing (self[key]).
•__setitem__(self, key, value): Defines behavior for setting item values (self[key] = value).
•__delitem__(self, key): Defines behavior for deleting items (del self[key]).
•__contains__(self, item): Defines behavior for membership tests (in).

Iteration and Context Management

•__iter__(self): Returns an iterator object for iteration.


•__next__(self): Returns the next item from the iterator.
•__enter__(self): Defines behavior for entering a context manager (with statement).
•__exit__(self, exc_type, exc_value, traceback): Defines behavior for exiting a context manager
(with statement).

Callable Objects

•__call__(self, ...): Allows an instance of a class to be called as if it were a function.


Object Conversion and String Representation

•__copy__(self): Defines behavior for copying objects using the copy module.
•__deepcopy__(self, memo): Defines behavior for deep copying objects.
Usages of Dunder Methods

Object Construction and Destruction

•__new__(cls, ...): Defines behavior for creating a new instance of a class, called before __init__.
•__hash__(self): Defines behavior for hashing an object (used in hash-based collections like sets
and dictionaries).

Special Methods for Collections

•__reversed__(self): Defines behavior for the reversed() function.


•__contains__(self, item): Defines behavior for membership testing (in).

Miscellaneous

•__call__(self, ...): Allows an instance of a class to be called as if it were a function.


•__eq__(self, other): Defines behavior for equality comparison (==).
•__ne__(self, other): Defines behavior for inequality comparison (!=).

You might also like