0% found this document useful (0 votes)

2 views27 pages

Python

The document outlines a syllabus for a data science course covering key Python libraries including IPython/Jupyter, NumPy, Pandas, Matplotlib, and Scikit-Learn. Each unit focuses on different aspects of these libraries, such as launching Jupyter Notebooks, manipulating data with NumPy and Pandas, visualizing data with Matplotlib, and implementing machine learning with Scikit-Learn. The curriculum is designed to provide a comprehensive foundation for students entering the fields of data analysis and machine learning.

Uploaded by

rajkumarm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views27 pages

Python

Uploaded by

rajkumarm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Unit – I (15 Hrs)

Shell or Notebook: Launching - Launching the Jupiter Notebook* - Help and Documentation
in IPython - Exploring Modules with Tab Completion - Keyboard Shortcuts in the IPython Shell -
IPython Magic Commands.

Unit – II (15 Hrs)

NumPy:Introduction to NumPy – The Basics of NumPy arrays*–Computation on NumPy
Arrays–Aggregations: Min. Max, and Everything in Between – Computation on Arrays.
Unit – III (15 Hrs)
Pandas:Introduction to pandas - Data manipulation with pandas–Operating on null values*,
hierarchical indexing – Combining Datasets – Aggregation and Grouping.
Unit – IV (15 Hrs)
Matplotlib:Introduction to Matplotlib– Visualization with Matplotlib – Simple line plots*–
scatter plots – visualizing errors – Histograms, binnings and density – Customizing plots –
Multiple
sub plots – Text annotation.
Unit – V (15 Hrs)
Sci-kit Learn: Introduction to Scikit Learn: Data representation – Hyper parameters
&Validation: Selecting the best model – Learning Curves. – Correlation - Linear
Regression: Simple
Linear Regression – Basis function regression – Regularization.
You're looking for comprehensive notes on a syllabus that seems to cover core Python libraries
for data science: IPython/Jupyter, NumPy, Pandas, Matplotlib, and Scikit-Learn. This is an
excellent curriculum for anyone getting into data analysis and machine learning!

As with the previous request, providing "full notes" equivalent to a textbook is beyond a single AI
response. However, I can give you a detailed and extensive explanation of each topic from
your syllabus, providing a strong foundation for your learning.

Let's break down each unit:

Unit – I: Shell or Notebook (15 Hrs)

This unit focuses on the interactive computing environment, primarily IPython and Jupyter
Notebooks, which are indispensable tools for data science in Python.

1. Launching Jupyter Notebook

● What is Jupyter Notebook?

○ An open-source web application that allows you to create and share documents
containing live code, equations, visualizations, and narrative text.
○ It's widely used for data cleaning and transformation, numerical simulation,
statistical modeling, data visualization, and machine learning.
○ The "Jupyter" name comes from the core languages it supports: Julia, Python,
and R.
● Installation:
○ Most commonly installed as part of the Anaconda distribution (highly
recommended for data science). Anaconda includes Python, Jupyter, NumPy,
Pandas, Matplotlib, SciPy, and many other essential libraries.
○ Alternatively, you can install it via pip: pip install jupyter
● Launching from Command Line:
○ Open your terminal or command prompt.
○ Navigate to the directory where you want to save your notebooks (e.g., cd
Documents/MyProjects).
○ Type jupyter notebook and press Enter.
○ This will typically open a new tab in your default web browser, showing the
Jupyter Notebook dashboard, which is a file explorer.
● Dashboard Features:
○ Files: Lists files and directories, allows you to open existing notebooks or create
new ones.
○ Running: Shows all currently running notebooks and terminals.
○ Clusters: (Less common now, related to older IPython parallel computing).
● Creating a New Notebook: From the dashboard, click "New" -> "Python 3" (or your
desired kernel). This opens a new, empty notebook.
● Notebook Interface:
○ Cells: The fundamental building blocks. Can be Code cells (for Python code) or
Markdown cells (for text, headings, images, equations).
○ Toolbar: Icons for saving, adding cells, cutting/copying/pasting cells, running
cells, interrupting kernel, restarting kernel.
○ Kernel: The computational engine that executes the code in your notebook. The
"Python 3" kernel means your code will be executed by a Python 3 interpreter.
○ Modes:
■ Command Mode (Blue border): For manipulating cells (add, delete,
move). Press Esc to enter.
■ Edit Mode (Green border): For typing inside a cell. Press Enter to
enter.
● Running Cells:
○ Shift + Enter: Run the current cell and select the next cell below.
○ Ctrl + Enter: Run the current cell in place.
○ Alt + Enter: Run the current cell and insert a new cell below.

2. Help and Documentation in IPython

● IPython: An enhanced interactive Python shell that Jupyter Notebooks run on top of. It
provides many features that make interactive computing more powerful.
● ? (Introspection):
○ Append ? to a variable, function, method, or object to get quick access to its
documentation (docstring).

Example:
Python
import numpy as np
np.array?

○
○ This will open a pager at the bottom of the screen with information like signature,
docstring, type, and file location.
● ?? (Source Code):
○ Append ?? to a function or method to view its full source code (if available, not
for compiled C extensions).

Example:
Python
def my_function(x):
"""This is a docstring."""
return x * 2
my_function??

○
● Tab Completion:
○ Object Methods/Attributes: Type object_name. and press Tab to see
available methods and attributes.
○ Module Contents: Type module_name. and press Tab to see functions,
classes, and variables within that module.
○ File Path Completion: In string literals, press Tab to complete file paths.
○Function Signature: After typing function_name( pressing Shift + Tab
(once, twice, thrice) can bring up parameter information/docstring (especially
useful in notebooks).
● help() function:
○ Standard Python built-in function. help(object_name) provides a more
verbose, pager-based help.
○ Example: help(np.mean)

3. Exploring Modules with Tab Completion

● As mentioned above, tab completion is crucial for exploring libraries.

Example:
Python
import pandas as pd
pd.DataFrame. # Press Tab here to see all methods like .head, .describe, .iloc etc.

data = {'A': [1,2], 'B': [3,4]}

df = pd.DataFrame(data)
df.loc[0, 'A'] # After 'A', press Tab for quotes completion, or if you had a variable name.

●
● This significantly reduces the need to constantly look up documentation manually,
making coding faster and more efficient.

4. Keyboard Shortcuts in the IPython Shell (and Jupyter Notebook)

● In Command Mode (Esc to activate, blue cell border):

○ A: Insert cell Above.
○ B: Insert cell Below.
○ DD: Delete selected cell(s).
○ M: Change cell to Markdown.
○ Y: Change cell to Code (Yes, code).
○ Z: Zundo last cell deletion.
○ X: Xut (cut) selected cell(s).
○ C: Copy selected cell(s).
○ V: Vaste (paste) cell below.
○ Shift + V: Paste cell above.
○ L: Toggle line numbers in the selected cell.
○ Up/Down arrows: Navigate between cells.
○ Shift + Up/Down: Select multiple cells.
○ Enter: Enter Edit Mode.
● In Edit Mode (Enter to activate, green cell border):
○ Ctrl + Enter: Run current cell.
○ Shift + Enter: Run current cell and select next.
○ Alt + Enter: Run current cell and insert new cell below.
○ Tab: Code completion.
○ Shift + Tab: Tooltip/docstring for function arguments (repeat for more detail).
○ Ctrl + /: Comment/uncomment selected lines.

5. IPython Magic Commands

● Concept: Special commands that start with % (line magics) or %% (cell magics). They
extend the functionality of IPython and Jupyter, offering convenient shortcuts for
common tasks.
● Line Magics (%): Apply to a single line.
○ %run script.py: Run a Python script.

%timeit expression: Time the execution of a single line of Python code (runs multiple times
for accuracy).
Python
%timeit [i**2 for i in range(1000)]

○
○ %debug: Enter the interactive debugger after an exception.
○ %who / %whos: List variables defined in the current namespace (with details).
○ %lsmagic: List all available magic commands.
○ %pwd: Print current working directory.
○ %cd path/to/directory: Change current working directory.
○ %env: List environment variables.
○ %matplotlib inline (or %matplotlib notebook): Render Matplotlib plots
directly within the notebook output. Essential for visualization.
○ %load_ext autoreload: Load the autoreload extension.
○ %autoreload 2: Automatically reload modules before executing code (useful
during development).
● Cell Magics (%%): Apply to the entire cell. Must be the first line of the cell.

%%timeit: Time the execution of the entire cell.

Python
%%timeit
L = []
for n in range(1000):
L.append(n**2)

○
○ %%time: Report the wall clock time and CPU time for the cell.
○ %%bash / %%sh: Execute the cell content as a bash/shell command.
○ %%html: Render the cell content as HTML.
○ %%writefile filename.py: Write the cell content to a file.
○ %%file filename.txt: (Deprecated, %%writefile is preferred).
○ %%latex: Render the cell content as LaTeX.
Unit – II: NumPy (15 Hrs)
NumPy (Numerical Python) is the fundamental package for numerical computation in Python,
providing powerful N-dimensional array objects and tools for integrating C/C++ and Fortran
code.

1. Introduction to NumPy

● Why NumPy?
○ Python lists are general-purpose, but for numerical operations, they are slow and
inefficient for large datasets.
○ NumPy arrays (ndarray) are designed for efficient numerical operations on
large amounts of data.
○ They are homogenous (all elements are of the same data type), which allows
for highly optimized, vectorized operations.
○ Under the hood, NumPy operations are often implemented in C or Fortran,
making them much faster than pure Python loops.
● Key Features:
○ ndarray: A fast and efficient multi-dimensional array object.
○ Mathematical functions for operating on arrays (linear algebra, Fourier
transforms, random number generation).
○ Tools for integrating C/C++ and Fortran code.
● Installation: Usually comes with Anaconda. Otherwise: pip install numpy
● Import Convention: import numpy as np

2. The Basics of NumPy Arrays (ndarray)

● Creating Arrays:

From Python Lists:

Python
import numpy as np
arr1 = np.array([1, 2, 3, 4]) # 1D array (vector)
arr2 = np.array([[1, 2], [3, 4]]) # 2D array (matrix)

Fixed-Size Arrays:
Python
np.zeros(5, dtype=int) # array([0, 0, 0, 0, 0])
np.ones((3, 5), dtype=float) # 3x5 array of ones
np.full((2, 2), 7) # 2x2 array with all 7s
np.empty(3) # Uninitialized values

○
Sequences:
Python
np.arange(0, 10, 2) # Like range(), but returns an array: array([0, 2, 4, 6, 8])
np.linspace(0, 1, 5) # 5 evenly spaced numbers between 0 and 1: array([0. , 0.25, 0.5 ,
0.75, 1. ])

Random Arrays:
Python
np.random.rand(3, 3) # Uniform distribution [0, 1)
np.random.randn(3, 3) # Standard normal distribution
np.random.randint(0, 10, size=(3, 3)) # Random integers

Identity Matrix:
Python
np.eye(3) # 3x3 identity matrix

○
● Array Attributes:
○ ndim: Number of dimensions.
○ shape: Tuple indicating the size of each dimension.
○ size: Total number of elements.
○ dtype: Data type of the elements (e.g., int64, float64).
○ itemsize: Size of each element in bytes.
● Array Indexing and Slicing:

1D Arrays (like Python lists):

Python
arr = np.array([0, 1, 2, 3, 4, 5])
arr[0] #0
arr[-1] # 5
arr[1:4] # array([1, 2, 3])
arr[:3] # array([0, 1, 2])

Multi-dimensional Arrays:
Python
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[0, 0] # 1
arr2d[2, 1] # 8
arr2d[:2, :2] # Slice rows and columns
# array([[1, 2],
# [4, 5]])
arr2d[1, :] # Second row: array([4, 5, 6])
arr2d[:, 0] # First column: array([1, 4, 7])

Fancy Indexing: Using arrays of integers or booleans to select arbitrary subsets of data.
Python
arr = np.array([10, 20, 30, 40, 50])
indices = [0, 2, 4]
arr[indices] # array([10, 30, 50])

mask = arr > 30

arr[mask] # array([40, 50])

○
● Reshaping Arrays:

reshape(): Returns a new array with a different shape, without changing the data.
Python
arr = np.arange(1, 10)
arr.reshape((3, 3))
# array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])

-1 in reshape: NumPy can infer one dimension.

Python
arr.reshape((3, -1)) # same as (3, 3)

○
○ ravel() / flatten(): Convert multi-dimensional array to 1D. flatten()
returns a copy, ravel() returns a view (if possible).
● Concatenation and Splitting:

np.concatenate(): Join arrays along an existing axis.

Python
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
np.concatenate([x, y]) # array([1, 2, 3, 4, 5, 6])

grid = np.array([[1, 2], [3, 4]])

np.concatenate([grid, grid]) # along axis=0 (rows)
# array([[1, 2],
# [3, 4],
# [1, 2],
# [3, 4]])
np.concatenate([grid, grid], axis=1) # along axis=1 (columns)
# array([[1, 2, 1, 2],
# [3, 4, 3, 4]])

○
○ np.vstack(): Stack arrays vertically (row-wise).
○ np.hstack(): Stack arrays horizontally (column-wise).
○ np.dstack(): Stack arrays depth-wise (3D).
○ np.split(), np.vsplit(), np.hsplit(), np.dsplit(): Split arrays into
multiple sub-arrays.

3. Computation on NumPy Arrays

● Universal Functions (UFuncs):

○ NumPy provides "vectorized" operations via ufuncs, which perform element-wise
operations on arrays. These are much faster than explicit Python loops.

Arithmetic Operations: +, -, *, /, // (floor division), %, ** (exponentiation).

Python
arr = np.array([1, 2, 3])
arr + 5 # array([6, 7, 8])
arr * arr # array([1, 4, 9])

○
○ Comparison Operators: >, <, ==, !=, >=, <=. Return boolean arrays.
○ Trigonometric Functions: np.sin(), np.cos(), np.tan().
○ Exponentials and Logarithms: np.exp(), np.log(), np.log2(),
np.log10().
○ Other UFuncs: np.abs(), np.sqrt(), np.ceil(), np.floor(),
np.round().
● Broadcasting:
○ A powerful mechanism that allows NumPy to perform operations on arrays of
different shapes.
○ It effectively "stretches" the smaller array across the larger array so that they
have compatible shapes.
○ Rules:
1. If the arrays have different numbers of dimensions, prepend 1s to the
shape of the smaller array until both shapes have the same length.
2. Two dimensions are compatible when they are equal, or one of them is 1.
3. If the dimensions are incompatible, an error is raised.

Example:
Python
a = np.array([0, 10, 20, 30]) # shape (4,)
b = np.array([0, 1, 2]) # shape (3,)
# Cannot directly add.
# But:
a = np.arange(3)[:, np.newaxis] # shape (3, 1)
b = np.arange(3) # shape (3,)
a+b
# array([[0, 1, 2],
# [1, 2, 3],
# [2, 3, 4]])

○ The b array (shape (3,)) is broadcast across the columns of a (shape (3,1)).

4. Aggregations: Min, Max, and Everything In Between

● Definition: Operations that collapse an array (or parts of it) into a single value.
● Common Aggregations:
○ np.sum(), np.min(), np.max(), np.mean(), np.median(), np.std()
(standard deviation), np.var() (variance).
● Axis Argument: Most aggregation functions accept an axis argument to specify along
which dimension the aggregation should occur.
○ axis=0: Aggregate down the columns (result has number of rows = 1).
○ axis=1: Aggregate across the rows (result has number of columns = 1).
○ If axis is not specified, the aggregation is performed over the entire array.

Python
M = np.random.randint(0, 10, (3, 4))
# array([[8, 2, 5, 0],
# [7, 6, 8, 8],
# [0, 2, 4, 7]])

M.sum() # Sum of all elements (scalar)

M.min(axis=0) # Min along columns (returns array with 4 elements)
# array([0, 2, 4, 0])
M.max(axis=1) # Max along rows (returns array with 3 elements)
# array([8, 8, 7])

●
● np.nansum(), np.nanmean(), etc.: Versions of aggregation functions that ignore NaN
(Not a Number) values.

5. Computation on Arrays (More Advanced Topics)

● Sorting Arrays:
○ np.sort(arr): Returns a sorted copy of the array.
○ arr.sort(): Sorts the array in-place.
○ np.argsort(arr): Returns the indices that would sort the array.
Python
x = np.array([2, 1, 4, 3, 5])
np.sort(x) # array([1, 2, 3, 4, 5])
i = np.argsort(x) # array([1, 0, 3, 2, 4])
x[i] # array([1, 2, 3, 4, 5])

●
○ Sorting along an axis for multi-dimensional arrays.
● Partial Sorts:
○ np.partition(arr, K): Returns a copy with the Kth smallest value in its
sorted position, and all smaller values to its left, larger values to its right.
○ np.argpartition(): Returns the indices.
● Structured Arrays:
○ Arrays with compound data types, allowing elements to have multiple fields (like
a C struct or database row).

Python
data = np.zeros(3, dtype={'names':('name', 'age', 'weight'),
'formats':('U10', 'i4', 'f8')})
# data[0] = ('Alice', 25, 55.5)
# data['name'] # Access by field name

●
● Masking and Filtering:
○ Using boolean arrays created by comparison operators to select elements.

Python
x = np.arange(10)
x[x % 2 == 0] # Select even numbers: array([0, 2, 4, 6, 8])

●
● Linear Algebra:
○ np.dot(a, b) or a @ b (Python 3.5+): Dot product of two arrays (matrix
multiplication).
○ np.linalg.inv(matrix): Inverse of a matrix.
○ np.linalg.det(matrix): Determinant of a matrix.
○ np.linalg.eig(matrix): Eigenvalues and eigenvectors.
● Broadcasting in more complex scenarios (e.g., adding a 1D array to a 2D array,
column-wise).

Unit – III: Pandas (15 Hrs)

Pandas is a powerful, flexible, and easy-to-use open-source data analysis and manipulation
library for Python. It builds on NumPy and provides high-performance data structures.
1. Introduction to Pandas

● Why Pandas?
○ NumPy is great for numerical arrays, but lacks labels for rows/columns and
handles heterogeneous data poorly.
○ Pandas introduces two primary data structures: Series (1D labeled array) and
DataFrame (2D labeled table).
○ Makes data cleaning, transformation, analysis, and visualization much easier and
more intuitive.
○ Excellent for handling tabular data, time series data, and heterogeneous data.
● Key Features:
○ DataFrame objects for data manipulation with integrated indexing.
○ Tools for reading and writing data between in-memory data structures and
different formats: CSV, text files, SQL databases, HDF5 format.
○ Intelligent data alignment and integrated handling of missing data.
○ Flexible groupby functionality for performing split-apply-combine operations.
○ High performance merging and joining of datasets.
○ Time series functionality.
● Installation: Usually comes with Anaconda. Otherwise: pip install pandas
● Import Convention: import pandas as pd

2. Data Manipulation with Pandas

2.1. Pandas Series

● Definition: A one-dimensional labeled array capable of holding any data type (integers,
strings, floats, Python objects, etc.).
● Creation:

From List/Array:
Python
s = pd.Series([0.25, 0.5, 0.75, 1.0])

With Custom Index:

Python
s = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])

From Dictionary: Keys become index, values become data.

Python
population = {'California': 38332521, 'Texas': 26448193, ...}
s = pd.Series(population)

○
● Attributes: s.values (NumPy array of values), s.index (Pandas Index object).
● Indexing and Slicing:
○ Explicit Index: s['a']
○ Implicit (Integer) Index: s[0]
○ Slicing by Explicit Index: s['a':'c'] (inclusive of end)
○ Slicing by Implicit Index: s[0:2] (exclusive of end)
○ Fancy Indexing: s[['a', 'c']]
○ Boolean Masking: s[s > 0.5]
○ loc (label-location based indexer): s.loc['a'], s.loc['a':'c']
○ iloc (integer-location based indexer): s.iloc[0], s.iloc[0:2]

2.2. Pandas DataFrame

● Definition: A two-dimensional labeled data structure with columns of potentially different

types. You can think of it as a spreadsheet, a SQL table, or a dictionary of Series
objects.
● Creation:

From Dictionary of Series:

Python
data = {'col1': pd.Series([1, 2, 3]), 'col2': pd.Series(['A', 'B', 'C'])}
df = pd.DataFrame(data)

From Dictionary of Lists/Arrays:

Python
data = {'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']}
df = pd.DataFrame(data)

From NumPy Array (with explicit index/columns):

Python
df = pd.DataFrame(np.random.rand(3, 2), columns=['foo', 'bar'], index=['a', 'b', 'c'])

○
○ From CSV, Excel, etc.: pd.read_csv('file.csv'),
pd.read_excel('file.xlsx')
● Attributes: df.values (NumPy array), df.index, df.columns.
● Indexing and Selection:
○ Column Selection: df['col1'] (returns Series), df[['col1', 'col2']]
(returns DataFrame).
○ Row Selection:
■ df.iloc[0] (first row by integer position)
■ df.loc['row_label'] (row by label)
■ Slicing rows: df[1:3] (by integer position, iloc preferred)
■ df.loc['label1':'label3'] (by label, loc preferred)
○ Combined Selection (loc, iloc):
■ df.loc[row_label, column_label]
■ df.iloc[row_pos, column_pos]
■ df.loc[df['col1'] > 1, ['col2']] (Boolean indexing rows, label
indexing columns)

Adding/Modifying Columns:
Python
df['new_col'] = df['col1'] * 2
df['another_col'] = ['X', 'Y', 'Z']

●
● Dropping Columns/Rows:
○ df.drop('column_name', axis=1, inplace=True) (inplace modifies
DataFrame)
○ df.drop(['row_label1', 'row_label2'], axis=0)
● rename(): Renaming columns or index labels.
● set_index() / reset_index(): Changing the DataFrame index.

3. Operating on Null Values (Missing Data)

● Representation: Pandas uses NaN (Not a Number) for floating-point missing values and
None for object-type missing values (Python None). NumPy's NaN is used internally.
● Detection:
○ df.isnull(): Returns a boolean DataFrame of the same shape, True where
null.
○ df.notnull(): Opposite of isnull().
○ df.isna() and df.notna() are aliases for isnull() and notnull().
● Counting Nulls:
○ df.isnull().sum(): Counts nulls per column.
○ df.isnull().sum().sum(): Total nulls in DataFrame.
● Handling Missing Data:
○ dropna() (Dropping):
■ df.dropna(): Drops any row containing any NaN value.
■ df.dropna(axis='columns') or df.dropna(axis=1): Drops any
column containing any NaN value.
■ df.dropna(how='all'): Drops rows/columns only if all values are
NaN.
■ df.dropna(thresh=N): Requires at least N non-null values for a
row/column to be kept.
○ fillna() (Filling):
■ df.fillna(0): Fill all NaN values with 0.
■ df.fillna(method='ffill') or df.fillna(method='pad'):
Forward-fill (propagate last valid observation forward).
■ df.fillna(method='bfill') or
df.fillna(method='backfill'): Backward-fill (propagate next valid
observation backward).
■ df.fillna(df.mean()): Fill NaN in each column with that column's
mean.
■ df['column'].fillna(df['column'].median()): Fill specific
column's NaN with its median.
■ df.fillna(value=dictionary_of_column_values): Fill with
different values per column.

4. Hierarchical Indexing (MultiIndex)

● Concept: Allows you to have multiple index levels on an axis (row or column), enabling
data to be stored and manipulated in higher dimensional space.
● Creation:

From list of tuples:

Python
index = [('California', 2000), ('California', 2010),
('New York', 2000), ('New York', 2010),
('Texas', 2000), ('Texas', 2010)]
populations = [33871648, 37253956,
18976457, 19378102,
20851820, 25145561]
multi_index_series = pd.Series(populations, index=index)

○
○ Using pd.MultiIndex.from_arrays, from_product, from_tuples:

From DataFrame columns using set_index():

Python
df.set_index(['State', 'Year'], inplace=True)

○
● Indexing and Slicing MultiIndex:
○ Accessing by Outer Level: multi_index_series['California']
○ Accessing by Inner Level: Requires slicing with loc or iloc.
○ xs() (Cross-section): For selecting data at a particular level.

Partial Indexing:
Python
multi_index_series['California', 2000] # Single element
multi_index_series.loc['California': 'New York'] # Slice on outer

loc with tuple indexing:

Python
df.loc[('California', 2000):('New York', 2010)]
df.loc[(slice(None), 2000), :] # Select all states for year 2000
○
● Rearranging MultiIndex:
○ unstack(): Converts a MultiIndex level from rows to columns.
○ stack(): Converts a MultiIndex level from columns to rows.
● Sorting MultiIndex: df.sort_index(level=0) or
df.sort_index(level='State')

5. Combining Datasets

● pd.concat() (Concatenation):
○ Joins DataFrame or Series objects along a particular axis (rows or columns).
○ pd.concat([df1, df2]): By default, stacks vertically (rows). Aligns columns.
○ pd.concat([df1, df2], axis=1): Stacks horizontally (columns). Aligns
rows.
○ join argument: 'inner' (default is 'outer') for intersection of indices.
○ ignore_index=True: Resets the resulting index.
○ keys argument: Creates a hierarchical index to identify the origin of each chunk.
● pd.merge() (Merging/Joining):
○ Combines DataFrames based on common columns or indices (like SQL JOINs).
○ on: Column name(s) to join on.
○ left_on, right_on: Column names if different in left/right DataFrames.
○ left_index, right_index: Join on index.
○ how argument (Join Types):
■ 'inner' (default): Intersection of keys.
■ 'outer': Union of keys (includes all rows, fills with NaN).
■ 'left': Includes all rows from left DataFrame, matching from right.
■ 'right': Includes all rows from right DataFrame, matching from left.
● df.join(): A convenience method for joining DataFrames on their indexes (or a key
column to index). Simpler for index-based joins than merge.

6. Aggregation and Grouping

● Simple Aggregations: df.sum(), df.mean(), df.min(), df.max(), df.count(),

df.median(), df.std(), df.var(), df.describe().
○ By default, these aggregate per column. axis=1 to aggregate per row.
● groupby() (Split-Apply-Combine):
○ A powerful method for grouping rows together based on one or more column
values and then applying an aggregation or transformation.
○ Split: The data is divided into groups based on some criterion.
○ Apply: A function is applied to each group independently.
○ Combine: The results are combined into a new data structure.
○ Syntax: df.groupby('column_name') or df.groupby(['col1',
'col2'])

Common Applications:
Python
df.groupby('category_col')['value_col'].mean() # Mean of 'value_col' for each 'category_col'
df.groupby('category_col').size() # Count of items in each category
df.groupby('category_col').describe() # Full statistics for each category

aggregate() (or agg()): Apply multiple aggregation functions.

Python
df.groupby('category_col').agg(['min', np.median, 'max'])
df.groupby('category_col').agg(mean_val=('value_col', 'mean'), sum_val=('another_col', 'sum'))

filter(): Return a subset of the DataFrame based on a group property.

Python
df.groupby('category_col').filter(lambda x: x['value_col'].mean() > 50)

transform(): Return a Series/DataFrame with the same shape as the original, where values
are group-wise transformations.
Python
df['normalized_val'] = df.groupby('category_col')['value_col'].transform(lambda x: (x - x.mean()) /
x.std())

○
○ apply(): Apply an arbitrary function to each group. Very flexible but can be
slower.

Unit – IV: Matplotlib (15 Hrs)

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations
in Python. It's the foundation for many other plotting libraries.

1. Introduction to Matplotlib

● Why Matplotlib?
○ Provides a highly flexible and powerful way to create a wide variety of static,
animated, and interactive plots.
○ Can produce publication-quality figures in a variety of hardcopy formats and
interactive environments.
○ Often integrated with NumPy and Pandas.
● Key Concepts:
○ Figures: The top-level container for all plot elements. You can have multiple
figures.
○ Axes: The actual plotting area where the data is drawn. A figure can contain
multiple axes (subplots). Each axes has x-axis, y-axis, titles, labels, etc.
○ pyplot module: matplotlib.pyplot is a collection of functions that make
Matplotlib work like MATLAB. It's the most common way to use Matplotlib.
● Installation: Usually comes with Anaconda. Otherwise: pip install matplotlib
● Import Convention: import matplotlib.pyplot as plt
● Displaying Plots in Jupyter:
○ %matplotlib inline: Displays plots statically embedded within the notebook
output.
○ %matplotlib notebook: Displays interactive plots within the notebook (zoom,
pan).

2. Visualization with Matplotlib

Simple Plotting Interface (plt.plot()):

Python
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)

plt.plot(x, np.sin(x))
plt.show() # In scripts, this displays the plot

●
● Object-Oriented Interface (Recommended for complex plots):
○ Explicitly create Figure and Axes objects.
○ Allows for more fine-grained control.

Python
fig = plt.figure() # Create a new figure
ax = fig.add_subplot(1, 1, 1) # Create axes (1 row, 1 col, 1st subplot)
ax.plot(x, np.cos(x))
ax.set_title("Cosine Wave")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
plt.show()

●
● Saving Plots: plt.savefig('my_plot.png') or fig.savefig('my_plot.pdf')

3. Simple Line Plots

plt.plot(x, y): Plots y versus x as lines and/or markers.

Python
plt.plot([1, 2, 3, 4], [1, 4, 2, 3]) # Simple line plot
plt.plot(x, np.sin(x), '-') # Solid line
plt.plot(x, np.cos(x), '--') # Dashed line
plt.plot(x, np.tan(x), 'o') # Markers only
plt.plot(x, np.sin(x), 'o-') # Line with markers
●
● Customizing Line Styles and Colors:
○ Color: 'r' (red), 'g' (green), 'b' (blue), 'k' (black), 'c' (cyan), etc.
○ Linestyle: '-', '--', '-.', ':'
○ Marker: 'o', '^', 's', '+', 'x'
○ Combined format string: plt.plot(x, y, 'r--o')

4. Scatter Plots

● plt.scatter(x, y): Plots individual data points as markers.

○ Generally used when there's no direct connection between points (e.g.,
relationship between two variables).
● Customizing Markers:
○ s (size of markers), c (color of markers, can be an array for varying colors),
alpha (transparency), marker (marker style).

Python
rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)

plt.scatter(x, y, c=colors, s=sizes, alpha=0.3, cmap='viridis')

plt.colorbar(); # show color scale

●
● vs. plt.plot(x, y, 'o'):
○ plt.plot() is optimized for plotting points along a line (even if markers are
used). It's faster for large datasets of uniform properties.
○ plt.scatter() offers more flexibility in controlling individual point properties
(color, size) from data arrays.

5. Visualizing Errors

● plt.errorbar(x, y, yerr=..., xerr=..., fmt='o'):

○ Plots data points with error bars, useful for representing uncertainty in
measurements.
○ yerr: Vertical error. Can be a scalar, array, or 2D array for asymmetric errors.
○ xerr: Horizontal error.
○ fmt: Format string for the main plot (e.g., 'o' for markers).
○ ecolor, capsize, capthick: Customize error bar appearance.

Python
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='o', color='black', ecolor='lightgray', elinewidth=3, capsize=0);

6. Histograms, Binnings, and Density

● Histograms (plt.hist()):
○ Shows the distribution of a single numerical variable by dividing data into bins
and counting observations in each bin.
○ bins: Number of bins or sequence of bin edges.
○ density=True: Normalize to form a probability density (area under histogram
sums to 1).
○ alpha: Transparency.
○ histtype: 'bar' (default), 'barstacked', 'step', 'stepfilled'.

Python
data = np.random.randn(1000)
plt.hist(data, bins=30, alpha=0.5, histtype='stepfilled', color='steelblue', edgecolor='none');

●
● Two-Dimensional Histograms (plt.hist2d(), plt.hexbin()):
○ For visualizing the joint distribution of two variables.
○ plt.hist2d(): Rectangular bins, color intensity indicates density.
○ plt.hexbin(): Hexagonal bins, often more visually appealing.
● Kernel Density Estimation (KDE):
○ A non-parametric way to estimate the probability density function of a random
variable. Smoothed version of a histogram.
○ Often done using seaborn.kdeplot or scipy.stats.gaussian_kde.
○ plt.plot(density_x, density_y) after calculating KDE.

7. Customizing Plots

● Titles and Labels:

○ plt.title("Plot Title") or ax.set_title()
○ plt.xlabel("X-axis Label") or ax.set_xlabel()
○ plt.ylabel("Y-axis Label") or ax.set_ylabel()
● Legends (plt.legend()):
○ Displays labels for different lines/markers.
○ Needs label='...' argument in plot/scatter calls.
○ loc: Location of the legend (e.g., 'upper left', 'best').
● Limits and Ticks:
○ plt.xlim(xmin, xmax), plt.ylim(ymin, ymax) or ax.set_xlim(),
ax.set_ylim()
○ plt.xticks(ticks, labels), plt.yticks() or ax.set_xticks(),
ax.set_yticks()
○ plt.axis('tight'): Auto-tight limits. plt.axis('equal'): Equal aspect
ratio.
● Stylesheets:
○ plt.style.use('ggplot'), plt.style.use('seaborn-v0_8'),
plt.style.use('fivethirtyeight') etc.
○ Provides a quick way to change the overall aesthetic. plt.style.available
to see options.
● Grid: plt.grid(True)
● Colors and Colormaps:
○ color='red', cmap='viridis', cmap='plasma'.

8. Multiple Subplots

plt.figure() and fig.add_subplot(): Object-oriented way (more flexible).

Python
fig = plt.figure(figsize=(10, 4))
ax1 = fig.add_subplot(1, 2, 1) # 1 row, 2 columns, 1st subplot
ax2 = fig.add_subplot(1, 2, 2) # 1 row, 2 columns, 2nd subplot
ax1.plot(x, np.sin(x))
ax2.plot(x, np.cos(x))
plt.tight_layout() # Adjust subplot params for a tight layout

plt.subplots(): A convenient helper function that creates a figure and a grid of subplots in a
single call.
Python
fig, axes = plt.subplots(2, 2, figsize=(8, 6)) # 2 rows, 2 columns

axes[0, 0].plot(x, np.sin(x)) # Access individual axes using array-like indexing

axes[1, 1].hist(np.random.randn(100), bins=20)
plt.tight_layout()

●
● plt.subplot(): MATLAB-style interface for creating subplots (less flexible for
complex layouts).

9. Text Annotation

● plt.text(x, y, 'Text') or ax.text(x, y, 'Text'): Add arbitrary text at a

specific data coordinate.
○ ha (horizontal alignment), va (vertical alignment), fontsize, color.
● plt.annotate('Text', xy=(x, y), xytext=(x_arrow, y_arrow),
arrowprops=dict(facecolor='black', shrink=0.05)) or ax.annotate():
○ Adds text with an arrow pointing from xytext to xy (the annotated point).
○ xy: Point to annotate.
○ xytext: Position of the text.
○ arrowprops: Dictionary to customize the arrow.
Mathematical Text ($ for LaTeX-like syntax):
Python
plt.title(r'$\sin(x)$ vs $\cos(x)$')
plt.text(5, 0.5, r'$\mu = 100, \sigma = 15$', fontsize=12)

Unit – V: Sci-kit Learn (15 Hrs)

Scikit-learn is a free software machine learning library for the Python programming language. It
features various classification, regression, and clustering algorithms and is designed to
interoperate with NumPy and Pandas.

1. Introduction to Scikit-Learn

● Why Scikit-Learn?
1. Provides a consistent API for a wide range of machine learning algorithms.
2. Built on NumPy, SciPy, and Matplotlib.
3. Emphasis on ease of use, robust implementation, and good documentation.
4. Does not handle data loading, data manipulation (use Pandas), or highly
customized deep learning (use TensorFlow/PyTorch).
● Key Principles:
1. Estimators: All objects in scikit-learn that learn from data are called estimators.
They typically have fit() and predict() methods.
2. Consistency: All estimators share a common API.
3. Inspection: All parameters and learned attributes are public attributes.
4. Sensible Defaults: Most parameters have reasonable default values.
● Installation: Usually comes with Anaconda. Otherwise: pip install scikit-learn
● Common Workflow:
1. Choose a Model Class: Import the appropriate estimator (e.g., from
sklearn.linear_model import LinearRegression).
2. Choose Model Hyperparameters: Instantiate the model with desired settings
(e.g., model = LinearRegression(fit_intercept=True)).
3. Arrange Data: Prepare data into feature matrix X (2D NumPy array or Pandas
DataFrame) and target vector y (1D NumPy array or Pandas Series).
4. Fit the Model: Use the fit() method to train the model on your data
(model.fit(X, y)).
5. Predict/Transform: Use the predict() method to make predictions on new
data (model.predict(X_new)) or transform() for feature transformation.

2. Data Representation

● Feature Matrix (X):

○ Typically a 2D NumPy array or Pandas DataFrame of shape [n_samples,
n_features].
○ n_samples: The number of observations or data points (rows).
○ n_features: The number of features or attributes per observation (columns).
○ Each row is a single data point/sample.
○ Each column is a specific feature.
● Target Vector (y):
○ Typically a 1D NumPy array or Pandas Series of shape [n_samples] or
[n_samples, 1].
○ Contains the labels or target values corresponding to each sample in X.
○ For supervised learning tasks (classification, regression).

Example:
Python
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data # Feature matrix: sepal length, sepal width, petal length, petal width
y = iris.target # Target vector: species (0, 1, 2)
X.shape # (150, 4) -> 150 samples, 4 features
y.shape # (150,) -> 150 target values

●
● Feature Engineering: The process of creating new features from existing raw data to
improve model performance.

3. Hyperparameters & Validation: Selecting the Best Model

● Hyperparameters:
○ Parameters of the learning algorithm that are set prior to training (e.g.,
n_estimators in a Random Forest, alpha in Ridge regression, C in SVM).
○ They are not learned from the data during training.
○ Crucial for controlling model complexity and preventing overfitting/underfitting.
● Model Validation:
○ The process of evaluating how well a trained model generalizes to unseen data.
○ Why not just use training error? Training error is usually overly optimistic; a
complex model might memorize the training data but perform poorly on new data
(overfitting).
○ Common Validation Approaches:
■ Train/Test Split: Divide data into training set (e.g., 70-80%) and test set
(remaining). Train on training, evaluate on test.
■ from sklearn.model_selection import
train_test_split
■ X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.3,
random_state=42)
■ Cross-Validation: A more robust method for estimating model
performance and selecting hyperparameters. Divides the data into
multiple "folds."
■ K-Fold Cross-Validation: Data is split into K equal-sized folds. In
K iterations, one fold is used for testing, and the remaining K-1
folds for training. The performance is averaged across K
iterations.
■ Stratified K-Fold: Ensures that each fold has a similar proportion
of class labels as the original dataset (important for imbalanced
datasets).
■ from sklearn.model_selection import KFold,
StratifiedKFold, cross_val_score
■ scores = cross_val_score(model, X, y, cv=5)
● Hyperparameter Tuning:
○ Grid Search: Systematically explores a predefined set of hyperparameter values
for a given model.
■ from sklearn.model_selection import GridSearchCV
■ param_grid = {'n_neighbors': [3, 5, 7], 'weights':
['uniform', 'distance']}
■ grid = GridSearchCV(KNeighborsClassifier(),
param_grid, cv=5)
■ grid.fit(X_train, y_train)
■ grid.best_params_, grid.best_score_,
grid.best_estimator_
○ Randomized Search: Randomly samples a fixed number of hyperparameter
combinations from a specified distribution. Often more efficient than Grid Search
for large search spaces.
■ from sklearn.model_selection import RandomizedSearchCV

4. Learning Curves

● Concept: Plots that show how a model's performance (e.g., score) changes as the
amount of training data increases.
● Interpretation:
○ High Bias (Underfitting): Both training and validation scores converge to a low
value. The model is too simple and cannot learn the underlying patterns. Getting
more data won't help much.
○ High Variance (Overfitting): Training score is high, but validation score is low,
and there's a significant gap between them. The model is too complex and has
memorized noise in the training data. Getting more data might help.
○ Good Fit: Training and validation scores converge to a high value, with a small
gap.
● Usage: Helps diagnose whether adding more training data, reducing model complexity,
or increasing model complexity would be beneficial.
● from sklearn.model_selection import learning_curve

5. Correlation

● Definition: A statistical measure that expresses the extent to which two variables are
linearly related.
● Pearson Correlation Coefficient (r):
○ Ranges from -1 to +1.
○ +1: Perfect positive linear relationship.
○ -1: Perfect negative linear relationship.
○ 0: No linear relationship.
● Usage in Data Science:
○ Feature Selection: Identify highly correlated features (multicollinearity can be a
problem for some models like linear regression).
○ Exploratory Data Analysis (EDA): Understand relationships between variables.

Calculation (Pandas):
Python
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [2,4,5,4,5], 'C': [5,4,3,2,1]})
df.corr() # Returns a correlation matrix
df['A'].corr(df['B']) # Correlation between two specific columns

●
● Correlation vs. Causation: Correlation does not imply causation.

6. Linear Regression

● Goal: To model the relationship between a dependent variable (target y) and one or
more independent variables (features X) by fitting a linear equation to observed data.
● Assumptions of Linear Regression: Linearity, independence of errors,
homoscedasticity, normality of residuals.

6.1. Simple Linear Regression

● Model: y=β0+β1x1+ϵ
○ y: Dependent variable.
○ x1: Independent variable (single feature).
○ β0: Y-intercept.
○ β1: Coefficient (slope).
○ ϵ: Error term.
● Minimizing Residual Sum of Squares (RSS): The model finds the β0 and β1 that
minimize the sum of the squared differences between the observed values and the
values predicted by the linear model.

Implementation (Scikit-Learn):
Python
from sklearn.linear_model import LinearRegression
import numpy as np

# Generate some sample data

rng = np.random.RandomState(42)
X = 10 * rng.rand(50, 1) # Feature (must be 2D for scikit-learn)
y = 2 * X - 1 + rng.randn(50, 1) # Target with some noise

model = LinearRegression(fit_intercept=True) # fit_intercept=False if you want to force through

origin
model.fit(X, y)
print("Slope (coefficient):", model.coef_) # Beta_1
print("Intercept:", model.intercept_) # Beta_0

y_pred = model.predict(X)

# You can then plot X, y and X, y_pred to visualize the fit

●
● Evaluation Metrics (for Regression):
○ Mean Squared Error (MSE): Average of the squared differences between
predicted and actual values.
○ Root Mean Squared Error (RMSE): Square root of MSE (in the same units as
y).
○ Mean Absolute Error (MAE): Average of the absolute differences. Less
sensitive to outliers than MSE.
○ R-squared (R2): Proportion of the variance in the dependent variable that is
predictable from the independent variables. Higher is better (max 1.0).
■ model.score(X, y) gives R2.

6.2. Basis Function Regression

● Concept: Extends linear regression to model non-linear relationships by transforming

the input features into a higher-dimensional space using basis functions.
● The model is still linear in the coefficients but non-linear in the original features.
● Common Basis Functions:
○ Polynomial Basis: Adds polynomial terms (e.g., x2,x3).
■ from sklearn.preprocessing import PolynomialFeatures
■ poly = PolynomialFeatures(degree=3,
include_bias=False)
■ X_poly = poly.fit_transform(X)
○ Gaussian Basis: Uses Gaussian (radial basis) functions, often with
KNeighborsTransformer or manual creation.
● Workflow:
○ Transform X using PolynomialFeatures (or other basis functions).
○ Fit a LinearRegression model on the transformed X_poly.
○ This allows fitting complex curves using the simple linear regression framework.

6.3. Regularization

● Problem: In complex linear models (e.g., with many features or high-degree polynomial
basis functions), the model can overfit the training data by assigning very large
coefficients to specific features.

● Solution: Regularization adds a penalty term to the loss function (the function being
minimized during training) that discourages large coefficients. This effectively shrinks
coefficients towards zero.
● Types of Regularization:

○ L2 Regularization (Ridge Regression):

■ Adds a penalty proportional to the sum of the squares of the
coefficients (∑βi2).
■ Shrinks coefficients towards zero, but rarely to exactly zero.
■ from sklearn.linear_model import Ridge
■ alpha parameter controls the strength of the penalty. Larger alpha
means stronger regularization.
○ L1 Regularization (Lasso Regression):
■ Adds a penalty proportional to the sum of the absolute values of the
coefficients (∑∣βi∣).
■ Can drive some coefficients exactly to zero, effectively performing feature
selection.
■ from sklearn.linear_model import Lasso
■ alpha parameter controls the strength of the penalty.
○ Elastic Net: Combines L1 and L2 regularization.
■ from sklearn.linear_model import ElasticNet
■ Has both alpha and l1_ratio parameters.
● Benefits of Regularization:

○ Reduces overfitting.
○ Improves generalization performance on unseen data.
○ Lasso can perform automatic feature selection.

This is a comprehensive set of notes covering the topics outlined in your syllabus. To truly
master these concepts and tools, remember to:

● Practice extensively: The best way to learn is by coding. Use Jupyter Notebooks to
follow along with examples and experiment.
● Refer to documentation: The official documentation for NumPy, Pandas, Matplotlib,
and Scikit-learn is excellent.
● Work on projects: Apply these skills to real-world datasets.
● Read relevant books/tutorials: "Python for Data Analysis" by Wes McKinney (Pandas
creator) and "Python Data Science Handbook" by Jake VanderPlas are highly
recommended resources.

Finite Element Analysis MCQ
100% (3)
Finite Element Analysis MCQ
38 pages
HW2 Questions (Theory of Computation) Due 09/26
No ratings yet
HW2 Questions (Theory of Computation) Due 09/26
2 pages
Jupyter Notebook
100% (1)
Jupyter Notebook
10 pages
15 Tips and Tricks Jupyter Notebook
No ratings yet
15 Tips and Tricks Jupyter Notebook
12 pages
Dsf - Unit II Notes
No ratings yet
Dsf - Unit II Notes
43 pages
28 Jupyter Notebook Tips, Tricks, and Shortcuts
No ratings yet
28 Jupyter Notebook Tips, Tricks, and Shortcuts
51 pages
Dhruv Python Lab File
No ratings yet
Dhruv Python Lab File
20 pages
UNIT - 2 NOTES DSF
No ratings yet
UNIT - 2 NOTES DSF
30 pages
Jupyter Notebook.docx
No ratings yet
Jupyter Notebook.docx
71 pages
1B Coding Environments - Copy
No ratings yet
1B Coding Environments - Copy
6 pages
Ipython Notebook Essentials: Chapter No. 1 "A Tour of The Ipython Notebook"
No ratings yet
Ipython Notebook Essentials: Chapter No. 1 "A Tour of The Ipython Notebook"
21 pages
Lab01 - Getting - Started
No ratings yet
Lab01 - Getting - Started
5 pages
Jupyter Notebook Tutorial
No ratings yet
Jupyter Notebook Tutorial
23 pages
Jupyter
No ratings yet
Jupyter
13 pages
IPython - Beyond Normal Python - Python Data Science Handbook
No ratings yet
IPython - Beyond Normal Python - Python Data Science Handbook
3 pages
Data Visualization_Lab_Manual_2024
No ratings yet
Data Visualization_Lab_Manual_2024
13 pages
DV Activity
No ratings yet
DV Activity
5 pages
Assignment
No ratings yet
Assignment
5 pages
Introduction To Jupyter Notebooks
No ratings yet
Introduction To Jupyter Notebooks
26 pages
Jupyter Notebook Cheat Sheet
No ratings yet
Jupyter Notebook Cheat Sheet
1 page
Jupyter
No ratings yet
Jupyter
17 pages
Jupyter PDF
No ratings yet
Jupyter PDF
39 pages
Jupyter Notebooks Advanced Tutorial
100% (1)
Jupyter Notebooks Advanced Tutorial
40 pages
Jupyter Notebook Installation Guide (Mac)
No ratings yet
Jupyter Notebook Installation Guide (Mac)
27 pages
Lab 0 - Getting Started with Jupyter Notebook
No ratings yet
Lab 0 - Getting Started with Jupyter Notebook
3 pages
Introduction to Jupyter Notebook — Python Numerical Methods
No ratings yet
Introduction to Jupyter Notebook — Python Numerical Methods
3 pages
03-Jupyter Markdown Python
No ratings yet
03-Jupyter Markdown Python
28 pages
Data Analysis Book Python (Pandas)
No ratings yet
Data Analysis Book Python (Pandas)
75 pages
A Crash Course in Python For Scientists PDF
No ratings yet
A Crash Course in Python For Scientists PDF
55 pages
Introduction To Python Lecture 2: Introduction To Jupyter: Pavlos Antoniou
No ratings yet
Introduction To Python Lecture 2: Introduction To Jupyter: Pavlos Antoniou
36 pages
Jupyter Cheat Sheet Python For Data Science: Working With Different Programming Languages Widgets
No ratings yet
Jupyter Cheat Sheet Python For Data Science: Working With Different Programming Languages Widgets
1 page
Jupyterlab Cheat Sheet
No ratings yet
Jupyterlab Cheat Sheet
1 page
Lecture 1 EDGE Python Data Science
No ratings yet
Lecture 1 EDGE Python Data Science
11 pages
Section 4 Further Problem-Solving and Programming Skills
No ratings yet
Section 4 Further Problem-Solving and Programming Skills
26 pages
A Beginner's guide to Python
From Everand
A Beginner's guide to Python
Steven Mcananey
No ratings yet
Internship Project Ppt-1
No ratings yet
Internship Project Ppt-1
23 pages
Features of Jupyter Notebook and Ipython
No ratings yet
Features of Jupyter Notebook and Ipython
2 pages
SE 132 Machine Learning For Structural Engineering J. S. Chen Spring 2021 Homework #1 Part 1 (10 Points)
No ratings yet
SE 132 Machine Learning For Structural Engineering J. S. Chen Spring 2021 Homework #1 Part 1 (10 Points)
8 pages
Jupyter Notebook Readthedocs Io en v6.4.5
No ratings yet
Jupyter Notebook Readthedocs Io en v6.4.5
179 pages
A1 T1 Lecture1
No ratings yet
A1 T1 Lecture1
29 pages
Jupyter Notebook Stable
No ratings yet
Jupyter Notebook Stable
157 pages
Writing and Running R in Jupyter Notebooks-En
No ratings yet
Writing and Running R in Jupyter Notebooks-En
2 pages
Getting Started With Jupyter-En
No ratings yet
Getting Started With Jupyter-En
2 pages
Introduction To Python by Data Science Nigeria
No ratings yet
Introduction To Python by Data Science Nigeria
56 pages
51 - PDFsam - Python Data Science Handbook, 2nd Edi... (Z-Library)
No ratings yet
51 - PDFsam - Python Data Science Handbook, 2nd Edi... (Z-Library)
3 pages
Short Introduction To Python Basics: Geared Towards Data Analysis
No ratings yet
Short Introduction To Python Basics: Geared Towards Data Analysis
28 pages
Intro To Scientific Python (2018-01-23) PDF
No ratings yet
Intro To Scientific Python (2018-01-23) PDF
16 pages
Python For Data Science Quickstart Guide
No ratings yet
Python For Data Science Quickstart Guide
13 pages
Python Basics: Before Numpy
No ratings yet
Python Basics: Before Numpy
49 pages
Jupyter Notebook Readthedocs Io en Stable
No ratings yet
Jupyter Notebook Readthedocs Io en Stable
191 pages
Overview of Colaboratory Features - Colab
No ratings yet
Overview of Colaboratory Features - Colab
5 pages
Let's Start With Data Science
No ratings yet
Let's Start With Data Science
5 pages
28 Jupyter Notebook Tips, Tricks and Shortcuts
No ratings yet
28 Jupyter Notebook Tips, Tricks and Shortcuts
35 pages
Jupiter Notebook Tricks
100% (1)
Jupiter Notebook Tricks
9 pages
Week1 - Introduction To Machine Learning and Toolkit
No ratings yet
Week1 - Introduction To Machine Learning and Toolkit
102 pages
Python_Info
No ratings yet
Python_Info
11 pages
Python Man
No ratings yet
Python Man
204 pages
Jupyter Notebook Basics
No ratings yet
Jupyter Notebook Basics
32 pages
Collection of Raspberry Pi Projects
From Everand
Collection of Raspberry Pi Projects
Guillermo Perez Guillen
5/5 (1)
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Software Manual SEFA 1.8
No ratings yet
Software Manual SEFA 1.8
230 pages
Mod 6
No ratings yet
Mod 6
15 pages
Orca Share Media1600756371985 6714058854046129160
No ratings yet
Orca Share Media1600756371985 6714058854046129160
11 pages
2.6 Travel Time & Delay Studies
No ratings yet
2.6 Travel Time & Delay Studies
7 pages
Drawing A Spur Gear Profile in AUTOCAD: D N / P N Module 40 0.2mm 8mm
No ratings yet
Drawing A Spur Gear Profile in AUTOCAD: D N / P N Module 40 0.2mm 8mm
7 pages
Chapter 1: Statistical Basics: Section 1.1: What Is Statistics?
No ratings yet
Chapter 1: Statistical Basics: Section 1.1: What Is Statistics?
24 pages
Week 2
No ratings yet
Week 2
12 pages
AKTU Basic Electrical ENg.
100% (1)
AKTU Basic Electrical ENg.
98 pages
Integrals: Definitions Definite Integral: Suppose Anti-Derivative: An Anti-Derivative of
No ratings yet
Integrals: Definitions Definite Integral: Suppose Anti-Derivative: An Anti-Derivative of
19 pages
Distance Time Graph Question
No ratings yet
Distance Time Graph Question
12 pages
HuynhRiviereLorphevreVerlinden Fullpaper
No ratings yet
HuynhRiviereLorphevreVerlinden Fullpaper
19 pages
01 - An Attribute-Based Ant Colony System For Adaptive Learning Object Recommendation
No ratings yet
01 - An Attribute-Based Ant Colony System For Adaptive Learning Object Recommendation
14 pages
Bolted Busbar Connections With Longitudinal Slots: July 2010
No ratings yet
Bolted Busbar Connections With Longitudinal Slots: July 2010
6 pages
Visak N Kumar 7619 IT-7 Ycet
No ratings yet
Visak N Kumar 7619 IT-7 Ycet
25 pages
Leadership Dan Lingk Kerja ( (Dapus)
No ratings yet
Leadership Dan Lingk Kerja ( (Dapus)
10 pages
MYA Resource 7
No ratings yet
MYA Resource 7
12 pages
23T1 Integration Bee Questions and Answers-2
No ratings yet
23T1 Integration Bee Questions and Answers-2
6 pages
DS3 Lateral Analysis
No ratings yet
DS3 Lateral Analysis
10 pages
Theory of Machines - Module 1 Notes
No ratings yet
Theory of Machines - Module 1 Notes
25 pages
STA 114 Question Bank
No ratings yet
STA 114 Question Bank
14 pages
Modeling Viscoelastic Damping For Dampening Adhesives
100% (1)
Modeling Viscoelastic Damping For Dampening Adhesives
28 pages
Differential Calculus
No ratings yet
Differential Calculus
245 pages
Untitled
No ratings yet
Untitled
523 pages
Shear and Moment in Beams
0% (1)
Shear and Moment in Beams
18 pages
Value at Risk Final
No ratings yet
Value at Risk Final
27 pages
RS Flip Flop
No ratings yet
RS Flip Flop
14 pages
Analysis for CBSE XII Maths (Q.P. Code 65-6-1; 65-6-2; 65-6-3 Series Z6YWX)
No ratings yet
Analysis for CBSE XII Maths (Q.P. Code 65-6-1; 65-6-2; 65-6-3 Series Z6YWX)
2 pages

Python

Uploaded by

Python

Uploaded by

Unit – I (15 Hrs)

Unit – II (15 Hrs)

Let's break down each unit:

Unit – I: Shell or Notebook (15 Hrs)

1. Launching Jupyter Notebook

● What is Jupyter Notebook?

2. Help and Documentation in IPython

3. Exploring Modules with Tab Completion

● As mentioned above, tab completion is crucial for exploring libraries.

data = {'A': [1,2], 'B': [3,4]}

4. Keyboard Shortcuts in the IPython Shell (and Jupyter Notebook)

● In Command Mode (Esc to activate, blue cell border):

5. IPython Magic Commands

%%timeit: Time the execution of the entire cell.

2. The Basics of NumPy Arrays (ndarray)

From Python Lists:

1D Arrays (like Python lists):

mask = arr > 30

-1 in reshape: NumPy can infer one dimension.

np.concatenate(): Join arrays along an existing axis.

grid = np.array([[1, 2], [3, 4]])

3. Computation on NumPy Arrays

● Universal Functions (UFuncs):

Arithmetic Operations: +, -, *, /, // (floor division), %, ** (exponentiation).

4. Aggregations: Min, Max, and Everything In Between

M.sum() # Sum of all elements (scalar)

5. Computation on Arrays (More Advanced Topics)

Unit – III: Pandas (15 Hrs)

2. Data Manipulation with Pandas

2.1. Pandas Series

With Custom Index:

From Dictionary: Keys become index, values become data.

2.2. Pandas DataFrame

● Definition: A two-dimensional labeled data structure with columns of potentially different

From Dictionary of Series:

From Dictionary of Lists/Arrays:

From NumPy Array (with explicit index/columns):

3. Operating on Null Values (Missing Data)

4. Hierarchical Indexing (MultiIndex)

From list of tuples:

From DataFrame columns using set_index():

loc with tuple indexing:

6. Aggregation and Grouping

● Simple Aggregations: df.sum(), df.mean(), df.min(), df.max(), df.count(),

aggregate() (or agg()): Apply multiple aggregation functions.

filter(): Return a subset of the DataFrame based on a group property.

Unit – IV: Matplotlib (15 Hrs)

2. Visualization with Matplotlib

Simple Plotting Interface (plt.plot()):

x = np.linspace(0, 10, 100)

3. Simple Line Plots

plt.plot(x, y): Plots y versus x as lines and/or markers.

● plt.scatter(x, y): Plots individual data points as markers.

plt.scatter(x, y, c=colors, s=sizes, alpha=0.3, cmap='viridis')

● plt.errorbar(x, y, yerr=..., xerr=..., fmt='o'):

6. Histograms, Binnings, and Density

● Titles and Labels:

plt.figure() and fig.add_subplot(): Object-oriented way (more flexible).

axes[0, 0].plot(x, np.sin(x)) # Access individual axes using array-like indexing

● plt.text(x, y, 'Text') or ax.text(x, y, 'Text'): Add arbitrary text at a

Unit – V: Sci-kit Learn (15 Hrs)

● Feature Matrix (X):

3. Hyperparameters & Validation: Selecting the Best Model

6.1. Simple Linear Regression

# Generate some sample data

model = LinearRegression(fit_intercept=True) # fit_intercept=False if you want to force through

# You can then plot X, y and X, y_pred to visualize the fit

6.2. Basis Function Regression

● Concept: Extends linear regression to model non-linear relationships by transforming

○ L2 Regularization (Ridge Regression):

You might also like