Principles of AI Laboratory Varshadr
Principles of AI Laboratory Varshadr
Name : VARSHA DR
Number : 21ETIS411052
Department : Computer Science and Engineering
List of Experiments
6. Game Algorithm
8. Classification Algorithms
9. Classification Algorithms
No. Lab Experiment Viva Results Documentation Total
Marks
(6) (7) (7)
(20)
1 Working with Numpy Library
4 Search Algorithm(Uninformed
Search)
5 Search Algorithm(Informed
Search)
6 Game Algorithm
8 Classification Algorithms
9 Classification Algorithms
10 Lab Internal Test conducted along the lines of SEE and valued for 50 Marks and
reduced for 20 Marks
Total Marks
Laboratory 1
Title of the Laboratory Exercise: Working with Numpy
Introduction and Purpose of Experiment
Students will be able to compute mean, median, mode, standard deviation, variance and
percentile for the given dataset using numpy libraries
Objectives
At the end of this lab, the student will be able to compute mean, median, mode, standard
deviation, variance and percentile for the given dataset using numpy libraries Experimental
Procedure
1. Write algorithm to solve the given problem
2. Translate the algorithm to python language code
3. Execute the code
4. Create a laboratory report documenting the work
Questions
Develop a program to compute the following:
– mean, median, mode, standard deviation, variance and percentile for the given
dataset using numpy libraries
Presentation of Results
Sample Result:
a. Python code for mean calculation:
b. Python code for Median Calculation
c. Python code for mode Mode calculation
d. Standard deviation express how the values are spread out. If it is smaller value then most
of the numbers are close to mean
e. Variance Computation:
f. Percentile describes a value that a given percent of the values are lower than
g. Five number summary includes Min, First quartile, second quartile, third quartile,
maximum
Analysis and Discussions
Mean Values: The mean values for the columns give us an understanding of the average FSIQ,
VIQ, PIQ, Weight, Height, and MRI_Count across all samples. These values are crucial for
understanding the central tendency of the data.
Median Values: The median values are important for understanding the central point of the data
distribution, particularly in cases where the data might be skewed or have outliers.
Mode Values: The mode values indicate the most frequently occurring values in each column,
which can be insightful in identifying common characteristics in the data.
Standard Deviation: The standard deviation values provide insights into the spread or variability
of the data. A higher standard deviation means that the data points are spread out over a wider
range of values.
Variance: The variance values, which are the square of the standard deviation, further
emphasize the degree of spread in the data.
Percentiles: The 25th, 50th (median), and 75th percentiles give a detailed understanding of the
data distribution, showing how the data is spread around the mean and highlighting potential
outliers.
Conclusions
The mean values indicate the average metrics across the dataset.
The median values, being close to the mean in most cases, suggest a relatively symmetrical
distribution for most columns.
The mode values, while not as critical for continuous data, can still offer insights into
common values.
The standard deviation and variance values suggest the degree of variability within the
dataset, with some columns potentially having more spread than others.
The percentiles give a deeper understanding of the data spread and are particularly useful in
understanding the distribution tails.
Comments
1. Limitations of Experiments
Sample Size: If the sample size is small, the results may not be generalizable to a larger
population.
Data Quality: Errors in data collection or entry (e.g., non-numeric values in numeric columns)
can affect the analysis.
Bias: There might be inherent biases in the sample that could skew the results.
2. Limitations of Results
Outliers: Presence of outliers can significantly affect mean and standard deviation.
Non-Normal Distribution: If the data is not normally distributed, certain statistical measures
(e.g., mean and standard deviation) might not be the best descriptors.
Missing Values: Handling of NaN values can impact the accuracy of statistical measures.
3. Learning happened
Data Preprocessing: Converting data to numeric and handling errors is crucial for accurate
analysis.
Statistical Analysis: Understanding different statistical measures (mean, median, mode,
standard deviation, variance, percentiles) and their implications on the data.
Using Libraries: Utilizing libraries like NumPy and pandas for data manipulation and analysis.
4. Recommendations
Increase Sample Size: For more generalizable results, a larger and more diverse sample size
is recommended.
Data Cleaning: Ensure thorough data cleaning to handle missing or erroneous data.
Advanced Analysis: Consider additional statistical tests or machine learning models for
deeper insights.
Regular Updates: Regularly update the dataset to reflect any new data and re-evaluate the
statistical measures to ensure they remain relevant.
4. Questions
Develop a program to compute the following:
- Visualize the data’s in dataset using various matplotlib and seaborn libraries
5. Presentation of Results
Sample Result:
a. Matplotlib is a library for creating static, animated and interactive visualizations in python
j. Scatter plot is used to understand the relationship between two features through the plot
k. Pair plot many variables can be compared and height parameter is used to change the
height of the graph
l. Heatmap
6. Analysis and Discussions
The bar chart shows the FSIQ scores distributed across genders. From the visualization, it is
observable if there are any significant differences in FSIQ scores between males and
females.
This bar chart includes an additional dimension (PIQ) and provides insights into how PIQ
varies within the gender groups. This can help to identify any relationships or patterns
between FSIQ, PIQ, and gender.
Histogram of Height:
The histogram illustrates the distribution of heights within the dataset. This can help to
understand the central tendency and variability of the height measurements.
The density plot offers a smoothed visualization of the FSIQ distribution, highlighting the
most common FSIQ values and the spread of the data.
This filled density plot further emphasizes the distribution of FSIQ, providing a clear visual of
areas with higher densities of data points.
The box plot shows the median, quartiles, and potential outliers in FSIQ scores for both
genders. This allows for a comparative analysis of the central tendency and variability
between males and females.
Scatter plot of FSIQ vs. VIQ with Gender hue:
The scatter plot examines the relationship between FSIQ and VIQ while differentiating the
data points by gender. This can reveal correlations and potential differences in cognitive
abilities between genders.
The pair plot provides a comprehensive view of the relationships between all variables,
including scatter plots for pairwise relationships and histograms for individual distributions.
The heatmap visualizes the correlation coefficients between numerical variables, showing
how strongly pairs of variables are related. This helps to identify any significant correlations
that might warrant further investigation.
7. Conclusions:
The analysis suggests that there are observable differences in FSIQ scores between genders.
PIQ scores vary within gender groups, indicating that gender may influence the relationship
between FSIQ and PIQ.
Height data shows a normal distribution, suggesting a typical variation in the dataset.
The density plots and box plots reveal the central tendencies and variabilities in FSIQ scores,
with potential outliers highlighted.
The scatter plot indicates a correlation between FSIQ and VIQ, with some differences based
on gender.
The pair plot provides a holistic view of relationships between all variables, supporting the
findings from individual plots.
The heatmap indicates significant correlations between several variables, suggesting potential
areas for further study.
1. Comments:
1. Limitations of Experiments
Sample Size: The dataset may not be large enough to generalize the findings to a broader
population.
Data Quality: Missing or erroneous values in the dataset can affect the accuracy of the
analysis.
Variable Selection: The dataset may not include all relevant variables that could influence
FSIQ, PIQ, and other measurements.
1. Limitations of Results
Causation vs. Correlation: The analysis identifies correlations but cannot establish causation
between variables.
Gender Differences: The observed differences between genders may be influenced by
factors not accounted for in the dataset.
External Validity: The results may not be applicable to populations or contexts outside the
scope of the dataset.
1. Learning happened
How to handle and clean data with missing or erroneous values.
Visualization techniques using Matplotlib and Seaborn to uncover patterns and relationships
in the data.
Interpreting and discussing results from various types of plots and correlation matrices.
The importance of considering limitations and potential biases in data analysis.
1. Recommendations
Data Collection: Ensure a larger and more diverse sample size to improve the generalizability
of the findings.
Additional Variables: Include more variables that could influence cognitive abilities and other
measured traits.
Longitudinal Studies: Conduct studies over time to observe changes and causal relationships.
Further Research: Investigate the underlying factors contributing to the observed differences
between genders and other variables.
c) Tic-tac-toe:
START
Step 1: Initialize the game board with numbers representing positions.
Step 2: Define winning combinations of board positions.
Step 3: Define functions for drawing the board, player 1's turn (placing "X"), player 2's turn
(placing "O"), and choosing a valid board position.
Step 4: Define a function to check if there is a winner or if the game ends in a tie.
Step 5: Implement a loop to handle the game flow:
a. Draw the current board.
b. Check if there is a winner or if the game ends in a tie.
c. If the game is over, print the result and end the game loop.
d. If the game continues, prompt player 1 to choose a position, update the board, and
repeat.
e. Repeat the process for player 2.
Step 6: After the game ends, prompt the user if they want to play again.
Step 7: If yes, reset the board and start over from step 5.
STOP
3. Presentation of Results
Sample Result:
a)4 Queen:
b)8 Puzzle:
4. Analysis and Discussions
Tic-Tac-Toe
8 Puzzle
Explain various algorithms used to solve the puzzle (e.g., A* search, BFS).
Discuss the impact of puzzle complexity on solving algorithms.
Analyze the effectiveness of different heuristic functions.
4- Queens Problem
5. Conclusions
Tic-Tac-Toe:
Conclusion: Minimax algorithm with alpha-beta pruning is effective for optimal gameplay,
balancing computational cost and decision-making.
Explanation: Despite its complexity, minimax ensures optimal play and can be enhanced
with heuristic evaluation functions for faster decision-making.
8- Puzzle:
4- Queens Problem:
Conclusion: Backtracking algorithms are suitable for solving the 4-Queens problem due to
their ability to systematically explore all potential solutions.
Explanation: Despite exponential time complexity in the worst-case scenario, optimizations
like constraint propagation can enhance performance on larger boards.
6. Comments
1. Limitations of Experiments
Tic-Tac-Toe: Limited scope in simulating human-like gameplay behaviors and psychological
elements.
8-Puzzle: Sensitivity to heuristic choice; some heuristics may lead to suboptimal solutions.
4-Queens Problem: Difficulty in scaling to larger board sizes due to exponential growth in
search space.
2. Limitations of Results
Tic-Tac-Toe: Results heavily influenced by initial conditions and opponent strategies.
8-Puzzle: Complexity increases with puzzle size, affecting algorithm performance and
solution optimality.
4-Queens Problem: Backtracking may struggle with larger board sizes due to increased
branching factor and computational demands.
3. Learning happened
Tic-Tac-Toe: Understanding the importance of balancing exploration and exploitation in
game theory.
8-Puzzle: Appreciation for heuristic design and its impact on algorithm efficiency.
4-Queens Problem: Insight into constraint satisfaction and backtracking algorithms in
combinatorial problems.
4. Recommendations
Tic-Tac-Toe: Explore machine learning techniques for adaptive gameplay strategies.
8-Puzzle: Experiment with advanced heuristic functions to further optimize A* performance.
4-Queens Problem: Investigate parallel and distributed computing approaches to handle
larger problem instances efficiently.
3. Experimental Procedure
1. Write algorithm to solve the given problem
2. Translate the algorithm to python language code
3. Execute the code
4. Create a laboratory report documenting the work
4. Questions
Develop a program to implement the following uninformed search strategies in python
– Breadth First Search
– Depth First Search
– Uniform Cost Search
– Depth limited Search
5. Algorithms
5.1 : Algorithm for Breadth First Search:
START
Step 1: Initialize an empty set visited to keep track of visited nodes.
Step 2: Initialize a queue and enqueue the start node. Mark the start node as visited.
Step 3: While the queue is not empty, do the following:
Step 3.1: Dequeue a node from the queue and call it current_node.
Step 3.2: Process the current_node (e.g., print it or add it to the result path).
Step 3.3: For each neighbour of the current_node that has not been visited, do the
following:
Step 3.3.1: Mark the neighbour as visited.
Step 3.3.2: Enqueue the neighbour.
STOP
5.2 : Algorithm for depth first search:
START
Step 1: Initialize an empty set visited to keep track of visited nodes.
Step 2: Define a recursive function dfs(node, visited):
Step 2.1: Mark the node as visited.
Step 2.2: Process the node (e.g., print it or add it to the result path).
Step 2.3: For each neighbor of the node that has not been visited, call dfs(neighbor, visited).
Step 3: Call dfs(start_node, visited) to start the DFS from the start node.
STOP
Performance: BFS explores all nodes at the present depth level before moving on to nodes at
the next depth level. It guarantees the shortest path in an unweighted graph.
Time Complexity: O(V+E) where V is the number of vertices and E is the number of edges.
Space Complexity: O(V) due to the queue and visited set.
Usage: Suitable for finding the shortest path in an unweighted graph, level-order traversal of
trees, and in scenarios where the shortest path is required.
Performance: DFS explores as far down one branch as possible before backtracking. It can be
implemented using recursion or a stack.
Time Complexity: O(V+E)
Space Complexity: O(V)due to the recursion stack.
Usage: Useful for tasks such as topological sorting, detecting cycles in a graph, and solving
puzzles like mazes.
Performance: UCS expands the least-cost node first and guarantees finding the least-cost
path in weighted graphs.
Time Complexity: O(E+VlogV) where the priority queue operations dominate the complexity.
Space Complexity: O(V) due to the priority queue and visited set.
Usage: Suitable for finding the shortest path in weighted graphs, especially when edge
weights are not uniform.
8. Conclusions
BFS: Effective for unweighted graphs to find the shortest path. Inefficient for deep
graphs due to high memory usage.
DFS: Suitable for tasks that require exploring all possibilities or finding paths in deep
graphs. May get stuck in deep or infinite loops.
UCS: Ideal for finding the least-cost path in weighted graphs. More computationally
intensive due to priority queue operations.
DLS: Useful for limiting search depth to prevent infinite loops. Effective in controlled
environments where the maximum depth is known.
9. Comments
1. Limitations of Experiments:
BFS: High memory usage for large or deep graphs.
DFS: Risk of infinite recursion in graphs with cycles if not handled properly.
UCS: Higher computational overhead due to priority queue management.
DLS: Requires prior knowledge of an appropriate depth limit, which may not always be
possible.
2. Limitations of Results
BFS: Not suitable for weighted graphs where path cost is a concern.
DFS: Does not guarantee the shortest path.
UCS: May be slow for large graphs with many nodes and edges.
DLS: Effectiveness heavily depends on the chosen depth limit; too shallow may miss the goal,
too deep may cause inefficiency.
3. Learning happened
Understanding the trade-offs between different graph traversal algorithms.
Recognizing the importance of choosing the right algorithm based on the graph properties
and the specific problem requirements.
Developing strategies to handle cycles and infinite paths in graph traversal.
4. Recommendations
Use BFS for unweighted graphs or when the shortest path in terms of edge count is
required.
Apply DFS for problems requiring exhaustive search or involving deep recursion, such as
topological sorting or detecting cycles.
Opt for UCS when dealing with weighted graphs and needing the least-cost path.
Consider DLS for scenarios with known depth constraints or to prevent infinite loops in cyclic
graphs.
3. Experimental Procedure
1. Write algorithm to solve the given problem
2. Translate the algorithm to python language code
3. Execute the code
4. Create a laboratory report documenting the work
4. Questions
Develop a program to implement the following uninformed search strategies in python
– Best First Search
– A* Algorithm
A*:
Performance: A* Search expands nodes based on both the cost to reach the node (g-cost)
and an estimate of the cost to reach the goal (h-cost). It guarantees finding the shortest path
under certain conditions.
Time Complexity: Depends on the heuristic used. Generally O(E+VlogV)O(E + V \log
V)O(E+VlogV) due to priority queue operations, where VVV is the number of vertices and EEE
is the number of edges.
Space Complexity: O(V)O(V)O(V) due to the priority queue and visited set.
Usage: Ideal for finding the shortest path in weighted graphs when an admissible heuristic is
provided.
Best-First Search:
Performance: Best FS expands nodes based only on the heuristic estimate (h-cost) without
considering the actual cost (g-cost) to reach the node.
Time Complexity: O(E+VlogV)O(E + V \log V)O(E+VlogV) in the worst case due to priority
queue operations.
Space Complexity: O(V)O(V)O(V) due to the priority queue and visited set.
Usage: Useful when only an estimate of the path cost is available or when exploring based
on heuristic knowledge.
8. Conclusions
A*: Effective for finding the shortest path in weighted graphs using an admissible heuristic.
Provides optimal solutions when the heuristic is consistent.
Best-First Search (Best FS): Provides a heuristic-driven approach to explore paths, but does
not guarantee optimality or completeness in all cases.
9. Comments
1. Limitations of Experiments
A*: Relies heavily on the quality of the heuristic function provided. An inaccurate or non-admissible
heuristic can lead to suboptimal paths.
Best-First Search: Similar to A* Search, the effectiveness heavily depends on the heuristic's accuracy
and admissibility.
2. Limitations of Results
A*: While generally optimal with an admissible heuristic, can still be computationally
expensive for large graphs due to priority queue operations.
Best-First Search: Does not guarantee finding the optimal solution or even a feasible solution
in all cases, as it may get stuck in local optima.
3. Learning happened
Understanding the role and importance of heuristics in informed search algorithms like A*
and Best FS.
Recognizing the trade-offs between optimality, completeness, and computational efficiency
in pathfinding algorithms.
Gaining insights into designing and evaluating heuristic functions for different types of
problems.
3. Recommendations
Use A* Search when the shortest path in terms of edge cost is crucial and when an
admissible heuristic can be defined.
Consider Best-First Search for heuristic-guided explorations where the exact path cost is less
critical, but heuristic knowledge is available.
Ensure the heuristic used is both accurate and admissible to maximize the effectiveness of
these algorithms.
b) Min-Max:
Algorithms
b) Alpha-beta-pruning:
START
Step 1: Initialize constants MAX and MIN with large positive and negative values,
respectively.
Step 2: Define the Minimax function with parameters: current depth, node index,
maximizing player flag, values array, alpha (best value for maximizing player), and
beta (best value for minimizing player).
Step 3: If the depth equals 3 (base case), return the value of the node.
Step 4: If it's the turn for the maximizing player:
- Initialize best to MIN.
- Iterate through the two child nodes:
- Recursively call Minimax for each child node, toggling to minimizing player's
turn.
- Update best to the maximum of current best and the value from the child
node.
- Update alpha to the maximum of alpha and best.
- If beta is less than or equal to alpha, break out of the loop (pruning).
- Return best.
Step 5: Otherwise (minimizing player's turn):
- Initialize best to MAX.
- Iterate through the two child nodes:
- Recursively call Minimax for each child node, toggling to maximizing player's
turn.
- Update best to the minimum of current best and the value from the child node.
- Update beta to the minimum of beta and best.
- If beta is less than or equal to alpha, break out of the loop (pruning).
- Return best.
Step 6: In the main driver code:
- Define an example values array.
- Print the optimal value found using Minimax starting from the root node with
initial alpha and beta values.
STOP
c) Min-Max:
START
Step 1: Define the Minimax function with parameters: current depth, node index,
maximizing turn flag, scores array, and target depth.
Step 2: If current depth equals target depth, return the score of the node.
Step 3: If it's the turn for the maximizing player:
- Return the maximum value of recursively calling Minimax for the left and right
child nodes.
Step 4: Otherwise (minimizing player's turn):
- Return the minimum value of recursively calling Minimax for the left and right
child nodes.
Step 5: Initialize scores array with game state values.
Step 6: Calculate the depth of the Minimax tree based on the length of the scores
array.
Step 7: Print the optimal value found using Minimax starting from the root node.
STOP
Presentation of Results
a) Alpha-beta pruning:
b) Min-Max:
Minimax Algorithm
Performance: Minimax explores the entire game tree recursively, making decisions at each
level based on maximizing or minimizing players. Without pruning, it evaluates all possible
moves.
Time Complexity: O(bd)O(b^d)O(bd), where bbb is the branching factor and ddd is the depth
of the game tree. Pruning with alpha-beta reduces this significantly in practice.
Space Complexity: O(d)O(d)O(d) due to the recursive stack depth, where ddd is the depth of
the game tree.
Usage: Effective for deterministic games with perfect information, where all possible
outcomes can be explored.
Alpha-Beta Pruning
Performance: Alpha-Beta Pruning optimizes Minimax by pruning branches of the game tree
that cannot influence the final decision, significantly reducing the number of nodes
evaluated.
Time Complexity: O(bd/2)O(b^{d/2})O(bd/2) in the best-case scenario where pruning
eliminates half of the nodes at each level. However, worst-case complexity remains
O(bd)O(b^d)O(bd).
Space Complexity: O(d)O(d)O(d) due to the recursive stack depth, similar to Minimax.
Usage: Essential for improving the efficiency of Minimax in larger game trees, enabling
deeper search within the same computational limits.
Conclusions
Minimax Algorithm:
Alpha-Beta Pruning:
Enhances Minimax by reducing the number of nodes evaluated, allowing deeper exploration
of game trees within practical time limits.
Provides significant performance gains in scenarios where the game tree is large.
Comments
1. Limitations of Experiments
Minimax Algorithm: Faces scalability issues in large game trees due to its exhaustive nature.
Alpha-Beta Pruning: While it improves efficiency, its effectiveness depends on the order of
node evaluation and may not always achieve maximum possible pruning.
2. Limitations of Results
Minimax Algorithm: Does not adapt well to games with uncertain outcomes or imperfect
information.
Alpha-Beta Pruning: May not achieve optimal pruning in all cases, especially with poorly-
ordered node evaluations.
4. Learning happened
Understanding the trade-offs between completeness (Minimax) and efficiency (Alpha-Beta
Pruning) in game tree search algorithms.
Implementing strategies to optimize decision-making processes in deterministic game
scenarios.
Gaining insights into the impact of algorithmic improvements on computational feasibility and
performance.
5. Recommendations
Minimax Algorithm: Use in scenarios where game complexity allows exhaustive search or as
a baseline for evaluating more advanced algorithms.
Alpha-Beta Pruning: Implement for enhancing Minimax in larger game trees to achieve deeper
search within practical time constraints.
Experimental Procedure
1. Write algorithm to solve the given problem
2. Translate the algorithm to python language code
3. Execute the code
4. Create a laboratory report documenting the work
Questions
Perform the following:
Sample Programs
To
retrieve particular column from the data set
Record Filter
Presentation of Results
Analysis and Discussions
In this analysis, a sample dataset consisting of five entries was created with columns representing ID,
Name, Age, Gender, and Salary. Using Pandas, various operations were performed to understand
and manipulate the dataset. Here are the key points from the analysis:
Conclusions
Data Manipulation: The use of Pandas makes data manipulation straightforward and
efficient. Various operations such as slicing, dropping, and filtering can be easily performed.
Data Summary: Descriptive statistics provide valuable insights into the dataset, helping to
understand the distribution and central tendency of numerical features.
Visualization of Data: Although not included in this specific code, visualizations complement
the analysis and provide a clearer understanding of the data.
Comments
1. Limitations of Experiments
Sample Size: The dataset consists of only five entries, which is too small to draw meaningful
conclusions or perform advanced statistical analysis.
Synthetic Data: The dataset is artificially created and may not accurately reflect real-world
scenarios.
2. Limitations of Results
Overfitting: Conclusions based on this dataset may not be generalizable to larger datasets
due to the limited sample size.
Lack of Variability: The dataset lacks variability, which limits the scope of analysis and the
ability to identify patterns or trends.
3. Learning happened
Pandas Operations: Learned how to create a dataset, save it as a CSV file, and load it back
using Pandas.
Data Manipulation: Gained skills in slicing, dropping rows/columns, retrieving specific data,
and filtering records.
Descriptive Statistics: Understood how to obtain summary statistics and the dimensions of
the dataset.
Data Transposition: Learned to transpose the dataset for a different perspective on the data.
4. Recommendations
Expand Dataset: Collect more data to enhance the reliability and validity of the analysis.
Real-World Data: Use real-world datasets to perform the analysis for more accurate and
meaningful results.
Include More Features: Adding more features such as job title, department, and years of
experience could provide deeper insights.
Advanced Analysis: Explore advanced data analysis techniques and machine learning models
to uncover hidden patterns and make predictions.