OJT-Field Report - Research Project Format 2025
OJT-Field Report - Research Project Format 2025
Signature
Name :Om Nitin Borkar.
2. Project Overview
Title: Business Analytics and Data Analytics
https://www.ybifoundation.org
Objectives :
Data Collection & Cleaning – Learn how to gather, preprocess, and clean large
datasets for analysis.
Exploratory Data Analysis (EDA) – Use statistical techniques to uncover
patterns, trends, and anomalies.
Predictive Modeling – Apply machine learning algorithms to make data-driven
predictions.
Data Visualization – Create meaningful visualizations to communicate findings
effectively.
Understanding Business Problems – Learn to identify business challenges and
translate them into analytical problems.
Data-Driven Decision Making – Use analytics to support strategic decision-
making processes.
Market & Customer Analysis – Analyze customer behavior, sales trends, and
market conditions.
Reporting & Visualization – Develop dashboards and reports using tools like Excel,
Tableau, or Power BI.
Theoretical framework:
Data Science: The interdisciplinary field that combines techniques from
statistics, computer science, and domain expertise to extract insights
from data.
On-the-Job Training (OJT): Practical learning within a workplace environment
Data: Raw information collected for analysis.
Variables: Characteristics or attributes measured in data.
Google Colab: Google Colab is a cloud-based Jupyter Notebook environment that
allows you to write and execute Python code with free GPU/TPU support, ideal for
machine learning and data science.
Pandas: Pandas is a powerful Python library for data manipulation and analysis,
providing easy-to-use data structures like DataFrames and Series.
Models: Mathematical representations of data patterns.
Power Bi: Analyzing and visualizing raw data to present
actionable information,etc.
Scope of Work:
The scope of work for data analysis includes data collection, cleaning, exploration,
visualization, statistical analysis, predictive modeling, and deriving actionable
business insights.
3. Methodology:
Data Structures: I delved into lists, dictionaries, tuples, and sets.
Understanding these structures is crucial for efficient data manipulation.
Pandas Library: Pandas became my go-to library for data manipulation and
analysis. I learned how to load, clean, transform, and Analyze data using
Pandas Data Frames.
Matplotlib Library: Visualizing data is essential. Matplotlib allowed me to
create insightful plots, charts, and graphs.
SQL Server Queries: Structured Query Language (SQL) is the backbone of
database management. I explored SQL Server and mastered the art of
querying databases.
Key areas included:
SELECT Statements: Retrieving data from tables.
JOIN Operations: Combining data from multiple tables.
Aggregation Functions: Calculating sums, averages, and other aggregate
values.
Subqueries: Navigating complex data relationships
4. Process:
Advanced Excel Functions:
Master advanced Excel functions such as VLOOKUP, INDEX-MATCH, and pivot tables.
Learn to manipulate data efficiently and create dynamic reports.
Python for Data Science:
Familiarize yourself with Python, a powerful tool for data science.
Study data structures (lists, dictionaries, tuples, and sets).
Dive into the Pandas library for data manipulation and analysis using DataFrames.
Explore the Matplotlib library for creating insightful visualizations.
SQL Server Queries:
Understand the basics of SQL (Structured Query Language).
Learn to write SELECT statements to retrieve data from tables.
Explore JOIN operations for combining data from multiple tables.
Practice aggregation functions (e.g., SUM, AVG) and subqueries.
Apply Your Learnings:
Work on real-world projects to apply your knowledge.
Analyze data, draw insights, and optimize strategie
● Observation 1 :
1. Initialize the dictionary with the provided data.
2. Iterate over each key-value pair in the dictionary.
3. Check if the score (value) is less than 45.
4. If the score is less than 45, print the associated name (key)
5. We have a dictionary called test_results containing names as keys and their
corresponding scores as values. Our goal is to iterate over the values (scores)
in the dictionary and print the names of people who received less than 45
points.
● Observation 2:
a. We haved provided is a Python script that uses Matplotlib to create a bar plot
showing the scores of three students (Scores1, Scores2, and Scores3) across
different subjects.
a. Importing Libraries:
b. Creating the Plot:
c. Data Preparation:
d. Plotting the Data:
e. Customizing the Plot:
f. Saving and Displaying the Plot: .
2. Specific Findings :
● Finding 1
When you run this code, it will output the following names: Joe
John
These are the individuals who received scores less than 45 points based on the given
dictionary.
● Finding 2
1. Visualization of Student Scores:
The plot displays the scores of three student groups (Scores1, Scores2, and Scores3) based
on the given data.
Scores1: Red circles with dashed lines. Scores2:
Magenta triangles with dash-dot lines. Scores3:
Green diamonds with solid lines.
2.Customizations:
The plot includes grid lines for better readability.
The x-axis label is “Students Name,” and the y-axis label is “Scores.” A
legend is added to identify the three student groups.
3. Data Points:
John scored around 200. Taha
scored around 230. Smith
scored around 210. Joe
scored around 220.
Mary scored around 205.
.
6. Data Analysis
1. Data Tables
Parameter Value Description
Scores1 [200,230,210,220,205] The resulting plot will
show the scores of the
three student groups
(Scores1, Scores2, and
Scores3) for the given
names.
Scores2 [180,200,110,222,190] Each student’s score is
represented by a different
line style and markers
Scores3 [182,205,190,220,170] with each student
represented by their
name
2. Graphs and Charts
● Graph/Chart 1: [Description]
Joe
John
These are the individuals who received scores less than 45 points based on the given
dictionary.
● Graph/Chart 2: [Description]
. We have provided is a Python script that uses Matplotlib to create a bar plot showing
the scores of three students (Scores1, Scores2, and Scores3) across different subjects.
8. Conclusions:
Advanced Excel Functions:
Excel functions play a crucial role in data cleansing, financial modeling, and dashboard
creation.
They enhance data analysis and reporting capabilities.
Python (Pandas and Matplotlib):
Pandas and Matplotlib are powerful tools for data wrangling, feature engineering, and
visualization. Python enables efficient data manipulation and exploration.
SQL Server Queries:
SQL queries are essential for data retrieval, transformation, and creating views.
Stored procedures simplify complex data processing.