INFORMATIC PRACTICES PROJECT
ON
IPL ANALYSIS
Submitted by
Aethen Paul Mathew
Class XII C
INDEX•
•DECLARATION................................….........................................5
• ACKNOWLEDGMENT.................................................................6
• HEADER FILES USED..............................………..........................7
• INTRODUCTION ABOUT PYTHON.......................................8-9
• INTRODUCTION ABOUT MYSQL...................….......................10
• SOFTWARE AND HARDWARE REQUIREMENTS..............11
• WORKING DESCRIPTION..................................................12-13
• DATA COLLECTION...................................................................14
• DATA VISUALIZATION........................................………..............15
• SOURCE CODE...........................................................................16
• OUTPUT........................................................................................23
• CONCLUSION............................................................................32
• BIBLIOGRAPHY………………………………………………………………………….34
DECLARATION
I declare that the project work entitled
"IPL ANALYSIS", submitted to
department of INFORMATICS
PRACTICES, ST.PHILOMENA'S PUBLIC
SCHOOL, ELANJI is prepared by me.All
the coding are result of my personal
efforts.
Submitted by
Aethen Paul Mathew
Class XII C
ACKNOWLEDGMENT
Primarily I would thank God for being able to complete this
project with success. Then I would like to thank my
Informatics Practices teacher Mrs. Jasmine Jacob whose
valuable guidance has been the one that helped me to patch
the project and make it full proof success. Her suggestions and
instructions has served as the major contributor towards the
completion of this project.
I also express my gratitude to our senior principal Rev.Dr.John
Erniakulathil and principal Joju Joseph for their
encouragement and all the facilities provided for the
completion of this project.
Then I would like to thank my parents and friends who have
helped me with their valuable suggestions and guidance has
been helpful in various phases of the completion of this project
HEADER FILES USED
• CSV Connectivity
• INTRODUCTION ABOUT PYTHON
Python is a high level general purpose
open source programming language. It
is both object oriented and procedural.
Python is an extremely powerful
language.
FEATURES OF PYTHON
• Python is a high level, open source, general purpose
programming language.
• It is object oriented, procedural and functional.
• It has library to support GUI.
• It is extremely powerful and easy to learn.
• It is open source, so free to available for everyone.
• It supports on Windows, Linux and Mac OS.
• Python enables us to write clear, logical applications for small
and large tasks.
• It has high level built in datatypes:string,lists,dictionaries etc.
• It encourages us to write clear and well structured code.
APPLICATIONS OF PYTHON
• Machine Learning
• Data Analysis
• Web Development
• Console based authentication
• 3D CAD Applications
INTRODUCTION ABOUT MYSQL
MySQL is an open source and freely available
Relational Database Management System that uses
structured Query Language. It provides excellent
features for creating, storing, maintaining and
accessing data stored in the form of databases and
their respective tables.
Mysql database system works on client server
architecture. It constitutes a Mysql server which runs
on a machine containing the databases and Mysql
databases (clients) which are connected to these server
machines over a network.
ADVANTAGES OF MYSQL
• Reliability and performance
• Modifiable
• Multi platform support
• Powerful Processing Capabilities
• Integrity
• Authorization
SOFTWARE AND HARDWARE
REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Python 3.6 x or higher version
• Pandas Library preinstalled
• Matplotlib library preinstalled
HARDWARE REQUREMENTS :
• A computer or a laptop with operating
system- windows 7 or above.
• x86 64-bit CPU(Intel/AMD architecture)
• 4GB RAM
• 5 GB free disk space
WORKING DESCRIPTION
INTRODUCTION ABOUT PROJECT
Cricket is one of the popular game in India.
After the Start of IPL, Indian cricket standards
reached an ultimate level and many talented
players got a chance to prove themselves in a
platform like IPL where many international
cricketers play together. IPL is the one of the
leading cricket tournament in the world.
The Indian Premiere League (IPL) is a
professional league for Twenty20 cricket
championship in India. It was initiated by the
Board of Control for Cricket in India head
quartered in Mumbai and is supervised by BCCI
Vice president Rajeev shukhla who serves as the
league's chairman and commissioner. The IPL
works on a franchise system based on American
style of hiring players and transfers.
And cricket, as you can imagine, is ripe
with data points. It's a battle between bat and
ball played across different formats and different
levels. The ball-by-ball analysis of matches can
produce some surprising hidden insights, such as
batting partnerships and who the best batting
partner is.
THE MAIN OBJECTIVES OF THIS PROJECT IS:
• To find the team that had won by maximum runs.
• To find the team that had won by maximum wickets.
• To find the team that had won by minimum runs.
• To find the team that had won by minimum wickets.
• To find the season that had most number of matches.
• To find the Most Successful IPL Team.
• To find Players who got max times Man of Match.
DATA COLLECTION
Data has been collected from
www.iplt20.com,www.cricsheet.org. Data
consists of the ball by ball details for a total
of 696 matches from 2008-2018. Ball by
ball data provides in depth detail of all the
balls thrown in that particular over. The
ball could be either wide, dead, no ball or a
player got singles, doubles, triples, six or
four on that ball. There are two csv files of
datasets. Matches.csv.gives the details of
match venue, location, Season, contesting
team, about toss winner and toss decision,
match result, win got by runs or wickets,
player of the match, details of all the three
umpires and match Winner etc.
Deliveries.csv is the ball by ball data and the
combination of all the deliveries from
2008-18.
It consists of different attributes Match_id,
bowling team, batting team, batsmen,
bowler, Nonstriker, no ball runs, penalty
runs, Extra runs, over, total runs etc.
Innings tell if the first team was going on
field or second one. Over describes the
current over number. Ball describes the
current ball number of the current over.
DATA VISUALIZATION
The most important and significant part of
data visualization and predictive analysis is
to represent the data in form of charts and
graphs to get a visual presentation of data.
The collected data is visualized to get a
better and clear understanding about all
the parameters of the Season, the team,
All- rounders, batsmen and bowlers so that
it will be helpful for the team selectors
Captains and managers for the next auction.
Different packages are used to get the
proper analysis and visualization for players
and teams.
SOURCE CODE
import numpy as np # numerical computing
import pandas as pd # data processing, CSV file 1/0
(e.g. pd.read_csv)
import matplotlib.pyplot as plt #visualization
import seaborn as sns #modern visualization
plt.rcParams['figure.figsize'] = (14, 8)
sns.set_style("darkgrid")
df = pd.read_csv("E:¥ipl1.csv")
Print('--------------------------------------------')
print(' --------------------------------------------')
print(df.info())
print()
print(' --------------------------------------------')
print('--------------------------------------------')
print('Total Matches are::::',df['id'].max())
print()
print('-------------------------------------------- ')
print('--------------------------------------------')
print('How many seasons data we've got in the dataset?')
print(df['season'].unique())
print()
print('--------------------------------------------')
print('--------------------------------------------')
print('Which Team had won by maximum runs?')
print(df.iloc[df['win_by_runs'].idxmax()])
print()
print('--------------------------------------------')
print('--------------------------------------------')
print('Which Team had won by maximum wickets?')
print(df.iloc[df['win_by_wickets'].idxmax()]
['winner])
print()
print('--------------------------------------------')
print('--------------------------------------------')
print('Which Team had won by (closest margin) minimum
runs?')
print(df.iloc[df[df['win_by_runs'].ge(1)].win_by_runs.id
xmin()]['winner'])
print()
print('--------------------------------------------')
print('--------------------------------------------')
print('Which Team had won by minimum wickets?')
print(df.iloc[df[df['win_by_wickets'].ge(1)].win_by wic
kets.idxmin()))
print()
print('--------------------------------------------')
print('--------------------------------------------')
print('Which season had most number of matches?')
sns.countplot(x='season', data=df)
plt.show()
print()
print('--------------------------------------------')
print('--------------------------------------------')
print('The Most Successful IPL Team is:::')
data = df.winner.value_counts()
sns.barplot(y = data.index, x = data, orient='h')
print()
print('--------------------------------------------')
print('--------------------------------------------')
print('Players who got max times Man of Match are:::')
top_players=df.player_of_match.value_counts()[:10]
#sns.barplot(x="day", y="total_bill", data=tips)
fig, ax = plt.subplots()
ax.set_ylim([0,20])
ax.set_ylabel("Count")
ax.set_title("Top player of the match Winners")
#top_players.plot.bar()
sns.barplot(x = top_players.index, y = top_players, orient='v');
#palette="Blues");
plt.show()
Output
<class 'pandas.core.frame. DataFrame'>
RangeIndex: 637 entries, 0 to 636
Data columns (total 17 columns):
Match_SK 637 non-null int64
match_id 637 non-null int64
Team1 637 non-null object
Team2 637 non-null object
match_date 637 non-null object
Season_Year 637 non-null int64
Venue_Name 636 non-null object
City_Name 637 non-null object
Country_Name 637 non-null object
Toss_Winner 636 non-null object
match_winner 634 non-null object
Toss_Name 636 non-null object
Win_Type 635 non-null object
Outcome_Type 637 non-null object
ManOfMach 633 non-null object
Win_Margin 628 non-null float64
Country_id 637 non-null int64
dtypes: float64(1), int64(4), object(12)
memory usage: 84.7+ KB
df.groupby('Season_Year') ('match_winner').value_counts()
Season_Year match_winner
2008 Rajasthan Royals 13
Kings XI Punjab 10
Chennai Super Kings 9
Delhi Daredevils 7
Mumbai Indians 7
Kolkata Knight Riders 6
Royal Challengers Bangalore 4
Deccan Chargers 2
2009 Delhi Daredevils 10
Deccan Chargers 9
Royal Challengers Bangalore 9
Chennai Super Kings 8
Kings XI Punjab 7
Rajasthan Royals 6
Mumbai Indians 5
Kolkata Knight Riders 3
2010 Mumbai Indians 11
Chennai Super Kings 9
Deccan Chargers 8
Royal Challengers Bangalore 8
Delhi Daredevils 7
Kolkata Knight Riders 7
Rajasthan Royals 6
df['Season_Year'].value_counts()
2013 76
2012 74
2011 73
2017 60
2016 60
2014 60
2010 60
2015 59
2008 58
2009 57
Name: Season_Year, dtype: int64
Which season had most number of matches?
-----------------------------------
-----------------------------------
Most successful IPL Teams is:::
CONCLUSION
In this paper, the performance of cricket players(batsmen)
and toss related analysis in IPL from season 2008-2018 has
been visualized. Finding out the hidden parameters, patterns
and attributes that lead to the outcome of a cricket match
helps the team owners and selectors to recognize better
players. A salary of IPL cricket players is decided through the
auction process. Thus, it is a part of franchise and matter of
decision making about which player to be bided for and at
what cost by the past performance of players in IPL. Every
Selector needs young and dynamic players who can handle the
pressure calmly, and go towards the winning line.
This paper highlights the player performance especially
batsmen and addresses the analysis that is done for Maximum
Man of the Matches, Maximum Centuries Scored by Batsmen,
Top Batsmen, Batsmen with Top Strike Rate, Top 10 Players
with Maximum Runs. Statistics of 696 matches have been used
in this experiment and even for toss related analysis such as
Count of Toss wins, Decision taken by each team after winning
the toss, Toss Decision Season Wise, Toss Decision Team Wise.
SK Raina considered as the finest batsmen who is second in
the top list of batsmen having maximum runs, maximum man
of the matches, maximum centuries scored, V Kohli at the first
position of maximum runs and even he is in the list for
maximum centuries. All other Indian Star batsmen MS Dhoni
(Best Captain, Maximum runs and Maximum man of the
matches), Rishabh Pant (second best strike rate and maximum
centuries), RG Sharma, S Dhawan, G Gambhir, YK Pathan and
M Vijay performed very well at the end of last five overs.
Selectors have the clear choice to give preference to Indian
Players at first as they performed very well in season from
2008-2018.
We also presented toss related analysis, in which MS Dhoni is
the best captain for CSK who won the toss maximum times
having count of 77 and elected to bat first. Their choice of bat
first mostly results in win. Most of the times filed first is
elected by the captains so that they can plan and perform well
by chasing. RCB, KKR, MI and KXIP elected field first most of
the times having count of 57 and 49. Selectors have the clear
choice to select batsmen from Mumbai Indians and Kings XI
Punjab as this two teams handled the pressure very well
during all the Seasons from 2008-2018. By considering all
this visualization and toss related analysis, Team Management
can select the right players and rights teams at the time of
auction. A good and strong cricket team can be formed within
a given budget, which will have the highest chance of winning.
BIBLIOGRAPHY
1. Informatics Practices with Python by
Preeti Arora
2. http//:en.wikipedia.org
3. http//:www.botskOOl.com