Forecasting Number of Indian Startups Using Supervised Learning Regression Models
Forecasting Number of Indian Startups Using Supervised Learning Regression Models
Application, Silver Oak Univercity , Engineering, Sigma Baroda, Vadodara, Gujarat, India
Gota Ahmedabad,Gujarat University,Vadodara, Gujarat [email protected]
[email protected] [email protected]
Abstract— For regulators and investors, estimating the The study will focus on identifying the significant factors
potential of the Indian market requires accurately predicting that drive the growth of startups in India, such as the
the number of startups in the Indian ecosystem. S tartup availability of funding, government policies, market demand,
growth may be predicted with great accuracy using Supervised and competition. In order to forecast future growth based on
Learning Regression models. These models take into account a these characteristics, linear models will be developed using
wide range of variables, including financing, market demand, past data.
and competition. The purpose of this research is to use
Supervised Learning Regression models to make pre dictions
about the future of the startup scene in India. Information
from the S tartup Database, official papers, and scholarly
journals all factored into the analysis. S upervised Learning
Regression models are then used to make predictions about
future growth based on the identified variables, using training
data taken from the past. Factors including finance
availability, government regulations, and market demand are
identified in the report as having a substantial influence on the
number of startups in India. The potential expansion of the
startup sector in India is foreseen by using Supervised
Learning Regression models to forecast the future number of
companies in the Indian ecosystem. The findings of this
research support the use of linear models for estimating future
startup activity in In dia. Policymakers and investors may
benefit from this study's results by learning more about the Fig. 1. Indian Startups [15]
forces that are propelling India's startup scene forward.
Policy makers and investors may utilise the findings of
Keywords— Indian Startups, Forecasting, Linear Models, this study to better understand the Indian market and the
Growth, Factors. startup industry's potential for development. The results of
this research can add to what is already known about
I. INT RODUCT ION utilising Supervised Learning Regression models to predict
The number of Indian startups operating in a wide range the expansion of new businesses.
of industries has increased dramatically in recent years. The
prospects for these new businesses depend on a number of II. RELAT ED W ORK
elements, such as access to capital, regulatory environment, This literature analysis draws from fourteen publications
consumer demand, and level of co mpetition. Policy makers that explore the potential of machine learning and other data
and investors in the Indian market may benefit fro m accurate analytics tools for gauging the prospects of new businesses.
projections of startup growth. The high percentage of startup failure makes this a
Predicting a startup's growth from a variety of inputs significant topic of study, as does the need to isolate the most
using a Supervised Learning Regression model has shown to promising new ventures.
be a valuable tool. Using Supervised Learning Regression Savin et al. (2023) employs a topic-based categorization
models, the total number of Indian startup companies is system to identify worldwide patterns among new
predicted. The startup database, government papers, and businesses. The authors provide a process for spotting new
scholarly publications will all be used to determine what developments and anticipating how they may affect new
variables influence the development of startups in India. businesses. Social media, news stories, and company reports
were among the many data sets analysed for this research. that metrics of network centrality are useful for forecasting
The results demonstrate the efficiency of the suggested startups' fortunes.
strategy in spotting tendencies and foreseeing their
Arroyo et al(2019) .'s research examines the usefulness of
prospective effects on new businesses.
machine learn ing algorithms for assisting with VC
Investment returns for new businesses may be predicted investment decisions [9]. Five different machine learning
with the use of AI algorith ms and econometric models, as algorithms were evaluated for their ability to predict the
investigated in [2] by Farahani et al. Portfolio optimization success of 272 startups using this dataset. Based on their
utilising VaR and C-VaR is also suggested by the authors. findings, the Random Forest and Support Vector Machine
Financial documents, stock market data, and news articles (SVM) algorithms are the most effective at forecasting the
are only some of the data sources that were analysed for this long-term viability of new businesses. The research shows
research. The results verify the efficacy of the suggested that machine learning might be useful in assisting VC
approach in estimating ROI for new businesses. investment choices.
Success rates of new businesses are predicted using By co mbining conventional econometric models with
machine learning methods in [3] by Bangdiwala et al. The machine learn ing algorithms, Krishna et al. (2019)
authors present a strategy for discovering what makes a established a new framework for foreseeing the success of
business successful and then utilising that knowledge to train new businesses [10]. The authors created a prototype system
a machine learning model. Financial documents, industry that takes into account details including the start-physical
reports, and news articles were among the data sets analysed up's location, industry, and financing history. They compared
for this research. The results verify the validity of the the results of their method to those of conventional
suggested approach for forecasting the long-term viability of econometric models by applying it to a dataset consisting of
new businesses. 2,000 startups. When compared to conventional econometric
models, the results demonstrate that the machine learning -
In [4], Castle et al. (2021) zero in on the lessons learned based method is superior at forecasting the long -term
fro m forecasting contests as a basis for their forecasting
performance of startups.
guidelines. Multiple models, expert opinion, and data
visualisation are all discussed as aspects that contribute to Dellermann et al. (2018) provide a hybrid intelligence
reliable forecasting by the authors. The research was approach to forecasting early-stage start-up performance in
conducted by analysing results fro m many prediction their study [11]. The authors created a prototype approach to
contests. The results demonstrate the efficacy of the forecast the success of startups that blends human assessment
suggested forecasting principles in raising the bar of with machine learning algorithms. Using data from 600
predicting precision. startups, the system demonstrated that its hybrid intelligence
approach provided more accurate predictions of startup
CapitalVX is a machine learn ing model proposed by
success than did conventional machine learning techniques.
Ross et al. (2021) for use in selecting startups and predicting
when they would fail. Financial documents, industry studies, The accuracy of linear regression and support vector
and news stories are only some of the materials the authors regression (SVR) algorithms for forecasting the success of
utilise to train the model. The research demonstrates that the startups was compared by Kavitha et al. (2017) in [12]. The
CapitalVX model accurately predicts the success rate of authors examined the accuracy of the two algorithms in
startups and pinpoints the optimal exit option. forecasting the return on investment using a dataset including
information on one thousand different startups. Their
In [6], Varma (2021) reviews the state of the art in findings demonstrate that the SVR algorith m is superior than
predicting the success of new businesses using machine
linear regression for forecasting the long-term viability of
learning. Data quality and model interpretability are only two
new businesses.
of the issues the author highlights as major obstacles to the
advancement of machine learning-based prediction methods. Cassar (2014) looked on how prior business and industry
This article is helpful since it su mmarises current research on expert ise might foretell a co mpany's future performance in
using machine learning to predict the success of new [13]. Using data from 657 new businesses, the author
businesses. discovered that individuals having previous start-up
experience had a better chance of success than those without.
Using Crunchbase data, [7] bikowski and Antosiuk
Furthermore, the author discovered that prior expertise in the
(2021) offer a machine learning method that is devoid of
sector is more essential than prior start-up experience in
bias. The authors forecast the success of businesses using a determining the success of new businesses.
variety of machine learning approaches, such as decision
trees and logistic regression. The research demonstrates that In [14], Shalabh (2013) investigates the problem of
the suggested method accurately identifies the most accurate prediction using linear regression models once
important determinants for startup success and accurately again. To boost the performance of linear regression models,
predicts the chance of success. the author suggested a strategy that combines principal
component analysis with ridge regression. The research
Bonaventura et al(2020) .'s research [8] exp lores the use
emphasises the need for accurate prediction methodologies
of network centrality indicators for predicting success in the in the startup industry.
global start-up network. The authors analysed the ties
between startups and investors using a database of 80,000 In conclusion, the literature study emphasises the value
companies, which represents more than 3 million contacts. of machine learning and other data analytics techniques for
Their findings indicate that start-ups with t ies to a few of gauging the potential of new businesses. Topic-based
powerful investors do better than their peers. Results show categorization, econometric modelling, and various machine
learning algorithms are only some of the methods proposed
for prediction in the papers reviewed in this overview. The the separate trees. The overfitting problem of decision tree
findings demonstrate the efficacy of these methods in regression is solved by random forest regression, leading to
determin ing the most important criteria for a startup's success improved performance in most cases. Predicting real estate
and in making accurate predictions of that success. and stock market values, as well as the results of medical
procedures, are just a few of its many uses.
III. M ET HODOLOGY
In conclusion, while linear regression is a straightforward
A. Dataset technique, its underlying assumption of a linear relationship
The top 300 Indian startups are represented in the given between variables limits its ability to capture complex non -
dataset, which contains the following categories of linear patterns in data, the more flexible decision tree
information: regression and random forest regression offer greater
potential for discovery. Compared to random forest
Company - Name of the startup. regression, which uses many decision trees to boost model
City - The city where the startup is headquartered. performance, choice tree regression is a single-tree approach.
Starting Year - The year in which the startup was There are many different types of regress ion models
founded. available, and selecting the most appropriate one will depend
Founders - Names of the startup's founders. on the nature of the issue at hand and the data being
Industries - The industry sector in which the startup analysed.
operates.
No. of Emp loyees - The total number of employees IV. NUMBER OF ST ART UP FORCAST ING
working for the startup. Using linear, random forest, and decision tree regression,
Funding Amount in USD - The total amount of funding figure 2 depicts the expected flo w of Indian startups between
received by the startup in US dollars. 1984 and 2022.
Funding Rounds - The number of times the startup has
raised funds from the market. Each funding round Data Reading
requires the founders to trade equity in their business Indian Startup (1984-2022)
for capital to advance their companies to the next
level.
No. of Investors - The total number of investors who Pre -Processing
have invested in the startup. Null-removal and Duplicate Removal
V. RESULT A NALYSIS
Here, the performance of linear, random forest, and
decision tree regression are examined on data spanning
Indian startups' founding years (1984-2022).
REFERENCES
[1] I. Savin, K. Chukavina, and A. Pushkarev, Topic-based classification
and identification of global trends for startup companies, vol. 60, no.
2. Springer US, 2023. doi: 10.1007/s11187-022-00609-6.
[2] M. Farahani, M. Shahvaroughi Farahani, and A. Esfahani,
“ Forecasting Startup Return using Artificial Intelligence Methods and
Econometric Models and Portfolio Optimization Using VaR and C-
VaR,” International journal of innovation in Engineering, vol. 2, no.
1, pp. 78–109, 2022, [Online]. Available:
https://www.researchgate.net/publication/362073994