Project Report Sample
Project Report Sample
INDUSTRY
INTERNSHIP
INSIGHTS
For private circulation only
CONTENTS
1 Introduction
1
2 Development of dashboard for Amazon sponsored products at Selvitate
Technologies Pvt Ltd
2
3 Data visualization and analytics using Qlik sense at Greencube Global
4
4 Analysis of online food ordering apps and their growth opportunities in
India
6
5 Binary regression model on online order cancellation using machine
learning algorithm at Voylla Fashions Pvt Ltd
8
6 Last mile operations visibility at Grofers
11
7 Exploratory Data Analysis and Visualization of Client Data at CIEL
HR Services.
13
8 Fraud detection in e-commerce industry at Kilobyte Tech Solutions
15
9 Analysis of customer sentiment towards digital marketing and
modelling of topics often discussed
17
10 Implementation of machine learning algorithm in optimization of
service request allocation in service desk query optimization at Unisys
India Pvt Ltd
19
11 Revenue estimation at Wipro Technologies
22
12 Material requirement planning using SAP at Roots India Pvt Ltd
24
13 Corporate Connect
26
14 Crossword
30
For private circulation only
INTRODUCTION
Industry Internships done in the summer months after the completion of the first year is a key
requirement to complete the MBA programme. Internships are expected to provide students with
an opportunity to apply their classroom learning to a real-life business situation. During the
process, students get an exposure to the real time working environment in an organization.
Internships provide an opportunity to be in an on-the-job work setting where there exists well
defined targets and timelines. Students get a feel of the challenges associated with specific roles
which they might have to face in an organization. Internships are expected to improve the critical
problem solving and analytical skills prior to taking up jobs in the Industry. Students also
develop interpersonal skills which are very much needed to excel in the competitive corporate
world.
Students identify and get in touch with reputed organizations in their area of Specialization with
clear focus on the learning process. Business Analytics students get to work on specific business
problems and find possible solutions by applying various tools & techniques of machine
learning. It is essential for a management student to identify problem, apply appropriate
methodology and find a solution to the problem with analytical mindset. Business understanding
or domain knowledge plays a major role in the problem identification step. Identifying and
understanding the variables which have an impact on the problem in-hand involves the next step.
Designing the methodology, applying appropriate algorithms to build a model and validating the
same is the crucial step which will take the model close to solving the identified problem.
Students are evaluated based on their final report and the capability to speak about their project
through a viva process. Overall, Internships are a valuable intervention for MBA students while
they are in the process of making a successful career path towards the Industry.
Business Analytics Specialization proudly presents the brief of selected Summer Internship
Projects taken up by the current batch of students. Needless to say, all of them had a great
learning experience.
1
For private circulation only
2
For private circulation only
Process Flow
The entire project can be analyzed in five stages as mentioned above wherein each stage of the
objectives is fulfilled.
The final outcome of the project is a report generation template that generates analysis reports
for multiple clients in a short span as well as a classification algorithm that categorizes the
products as critical, important and additional on the basis of performance over a period of three
months. The instruction manuals containing the operationalization of the report templates is yet
another outcome of the project.
LEARNINGS
1. An understanding of the various processes and the flow of business in the e-commerce
industry and techniques applied to power up the data analytics in Selvitate.
2. Learnt how to record and execute macros and utilize
it for the standardization of data, work with analytical
tools namely Google Data Studio and Microsoft Power BI.
PRATHISH SHOBI M
1827723
3
For private circulation only
4
For private circulation only
the resulting data back to the Qlik Engine. Lastly, the resulting data is combined with the
existing in-memory data and immediately visualized for the user.
For the study, a Retail Apparel store data is used to build a Qlik Sense Visualization application
that was used to build Dashboard, analyze Region wise Sales, Product wise Sales, Customers
analysis, Time Series Analysis and forecasting. In the application, Clustering and Time Series
Analysis and Forecasting is done using R integration. Kmeans algorithm is used for Clustering
and Holt-Winters Algorithm is used for Time Series. From the time Series forecast, the business
will be able to take better investment and production decisions depending upon the product
category which could fetch more sales. From the, it was inferred that there would be a rise in
the sales of Children’s wear and fall in the sales of Sportswear in the future.
Another Visualization application is built on a Health care data set. The objective was to build
an application to analyze the parameters that would lead to heart attack and predict if a patient is
likely to get heart attack. The model is having 77.55% accurate and is having 60.31% precision
in predicting if a person would have a heart attack. A prediction template was built in a Qlik
Sense application for a real time output.
This work has presented an advanced approach in Business Intelligence where not only pre
computed questions are visualized, but also the extensive machine learning algorithms are built
to provide predictive insights for the Business, which would help in better data driven decisions
LEARNINGS
NAMRATA
MANGALGI
1827648
5
For private circulation only
6
For private circulation only
The internship has definitely given me a better understanding of my skill sets and the industry
expectations. This internship has provided a practical exposure to the theoretical concepts learnt
from classroom courses.
LEARNINGS
1. In depth understanding about food industry and importance of analytics in decision making
2. Understanding about classification models, various algorithms and how to choose accurate
model
MONICA CHOWDARY
1827042
7
For private circulation only
Data Preparation
• Data preprocessing
➢ Removing NAs: The model cannot be Run if there are NA’s in the dataset. Therefore,
all the NAs in character variables were replaced by creating a new category of NAs, and
NAs in numeric variables were replaced by the mean. Deleting the NAs leads to a lot of
data loss. So, NAs were replaced.
➢ Converting the class of variable: As per the requirement, the character variables were
converted into categorical variables using as.factor.
➢ Dividing the data into Train & Test: For a machine learning model the data has to be
divided into training and testing. Training data helps the model in learning and
understanding the pattern and Test data is used to check the accuracy of the model.
Modeling
The tool used to build the model is R and algorithm used is Extreme Gradient Boosting
(XGBoost). To check the Cut-off Value & prediction ability ROC curve and AUC curve is
used.
The model is built using the following steps:
After preprocessing the data:
1. BORUTA algorithm is used to select important variables out of 47 variables.
2. Created Sparse Model Matrix used in XGBoost Model
3. Defined desired parameters Manually
4. Built the XGBoost model after getting Best Iterations
5. Checked accuracy using Prediction Matrix
6. Built Feature importance graph
7. Made ROC curve to check which cut off value is giving higher accuracy
8. Made Area Under Curve which gives us the discriminative power of a model
9. Built a machine learning model on XGBoost which will automatically set the parameters
which will give Higher accuracy
9
For private circulation only
Findings
• The company is facing huge loss due to many products getting returned from
customers.
• There was a lot of cost involved in packaging, dispatching and then returning it back.
• In case of RTO, the product is usually in good condition and is sent back to inventory.
• In the first case, when the product is received back at retailer's warehouse, a quality
check is done and if good, it is sold again. If the product cannot be sent back to vendor
for any reason, then it goes for liquidation, where the retailer tries to liquidate it to
someone who can buy it generally at a lower cost
• The percentage of RTO mainly depends on the following features Amount Total,
Previous RTO percentage, Device Group, State & Tier, Address Length, Gender,
Source, TOD,
• ROC curve gives us the Cut-off value at which the accuracy is Max. In this model the
cut-off is 4.8 and max accuracy is 72%.
• Area under curve tells us how much our model is able to predict the values, here the
AUC is 0.79 that means out of 100 values our model is able to predict 79 values.
LEARNINGS
SACHIN S KATIYAR
1827862
10
For private circulation only
The summer internship project was done in last mile logistics. Last mile logistics is the
movement of goods from the delivery points to the customer. It is the last leg of order
fulfilment. The last mile logistics team provides technical solutions and support for smooth
operations. The company is looking at optimizing its last mile logistics operations by reducing
the time and cost of order fulfilment, and is hence undertaking several projects to achieve the
same.
The different levels of management do not have good visibility on all the activities performed
during last mile operations. The company operates in 18 cities, and in order to optimize the
activities, it is essential to have a clear view of the current operations.
Operations data is being captured in their various application databases. Each team captures data
in different systems. The primary task during the internship was to extract the data from
application database and perform data preprocessing on the different data sets, and was finally
presented on tableau dashboard. Different levels of management require different levels of
visibility. The lower level management will require a more detailed report of all the activities
being performed in their stations, and the top-level management requires an overview of
operations in the different cities. Different reports need to be generated to provide the views.
SQL queries in Redash application was used to extract the data from the application database
and to perform the data pre-processing. Reliability data for reschedules, cancellations, on-time
delivery and delivery failed have been arrived at. This data will be used to track reliability of the
last mile operations. Further analysis of this data can aid in-time and cost optimization through
identification of pain points.
Automatic mailer has been setup to send the reports to different stakeholders providing visibility
on the number of orders successfully delivered by filed executives. This enables managers
11
For private circulation only
to track the performance of field executives. Another automatic mailer was setup to send a report
that gives information on the items settled by GSP’s so that the station manager can deduct the
pendency from the GSP's salary.
A sentiment analysis of the customer was done so that the company can understand the
sentiments of the customers better towards Grofers. The data extracted from twitter, and the
tweets were classified as positive or negative. A word cloud was generated to identify what the
customers are talking about.
An analysis of the reports and the sentiment analysis was shown, which helped us to infer that
reschedules was causing distress to customers. Grofers needs to minimize the number of
reschedules from their end. This can be done by improving capacity planning at the stations.
LEARNINGS
POOJA KUMAR
PAIKERA
1828149
12
For private circulation only
CIEL HR Services is into the HR Services industry powered by technology and analytics to
deliver full range of Recruiting services from Executive Search, Recruitment Process
Outsourcing and Staffing (Permanent as well as Temporary roles) through its offices in India.
In recent years, The Global HR and Recruitment Services industry has grown by 5.3% to reach
revenue of $523 Billion in 2018. HR industry is witnessing a lot of development due to
increased adoption of technology in the hiring process. This includes the smart use of people
analytics in hiring process and the software systems that assist in making the life of HR team
easier.
The cloud deployment is being increased at a rapid pace in the industry.
Problem statement
So, CIEL is trying to compete with the other competitors to expand its client base in this
industry. It has 150 clients to work with across various industries and by filling various roles.
These clients are the only income source to CIEL. So, the more they analyze clients on how they
are performing, the greater the revenues would be for the organization. So, with the past three
years’ client data, the main problem was to analyze each client’s performance over the years,
how frequently the company is losing clients and how much revenue can they make with the
client in the next few years. Hence, the problem statement is focused across the performance of
the clients in the past and the forecasting.
Objectives
The main objectives of study at CIEL HR Services are as follows:
1) To understand the pattern of client behaviour and performance, and identify the top
clients at CIEL HR Services Pvt. Ltd.
2) Enhancing and reporting the web analytics insights using Google Trends and Google
Analytics.
13
For private circulation only
Prepare
Dashboards Forecast the
Clean the Data
Explore the and create revenues made
and make data
Gather Clients Data and visualizations by the top Get Business
ready for
Data Identify key in terms of clients . Insights
furthur
columns clients and the Deploying 80-
analysis
profits made 20 Rule
by them
Findings
The client base of CIEL is 150 and the occurrence of the clients from the past 3 years is very
low. We can say the company fits for 80-20 rule i.e. 80% of the profits are made by the 20% of
clients. So, retaining the valuable clients helps them to stay in profits.
Coming to the website visitors, the company is not getting much clients from the website.
Website visibility should be boosted. Company’s ranking in terms of visitors is drastically
coming down.
Recommendations
• CIEL should concentrate on top 10 valuable clients.
• CIEL can avoid retaining clients from which they aren’t making any revenues
• CIEL website’s chatbot should be made more interactive by training and allowing it to
answer all user queries.
• They should concentrate on website visibility, boost the digital marketing.
LEARNINGS
1. During the internship in CIEL HR Services as a Data
Analyst, the main learnings were on how to handle the
data, maintain and clean it thoroughly in order to
ensure that the end result will be as accurate as
possible.
2. Apart from this, it gave an exposure to tools and
techniques that helped in expediting various business
processes.
SUNIL
1827111
14
For private circulation only
15
For private circulation only
LEARNINGS
1.In depth Knowledge about data cleaning and analyzing unstructured dataset
2.Basic understanding about classification models and various algorithms used for the same.
SHWETHA SP
1827048
16
0
For private circulation only
17
For private circulation only
visualization by constructing word cloud. Sentiment analysis using Afinn lexicon and topic
modeling by Latent Dirichlet Allocation (LDA) were the modeling techniques applied.
However, the business objective of this project did not require deploying any model, hence the
project was completed at the stage of evaluation.
Findings:
Sentiment Analysis results indicated that from a total of 300 responses, only 29 responses
(approx. 10%) were classified with negative sentiment. Three most often discussed topics were
extracted through topic modeling and by mapping the negative sentiment responses to the
extracted topics, the findings are as follows:
• Topic 1: Customers have talked about jobs and workshops.
There were questions raised on what are the companies that would offer job at skill velocity
• Topic 2: Customers have talked about attractiveness of the poster.
Few customers have questioned the credibility of the company as no website details were
mentioned in the poster.
• Topic 3: Customers have talked about information and usefulness to student Community
with regard to workshops and jobs.
Customers would prefer to know what skills the workshop is offering to benefit the
students/fresher’s and also mention the same in their promotional poster.
As per the observations recoded, implementable suggestions were provided for improving the co
mpany’s digital marketing campaign in its effectiveness to attract the target customers.
LEARNINGS
SHYALAJA S
1827956
18
For private circulation only
Industry profile
INDIA PVT LTD
The Information Technology & Information Technology Enabled Services (IT-ITeS) sector is a
field which is undergoing rapid evolution and is changing the shape of Indian business
standards. This sector includes software development, consultancies, software management,
online services and business process outsourcing (BPO).
Market size of IT industry:
India’s IT & ITeS industry grew to US$ 181 billion in 2018-19. Exports from the industry
increased to US$ 137 billion in FY19 while domestic revenues (including hardware) advanced to
US$ 44 billion. According to the NASSCOM’s report the year 2018 is considered to be the year
of growth because of the new technology which is changing the businesses. The image below
provides information of IT industry’s contribution towards global economy, Indian economy and
global tech spending.
Problem Statement
Service desk in Unisys handles all the queries related to hardware, software and network issues
which are received from across the globe. They are required to resolve the issue quickly so as to
not cause any work delay that affects the day to day business operations. The current situation is
such that when the query is been raised it is redirected to the team in general which is time-
consuming.
Objectives
• To analyse the various service request received from the customers across the globe.
• To categorize the service requests and to optimally allocate the service request queries to
the appropriate support team.
19
For private circulation only
Business Understanding
Unisys is a service-based company which will provide service-centric solutions to business and
governments. When the service is been provided and if there are any queries which have to be
resolved the customers will contact the service desk department for their query to be addressed.
The service desk department will receive customer queries from various parts of the globe and
will receive nearly three thousand queries from Bangalore alone every month. Wherein each
query is been addressed within 30 minutes. This is a very critical process, where the query has to
be directed to the appropriate support team to resolve the query. The time taken to resolve the
query is been optimized by allocating the queries to the correct department.
The project deals with developing a model which will automatically categorize the service desk
queries to its appropriate categories.
Modeling
The tool used to build a model is Microsoft Azure. Microsoft Azure machine learning studio has
a specific experiment workflow, this workflow is been followed to build a classification model.
The data was imported and the text pre-processing was done such as removal of stop words,
lemmatization, detect sentences, normalize the cases to lower case, remove numbers, remove
special characters, remove an email address. Feature hashing is the next step which is used to
convert variable-length text to equal length numeric feature vector. The data was split in the ratio
of 80-20 which is used for training and testing of the data model. The Algorithm chosen was
multiclass neural network.
20
For private circulation only
Findings
With respect to the analysis of the service desk queries, some of the findings are:
• Most of the queries which the service desk received were under the category
Application, Hardware and Network related query.
• Most queried subcategory was accessed connectivity and password reset.
LEARNINGS
• Learned most of the aspects of R programming for text analytics such as text mining, tm
(text mining) package and word cloud.
• Learned how to work on Microsoft Azure Machine Learning Studio wherein the model
was deployed.
KAVYA S
1827943
21
For private circulation only
22
For private circulation only
✓ The model was automated with 90% accuracy (tested on previous quarters) using excel as
recommended by the team. the model with project type revenue generation was
recommended strongly because of its accuracy. Dash board was created to understand the
valuable customers with slicing and dicing of the customers according to the business unit
and Service line with top 10% revenue generated to make sure that men can be deployed
without any delay. Dashboard also shows the revenue generated for region wise, revenue
wise, BU wise, SL wise etc.
Recommendations such as
✓ All the business units and service lines must be consulted every time to know the additions
and deletions in the process in order to know the pass through, removal of project
✓ This estimation process gives 90% accuracy and should be deployed for all other
departments in order to make the estimation a little more convenient.
✓ The data collected must be in a granular level so that the accuracy increases and this can be
achieved by using grade wise
✓ As we can see the Business grow, deployment must be ready in hand to avoid turnover ratio
were given.
LEARNINGS
HARITHA
PONNALAGI R
1827742
23
For private circulation only
24
For private circulation only
order and documentation were different from the invoice and the SAP data for about 10% of the
data. Therefore, the accuracy of the lead time is not accurate. With the collected data it was
found that the suppliers send the stocks randomly based on their production. The reason for the
changes in the lead time of the components is because the PO orders are not returned back in the
original PO numbers.
The accuracy of the previously available MRP is 70%, and the accuracy of the calculated MRP
is 90%, an improvement of 20% in the MRP has been achieved. The recommendations given are
Complete MRP using SAP, more concentration towards factors of rejection so that the cost of re-
work can be reduced. Since PO dates and 101 dates are entered manually to SAP, the lead time
is inaccurate. Since the FG plan is completely based on the data form SAP, there is a drastic
difference between the schedule and dispatch.
LEARNINGS
1. How to implement material requirement planning in SAP, so that forecasting of
production is done with more accuracy.
2. The concepts of operational management been implemented in the company for more
efficient product line.
MADHUMITHA S
1827546
25
For private circulation only
26
For private circulation only
2. Typing can be specific at one go. But, technology like Alexa which is a back
and forth interaction can be time consuming. Don’t you think this is a
disadvantage?
It actually depends on the use case. For example, in search, one needs to open a laptop and
type it out, where there might be errors and then hit enter. Instead you can open Alexa and
can say something like “Hey Alexa, how many centuries did Virat Kohli make?” which
makes it easy. So, it is a lot of things like ordering food when you are a picky eater. May be
people say that screens are better, and I agree to it that screens are better. But say if you have
to reorder a thing, say “Alexa, I want to repeat my last order”. So, in such cases you can go
with Alexa.
3. In what way and how fast do you think Alexa’s technology will impact the
businesses, and which industries will it affect?
It is actually interesting. Last year we have launched something and is right now available
only in the US. It is called ‘Alexa for Business’. It is going to revolutionize the work, where
you go to a meeting room, you have to set up the TV, dial people and stuff. Right now, there
are offices in the US where they use Alexa in the office room. I walk inside the room and say
“Alexa, start the meeting”, and it figures out that right now it is 3 pm and dials by itself. All
of it happens automatically. In future, it might even be able to take meeting notes and give
you minutes of the meeting. And this is just one-way Alexa enables all of this value chain.
From a fresher joining, who wants an employee orientation, to filing a ticket by asking Alexa
at the desk, to the CEO who wants a daily sales report. Instead of opening the system and
drawing three tableau reports which takes around 10 minutes, one can just say “What are my
top sales today?” or “What are the sales in quarter 4 looking like?”. I think, throughout the
entire employee organization there is some sort of value added.
4. What are some of the security aspects that one has to consider while
talking about such a technology?
We take security and privacy very seriously. Anyone who builds a skill for something
specific, at no point will anyone get the transcript of what the user has sent. This is
something we take very seriously, and not going to change. To be a bit technical, there is
something called intents and slot values. Intent is an action which a user wants to do. So,
27
For private circulation only
the user says “What’s the weather?”, and the intent is to get the weather, and the slot value is
any variable within that sentence. So, I say “Get me the weather in New York” or “Get me
the weather in Bangalore”. So, the intent is the action and the slot which is either Bangalore
or New York. You will never get the full sentence of what the user is saying. That’s just one
step. The other is, all these devices have something called as wake word, and Alexa won’t
process anything until you say the word ‘Alexa’ which is a wake word. Alexa understands
that, and now is when it processes it. Without this, it might record everything in the
background, which is not what we are doing.
5. How does Alexa understand that two sentences mean the same even when
all of the words are different in both the sentences?
Essentially, in any conversational interface, there are different ways of saying things. There
is something called as Natural Language Processing in technology which is a subset of AI,
which is how computer would understand human speech. So, human speech is largely
unstructured, whereas computers can understand structured data. And, by structured I mean
array and such things. So, what NLP does is, it takes the sentence that the human says and
converts it to something structured which a computer can understand. And that structure is
essentially what an intent is and what the slot values are. So, for any chatbot or
conversational interface, this is the transaction that is happening. A human sentence is taken
and there is an algorithm for NLP which pulls out what the intent is and the slot values are. It
creates and builds a structure and feeds that into a program which can then do something.
6. How important is it to understand the emotion in a sentence?
So, that’s a tricky part because it is very important and that is what distinguishes a human
from a computer or laptop for instance. It is important as we read news reports recently,
about Alexa being able to understand emotions. There is something in computers called as
sentiment analysis which sort of sees how the words you are using the way you are using,
and can recognize the sentiment. It will give you a percentage. So, it will tell you it is 100%
positive or 50% positive. And this is something we are working on. It is also very difficult
because two different people can express the same emotions slightly in a different way, like
how they can say the same thing, they can also express the emotions differently. One being
angry versus someone else being angry is going to be a different sort of interaction. It is
something we are working on.
28
For private circulation only
7. Which stage according to you is the most difficult stage in the development
of the voice component in Alexa?
I don’t know which is most complex, but I can tell which one is the most important, which is
the voice designing phase. This is because the technology aspect is something that we are
good at, and we can all figure it out. But unfortunately, people don’t pay much attention to
the voice designing stage. Designing conversation is something that is so different from what
we have been doing till now, which is designing websites and apps. Those are screen-based
devices which are visually good for the eye. So, I think you should spend more time on this,
and this in turn leads to good skills and the best skills come out.
8. Can you give an advice for students who are aspiring to reach great
heights in the field of business analytics?
At present, technology and analytics are something in the process of becoming big. And the
playing field is completely open. Sitting in India, you can build a system for your client in
US. Thanks to technology. So, there is tremendous scope here, and there is always a first
mover advantage, and figuring out the right use case will really be good in the future. So,
that’s my advice to you.
9. As you are working in a field where creativity is of prime importance, what
priority must we give for it in the field of analytics?
I am not an analytics expert, but I feel there is always room for creativity. That is what
differentiates the insights in business analytics. There are some creative ways of getting data
or what you do with INTERVIEWERS
the data. Now, access to
data is not the problem,
it is about what you do
with it. And, that’s
where creativity plays a
big role.
29
For private circulation only
CROSSWORD
30
For private circulation only
ANSWERS
1. BLOCKCHAIN
2. COMPUTER VISION
3. DARK DATA
4. TEXT MINING
5. SELF SERVICE ANALYTICS
6. SPARK
7. SEMANTIC SEGMENTATION
8. AUGMENTED ANALYTICS
9. DASHBOARD
10. EDGE COMPUTING
31
DATA GEEK CREW
32
For private circulation only
33