Sample 3

The document describes a dataset provided for a hackathon challenge to predict missing sales transaction data for certain stores. The dataset includes ideal store transaction data, incomplete transaction data for selected stores that needs to be imputed, mappings of the data, and a validation file to submit predicted totals to. The goal is to build a model to impute the missing transaction values for select stores and categories using available data factors, and optionally detect outlier stores.

Uploaded by

Amol Katkar

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Sample 3

Uploaded by

Amol Katkar

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

Nielsen receives transaction level scanning data (POS Data) from its partner stores

on a regular basis. Stores sharing POS data include bigger format store types such
as supermarkets, hypermarkets as well as smaller traditional trade grocery stores
(Kirana stores), medical stores etc. using a POS machine.

While in a bigger format store, all items for all transactions are scanned using a
POS machine, smaller and more localized shops do not have a 100% compliance rate in
terms of scanning and inputting information into the POS machine for all
transactions.

A transaction involving a single packet of chips or a single piece of candy may not
be scanned and recorded to spare customer the inconvenience or during rush hours
when the store is crowded with customers.

Thus, the data received from such stores is often incomplete and lacks complete
information of all transactions completed within a day.

Additionally, apart from incomplete transaction data in a day, it is observed that

certain stores do not share data for all active days. Stores share data ranging
from 2 to 28 days in a month. While it is possible to impute/extrapolate data for 2
days of a month using 28 days of actual historical data, the vice versa is not
recommended.

Today, a blanket call is taken to include or not include the data for a store given
its compliance rate and data quality. Inactivity for a couple of hours currently
disqualifies a store for the whole day. While these limits are effective, it leads
to high wastage of available data.

Nielsen expects you to develop an automated process to filter usable stores and
fill the data gaps for those stores. There are two key areas which needs to be
addressed in the proposed solution:

Primary objective is to build an imputation and/or extrapolation model to fill the

missing data gaps for select stores by analyzing the data and determine which
factors/variables/features can help best predict the store sales.

Note: You will be requested to present and defend the logic in your analysis.

Optional objective - An outlier detection system to ensure selected stores do not

skew the sales at a category level.

Task: Impute/Extrapolate the total value of the group products for the respective
stores in the respective months.

About DataSet:
In this hackathon, you are provided with the dataset that contains store level data
by brands and categories for select stores.

DataSet Description:

Hackathon_ Ideal_Data - The file contains brand level data for 10 stores for the
last 3 months. This can be referred to as the ideal data.

Hackathon_Working_Data - This contains data for selected stores which are missing
and/or incomplete.

Hackathon_Mapping_File - This file is provided to help understand the column names

in the data set.
Hackathon_Validation_Data - This file contains the data stores and product groups
for which you have to predict the TOTALVALUE.

Sample Submission - This file represents what needs to be uploaded as output by

candidate in the same format. The sample data is provided in the file to help
understand the columns and values required.

Hi Candidates, PFB FAQs based on different queries on Problem Statement:

1) What is the difference between Data_Ideal and Data_Working?--> Ideal data file
contains data for stores which are complete. This can be used as a to understand
how complete stores behave. On the contrary, Data_Working files is the actual
working data file that needs to be worked upon. This file has incomplete
transactions that need to be imputed/predicted.

2)What does it mean by ideal data?--> Ideal data refers to truth set or data from a
set of stores that are complete. This is how the end data should look like at a
store level.

3)What is the difference between N and P?--> The difference is naming is just to
ensure that stores in Data_Working and Data_Validation are distinct and independent
of each other. They have no other relevance.
4)What is the objective of this problem? --> The objective of the problem is to
learn store behaviour across categories from Ideal_Data and use that on
Working_Data to predict/impute missing values and then input those values in the
Validation_Data file as the final submission.

Case Study - SaaSafras
No ratings yet
Case Study - SaaSafras
7 pages
Content Chart Champions
No ratings yet
Content Chart Champions
91 pages
SAP Course For Beginners PDF
100% (1)
SAP Course For Beginners PDF
133 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
SAP IS-Retail Interview Questions, Answers, and Explanations
From Everand
SAP IS-Retail Interview Questions, Answers, and Explanations
equitypress
3/5 (11)
Features of Data Ware House: Subject-Oriented
No ratings yet
Features of Data Ware House: Subject-Oriented
14 pages
Defining Features of Data Warehouse
No ratings yet
Defining Features of Data Warehouse
3 pages
8 Dimensional Modeling
No ratings yet
8 Dimensional Modeling
35 pages
TSK 1
No ratings yet
TSK 1
3 pages
Data Warehousing and Business Intelligence
No ratings yet
Data Warehousing and Business Intelligence
8 pages
Differentiate Between OLTP and Data Warehouse
No ratings yet
Differentiate Between OLTP and Data Warehouse
10 pages
DWDM u-1
No ratings yet
DWDM u-1
45 pages
Case Study-Retail Walmart Store Sales Prediction - Forecasting
No ratings yet
Case Study-Retail Walmart Store Sales Prediction - Forecasting
3 pages
Retail Data Warehousing
No ratings yet
Retail Data Warehousing
5 pages
Chap 7 SYS210 DR Samreen Sep 2023
No ratings yet
Chap 7 SYS210 DR Samreen Sep 2023
56 pages
Introduction-to-Data-Warehousing-hand-out (1)
No ratings yet
Introduction-to-Data-Warehousing-hand-out (1)
5 pages
MC0088 Data Warehousing & Data Mining
No ratings yet
MC0088 Data Warehousing & Data Mining
10 pages
Datawarehouse Interview Quesion and Answers
100% (1)
Datawarehouse Interview Quesion and Answers
230 pages
What Is Data Mining
No ratings yet
What Is Data Mining
10 pages
Create First Data WareHouse
No ratings yet
Create First Data WareHouse
39 pages
What Is Data Warehouse?: Explanatory Note
No ratings yet
What Is Data Warehouse?: Explanatory Note
10 pages
Data Warehouse
No ratings yet
Data Warehouse
3 pages
Data Warehouse: Data:-Raw Facts Ware House: - Godown
No ratings yet
Data Warehouse: Data:-Raw Facts Ware House: - Godown
12 pages
DataStage Matter
No ratings yet
DataStage Matter
81 pages
Data Warehouse: Meaning, Features, Applications, Architecture, Functions, Terminology
No ratings yet
Data Warehouse: Meaning, Features, Applications, Architecture, Functions, Terminology
13 pages
Unit II DATA BI
No ratings yet
Unit II DATA BI
13 pages
Course Project - Miranda
No ratings yet
Course Project - Miranda
4 pages
ETL Specific
No ratings yet
ETL Specific
12 pages
Demo
No ratings yet
Demo
10 pages
Data Warehouse: by Hemanth
No ratings yet
Data Warehouse: by Hemanth
13 pages
Management Information System GTU MBA Data Mining and Warehousing
No ratings yet
Management Information System GTU MBA Data Mining and Warehousing
4 pages
Interview Abinitio
100% (2)
Interview Abinitio
28 pages
Eserver I5 and Db2: Business Intelligence Concepts
No ratings yet
Eserver I5 and Db2: Business Intelligence Concepts
12 pages
SD Sales Scenario and Org Structure in IS-Retail
No ratings yet
SD Sales Scenario and Org Structure in IS-Retail
13 pages
Data Warehouses: FPT University
No ratings yet
Data Warehouses: FPT University
49 pages
QT Module 5
No ratings yet
QT Module 5
9 pages
Databases Assignment PDF
No ratings yet
Databases Assignment PDF
22 pages
Data Mining Unit-2 notes
No ratings yet
Data Mining Unit-2 notes
8 pages
Data Warehouse Toolkit Classics - Kimball Ross Muncy Becker
No ratings yet
Data Warehouse Toolkit Classics - Kimball Ross Muncy Becker
56 pages
What Is Data Warehouse?
No ratings yet
What Is Data Warehouse?
9 pages
Business Intelligence
No ratings yet
Business Intelligence
27 pages
What Are The Main Characteristics of Data Warehouse
No ratings yet
What Are The Main Characteristics of Data Warehouse
31 pages
Rakesh Data Migration
No ratings yet
Rakesh Data Migration
9 pages
SAP For Beginners
No ratings yet
SAP For Beginners
88 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
5 pages
1.1. BI Applications: Reports
No ratings yet
1.1. BI Applications: Reports
7 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
14 pages
Unit 1
No ratings yet
Unit 1
14 pages
Very Important Waht Is Data Warehouse and Why Required
No ratings yet
Very Important Waht Is Data Warehouse and Why Required
26 pages
5.data Warehouse
No ratings yet
5.data Warehouse
19 pages
Data Warehouse Basics (Lec. Notes 1)
No ratings yet
Data Warehouse Basics (Lec. Notes 1)
5 pages
Data Dictionary
No ratings yet
Data Dictionary
11 pages
What Is SAP
No ratings yet
What Is SAP
29 pages
DataStage Matter
0% (1)
DataStage Matter
81 pages
BI Retail Industry v1.1
100% (1)
BI Retail Industry v1.1
23 pages
What Is SAP? Definition of SAP ERP Software
No ratings yet
What Is SAP? Definition of SAP ERP Software
10 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
01) What Is SAP - Meaning and Definition of SAP ERP Software
100% (1)
01) What Is SAP - Meaning and Definition of SAP ERP Software
4 pages
Data Warehouses: FPT University Hanoi 2010
No ratings yet
Data Warehouses: FPT University Hanoi 2010
40 pages
Warehouse Assignment
No ratings yet
Warehouse Assignment
9 pages
Retail Data Analytics: Enhancing Customer Experience and Profitability
From Everand
Retail Data Analytics: Enhancing Customer Experience and Profitability
Christine Nyaga
No ratings yet
Data Analytics. Fast Overview.
From Everand
Data Analytics. Fast Overview.
George Letton
2.5/5 (18)
Method in Java
No ratings yet
Method in Java
18 pages
S.co M S.co M K.co M G.co
No ratings yet
S.co M S.co M K.co M G.co
2 pages
Ass Day4
No ratings yet
Ass Day4
3 pages
Accept 10 Integers From User and Print Their Average Value On The Screen Code
No ratings yet
Accept 10 Integers From User and Print Their Average Value On The Screen Code
4 pages
Women Safety ppt1
No ratings yet
Women Safety ppt1
25 pages
LPC214X PDF
No ratings yet
LPC214X PDF
2 pages
Emil Id of 01452 Sbi - Google Search
No ratings yet
Emil Id of 01452 Sbi - Google Search
1 page
FMAA Section D Final
No ratings yet
FMAA Section D Final
26 pages
Bharatesh EDUCATION Trust': Assignment
No ratings yet
Bharatesh EDUCATION Trust': Assignment
44 pages
AC GST Tax Invoice
No ratings yet
AC GST Tax Invoice
1 page
Samsung Amazon Invoice
No ratings yet
Samsung Amazon Invoice
2 pages
Uber Final Project 21 !!
No ratings yet
Uber Final Project 21 !!
120 pages
AS 2818-1993 Guide To Swimming Pool Safety PDF
No ratings yet
AS 2818-1993 Guide To Swimming Pool Safety PDF
8 pages
Chapter # 05: - Project Scope Management
No ratings yet
Chapter # 05: - Project Scope Management
24 pages
1
No ratings yet
1
6 pages
Sevottam
No ratings yet
Sevottam
3 pages
Deborah S. Grueter 44 West Street - Foxboro, MA 02035 Cell Phone: 508-298-4895/email
No ratings yet
Deborah S. Grueter 44 West Street - Foxboro, MA 02035 Cell Phone: 508-298-4895/email
7 pages
FAQs Load Violation
No ratings yet
FAQs Load Violation
3 pages
Epicor10 ArchitectureGuide 102700
No ratings yet
Epicor10 ArchitectureGuide 102700
39 pages
Future Ready Ebook
No ratings yet
Future Ready Ebook
32 pages
OMBC 101 Unit 6
No ratings yet
OMBC 101 Unit 6
7 pages
Growth and Trends in E-Commerce of India
No ratings yet
Growth and Trends in E-Commerce of India
41 pages
Airframe-Assembly_Centennial-College
No ratings yet
Airframe-Assembly_Centennial-College
5 pages
Flow Proses Operational Linc Terminal
100% (1)
Flow Proses Operational Linc Terminal
4 pages
Global Technology
No ratings yet
Global Technology
10 pages
Consumer Perception Towards The Electric Bike
No ratings yet
Consumer Perception Towards The Electric Bike
72 pages
EPR FINAL REPORT Tahir
No ratings yet
EPR FINAL REPORT Tahir
18 pages
Marine JIR Format
No ratings yet
Marine JIR Format
2 pages
Die Design Presentation
100% (1)
Die Design Presentation
24 pages
CASE Analysis
No ratings yet
CASE Analysis
16 pages
Financial Analysis Retail Industry UK - Case Study Primark
No ratings yet
Financial Analysis Retail Industry UK - Case Study Primark
21 pages
A Study of Marketing Strategies of Cadbury Products in India
No ratings yet
A Study of Marketing Strategies of Cadbury Products in India
21 pages
QFIX-PAYMENT-RECEIPT-QCQRKADE0005310
No ratings yet
QFIX-PAYMENT-RECEIPT-QCQRKADE0005310
2 pages
ITSM - Architecture Target v0.2
No ratings yet
ITSM - Architecture Target v0.2
19 pages

Sample 3

Uploaded by

Sample 3

Uploaded by

Nielsen receives transaction level scanning data (POS Data) from its partner stores

Additionally, apart from incomplete transaction data in a day, it is observed that

Primary objective is to build an imputation and/or extrapolation model to fill the

Optional objective - An outlier detection system to ensure selected stores do not

Hackathon_Mapping_File - This file is provided to help understand the column names

Sample Submission - This file represents what needs to be uploaded as output by

Hi Candidates, PFB FAQs based on different queries on Problem Statement:

You might also like