VARUN KAPOOR
Data Engineer - 3 YoE | +91 9999254593 | [email protected]
TECHNICAL SKILLS
▪ Programming Languages: SQL, Python, Unix
▪ Web Development: HTML, CSS
▪ Databases & Connectors: MySQL, Oracle DB
▪ Data Warehouses: AWS Redshift, Snowflake, BigQuery
▪ Distributed Computation Frameworks: Apache Hadoop, Apache Spark, ETL Processes, IBM DataStage
▪ Dashboarding & MonitoringTools: Splunk
▪ Workflow Management: Airflow, Autosys
▪ AWS Services: S3, Lambda, Redshift, Athena, Glue, SNS, StepFunction
▪ Control Systems and Documentation: JIRA
WORK EXPERIENCE
Data Engineer – at Bank of America Dec 2021 – Present
Developed and maintained ETL processes by extracting data from HDFS and Hive, processing it in Apache Spark and IBM DataStage,
and loading it into Oracle databases.
Collaborated with cross-functional teams to understand business requirements and design efficient data pipelines, contributing to data-
driven decision-making.
Automated data workflows using AutoSys/Airflow reducing manual intervention and data processing time.
Assisted in data quality assessment and implemented data validation checks to ensure data accuracy and consistency.
Led efforts in continuous integration and continuous delivery (CI/CD) for data pipelines, streamlining the deployment process and
reducing errors.
Troubleshooted data related issues, performed root cause analysis, identified, and resolved issues as per change management
process.
Successfully reduced the cost to the company by 50% by migrating the processing from IBM DataStage to Apache Spark.
Collaborated with data analysts to provide data support for various business projects.
PROJECTS
Incremental Ingestion Pipeline – Airline Data
Tech stack – PySpark, AWS S3, Glue ETL, Glue Crawler, Redshift, Event Bridge, SNS, Step functions
Built a generic and optimized ETL pipeline to process daily ~200GB of airline data from AWS S3 for daily incremental load in Redshift
table.
Enabled the business to predict flight delays, optimize schedules, and enhance customer satisfaction and profitability through robust
data insights.
Quality Movie Data Analysis
Tech stack - AWS S3, Glue, Redshift, ETL, Event Bridge, SNS
Implemented an end-to-end data pipeline to process and ingest only high-quality movies into the Redshift Data warehouse.
Leveraged a variety of AWS services including S3, Glue Crawler, Glue Catalog, Glue Catalog Data Quality Checks, Glue ETL,
Redshift, Event Bridge and SNS
for seamless orchestration- and monitoring.
Utilized S3 buckets for storing raw movie data. Employed Glue Crawler to automatically discover and catalog metadata from the raw
data.
Ensured data quality with Glue Catalog Data Quality checks and automated notifications for seamless operation.
EDUCATION
B.tech in Computer Science Engineering
SRM Institute of Science and Technology, U.P - Jul 2017 – May 2021