0% found this document useful (0 votes)
4 views1 page

Varun Kapoor: Data Engineer - 3 Yoe - +91 9999254593

Varun Kapoor is a Data Engineer with 3 years of experience, skilled in SQL, Python, and various data technologies including AWS Redshift and Apache Spark. He has worked at Bank of America, developing ETL processes, automating workflows, and leading CI/CD efforts, while successfully reducing costs by migrating to more efficient systems. Varun holds a B.Tech in Computer Science Engineering from SRM Institute and has completed significant projects involving data pipelines for airline data and quality movie data analysis.

Uploaded by

Shobhit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views1 page

Varun Kapoor: Data Engineer - 3 Yoe - +91 9999254593

Varun Kapoor is a Data Engineer with 3 years of experience, skilled in SQL, Python, and various data technologies including AWS Redshift and Apache Spark. He has worked at Bank of America, developing ETL processes, automating workflows, and leading CI/CD efforts, while successfully reducing costs by migrating to more efficient systems. Varun holds a B.Tech in Computer Science Engineering from SRM Institute and has completed significant projects involving data pipelines for airline data and quality movie data analysis.

Uploaded by

Shobhit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

VARUN KAPOOR

Data Engineer - 3 YoE | +91 9999254593 | [email protected]

TECHNICAL SKILLS

▪ Programming Languages: SQL, Python, Unix


▪ Web Development: HTML, CSS
▪ Databases & Connectors: MySQL, Oracle DB
▪ Data Warehouses: AWS Redshift, Snowflake, BigQuery
▪ Distributed Computation Frameworks: Apache Hadoop, Apache Spark, ETL Processes, IBM DataStage
▪ Dashboarding & MonitoringTools: Splunk
▪ Workflow Management: Airflow, Autosys
▪ AWS Services: S3, Lambda, Redshift, Athena, Glue, SNS, StepFunction
▪ Control Systems and Documentation: JIRA

WORK EXPERIENCE

Data Engineer – at Bank of America Dec 2021 – Present

 Developed and maintained ETL processes by extracting data from HDFS and Hive, processing it in Apache Spark and IBM DataStage,
and loading it into Oracle databases.
 Collaborated with cross-functional teams to understand business requirements and design efficient data pipelines, contributing to data-
driven decision-making.
 Automated data workflows using AutoSys/Airflow reducing manual intervention and data processing time.
 Assisted in data quality assessment and implemented data validation checks to ensure data accuracy and consistency.
 Led efforts in continuous integration and continuous delivery (CI/CD) for data pipelines, streamlining the deployment process and
reducing errors.
 Troubleshooted data related issues, performed root cause analysis, identified, and resolved issues as per change management
process.
 Successfully reduced the cost to the company by 50% by migrating the processing from IBM DataStage to Apache Spark.
 Collaborated with data analysts to provide data support for various business projects.

PROJECTS

 Incremental Ingestion Pipeline – Airline Data


Tech stack – PySpark, AWS S3, Glue ETL, Glue Crawler, Redshift, Event Bridge, SNS, Step functions
 Built a generic and optimized ETL pipeline to process daily ~200GB of airline data from AWS S3 for daily incremental load in Redshift
table.
 Enabled the business to predict flight delays, optimize schedules, and enhance customer satisfaction and profitability through robust
data insights.
 Quality Movie Data Analysis
Tech stack - AWS S3, Glue, Redshift, ETL, Event Bridge, SNS
 Implemented an end-to-end data pipeline to process and ingest only high-quality movies into the Redshift Data warehouse.
 Leveraged a variety of AWS services including S3, Glue Crawler, Glue Catalog, Glue Catalog Data Quality Checks, Glue ETL,
Redshift, Event Bridge and SNS
for seamless orchestration- and monitoring.
 Utilized S3 buckets for storing raw movie data. Employed Glue Crawler to automatically discover and catalog metadata from the raw
data.
 Ensured data quality with Glue Catalog Data Quality checks and automated notifications for seamless operation.

EDUCATION

B.tech in Computer Science Engineering


SRM Institute of Science and Technology, U.P - Jul 2017 – May 2021

You might also like