Rahul Reddy Resume
Rahul Reddy Resume
SUMMARY
Data Engineer with 5+ years of experience in the data pipeline, from acquiring and validating large datasets (structured and
unstructured) to building data models, developing reports, and utilizing visualization tools for impactful insights. Leveraged Spark
SQL and Spark APIs to perform large-scale data analytics on distributed datasets, achieving faster processing than traditional
MapReduce jobs. Proficient in data pipeline deployment and monitoring using AWS services and Big Data ecosystem, ensuring
consistent delivery and improved reliability of data pipelines. Automated data pipeline monitoring with Apache Airflow, reducing
manual intervention and improving operational efficiency. Skilled in fostering cross-functional collaboration to drive data-driven
decisions and achieve business goals more efficiently.
EDUCATION
Southern Arkansas University
Master of Science in Computer and Information Science. Jan 2022 – Jan 2023
SKILLS
Technical: Python, Java, SQL, (MySQL, PostgreSQL, Hive, MongoDB), Spark (Pyspark, Spark SQL), Flink, Databricks, HDFS,
MapReduce, Kafka, REST API
Cloud & DevOps: AWS (Lambda, Redshift, S3, RDS, CloudWatch, EC2, Event Bridge, VPC, IAM), Azure (ADLS, Synapse
Analytics, ADF, Blob Storage, Cosmos DB), GCP (Big Query, Dataproc, Dataflow, Cloud composer, KMS, IAM), Azure DevOps,
Apache Airflow, Terraform, Kibana, Jenkins, Maven, GitHub, Linux, CI/CD
Data Engineering & Analytics: Snowflake, Power BI, DBT, SSIS, Machine Learning, AI/ML, Data Warehousing, Tableau, ETL,
Unit Testing, Data Pipelines, Data Lineage, Data Lake, Data Modeling, Data Quality & Observability, Dynatrace.
WORK EXPERIENCE
Travelers Insurance - Data Engineer, Analytics Jan 2023 - Present
Data Acquisition & Storage Framework
● Built and orchestrated scalable data pipelines using Apache Spark (PySpark, Spark SQL, RDD, DataFrame APIs) on
Databricks, integrating with Airflow, Fivetran, DBT, and Snowflake for efficient ETL.
● Leveraged AWS EMR, Glue, S3, Athena, and Lambda to develop serverless data processing solutions, enhancing scalability
and reducing operational overhead.
● Enabled real-time streaming pipelines using Kafka, AWS Kinesis, and Apache Flink, achieving a 40% reduction in processing
latency and seamless checkpointing for fault tolerance.
● Performed advanced data transformations and SQL performance tuning using Hive and Impala, handling terabytes of data
with optimized queries.
● Developed Kibana dashboards and integrated OpenSearch (ElasticSearch) for real-time log analytics and observability.
● Reduced ETL execution time by 50% through Spark optimization techniques and effective partitioning strategies, including
broadcast join where applicable.
● Automated infrastructure provisioning and CI/CD workflows using Terraform, Jenkins, and Unix Shell scripting, doubling
deployment speed and reducing manual overhead by 70%.
● Designed and maintained streaming producers/consumers using Kafka, ensuring reliable data flow and schema validation in
real-time systems.
● Applied data governance best practices including metadata tracking, lineage documentation, and cost optimization
(saving $10K+ annually).
● Designed and scheduled batch ETL workflows with Hive, NiFi, and Redshift, achieving up to 50% improvement in
processing time
● Set up automated CI/CD pipelines using GitLab and Jenkins, with infrastructure provisioning handled through
Terraform and shell scripts
● Modeled and optimized relational databases in MySQL, MS SQL, and Oracle, improving query efficiency and supporting
reporting needs
● Containerized ETL components using Docker and deployed them via Kubernetes, ensuring scalable and consistent
environments across stages
● Implemented observability dashboards using Prometheus, Grafana, and Kibana, helping proactively detect and resolve
pipeline issues while reducing cloud costs by over $10K annually.
● Collaborated with cross-functional teams to deliver data-driven solutions, improving analytics insights and increasing
stakeholder engagement by 28%.
Data Visualization | Storytelling with Data
● Presented insights to leadership through data-driven storytelling, translating complex dashboard analysis to action items.
● Provided data support to Marketing, Engagement & Finance team, building dashboards for compliance, sales, and accounting.
Amazon – Data Engineer I Aug 2018 - Nov 2021
● Optimized ETL pipelines, reducing Prime Sales Core’s data processing time by 10%.
● Boosted campaign conversion rates by 7% through targeted data enhancements and analytics delivery.
● Built automated KPI onboarding with a Scala-based framework, improving time-to-insight by 5x.
● Scaled infrastructure to handle 3x more data via Hive, Spark, and distributed storage systems.
● Tuned SQL queries and Snowflake datasets, cutting execution time by 15% in reporting workflows.
● Enabled near real-time streaming pipelines using Apache Kafka, HBase, and Spark Streaming.
● Managed producers/consumers and checkpointing to support high-throughput streaming data pipelines.
● Reduced AWS costs through S3 storage optimization and Glue compute efficiency improvements.
● Built data models and schemas to support scalable, transactional analytics for business teams.
● Used Postgres, Oracle, SQL Server, and MongoDB in hybrid database environments.
● Automated ETL jobs and system processes using Unix Shell scripting and scheduled with Control-M.
● Supported big data pipelines in production, troubleshooting MapReduce, Sqoop, and Oozie workflows.
● Used AWS EMR, Athena, Lambda, IAM, and CloudWatch for distributed job processing and monitoring.
● Resolved backend data inconsistencies with Amplitude, increasing predictability by 10%.
Data Engineering | Ensuring High Data Quality
● Developed Engagement foundational data model and designed transactional schema, enabling scalable and quick analysis.
● Resolved discrepancies between backend configuration data and amplitude, increasing predictability by 10%
PROJECTS
User Testing/Co-Design of Current PIVOT Features:
● Developed a tool that displays tweets related to local attributes with geographical locations based on the system using python,
HTML, CSS, JavaScript, and SQL.
● Designed an evaluation of the prototype by local stakeholders and engaged them in discussing the critical information needed
during an emergency crisis.