Naresh DE
Naresh DE
Around 9+ years of highly qualified professional experience as a Data Engineer in the industry in developing and
implementing of software applications using Python, Flask and Django.
Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, IAM, Amazon Elastic Load Balancing,
CloudWatch, SQS, Lambda, EMR and other services of the AWS family.
Seasoned professional with extensive expertise in data engineering, proficient in PySpark, SQL, and Python, ensuring
efficient data processing and analysis.
Experienced in designing and implementing data warehousing solutions, including Snowflake and Azure Data
services, optimizing storage and retrieval processes.
Adept at ETL development using tools like Informatica, Apache NiFi, and Airflow, ensuring seamless data
extraction, transformation, and loading across various platforms.
Proven track record in managing data on cloud platforms, including Azure and AWS, with hands-on experience in
Azure Blob Storage, AWS S3, and Azure SQL Database.
Deep understanding of the Hadoop ecosystem, encompassing HDFS, MapReduce, Hive, HBase, Kafka, and Apache
Spark, facilitating large-scale data processing and analytics.
Proficient in various databases, including Cosmos DB, MongoDB, DynamoDB, Oracle, and Teradata SQL, ensuring
optimal data storage and retrieval strategies.
Skilled in data visualization tools such as Tableau and Power BI, transforming complex data sets into insightful
visualizations for effective decision-making.
Adept at using collaboration tools like Jira and Confluence, with experience as a Scrum Master, ensuring smooth
project workflows and timely deliverables.
Hands-on experience in DevOps practices, including CI/CD pipelines, Jenkins, Docker, and Kubernetes,
streamlining the development and deployment of data solutions.
Proficient in Git and GitHub for version control, ensuring collaborative and organized development workflows.
Skilled in Informatica Data Quality, ensuring data integrity, accuracy, and consistency across the entire data lifecycle.
Proficient in workflow automation using tools like Apache Oozie and Apache Airflow, optimizing data processing
pipelines for efficiency.
Experienced in handling NoSQL databases, including Cassandra and MongoDB, adapting to diverse data storage
needs.
Advanced skills in scripting languages such as Python and Shell, automating data processes and enhancing overall
efficiency.
Strong Excel skills for data analysis and reporting, complementing technical expertise with user-friendly data insights.
Knowledgeable in implementing robust data security measures to ensure compliance and safeguard sensitive
information.
Expertise in advanced SQL for query optimization, enhancing database performance and response times.
Proficient in data movement technologies like Sqoop, Flume, and Kafka, ensuring smooth and efficient data transfer
between systems.
Skilled in containerization with Docker and orchestration with Kubernetes, optimizing scalability and resource
utilization.
TECHNICAL SKILLS
Big Data Technologies PySpark, Hadoop Ecosystem (HDFS, MapReduce, Hive), HBase, Kafka, Apache
Spark, Cassandra, AWS S3, AWS Lambda, DynamoDB, MongoDB, Apache NiFi
Data Warehousing Snowflake, Azure Data Lake Storage, Azure Blob Storage, Azure SQL Database,
Cosmos DB, Redshift, Oracle Databases
SQL and Database SQL, Teradata SQL, SQL Server, Advanced SQL, Snowflake, Oracle Databases,
Spark SQL, PL/SQL
Data Visualization Power BI, Tableau, Excel
PROFESSIONAL EXPERIENCE
Client: AbbVie Vernon Hills, IL Dec 2020 - Present
Role: Sr. Data Engineer
Responsibilities:
Implemented and managed data storage solutions on Azure, leveraging Azure Data Lake Storage and Blob Storage.
Designed data migration steps and deployed and optimized SQL databases on Azure.
Design and implement scalable data processing solutions in Azure, utilizing services like Azure Data Factory and
Azure Databricks. Develop and optimize ETL pipelines on Azure for efficient data extraction, transformation, and
loading.
Manage and administer Azure databases, including Azure SQL Database and Cosmos DB, ensuring high performance
and reliability.
Managed NoSQL databases on Azure using Cosmos DB, ensuring scalability and performance.
Integrated Informatica Data Quality processes to ensure high data quality standards. Collaborated with database
administrators and developers to integrate Informatica solutions with various database systems.
Built robust ETL pipelines on Snowflake for seamless data extraction, transformation, and loading, adhering to best
practices.
Created PySpark scripts using Spark transformations and actions to effectively load data from various sources to
destination systems. Responsible for Data Migration between NDW and MiniO.
Leveraged Python for integrating diverse data sources and systems, ensuring data consistency and accuracy across the
organization. Actively worked on migrating dashboards from SQLO3 to NDW.
Designed, developed, and maintained big data processing solutions using Hadoop ecosystem tools like HDFS,
MapReduce, and Hive. Managed and administered Hadoop clusters, ensuring optimal performance and scalability.
Implemented data storage and retrieval mechanisms in Hadoop, utilizing HBase, HDFS, and other storage solutions.
Defined and implemented data retention policies in Kafka for efficient storage management.
Interacted with business clients to understand requirements, developed Spark Python code, and designed pipelines for
data migration, validation, and transformation and utilized Python for scripting data processing and transformation.
Extensively worked on tools like DBeaver, Teradata SQL, Putty, and Winscp for day-to-day requirements.
Developed end-to-end pipelines using Spark with Python and triggered those jobs in the cluster.
Wrote SQL queries to identify and validate data inconsistencies in the data warehouse against the source system.
Worked on tools like Tableau and Microsoft Excel for data analysis and generating data reports for proof of concept.
Effectively used DBeaver tool for writing SQL queries with subqueries, joins, windowing functions, and aggregate
functions.
Designed and implemented efficient data warehousing solutions on Snowflake, ensuring optimal performance for
analytical queries. Developed and maintained data models on Snowflake.
Implemented row-level security in Power BI, set up DirectQuery connections for real-time access, and optimized
Power BI reports and dashboards for performance.
Worked with other developers to create reports and dashboard designs in Tableau. Created Teradata objects like
Tables and Views for data analysis.
Worked extensively on Tableau for creating and monitoring dashboards. Monitored daily jobs, provided assistance to
the offshore team, and maintained daily job status updates in Excel.
Coordinated with clients to understand requirements, assisted the team, and reviewed and committed code using
GitHub.
Worked as a Scrum master, obtaining daily status updates from the team, and providing assistance to complete tasks on
a sprint-to-sprint schedule.
Analysed requirements and created designs using Jira and Confluence Page. Worked closely with the QA team to
perform validation and resolved conflicts accordingly.
Environment: PySpark, DBeaver, SQL, GitHub, Spark, Python, Data Warehousing, Snowflake, ETL, Azure Data, Lake
Storage, Azure Blob Storage, Azure SQL Database, Cosmos DB, Informatica Data Quality, Hadoop Ecosystem (HDFS, Map
Reduce, Hive),HBase, Kafka, Power BI, Python Integration, Jira, Confluence, Tableau, Excel, Scrum Master, Teradata SQL,
Putty, Winscp.