Srikanth Gottimukkula Professional Summary
Srikanth Gottimukkula Professional Summary
________________________________________________________________________________________
Professional Summary:
• 8+ years of professional experience in information technology with an expert hand in the areas of BIG DATA, HADOOP,
SPARK, HIVE, SQOOP, SQL tuning, ETL development, report development, database development, data modeling on
various IT Projects
• Well knowledge and experience in Cloudera ecosystem (HDFS, YARN, Hive, SQOOP, FLUME, HBASE, Oozie, Kafka, Pig),
Data pipeline, data analysis and processing with hive SQL, SPARK, SPARK SQL. Hands of experience in GCP, Big Query,
GCS bucket, cloud dataflow, GSUTIL, BQ command line utilities.
• Experience in Amazon AWS services such as EMR, EC2, S3, cloud Formation, Red shift which provides fast and efficient
processing of Big Data.
• Developed Spark applications using Scala for easy Hadoop transitions. Used Spark API over Cloudera Hadoop YARN to
perform analytics on data in Hive. Developed Spark code and Spark- SQL/Streaming for faster testing and processing of
data.
• Data Ingestion in to Hadoop (HDFS): Ingested data into Hadoop from various data sources like Oracle, MySQL using
Sqoop tool. Created Sqoop job with incremental load to populate Hive External tables. Involved in importing the real
time data to Hadoop using Kafka and also worked on Flume. Exported the analyzed data to the relational databases
using SQOOP for visualization and to generate reports for the BI team.
• File Formats: Involved in running Hadoop streaming jobs to process terabytes of text data and worked with different
file formats such as Text, Sequence files, Avro, ORC and Parquet.
• Strong Knowledge and experience on architecture and components of Spark, and efficient in working with Spark Core,
SparkSQL, Spark streaming and implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets)
and used PySpark and spark-shell accordingly.
• Experience working in Teradata, Oracle, and MySQL database.
• Experience in working with Onsite-Offsite-Offshore model.
Skill Sets:
• Big Data Ecosystem • Hadoop, MapReduce, Pig, Hive, YARN, Flume, Sqoop, Impala, Spark,
Parquet, Snappy, ORC, Ambari and TEZ
• Hadoop Distributions • Cloudera, Hortonworks, MapR
• Languages • Java, SQL, Scala, Pyspark and C/C++
• Development / Build.
Tools
• Eclipse, Maven and IntelliJ
• Scheduling and Automation
• Shell scripts, Oozie and Automic workflows to automate and for scheduling Automic Scheduler
• DB Languages • MySQL and PL/SQL
• RDBMS • Oracle, MySQL and DB2
• Operating systems • UNIX, LINUX and Windows Variants
Work Experience:
Walmart, Arkansas Dec 2018 – Present
Role: Data Engineer
Responsibilities:
• Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation
in GCP and coordinate task among the team.
• Design various layer of Data lake.
• Design star schema in Big Query
• Loading SAP Transactional Data every 30 min on incremental basis to BIGQUERY raw and UDM layer using SQL, Google
DataProc, GCS bucket, HIVE, Spark, Scala, Python, GSutil And Shell Script.
• Building a Scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle,
SQLServer, SAP HANA and load it in Bigquery.
• Monitoring Bigquery and cloud Data flow jobs via Automic scheduler for all the environment.
• Open SSH tunnel to Google DataProc to access to yarn manager to monitor spark jobs.
• Submit spark jobs using gsutil and spark submission get it executed in GCP cluster.
• Understand the business needs and objectives of the system and gathering requirements from the reporting
perspective.
• Develop HQL queries to perform DDL and DML's
• Develop transformations scripts using Hive QL to implement logic on data lake.
• Develop automic workflows between Hadoop, Hive and Teradata using Aorta framework
• Schedule jobs on Automic to automate aorta framework Workflows.’
• Design and develop data pipeline using the YAML scripts and Aorta Framework.
• Ingest data from Teradata to Hadoop using Aorta Framework
• Create, Validate and maintain Aorta scripts to load data from various data sources to HDFS.
• Creating external tables to read data into Hive from RDBMS.
• Develop shell scripts to build a framework between Hive and Teradata.
• Unit testing the Aorta jobs and Automic workflows.
• Developed Data Quality (dq) scripts to validate and maintain data quality for downstream applications.
• Validated the Thoughtspot data against base tables in data lake.
• Used Git for version control.
Environment: GCP,Hadoop, MapReduce, Sqoop, Hive, Automic , Spark, Teradata.
Education:
Master’s in Information Security and Intelligence
• Ferris State University, Big Rapids, Michigan. Jan 2016 – May 2017
Bachelor’s in Electrical and Electronics Engineering
• Jawaharlal Nehru Technological University, Hyderabad. June 2007 – July 2011