Anshul Yadav Resume ML
Anshul Yadav Resume ML
PROFESSIONAL EXPERIENCE
inn4smart solution Gurugram, Haryana, India
Intern Data Analyst August 2022 - August 2023
• Implemented an end-to-end data pipeline from user to data warehouse : Snowflake, processing 10,000+ records daily for seamless visualization with
Python for EDA.
• Optimized ETL workflows with PySpark and Apache Airflow, reducing processing time by 30% and enhancing data accuracy for analytics using data
models.
• Specialized in NLP and machine learning techniques to enhance user interaction with data-driven applications, implementing sentiment analysis models
to gauge consumer response with over 85% accuracy.
• Developed and tuned LLMs for chatbot applications in a smart home environment, improving the efficiency of natural language understanding.
• Developed NLP algorithms to automate the extraction and interpretation of unstructured data from diverse sources, enhancing data accuracy and speed
of insights in business intelligence tools.
U.S. Climatological data analysis in Google Cloud Platform San Jose, CA, USA
MySQL, GCP, Airflow, ETL pipeline, KNN September 2023 - December 2023
• Automated ETL framework for U.S. Climatological data analysis using Apache Airflow, ensuring efficient data maintenance and updates with Archive
DAG and Daily DAG cycles.
• DAG cycle workflows with Google Cloud Platform, constructing a BigQuery star schema that streamlined real-time and historical data analytics,
culminating in the effective visualization of key performance indicators.
• Analyzed data from 756 California weather stations using independent t-tests, Shapiro-Wilk normality tests, and data modeling like k-means clustering
to identify regional climate change trends.
• Utilized Python for data extraction, EDA, and analysis in the Google Cloud Platform project, demonstrating proficiency in data analytics, data
visualization, and machine learning techniques.
Technical Skills
• Programming Languages: Python, SQL, MySQL, R, Java.
• Machine Learning Libraries: Pandas, Parallel-Pandas, Numpy, Scikit-Learn, Modin, SpaCy, NLTK, TensorFlow, Pytorch.
• Data Analysis: Machine Learning, NLP, LLM, Statistics Analysis, Reinforcement Learning, Statistical Models, Regression Analysis, Predictive Models,
Bayesian analysis.
• Data Engineer: AWS (Athena, Glue, Redshift), Google Data Studio, Databricks, GCP: Big Query, Spark, Microsoft Excel (VLOOKUP and pivot tables),
GitHub, Visual Studio.
• Data Visualization: Tableau, Looker, PowerBI, MATLAB, Seaborn, Matplotlib, Plotly.
Certification
• Google Data Analytics, Google Analytics (Coursera) - (Data-driven decision making, MS-Excel, Machine learning ).
• Microsoft Certified: Azure Fundamentals
• IBM Data Analyst, IBM (Coursera) - (Data Visualization, Predictive Models).
• Data Science and Advanced Analytics, VIT-AP & Advanced Data Analytics Tools, VIT-AP (NLP, Optimization Techniques).
• Tableau Essential Training - LinkedIn Learning – (Dashboard Development, Tableau Public, Advanced Calculations, Trend Analysis).