Praveen Kumar Pokala, PhD’s Post

VP - AI @ JPMorgan || IISc PhD Gold Medalist/Best PhD Thesis Award II PhD (IISc, Bangalore) || M.Tech (IIT) ll IEEE Reviewer II Multimodal LLMs & Diffusion Models ll CV-NLP || 20+ Publications ||Qualcomm/Jio/OLA

3mo Edited

Brain Teaser #11: What mathematical arguments can explain how **Layer Normalization** enhances training stability in large language models (LLMs) and Transformer architectures, particularly in mitigating issues like vanishing and exploding gradients? Answer: Training destabilization in neural networks, particularly due to gradient explosion or vanishing, typically occurs because of the high condition numbers of weight matrices during backpropagation. These large condition numbers lead to unstable gradient scaling, either amplifying gradients excessively or diminishing them, hindering effective learning. How Layer Normalization overcomes and brings stability in training is briefly discussed here: https://lnkd.in/gmWm42tQ #MachineLearning #DeepLearning #Transformers #LargeLanguageModels #LayerNormalization #GradientStability #AIResearch #DataScience #NeuralNetworks #AIInsights

5 Comments

Praveen Kumar Pokala, PhD

3mo

Answer: Training destabilization in neural networks, particularly due to gradient explosion or vanishing, typically occurs because of the high condition numbers of weight matrices during backpropagation. These large condition numbers lead to unstable gradient scaling, either amplifying gradients excessively or diminishing them, hindering effective learning. How Layer Normalization overcomes and brings stability in training is briefly discussed here: https://www.youtube.com/watch?v=Pe9tvgXPLoE

1 Reaction

Arjit Bhardwaj

Deep Learning Intern @ISRO | Data Analyst | Kaggle Expert | Student at Dayananda Sagar College of Engineering, Bangalore

3mo

• LayerNorm normalizes the activations within a layer which stabilizes the network's dynamics by reducing the sensitivity to extreme values. • To avoid vanishing gradients, it keeps the activations within a manageable range (unit variance) throughout the network whereas for large gradients it just normalizes the layer output. • It ensures stable distribution of activations regardless of batch size or sequence length. • It allows each weight update to contribute evenly to learning.

1 Reaction

Manoharan Ramalingam

Founder & Chief Curious Learner at Stealth Startup | Hiring Interns

3mo

Praveen Kumar Pokala, PhD Layer normalization standardizes the inputs within a layer, ensuring that activations maintain a consistent range across different layers and mini-batches. By doing this, it prevents the internal covariate shift and keeps gradients in a stable range, indirectly addressing the vanishing and exploding gradient problem. Interesting point to know, applying layer normalization in all layers is common because it maximizes gradient stability and maintains consistency across the model. However, there are alternative ways to selectively apply normalization, monitoring gradient variance across layers can help identify specific layers where normalization will have the most impact. The selective normalization might be explored in specific applications where computational constraints are a priority.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Vincent Boucher

President of Montreal.AI and Quebec.AI
7mo
Report this post
Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning Plantec et al.: https://lnkd.in/ezkQy9M5 #ArtificialIntelligence #DeepLearning #MachineLearning
Like Comment
To view or add a comment, sign in
Joshua Boraz

CPC, COC, ACCME, ACCNE, Data Scientist, Educator, Auditor (former CPMA), and Extrovert
7mo
Report this post
Just finished the course “Artificial Intelligence Foundations: Neural Networks” by Gwendolyn Stripling! Check it out: https://lnkd.in/eyHG_9eN #neuralnetworks #artificialneuralnetworks.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Vinyasri Vasireddy

Cloud support engineer AI/ML @DigitalOcean
2mo
Report this post
Just finished the course “Artificial Intelligence Foundations: Neural Networks” by Gwendolyn Stripling! Check it out: https://lnkd.in/g7t2H6Jt #neuralnetworks #machinelearning.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Mihaela Babara

Customer Expert | Communication Expert
2mo
Report this post
Just finished the course “Artificial Intelligence Foundations: Neural Networks” by Gwendolyn Stripling! Check it out: https://lnkd.in/gt8bi3gp #neuralnetworks #machinelearning.

Certificate of Completion

linkedin.com

1 Comment
Like Comment
To view or add a comment, sign in
Trisha Kumar

Future Data Scientist - Learning Stage
7mo
Report this post
Hello World! Today I learnt about Batch Normalization. Batch normalization (also known as batch norm) is a method used to make training of artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling. #datascience #deeplearning #machinelearning #artificialintelligence #sql #datascientist
Like Comment
To view or add a comment, sign in
Dr. Ahmad Ishanzai, ASc, BSc, MSc, PGDip, MBA, PhD

Professional Full Stack Web Developer & Software Engineer | AI-Powered Designer | Independent Contractor | Collaborating with a Skilled Team on Web Development & Software Projects for Clients
2mo
Report this post
Just finished the course “Artificial Intelligence Foundations: Neural Networks” by Gwendolyn Stripling! Check it out: https://lnkd.in/ejX4Ktht #neuralnetworks #machinelearning.

Certificate of Completion

linkedin.com

1 Comment
Like Comment
To view or add a comment, sign in
Joshua Mfiri

Principal Engineer: Control Systems at Council for Scientific and Industrial Research (CSIR)
4mo
Report this post
Just finished the course “Artificial Intelligence Foundations: Neural Networks” by Gwendolyn Stripling! Check it out: https://lnkd.in/eQyf_2fr #neuralnetworks #machinelearning.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Swathi Ambi

Data Scientist @Genpact, MTech Data Science & Machine Learning (PES University). Computer Vision, Gen AI, Python. 12+ years IT experience.
3mo
Report this post
Just finished the course “Artificial Intelligence Foundations: Neural Networks” by Gwendolyn Stripling! Check it out: https://lnkd.in/gfx7-Wbg #neuralnetworks #machinelearning.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Bernice M. Lowder

Registered Nurse 🩺 Medical Journalist 🚑 Health & Wellness content writer💻 Freelance Medical Writer ✒️Health, Wellness & Dogs Blogger🐾
1mo
Report this post
Just finished the course “Artificial Intelligence Foundations: Neural Networks” by Gwendolyn Stripling! Check it out: https://lnkd.in/drmQUz5g #neuralnetworks #machinelearning.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Jatin Sharma

Data Science @Amdocs
7mo
Report this post
🌟 Excited to share that I have completed Google's Attention Mechanism course! This comprehensive micro-learning course provided a deep dive into the attention mechanism, an advanced technique that dynamically allows neural networks to focus on relevant parts of an input sequence. Through this course, I gained insights into how attention mechanisms operate and how they can be leveraged to enhance the performance of various machine learning tasks, including: Machine Translation: Improving translation accuracy by focusing on relevant parts of the source sentence. Text Summarization: Generating concise summaries by prioritizing important content. Question Answering: Providing precise answers by concentrating on pertinent segments of text. The course also covered the application of Google tools to develop Generative AI applications and techniques such as prompt tuning to optimize large language models (LLMs). #GenerativeAI #AttentionMechanism #MachineLearning #GoogleAI #LLM #PromptTuning #ContinuousLearning #Innovation

Attention Mechanism

cloudskillsboost.google
Like Comment
To view or add a comment, sign in

10,284 followers

92 Posts

View Profile Follow

Praveen Kumar Pokala, PhD’s Post

More Relevant Posts

Explore topics