Tao Sun’s Post

Neuromorphic Algorithm Researcher focused on uncertainty estimation and speech processing

6mo Edited

We are thrilled to announce our latest paper, now available on [arXiv](https://lnkd.in/grgvEftD)! 📚 🔬 Title: DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement In our recent work, we tackle a crucial issue in speech enhancement SNN solutions: latency. While neuromorphic hardware is designed for low latency, some algorithms and implementations can still introduce significant delays. In speech enhancement, the Short-Time Fourier Transform (STFT)—a common preprocessing step in frequency-domain approaches—can be a significant source of latency. Inspired by the success of high-performance, low-latency deep learning models, we have developed a novel time-domain SNN framework that achieves the very low latency required for applications like hearing aids. Key Contributions of Our Paper: 1. Innovative Solution: We introduce a novel two-phase time-domain streaming SNN framework that effectively addresses latency while ensuring high accuracy and power efficiency. 2. Latency Optimization: Traditional methods often suffer from latency due to long sampling windows, such as 32ms. Our time-domain approach significantly reduces this latency, meeting the stringent requirements of real-time applications like hearing aids, which demand latencies under 5ms. 3. Competitive Performance: Our framework not only reduces latency but also achieves competitive performance compared to current SNN models, pushing the boundaries of what’s possible in speech enhancement. Explore the full details of our work on [arXiv](https://lnkd.in/grgvEftD) and discover how our innovations are advancing the practical applications of neuromorphic computing in this vital field. We look forward to your feedback and discussions! #SpeechEnhancement #SNN #LatencyReduction #DeepLearning #NeuralNetworks #RealTimeProcessing #AI #TechInnovation #arXiv #HearingAids

DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement

arxiv.org

2 Comments

Tao Wang

Director of Machine Learning, PhD

6mo

Congratulations Tao Sun, Ph.D.!

1 Reaction

To view or add a comment, sign in

More Relevant Posts

bio-neural.ai

26 followers
10mo
Report this post
Abstract Neural Matrix Synaptic Resonance Networks (NM-SRNs) represent a groundbreaking departure from traditional artificial neural network (ANN) architectures. This paper introduces the fundamental concepts underlying NM-SRNs, highlighting their unique architectural features and advanced learning mechanisms. The core building blocks, Synaptic Resonance Vectors (SRVs) and Synaptic Resonance Tensors (SRTs), enable flexible representation and computation, while Neural Cubes (NCs) introduce modularity and parallelism. NM-SRNs demonstrate significant potential for Fast-Forward Learning (FFL/FFBL), distributed regularization to combat overfitting, and meta-learning for system-level optimization. An example application in image processing showcases the potential of NM-SRNs for adaptable and efficient learning. Despite challenges in theoretical formalization and large-scale optimization, NM-SRNs offer a promising path towards more powerful and flexible AI systems. This paper invites researchers to collaborate in overcoming these challenges and unlocking the full potential of this exciting new approach. Introduction Every day I see my favorite Die Hard Actor suffering, I’m reminded of why it’s so important to never give up on this work. One day, with this technology and “dry” AGI (non-sentient) artificial neurons and synapses we’ll be able to give people with degenerative brain diseases their lives back. Neural Matrix Synaptic Resonance Networks (NM-SRNs) offer a novel approach to machine learning (ML) and artificial intelligence (AI), introducing a network architecture and learning mechanisms that differ significantly from traditional artificial neural networks (ANNs). This paper explores the fundamental concepts behind NM-SRNs, their advanced features, and the unique learning paradigms they employ. New AI-Related Science Paper https://lnkd.in/gssZUGu4 #AI #Science #SRNs #NMSRN #deeplearning #neuralnetworks
Like Comment
To view or add a comment, sign in
Yoto Fujita

Master's Student @Kyoto University | Audio and Speech Processing, Machine Learning
3mo
Report this post
Excited to share that our latest research paper, titled "Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising," has been accepted to APSIPA ASC 2024 and is now available on arXiv! Abstract: We tackle the challenges of speech enhancement for real-time automatic speech recognition (ASR) in noisy, reverberant environments. While neural beamforming has been a go-to approach, it often struggles under mismatched conditions. Our work introduces a run-time adaptation technique for neural networks using pseudo ground-truth data generated by blind dereverberation and separation methods, such as Weighted Prediction Error (WPE) and Fast Multichannel Nonnegative Matrix Factorization (FastMNMF). To improve robustness, we introduce an adaptive beamforming method: Weighted Power Minimization Distortionless Response (WPD) beamforming. By unifying WPE and Minimum Power Distortionless Response (MPDR), our approach integrates dereverberation and denoising capabilities, enabling real-time fine-tuning. We evaluated this method across diverse settings, exploring the impact of run-time adaptation in scenarios with multiple speakers, varying reverberation times, and a range of signal-to-noise ratios (SNRs). Check out the full paper on arXiv to learn more about our methodology, experiments, and insights: https://lnkd.in/gq5y_JqH Proud to be sharing this work with the community at APSIPA ASC 2024! #SpeechRecognition #ASR #MachineLearning #NeuralNetworks #Beamforming #DNN #Research #APSIPA2024

Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising

arxiv.org
Like Comment
To view or add a comment, sign in
Laird Snowden

HELIOS, AHEL, IFPIC AT&T Bell Labs SONET CHIP SET. VFMA Wafer scale test , High Energy Laser Weapons, Millimeter Wave, Senior Electro-Optical Manager, Process & Electrical Engineer High Energy Projected Beam Weapons
9mo
Report this post
https://lnkd.in/gteWgQad Parallel flash synaptic response for high speed real time AI response... approximating the function of the human brain.

Novel, parallel and differential synaptic architecture based on NAND flash memory for high-density and highly-reliable binary neural networks | Request PDF

researchgate.net
Like Comment
To view or add a comment, sign in
Datature

16,849 followers
9mo
Report this post
Datature is excited to unveil the latest support for Vision Transformers (ViT) into our Nexus platform, propelling forward the capabilities of segmentation tasks with cutting-edge technology. ViTs leverage self-attention mechanisms, which allow them to focus on different parts of an image and understand contextual relationships better than conventional convolutional neural networks (CNNs). This approach improves the model’s ability to handle complex, high-resolution images, enhancing accuracy in critical applications such as medical imaging and defect detection. Teams can now fine-tune ViTs on their dataset today on Datature Nexus 🌟 Read The Guide 👉 https://lnkd.in/gEgf5rZi _____ #visiontransformers #transformers #selfattention #medicalimaging #instancesegmentation

Introducing Vision Transformers for Robust Segmentation

datature.io

1 Comment
Like Comment
To view or add a comment, sign in
International Journal of Artificial Intelligence and Robotics Research (IJAIRR)

39 followers
4mo
Report this post
Multi-Modal Multi-Channel American Sign Language Recognition In this paper, we propose a machine learning-based multi-stream framework to recognize American Sign Language (ASL) manual signs and nonmanual gestures (face and head movements) in real time from RGB-D videos. Our approach is based on 3D Convolutional Neural Networks (3D CNNs) by fusing the multi-modal features including hand gestures, facial expressions, and body poses from multiple channels (RGB, Depth, Motion, and Skeleton joints). To learn the overall temporal dynamics in a video, a proxy video is generated by selecting a subset of frames for each video which are then used to train the proposed 3D CNN model. We collected a new ASL dataset, ASL-100-RGBD, which contains 42 RGB-D videos captured by a Microsoft Kinect V2 camera. Each video consists of 100 ASL manual signs, along with RGB channel, Depth maps, Skeleton joints, Face features, and HD face. The dataset is fully annotated for each semantic region (i.e. the time duration of each sign that the human signer performs). Our proposed method achieves 92.88% accuracy for recognizing 100 ASL sign glosses in our newly collected ASL-100-RGBD dataset. The effectiveness of our framework for recognizing hand gestures from RGB-D videos is further demonstrated on a large-scale dataset, ChaLearn IsoGD, achieving the state-of-the-art results by Elahe Vahdani, Ph.D., Longlong Jing,Matt Huenerfauth,Yingli Tian. To download for free, do visit : https://lnkd.in/gEuxc5B9 #AmericanSignLanguagerecognition, #handgesturerecognition #RGBDvideoanalysis, #multimodality, #3DConvolutionalNeuralNetworks, #proxyvideo Xiaorui Zhu Dong Xu Dr. Chi Wai (Rick) Lee Yu Sun
Like Comment
To view or add a comment, sign in
Bridget Tirivaviri

Aspiring Biomedical person |AI and ML Engineer @ Africa Agility Foundation (internship) | Chatbot Development, LLM.
9mo
Report this post
Day 10/20 #20DaysinAIwithAA In the ever-evolving landscape of healthcare, the pursuit of accurate diagnoses, personalized treatment plans, and improved patient outcomes is a relentless endeavor. Enter the captivating realm of Neural Networks, a powerful branch of Artificial Intelligence (AI) that is revolutionizing the way we approach medical challenges. As I delve into this captivating field, I am awe-inspired by the remarkable capacity of Neural Networks to mimic the neural pathways of the human brain, enabling them to tackle intricate medical problems with unprecedented precision. In this transformative journey, I witness the power of Neural Networks as they navigate the labyrinth of medical data, extracting invaluable insights and patterns that were once hidden from view. By leveraging advanced machine learning algorithms, Neural Networks can analyze vast troves of medical images, patient records, and clinical data, revealing critical correlations that empower healthcare professionals to make more informed decisions. #GIT20DayChallenge #GITbootcamp #GirlsinTech #AfricaAgility #AfricanGirlsinTechBootcamp #GITBootCohort7
Like Comment
To view or add a comment, sign in
Syed Munir Ullah

Frontend Developer | Affiliate Marketing| Proficient in HTML, CSS, TypeScript, Tailwind CSS & Next.js | Building Modern, Responsive Web Applications | Digital Marketing | Google Ads
3mo
Report this post
Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning Andy Li, Aiden Durrant, Milan Markovic, Lu Yin, Georgios Leontidis University of Aberdeen 2024 https://lnkd.in/gUZ6PXjY How to Shrink Neural Networks Without Losing Their Edge: A Breakthrough in Extreme Pruning Imagine a world where your smartphone could run advanced AI models that today require supercomputers. This vision hinges on finding ways to drastically shrink the size of deep neural networks while preserving their intelligence—a process known as pruning. Scientists have made strides in this area, but things get tricky when pushing sparsity to extreme levels, like leaving just 0.01% of the original model intact. At such extremes, networks often crumble, losing their ability to learn and perform effectively. Enter Extreme Adaptive Sparse Training (EAST), a new approach that reimagines how to train neural networks under such challenging conditions. By combining clever tricks, EAST keeps these ultralight models sharp and accurate even when reduced to just a fraction of their original size. Here's how it works: Dynamic ReLU Phasing: Neural networks use activation functions to decide which neurons "fire." EAST starts with a more flexible function, Dynamic ReLU, to give the network room to explore and adapt before switching to a standard one for stability. Weight Sharing: Instead of assigning unique parameters to each part of the network, EAST cleverly reuses them within layers, significantly reducing the memory footprint without losing complexity. Cyclic Sparsity: EAST constantly changes which parts of the network are pruned and how much, keeping the network on its toes and encouraging it to find efficient pathways for learning. The researchers tested EAST on well-known architectures like ResNet-34 and ResNet-50 using datasets such as CIFAR-10, CIFAR-100, and ImageNet. The results were remarkable: even at staggering sparsity levels of 99.99%, EAST outperformed previous state-of-the-art techniques, proving that extreme pruning doesn't have to come at the cost of accuracy. This innovation opens the door to powerful AI systems that can thrive on low-power, memory-constrained devices, revolutionizing applications from smartphones to edge computing in remote or resource-limited environments. #AI #computervision #patternrecognition #deeplearning #ModelCompression #neuralnetworkpruning #EdgeAI #ResNet #dynamicrelu
Like Comment
To view or add a comment, sign in
SuperDataScience

93,956 followers
9mo
Report this post
Discover the future of AI with Neuromorphic Optical Neural Networks (Neuromorphic ONNs) ! This groundbreaking fusion of optical neural networks (ONNs) and neuromorphic computing (NC) promises unparalleled speed, efficiency, and scalability https://lnkd.in/gKmpZvQ7. Learn how ONNs leverage light for rapid data processing, while NC mimics the brain's neural architecture for parallel processing. Stay updated on cutting-edge AI developments with SuperDataScience weekly newsletters https://lnkd.in/gtRktaY. #AI #NeuralNetworks #TechInnovation Subscribe now to explore the future of AI.

Illuminating AI: The Transformative Potential of Neuromorphic Optical Neural Networks - Unite.AI

https://www.unite.ai
Like Comment
To view or add a comment, sign in
Xiaorui Zhu

Scientist, Entrepreneur, Mentor, DJI Co-Founder, RoboSense Co-Founder
4mo
Report this post
See how AI will empower sign language recognition at IJAIRR
International Journal of Artificial Intelligence and Robotics Research (IJAIRR)

39 followers
4mo

Multi-Modal Multi-Channel American Sign Language Recognition In this paper, we propose a machine learning-based multi-stream framework to recognize American Sign Language (ASL) manual signs and nonmanual gestures (face and head movements) in real time from RGB-D videos. Our approach is based on 3D Convolutional Neural Networks (3D CNNs) by fusing the multi-modal features including hand gestures, facial expressions, and body poses from multiple channels (RGB, Depth, Motion, and Skeleton joints). To learn the overall temporal dynamics in a video, a proxy video is generated by selecting a subset of frames for each video which are then used to train the proposed 3D CNN model. We collected a new ASL dataset, ASL-100-RGBD, which contains 42 RGB-D videos captured by a Microsoft Kinect V2 camera. Each video consists of 100 ASL manual signs, along with RGB channel, Depth maps, Skeleton joints, Face features, and HD face. The dataset is fully annotated for each semantic region (i.e. the time duration of each sign that the human signer performs). Our proposed method achieves 92.88% accuracy for recognizing 100 ASL sign glosses in our newly collected ASL-100-RGBD dataset. The effectiveness of our framework for recognizing hand gestures from RGB-D videos is further demonstrated on a large-scale dataset, ChaLearn IsoGD, achieving the state-of-the-art results by Elahe Vahdani, Ph.D., Longlong Jing,Matt Huenerfauth,Yingli Tian. To download for free, do visit : https://lnkd.in/gEuxc5B9 #AmericanSignLanguagerecognition, #handgesturerecognition #RGBDvideoanalysis, #multimodality, #3DConvolutionalNeuralNetworks, #proxyvideo Xiaorui Zhu Dong Xu Dr. Chi Wai (Rick) Lee Yu Sun
Like Comment
To view or add a comment, sign in
Ali Waheed

Machine Learning Engineer
9mo
Report this post
At this point, most would agree that it is better to build a general foundation model that is then fine-tuned to solve specific problems, rather than training from scratch. Nevertheless, It's quite funny when a scientist demonstrates this idea by showing how a foundation model trained mostly on cat videos, when fine-tuned to learn the dynamics of multiple heterogeneous physical systems, outperforms complex models trained from scratch. The following is an amazing talk by Miles Cranmer where he shares other more serious ideas such as interpretability via symbolic modeling and discovering new scientific theories using neural networks! https://lnkd.in/dyR9s34M

The Next Great Scientific Theory is Hiding Inside a Neural Network

https://www.simonsfoundation.org
Like Comment
To view or add a comment, sign in

1,030 followers

6 Posts

View Profile Connect

Tao Sun’s Post

More Relevant Posts

Explore topics