Michael Canesche's paper, which describes the new kernel fusion algorithm used in the Cadence XNNC Tensor Compiler, has been accepted at the International Conference on Compiler Construction. Tensor compilers like XLA, TVM, and TensorRT operate on computational graphs, where vertices represent operations, and edges denote data flow between these operations. Operator fusion is an optimization technique that combines multiple operators into a single, more efficient operation. The paper "Fusion of Operators of Computational Graphs via Greedy Clustering: The XNNC Experience" introduces the operator fusion algorithm recently implemented in the Xtensa Neural Network Compiler (XNNC). XNNC is a toolchain designed for deploying machine learning models on Cadence's Tensilica processors. These edge-device processors are widely used in applications such as automotive systems, consumer electronics, communications, LiDAR, and radar technologies. First released in 2017 to complement Tensilica’s Vision 7 processors, XNNC has since evolved significantly. Now in version 3.0, its codebase spans hundreds of thousands of lines of C++ code. XNNC has been used to compile thousands of neural networks for a broad range of Xtensa architectures, and its design and implementation continue to advance, as this paper demonstrates. Read the paper: https://lnkd.in/dsQJtSkz #compilers #research #university #education #gradschool
Compilers Lab’s Post
More Relevant Posts
-
Hi all, I recently worked on a small but fun CV project to build a real-time drone detection system using the YOLO11n deep learning model. With a dataset from Roboflow (https://lnkd.in/eGHrCSfi), I trained the model on my NVIDIA GTX 1660 Super, leveraging CUDA and cuDNN for GPU acceleration. Results: Achieved 90%+ mAP50 for drone detection accuracy. Processed live camera feeds at ~20 FPS. Successfully tested the model on videos and live streams with minimal false positives! Challenges: Formatting the dataset for YOLO. Managing dependencies like PyTorch and CUDA. Balancing real-time performance with model accuracy. Key Takeaways: Proper dataset structure and GPU acceleration were game-changers. YOLO11n’s lightweight architecture made real-time inference possible. This project showcased the potential for surveillance, monitoring, and even edge deployment!
To view or add a comment, sign in
-
Hello everyone, Today, we've decided to postpone our next demo presentation to delve into some remarkable images from IMEC showcasing High-NA EUV work for Logic and DRAM. A big thank you to Subhash KM for sharing these. While we don't currently have customers utilizing High-NA EUV tools, I took the initiative to run these images through our Measurement Utility (YieldPro 5.1.1). My goal was to see how our software handles the latest metrology challenges. In about 20 minutes, I was able to create a 22nm pitch 2D feature recipe for full cell segmentation. It wasn’t too challenging, but certainly not trivial. I used some Deep Learning pre-filtering along with simple contour extraction based on gradients to perform the segmentation. It's worth noting that the image had high SNR, making the use of Deep Learning somewhat excessive, but it was an interesting exercise nonetheless. I've attached a video showcasing the process—please take a look and share your thoughts! #gazadelendaest #deeplearning #resolution #beam #EHAR #DeepStructures #metrology #Fab #problemsolving #LVTailoring #innovation #measurement #SecondaryElectronDetection #BackScatteredElectronDetection #CriticalDimentionMetrology #noisereduction #artificialneuralnetworks #CAD2SEM #Die2DB #EPE #PSD #precision #accuracy #lithography #tmu #NoiselessPSD #TargetDesign #ai #BlindDenoising #DeepStructures #neuralnetworks #weave #MassMeas #defects #beam #SecondaryElectronDetection #Overlay #EPE #labview #python #noisereduction #appliedmaterials #hitachi #kla
To view or add a comment, sign in
-
This way Gaussian Splatting can leapfrog - big 🤘 #gaussiansplatting Gaussian Splatting #3DGS #colmap
COLMAP-Free 3D Gaussian Splatting UC San Diego, NVIDIA, UC Berkeley CVPR 2024, Seattle ✨ Highlight ✨ page: https://lnkd.in/eNQMXtnS arxiv: https://lnkd.in/ergUrMdP video: https://lnkd.in/e363juUW code: https://lnkd.in/exDrDdUJ
To view or add a comment, sign in
-
Expanding My Skillset! Proud to share that I’ve completed the Image Processing Onramp course by MathWorks! Excited to apply these advanced image processing techniques to real-world challenges. #MATLAB #ImageProcessing #Innovation #ContinuousGrowth
To view or add a comment, sign in
-
-
🚨 Latest group preprint: 🚨 𝐅𝐞𝐍𝐍𝐨𝐥: 𝐚𝐧 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐚𝐧𝐝 𝐅𝐥𝐞𝐱𝐢𝐛𝐥𝐞 𝐋𝐢𝐛𝐫𝐚𝐫𝐲 𝐟𝐨𝐫 𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐅𝐨𝐫𝐜𝐞-𝐟𝐢𝐞𝐥𝐝-𝐞𝐧𝐡𝐚𝐧𝐜𝐞𝐝 𝐍𝐞𝐮𝐫𝐚𝐥 𝐍𝐞𝐭𝐰𝐨𝐫𝐤 𝐏𝐨𝐭𝐞𝐧𝐭𝐢𝐚𝐥𝐬. 👉 : https://lnkd.in/enz4vrcb A new #GPU-accelerated #opensource library for building, training and running force-field-enhanced neural network potentials. It provides a flexible and modular system for building hybrid models, allowing to easily combine state-of-the-art embeddings with ML-parameterized physical interaction terms without the need for explicit programming. FeNNol shrinks the performance gap between ML potentials and standard force-fields. It can be used standalone or via Deep-HP within Tinker-HP: heavy #HPC optimization is underway for multi-(nodes/GPUs) runs. Available at https://lnkd.in/e4MCDRp2 Great work by Thomas Plé, Olivier ADJOUA, Louis Lagardère Funding European Research Council (ERC) (project EMC2). Supercomputer time GENCI. #drugdesign #NeuralNetworks #GPU #supercomputing #HPC NVIDIA #machinelearning Sorbonne Université CNRS
To view or add a comment, sign in
-
-
🚨CVPR 2024 Paper Alert 🚨 ➡️Paper Title: 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering 🌟Few pointers from the paper 🎯To achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency, authors have proposed 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes rather than applying 3D-GS for each individual frame. 🎯In 4D-GS, a novel explicit representation containing both 3D Gaussians and 4D neural voxels is proposed. 🎯A decomposed neural voxel encoding algorithm inspired by HexPlane is proposed to efficiently build Gaussian features from 4D neural voxels and then a lightweight MLP is applied to predict Gaussian deformations at novel timestamps. 🎯Their 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800×800 resolution on an RTX 3090 GPU while maintaining comparable or better quality than previous state-of-the-art methods. 🏢Organization: Huazhong University of Science and Technology, Huawei Inc. 🧙Paper Authors: Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang 1️⃣Read the Full Paper here: https://lnkd.in/gxKg-5xn 2️⃣Project Page: https://lnkd.in/gxMKFBas 3️⃣Code: https://lnkd.in/gGvwsSCw 🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊 Music by Sergio Prosvirini from Pixabay Find this Valuable 💎 ? ♻️REPOST and teach your network something new Follow me, Naveen Manwani, for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements. #cvpr2024
To view or add a comment, sign in
-
Hello everyone, Today, we've decided to postpone our next demo presentation to delve into some remarkable images from IMEC showcasing High-NA EUV work for Logic and DRAM. A big thank you to Subhash KM for sharing these. While we don't currently have customers utilizing High-NA EUV tools, I took the initiative to run these images through our Measurement Utility (YieldPro 5.1.1). My goal was to see how our software handles the latest metrology challenges. In about 20 minutes, I was able to create a 22nm pitch 2D feature recipe for full cell segmentation. It wasn’t too challenging, but certainly not trivial. I used some Deep Learning pre-filtering along with simple contour extraction based on gradients to perform the segmentation. It's worth noting that the image had high SNR, making the use of Deep Learning somewhat excessive, but it was an interesting exercise nonetheless. I've attached a video showcasing the process—please take a look and share your thoughts! #gazadelendaest #deeplearning #resolution #beam #EHAR #DeepStructures #metrology #Fab #problemsolving #LVTailoring #innovation #measurement #SecondaryElectronDetection #BackScatteredElectronDetection #CriticalDimentionMetrology #noisereduction #artificialneuralnetworks #CAD2SEM #Die2DB #EPE #PSD #precision #accuracy #lithography #tmu #NoiselessPSD #TargetDesign #ai #BlindDenoising #DeepStructures #neuralnetworks #weave #MassMeas #defects #beam #SecondaryElectronDetection #Overlay #EPE #labview #python #noisereduction #appliedmaterials #hitachi #kla
To view or add a comment, sign in
-
Sharing my implementation of the Neural Radiance Fields (NeRF) model! NeRF creates and predicts 3D scenes from few 2D images, opening up new possibilities in 3D visualization. In this project, I utilized the power of GPU acceleration with CUDA and PyTorch, and it was a great learning experience. Special thanks to Maxime Vandegar and Quei-An Chen for their insightful repositories that greatly assisted in this implementation. you can refer these papers for NeRF : NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis : https://lnkd.in/duUz5qJd NeRF : https://lnkd.in/d5-jZdD6 Or check out the project on my GitHub for more details and to explore the code: https://lnkd.in/dVJH27xR Note : This model takes approximately 8 hours to train on RTX 2080 Ti processor. (Also the reason for my keyboard interrupt) #MachineLearning #NeRF #CUDA #PyTorch #ReserchImplementation #CNN #deeplearning
To view or add a comment, sign in
-
Despite extensive research on jamming attacks, the potential of machine learning for amplifying the threat of such attacks, or our ability to mitigate them, remains untapped. A key obstacle to this kind of research has been the absence of a suitable framework. To resolve this obstacle, we released PyJama, a fully-differentiable open-source library that adds jamming and anti-jamming functionality to NVIDIA Sionna. The accompanying paper, which will be presented at SPAWC 2024, demonstrates the utility of PyJama (i) for realistic MIMO simulations by showing examples that involve forward error correction, OFDM waveforms in time and frequency domain, realistic channel models, and mobility; and (ii) for learning to jam. Specifically, we use stochastic gradient descent to optimize jamming power allocation over an OFDM resource grid. The learned strategies are non-trivial, intelligible, and effective. PyJama has been developed by Fabian Ulbricht during his Master's Thesis in our research group, during which he was supervised by Gian Marti and Reinhard W.. The paper is co-authored by Fabian Ulbricht, Gian Marti, Reinhard Wiesmayr, and myself. A preprint of our paper is available on arXiv https://lnkd.in/dwDi3xHn, and the code is available on GitHub https://lnkd.in/ds9BGQhP. Please also check out the PyJama project website http://pyjama.ethz.ch!
To view or add a comment, sign in
-