“Fast learner, hard worker and sympathetic colleague are terms that come to mind when I think about Hassen. I’ve had the pleasure of working with him as his manager for one year and a half during which he worked on several NLP and machine learning projects. As a junior, Hassen has proven himself to be a strong technical fit here at Allianz, he displayed great talents in applications, such as R coding, Python, and several other data visualization tools. It’s not just his technical skills that impress me, however. Hassen was a joy to work with because of his amazingly positive attitude and eagerness to stick tightly to deadlines. I highly recommend him ! ”
Hassen Bazarbacha
Paris et périphérie
2 k abonnés
+ de 500 relations
Voir les relations en commun avec Hassen
Bon retour parmi nous
En cliquant sur Continuer pour vous inscrire ou vous identifier, vous acceptez les Conditions d’utilisation, la Politique de confidentialité et la Politique relative aux cookies de LinkedIn.
Nouveau sur LinkedIn ? Inscrivez-vous maintenant
ou
En cliquant sur Continuer pour vous inscrire ou vous identifier, vous acceptez les Conditions d’utilisation, la Politique de confidentialité et la Politique relative aux cookies de LinkedIn.
Nouveau sur LinkedIn ? Inscrivez-vous maintenant
Voir les relations en commun avec Hassen
Bon retour parmi nous
En cliquant sur Continuer pour vous inscrire ou vous identifier, vous acceptez les Conditions d’utilisation, la Politique de confidentialité et la Politique relative aux cookies de LinkedIn.
Nouveau sur LinkedIn ? Inscrivez-vous maintenant
ou
En cliquant sur Continuer pour vous inscrire ou vous identifier, vous acceptez les Conditions d’utilisation, la Politique de confidentialité et la Politique relative aux cookies de LinkedIn.
Nouveau sur LinkedIn ? Inscrivez-vous maintenant
Expérience
Formation
Licences et certifications
Langues
-
French
Bilingue ou langue natale
-
English
Capacité professionnelle complète
-
Arabic
Bilingue ou langue natale
-
German
Compétence professionnelle limitée
Recommandations reçues
1 personne a recommandé Hassen
Inscrivez-vous pour y accéderVoir le profil complet de Hassen
Autres profils similaires
-
Hamza FILALI
Royaume-UniSe connecter -
Codreanu Dana
BucarestSe connecter -
Souhail Toumdi
Machine Learning Engineer @ Criteo
ParisSe connecter -
Cristian David Rodriguez
Paris et périphérieSe connecter -
Wissem Hamidou
ParisSe connecter -
Tram Ngoc DINH
Data Scientist
ParisSe connecter -
Maria Cane
ParisSe connecter -
Fatima Ezzahrae Malki
ParisSe connecter -
Aladin SABBAGH
CachanSe connecter -
Michael MOUTEI
Levallois-PerretSe connecter -
Hicham KERKRI
FranceSe connecter -
fatima gasmi
FranceSe connecter -
Osheen Mohan
INSEAD MBA'25D Candidate
BerlinSe connecter -
Yuan Zhang
Tokyo, JaponSe connecter -
Oumaima Sabir
Data Scientist at AXA
ParisSe connecter -
Ali Bahou
ParisSe connecter -
Abhinaba Banerjee
Python| AI Learner | MSc from IESEG, France | Technical Writer | Ex-Operanka
Vijayawada CentraleSe connecter -
Alexandre DO
Paris et périphérieSe connecter -
Victor Journé
Data Scientist at the French Ministry of the Interior
Paris et périphérieSe connecter -
Marine Gosselin
Tech advocate
FranceSe connecter
Découvrir plus de posts
-
Sauradeep Debnath
Hi everyone, I wanted to share with my brief write up about basic optimization algorithms used in deep learning. I start by giving the convex optimization backgrounds where these algorithms were conceived. Then following the amazing ML book by Shai Shalev-Shwartz and Shai Ben-David, I discuss the intuitive interpretation of Gradient Descent and SGD. Lastly, I do a deep dive of the absolutely amazing ICML 2013 paper by Ilya Sutskever & Hinton ( “ On the importance of initialization and momentum in deep learning.”) which gives wonderful re interpretation of the classical momentum algo ; re -derives the Nesterov Accelerated Gradient (NAG, 1983) as a form of momentum and talks about their different experimentations regarding how these algos vs SGD vs 2nd order approximate Newton methods fare for deep learning, and how they deal with high curvature. Thus overall, this blog is supposed to be a refresher for the basics, before we dive into the newer, faster algorithms like Adam & other Adaptive algorithms in the next article. I would like to thank my Department of CSE, IIT Hyderabad professors, especially Vineeth N Balasubramanian Sir who taught us 2 courses - ML & deep learning and Saketha Nath Jagarlapudi Sir who taught us convex optimization theory and convex optimization algorithms, without whom I wouldn't have the necessary foundations to undertake this exploration. #deeplearning #optimization #gradientdescent #sgd #momentum #nesterov #nag #hinton #paperreview
14
2 commentaires -
Paul Iusztin
You must know these 𝟯 𝗺𝗮𝗶𝗻 𝘀𝘁𝗮𝗴𝗲𝘀 of 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗮𝗻 𝗟𝗟𝗠 to train your own 𝗟𝗟𝗠 on your 𝗽𝗿𝗼𝗽𝗿𝗶𝗲𝘁𝗮𝗿𝘆 𝗱𝗮𝘁𝗮. # 𝗦𝘁𝗮𝗴𝗲 𝟭: 𝗣𝗿𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗳𝗼𝗿 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗶𝗼𝗻 You start with a bear foot randomly initialized LLM. This stage aims to teach the model to spit out tokens. More concretely, based on previous tokens, the model learns to predict the next token with the highest probability. For example, your input to the model is "The best programming language is ___", and it will answer, "The best programming language is Rust." Intuitively, at this stage, the LLM learns to speak. 𝘋𝘢𝘵𝘢: >1 trillion token (~= 15 million books). The data quality doesn't have to be great. # 𝗦𝘁𝗮𝗴𝗲 𝟮: 𝗦𝘂𝗽𝗲𝗿𝘃𝗶𝘀𝗲𝗱 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 (𝗦𝗙𝗧) 𝗳𝗼𝗿 𝗱𝗶𝗮𝗹𝗼𝗴𝘂𝗲 You start with the pretrained model from stage 1. This stage teaches the model to respond to the user's questions. For example, without this step, when prompted, "What is the best programming language?", it has a high probability of creating a series of questions such as: "What is MLOps? What is MLE? etc." As the model mimics the training data, you must fine-tune it on Q&A data to align the model to respond to questions instead of predicting the following tokens. After the fine-tuning step, it will respond to "What is the best programming language?" with "Rust". 𝘋𝘢𝘵𝘢: 10K - 100K Q&A example 𝘕𝘰𝘵𝘦: After aligning the model to respond to questions, you can further single-task fine-tune the model to specialize the LLM on a specific use case. # 𝗦𝘁𝗮𝗴𝗲 𝟯: 𝗥𝗟𝗛𝗙 Demonstration data tells the model what kind of responses to give but doesn't tell the model how good or bad a response is. The goal is to align your model with user feedback (what users liked or didn't like) to increase the probability of generating answers that users find helpful. 𝘙𝘓𝘏𝘍 𝘪𝘴 𝘴𝘱𝘭𝘪𝘵 𝘪𝘯 2: 1. Using the LLM from stage 2, train a reward model to act as a scoring function using (prompt, winning_response, losing_response) samples (= comparison data). The model will learn to maximize the difference between these 2. After training, this model outputs rewards for (prompt, response) tuples. 𝘋𝘢𝘵𝘢: 100K - 1M comparisons 2. Use an RL algorithm (e.g., PPO) to fine-tune the LLM from stage 2. Here, you will use the reward model trained above to give a score for every: (prompt, response). The RL algorithm will align the LLM to generate prompts with higher rewards, increasing the probability of generating responses that users liked. 𝘋𝘢𝘵𝘢: 10K - 100K prompts #machinelearning #mlops #datascience . 💡 Follow me for daily content on production ML and MLOps engineering.
128
4 commentaires -
Giorgio Lazzarinetti, Ph.D.
These days I’ve been in Montpellier, France, to present our last research results on combinatorial optimization problems over temporal graphs to the 31st symposium on temporal representation and reasoning (TIME2024). Check out our paper at https://lnkd.in/da66ref4 Here we discuss a new heuristic-based approach for minimum timeline cover over temporal graphs. The paper provides theoretical results to a problem largely adopted in social network analysis as interpretable network summarization framework to find important events. Particularly, we used it to analyze job postings, to find the most required skills by companies and understand how the market is evolving. We are already working on evolutions of this approach with Deep Learning. Stay tuned for next publications. #knowledgegraph #combinatorialoptimization #networksummarization
15
1 commentaire -
Trevis Litherland
Causality: Chapter 2 – Post 2 ...But what if there are hidden (latent) variables? Pearl extends the IC algorithm to the latent structure case (the IC* Algorithm). The resulting “marked diagrams” outputs have four different types of edges, (described in further detail in the next section): 1) “Marked,” directed edges (genuine causality) 2) Unmarked, directed edges (potential causality) 3) Bi-directed edges (spurious association) 4) Undirected edges (unknown relationship) The notion of genuine causation is a little simpler once he introduces temporal information, and Pearl explores temporality further via the intriguing topic of “statistical time.” The chapter ends with a defense of the approach described above, both in practical and philosophical terms. Whereas minimality is relatively uncontroversial, DAGs’ inherently Markovian structure and Pearl’s notion of stability have both been challenged. Pearl finds these challenges to be answerable, and so it seems to me. I very much enjoyed this chapter. After all, it’s a very natural question to ask when you begin playing with diagrams: “What sort of DAGs should I be writing down, based on this probability distribution?” While the IC* Algorithm doesn’t always provide a unique answer, it does provide a very clear picture of what causal relationships we can know “for certain” (genuine causation), which ones we can feel good about (potential causation), which ones involve a latent variable (spurious association), and which ones exhibit a dependence that we can’t elucidate further. Overall, this is quite an achievement, one I’m still pondering, as I move on to Chapter Three.
-
David Hundley
New blog post! One of the most popular implementations of Generative AI is this mechanism to incorporate your own data called retrieval augmented generation (RAG). Given that we want to ensure that our RAG pipelines remain performant and effective over time, this post takes a deep dive into a framework to perform this evaluation called Ragas. In this post, we cover how to synthesize a dataset to play with Ragas, how to calculate the Ragas metrics in Python code, and how each of the individual metrics is calculated “under the hood.” #RAG #GenAI #AI #Ragas #DataScience https://lnkd.in/gsvJrCj9
19
1 commentaire -
Sauradeep Debnath
Hey everyone, in my previous article I had covered the basics of gradient descent, SGD, Momentum & NAG. Here we would cover a new set of optimization, called Adaptive learning algorithms. For the first algo ADAGRAD (2011) , we first discuss the more mathematical presentation from the original paper and then give a more intuitive and simpler explanation from the Adadelta(2012) paper which tries to fix its issue of prematurely slowing down. Next, We will first do an in-depth coverage of RProp(1993), and how RMSProp(2012) was created to adapt RProp to stochastic settings. Next we will discuss Adam (2015) in detail. Adam has been the one of the most popular optimization algorithms, but what we tend to overlook that while it can accelerate the initial phases of training ( in terms of lowering the loss), its generalization capabilities are far worse than SGD+Momentum. There has been multiple attempts to improve its shortcomings -- like Nadam(2016), AMSGRAD(2018) , ADABOUND & AMSBOUND(2019), RAdam( 2019/2020) and Adan(2023/2024). We go through them very briefly Last but not least, we will do a deep dive of the highly successful AdamW (2017/19) algorithm which is highly popular in LLM training these days. #deeplearning #optimization #rmsprop #Adam #AdamW
10
1 commentaire -
Ali Nemati
🚀 Mistral AI: Pixtral Large and Le Chat Platform Introducing Pixtral Large: Mistral AI, the French AI startup, has launched Pixtral Large, a groundbreaking 124-billion parameter multimodal model designed to process both text and visual data. - 🧠 Core Architecture: Combines a 123-billion parameter text decoder with a 1-billion parameter vision encoder. - 📖 Expanded Context Window: Supports up to 128K tokens, enabling it to handle 30 high-resolution images or a 300-page book. - 🏆 Benchmark Leader: Outperforms GPT-4o, Gemini 1.5 Pro, and Claude-3.5 Sonnet on key benchmarks like MathVista (69.4% accuracy), ChartQA, and DocVQA. Le Chat Platform: Unlocking New Possibilities Mistral’s Le Chat platform integrates Pixtral Large, introducing innovative features for diverse applications: -🔍 Web Search with Citations: Real-time data retrieval with source transparency. -🖋️ Canvas: Tools for live document creation, collaborative editing, and version control. -📄 Advanced OCR: Efficiently processes PDFs, tables, and equations to extract insights. - 🎨 Image Generation: Powered by Flux Pro from Black Forest Labs. - 🤖 Task Agents: Automates tasks like summarization and invoice processing. - 🌟 Performance Highlights Pixtral Large sets new standards on benchmarks like ChartQA, DocVQA, and VQAv2, showcasing cutting-edge capabilities in document analysis and visual data interpretation. The model is available under the Mistral Research License for non-commercial use, with commercial licenses for enterprise adoption. 📢 Additional Updates: Mistral AI also unveiled Mistral Large 2.1, a 123-billion parameter model optimized for general-purpose tasks, along with enhanced APIs to streamline developer integration.
6
-
Ubajaka Chijioke
[11/42] My second project idea for @_buildspace : USE AI TO SOLVE MATH QUESTIONS EFFECTIVELY The current AI models are not very effective in solving difficult Math problems, They are not up to 90% accuracy. We want to build this. Excited about the challenge! Please, you can give your feedback below: @FarzaTV @_nightsandweekends
5
2 commentaires -
Ganesh Jagadeesan
This is an exciting comparison! 🚀 Both Claude 3.5 Sonnet and o1 pro represent cutting-edge advancements in AI, showcasing unique strengths tailored for different use cases. Let’s dive into the details: 🤖 Versatility in Task Handling Claude 3.5 Sonnet is a powerhouse for creative tasks, everyday coding, and long-form conversations. Its larger context window makes it ideal for handling extensive documents and multi-turn dialogues, offering speed and flexibility for general-purpose AI needs. 🔍 Advanced Reasoning with o1 pro o1 pro, on the other hand, is designed for deep reasoning and multi-step problem-solving. It excels in complex areas like advanced mathematics, scientific research, and intricate programming challenges, delivering precise and well-reasoned outputs. 💡 Specialized Strengths 1️⃣ Coding and Debugging: Claude is efficient for straightforward coding, but o1 pro’s advanced reasoning skills make it a better fit for challenging algorithmic tasks. 2️⃣ Mathematical Problem-Solving: o1 pro leads here, thanks to its ability to handle complex calculations and logical sequences. 3️⃣ Image-Based Reasoning: While Claude is capable, o1 pro provides enhanced performance, particularly for tasks requiring visual and contextual understanding. 🌟 Scenarios for Each Model Claude 3.5 Sonnet is perfect for content creation, marketing, and lightweight coding. Meanwhile, o1 pro shines in research-heavy environments, academic explorations, and tasks demanding analytical depth. Choosing between them depends on your specific requirements. 🔧 Innovation in AI Models The integration of advanced capabilities like image-based reasoning and multi-level problem-solving in o1 pro signals a step forward in AI evolution. These models are shaping the future of AI applications across industries. Thank you for sharing this comparison! 🙌 It’s exciting to see how models like Claude and o1 pro cater to diverse needs, pushing the boundaries of what’s possible with AI. #ClaudeAI #o1Pro #AIComparison #AdvancedAI #OpenAI #AIInnovation #FutureOfAI
3
-
Kumari Sweta
I'm excited to share my latest project: a 𝗥𝗔𝗚 𝗖𝗵𝗮𝘁𝗯𝗼𝘁 𝗹𝗲𝘃𝗲𝗿𝗮𝗴𝗶𝗻𝗴 𝗯𝗼𝘁𝗵 𝗔𝗺𝗮𝘇𝗼𝗻 𝗕𝗲𝗱𝗿𝗼𝗰𝗸 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗯𝗮𝘀𝗲/𝗦𝟯 𝗯𝘂𝗰𝗸𝗲𝘁 and user-uploaded documents to provide accurate, context-aware responses. It also 𝗽𝗿𝗼𝘃𝗶𝗱𝗲𝘀 𝘁𝗵𝗲 𝘀𝗼𝘂𝗿𝗰𝗲𝘀 𝗼𝗳 𝘁𝗵𝗲 𝗿𝗲𝘀𝗽𝗼𝗻𝘀𝗲 and 𝗺𝗮𝗶𝗻𝘁𝗮𝗶𝗻𝘀 𝗰𝗵𝗮𝘁 𝗵𝗶𝘀𝘁𝗼𝗿𝘆. 🔍 Key Features: - 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗨𝗽𝗹𝗼𝗮𝗱 𝗖𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝘆: Support for PDF and TXT files allows users to enrich the chatbot's knowledge base with personal or proprietary documents. - 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗠𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺: Utilizes FAISS for vector-based similarity searches. -𝗦𝗼𝘂𝗿𝗰𝗲𝘀 𝗳𝗼𝗿 𝗥𝗲𝘀𝗽𝗼𝗻𝘀𝗲𝘀: Ensures transparency and reliability by providing references to the original sources from which information was retrieved. This feature allows users to verify and delve deeper into the provided answers. - 𝗖𝗵𝗮𝘁 𝗛𝗶𝘀𝘁𝗼𝗿𝘆 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁: Maintains a well-organized chat history. - 𝗖𝗹𝗲𝗮𝗿 𝗖𝗵𝗮𝘁 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹𝗶𝘁𝘆: Users can easily reset their conversation history 🛠️ 𝗗𝗶𝗿𝗲𝗰𝘁 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗕𝗮𝘀𝗲 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻 𝗘𝘅𝗽𝗹𝗮𝗶𝗻𝗲𝗱: One of the standout features of my RAG Chatbot is it's ability to dynamically retrieve and integrate information directly from Amazon Bedrock knowledge base. Here's how it works: 𝗤𝘂𝗲𝗿𝘆 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴: When a user submits a question, the system first converts the query into a high-dimensional vector using Amazon Bedrock's fully managed bedrock-runtime Embedding API. Leveraging this fully managed API ensures seamless scalability and reliability without the need for manual infrastructure management. 𝗩𝗲𝗰𝘁𝗼𝗿-𝗕𝗮𝘀𝗲𝗱 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹: This vector is then used to perform a similarity search against the knowledge base stored in FAISS, identifying the most relevant documents or passages based on cosine similarity. This is achieved through the 𝗯𝗲𝗱𝗿𝗼𝗰𝗸_𝗮𝗴𝗲𝗻𝘁_𝗿𝘂𝗻𝘁𝗶𝗺𝗲.𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗲 method, which takes the 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲𝗕𝗮𝘀𝗲𝗜𝗱 and a 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹𝗤𝘂𝗲𝗿𝘆 containing the user's question. By utilizing this method, the chatbot efficiently fetches pertinent information directly from the knowledge base, thereby enhancing performance and reducing latency. 𝗖𝗼𝗻𝘁𝗲𝘅𝘁𝘂𝗮𝗹 𝗥𝗲𝘀𝗽𝗼𝗻𝘀𝗲 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻: The retrieved information serves as contextual input for the large language model (LLM), specifically Amazon Bedrock's fully managed Titan-TG1 Large LLM services, which generate a coherent and accurate response tailored to the user's query. If you're interested in seeing this in action or exploring collaboration opportunities, feel free to reach out or try it out yourself! 🔗 Live App: https://lnkd.in/gVEJzM9c 🔗 Github: https://lnkd.in/gKFpjYwB References: https://lnkd.in/gJ88fwwS #GenerativeAI #RAG #AmazonBedrock #FAISS #LangChain #Innovation #NLP #MachineLearning
47
4 commentaires -
Sauradeep Debnath
Hi everyone, in this video I wanted to present some basic first and 2nd order methods of optimization algorithm ( SGD, Newton, Momentum, Nesterov) and also the challenges they face in case of deep learning. I talk about the different important academic papers, intuitions from Ian Goodfellow -Bengio’s book & Christopher Bishop - Hugh Bishop ’s book as well Stanford CSC421/2516 class notes that talk about the complex attributes of loss function in case of deep learning like local minima, saddle points and plateau. In contrast to my DL blogs which focus more on the implementation of different algorithms, here I try to *motivate everyone why we need those complex algorithms in the first place*. Here are the list of papers that discussed here in detail : 1. 2014 NIPS – “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization” co-authored by Yoshua Bengio 2. 2014 JMLR – “The Loss Surfaces of Multilayer Networks” co-authored by Yann LeCun 3. 2017 NIPS – “Visualizing the Loss Landscape of Neural Nets” Feel free to watch at 1.25X or 1.5X SPEED, if required; I speak a bit slow. In stead of giving vague intuitions, I tried to give some *mathematical perspective*. With that goal in mind, I also give a brief refresher on Hessians, eigen values, Taylor Series, spectral decomposition—as and when we require their intuition-- to help us understand clearly the concepts like why Newton’s method does well in plateau and why SGD learning rates need a lot of fine tuning and depends on the maximal eigen value of the Hessian. This way, I try to motivate everyone about the need for Momentum and Adaptive Learning methods (like RMSProp, Adam ). Specifically, I spend the last 15 minutes explaining the mathematical derivations from the Stanford class notes, why the algo will diverge in case the learning rate is too big, and even show some examples in excel with some data points to explain the deck. In retrospect, I could have also done this using diagrams like many other resources do. But did totally love the approach they took in the Stanford classnotes, so that’s fine I guess. Please note, while the above is one motivation behind Adaptive Learning, which , as I cover in the part 2 of my DL blog – helps in making good progress in the initial part of the training ( in terms of lowering training cost fast) – I forgot to mention another which is mentioned in the ADAGRAD paper. Taking inspiration from the concept of TF -IDF, they theorized that sparse features have more predictive power. Hence we need to have more learning rates for features that are sparse. For a more in- depth implementation of these algorithms, please check out my blogs on deep learning. #deeplearning #lossfunctio #optimization #gradientdescent #sgd #momentum #adaptivelearning #learningrate #math #linearalgebra #hessian #eigenvalue #spectraldecomposition #taylorseries
18
2 commentaires -
Kodjo Mawuena Amekoe
Our latest preprint on Adaptive ML is now available on arxiv 💥 ! In this paper entitled “Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection”, we question the superiority of instance incremental algorithms (in terms of predictive performance/interpretability) when we take into account real conditions such as label or feedback delay, class imbalance 😑. Our results using real world fraud data (as well as generated data widely used in academia) clearly invite researchers to rethink the design and evaluation of supervised incremental machine learning algorithms, with the aim of narrowing the gap between pure academic research and industrial applications (for example, in fraud detection). link: https://lnkd.in/eJdQYgQM code: https://lnkd.in/eWrRvHP9 Thanks to my PhD advisors and co-authors Mustapha Lebbah, Hanene Azzag, Grégoire Jaffre and Zaineb Chelly Dagdia. #FraudDetection, #DataStreams, #IncrementalLearning
44
-
Firoj Paudel
𝐃𝐚𝐲 7 𝐈𝐧 𝐆𝐞𝐧𝐀𝐈: 𝐄𝐱𝐩𝐥𝐨𝐫𝐢𝐧𝐠 𝐁𝐄𝐑𝐓'𝐬 𝐂𝐨𝐫𝐞 𝐚𝐧𝐝 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 Day 7 of my Generative AI journey was packed with insights as I wrapped up the BERT paper, "𝐁𝐄𝐑𝐓: 𝐏𝐫𝐞-𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐨𝐟 𝐃𝐞𝐞𝐩 𝐁𝐢𝐝𝐢𝐫𝐞𝐜𝐭𝐢𝐨𝐧𝐚𝐥 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 𝐟𝐨𝐫 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐔𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠" by Devlin et al. I also delved into Umair Jamil’s video, "𝐁𝐄𝐑𝐓 𝐞𝐱𝐩𝐥𝐚𝐢𝐧𝐞𝐝: 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠, 𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞, 𝐁𝐄𝐑𝐓 𝐯𝐬 𝐆𝐏𝐓/𝐋𝐋𝐌, 𝐅𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠, [𝐂𝐋𝐒] 𝐭𝐨𝐤𝐞𝐧", which added another layer of clarity to my understanding. ( Check it out: https://lnkd.in/dGZg6mQw ) 𝐔𝐧𝐩𝐚𝐜𝐤𝐢𝐧𝐠 𝐭𝐡𝐞 𝐁𝐄𝐑𝐓 𝐏𝐚𝐩𝐞𝐫 The architecture and pre-training strategies in BERT are a masterclass in optimization and versatility. Here’s what stood out: → [𝐂𝐋𝐒] 𝐓𝐨𝐤𝐞𝐧: Used for sentence-level tasks, the token aggregates information from the input sequence and is pivotal for classification models. → 𝐌𝐚𝐬𝐤𝐞𝐝 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠 (𝐌𝐋𝐌): This clever approach of masking random tokens teaches BERT to predict context-sensitive words, enabling a bidirectional understanding of language. → 𝐍𝐞𝐱𝐭 𝐒𝐞𝐧𝐭𝐞𝐧𝐜𝐞 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧 (𝐍𝐒𝐏): A task that helps BERT understand sentence relationships, making it effective for tasks like Q&A and summarization. 𝐈𝐧𝐬𝐢𝐠𝐡𝐭𝐬 𝐟𝐫𝐨𝐦 𝐔𝐦𝐚𝐢𝐫 𝐉𝐚𝐦𝐢𝐥’𝐬 𝐕𝐢𝐝𝐞𝐨 Umair Jamil’s explanation bridged the gap between theory and practice. He not only demystified BERT’s architecture but also touched on: → 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐓𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬: How large-scale datasets and compute are utilized for pre-training. → 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠: The ease of adapting BERT for tasks like sentiment analysis and named entity recognition by tweaking the final layers. → 𝐂𝐨𝐦𝐩𝐚𝐫𝐢𝐬𝐨𝐧𝐬: Distinguishing BERT from GPT and other LLMs, emphasizing its pre-training methodology and bidirectional nature. And that was it for today 🤷♂️ 𝐖𝐡𝐚𝐭’𝐬 𝐍𝐞𝐱𝐭? Now it’s time to put theory into practice. I'll try implementing the BERT tomorrow. #GenAI #BERT_learnings
10
3 commentaires -
Pavan Purohit
🤖 𝐇𝐨𝐰 𝐀𝐈 𝐋𝐋𝐌𝐬 𝐂𝐚𝐧 𝐁𝐞 𝐔𝐬𝐞𝐝 𝐢𝐧 𝐂𝐨𝐧𝐯𝐞𝐫𝐬𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐀𝐈 𝐔𝐬𝐚𝐠𝐞 𝐢𝐧 𝐂𝐨𝐧𝐯𝐞𝐫𝐬𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐀𝐈: Large Language Models (LLMs) like GPT (from OpenAI), BERT (from Google), or similar models developed by other companies, offer significant advancements over traditional chatbot technologies. This ability allows LLMs to maintain more coherent and contextually relevant conversations than older rule-based systems 📈. 𝐑𝐞𝐩𝐥𝐚𝐜𝐢𝐧𝐠 𝐎𝐥𝐝 𝐂𝐡𝐚𝐭𝐛𝐨𝐭𝐬: Traditional chatbots often rely on predefined scripts and simple decision-tree logic 🌳. They struggle with unexpected queries and can only handle a narrow set of interactions effectively. In contrast, LLMs can generate responses based on a vast range of inputs, providing more flexibility and a better user experience 𝐂𝐮𝐬𝐭𝐨𝐦𝐞𝐫 𝐒𝐞𝐫𝐯𝐢𝐜𝐞: LLMs can be used to enhance customer interaction, providing responses to inquiries about products, services, or policies without human intervention. They can handle a broader array of questions with a more nuanced understanding than traditional models 🛍️. 𝐇𝐞𝐚𝐥𝐭𝐡𝐜𝐚𝐫𝐞: In the healthcare sector, conversational AI can assist with patient triage by interpreting symptoms described in natural language and guiding patients accordingly. However, due to the critical nature of medical advice, LLMs should be closely monitored or limited to providing general information while deferring critical advice to human professionals 🏥. 𝐄𝐝𝐮𝐜𝐚𝐭𝐢𝐨𝐧: Educational bots can use LLMs to tutor students by answering questions related to course content, providing explanations, and even generating practice questions 📚. 𝐋𝐢𝐦𝐢𝐭𝐚𝐭𝐢𝐨𝐧𝐬 𝐚𝐧𝐝 𝐂𝐨𝐧𝐜𝐞𝐫𝐧𝐬 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐌𝐢𝐬𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧: One of the significant risks with LLMs is their tendency to "hallucinate" information — generating plausible but factually incorrect responses. This can be particularly problematic in fields like healthcare or legal advice where accuracy is crucial ❗. 𝐋𝐚𝐜𝐤 𝐨𝐟 𝐄𝐦𝐨𝐭𝐢𝐨𝐧𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞: While LLMs can process and generate text based on patterns, they lack true emotional intelligence and the ability to understand human feelings beyond what is explicitly stated. This makes them less effective in situations requiring empathy and emotional nuance, such as counseling or deep personal conversations 💔. 𝐏𝐫𝐢𝐯𝐚𝐜𝐲 𝐚𝐧𝐝 𝐃𝐚𝐭𝐚 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲: Using LLMs in sectors with strict privacy regulations (like healthcare and finance) can be challenging because these models often need large amounts of data to train and operate, raising concerns about data security and user privacy 🔒. check this for some more in-depth explanation thank you so much for your time #ai #genai
8
-
Alina Ghani
Want to test Meta, Google, and Mistral models all on one platform ? **Groq.com** Why Groq? Because speed matters. GroqCloud supports a diverse array of models, enabling you to select the best for your needs: ➡ llama3-8b-8192 by Meta (Context Window: 8,192 tokens) ➡ llama3-70b-8192 by Meta (Context Window: 8,192 tokens) ➡ mixtral-8x7b-32768 by Mistral (Context Window: 32,768 tokens) ➡ gemma-7b-it by Google (Context Window: 8,192 tokens) Groq's commitment to GenAI inference speed is transforming real-time AI applications with its LPU. What is an LPU? ➡ Language Processing Unit, is a revolutionary processing system designed specifically for LANGUAGE applications. Unlike traditional CPUs (Central Processing Units) and GPUs (Graphics Processing Units), which handle a broad range of tasks. LPUs are optimized for the rapid processing demands of LLMs This specialized architecture allows for more efficient handling of sequential data, significantly reducing the time per word processed. ➡ That means you get faster outputs You can run Llama-2 70B at over 300 tokens per second per user!
24
-
Ganesh Jagadeesan
Forget about LLMs Ranking Leaderboards🥇, Welcome to the Era of Model Routing 🔀! This shift in the market is truly transformative. The idea of dynamically selecting the most suitable model for each specific task is a game-changer. It's like having a garage full of specialized vehicles, each ready to take on a different terrain. Whether it's the speed of a Ferrari 🏎 for complex technical analyses or the reliability of a Range Rover 🚗 for varied tasks, model routing ensures the optimal performance for every scenario. Amazon Q for Business is a perfect example, dynamically choosing the best model based on query complexity to maximize user experience. This approach heralds a future where model selection is flexible, task-specific, and efficient, marking a significant departure from static model use. Exciting times ahead in the world of generative AI! #AI #MachineLearning #ModelRouting #Innovation #TechRevolution #FutureOfAI #GenerativeAI #SmartTech 🚀🔍🧠
1
-
Quentin Lhoest
It works on any subject/topic ! 😳 Who said synthetic data are not easy / useful ? 💥 Search + Save Datasets generated by a LLM in real time 💥 Try my new app on Hugging Face : https://lnkd.in/epvgZcvD The LLM invents dataset names and themes and generates full datasets in just one click ! You can use this app to bootstrap all sorts of machine learning projects, whether it is to train a language model or small machine learning models. Let me know what you think ! (btw, of course it's all free and open source) Thanks for the gradio tips Abubakar Abid Freddy Boulton and Vishrav Chaudhary and team for Phi-3 mini #synthetic #data #datasets #llm #opensource
442
25 commentaires -
Djohra IBERRAKEN
📚 Another conference, another article! The International Conference on Machine Learning (ICML) took place last week, from July 21 to 27, 2024, in Vienna. In this article I explore some exciting advances in Machine Learning presented at this year’s edition: 🔍 For knowledge transfer, I highlight LLM distillation, 2-bit quantization, and transfer learning, which aim to create smaller and efficient models. 📈 In time series analysis, I explore zero-shot forecasting and metadata-enhanced time series generation, which is especially useful in energy field. Discover more in my medium article!
21