Frontier models are saturating reasoning benchmarks such as ARC-AGI. But is that enough to call these models intelligent? 🧠 We need more robust ways to evaluate intelligence. I believe linguistics may be a powerful tool here - it requires exceptional reasoning skills, and current frontier models perform poorly on even easy problems in the domain. In my latest blog post, I explore why linguistics offers a unique and meaningful way to test intelligence. Check it out here: https://lnkd.in/gMVQJiuh I’d also love to hear thoughts on this - what other domains do you think could push the boundaries for evaluating AI?
Advaith Sridhar’s Post
More Relevant Posts
-
The nuances and context of language are incredibly important in prompt engineering, where even the smallest changes in prompts impact the accuracy of the responses from GenAI chatbots. In this blog, Hélène Sauvé explores some of the challenges of prompt engineering from a Linguistics/Pragmatics angle and shares techniques to weave context into LLMs to avoid ambiguous outputs. https://buff.ly/3y7xiaD #LLMs #AI #PromptEngineering
In conversation with AI - when Prompt Engineering meets Linguistics
blog.scottlogic.com
To view or add a comment, sign in
-
🧠 Exploring the Future of Natural Language Planning! 🧠 Don't miss this compelling article by DailyAI on "NATURAL PLAN: Benchmarking LLMs on Natural Language Planning." Discover how the latest advancements in Large Language Models (LLMs) are revolutionizing the way we approach planning and problem-solving through natural language. 🔍 Key Takeaways: 🔷 Benchmarking Breakthroughs: The article delves into the latest benchmarks for evaluating LLMs' effectiveness in natural language planning. 🔷 Performance Insights: Gain insights into how different models perform in various planning scenarios and the potential implications for AI development. 🔷 Future Directions: Learn about the challenges and opportunities in enhancing LLMs for more sophisticated and accurate natural language planning. For an in-depth look at these exciting developments, read the full article here: https://lnkd.in/ea2dz685 #AI #LLMs #NaturalLanguageProcessing #Planning #TechInnovation #DailyAI
NATURAL PLAN: Benchmarking LLMs on natural language planning | DailyAI
https://dailyai.com
To view or add a comment, sign in
-
Are LLMs capable of Artificial General Intelligence (AGI)? I came across this article in the NY Times: https://lnkd.in/geXXbi8B (for non-subscribers here's the paper it talks about https://lnkd.in/gP-m7sxA) The upshot is that researchers are finding strong evidence that language is a very distinct part of the human brain whose purpose is to communicate and that it is not intrinsically involved in reasoning. This gets at a very fundamental question about the use of LLMs as the basis for Artificial General Intelligence. Can a model based on language be used as the basis for AGI? Time will tell I'm sure, but when I've asked this question to researchers I know, they generally say yes. However, I'm not sure what the basis is for this belief. CAVEAT: I'm not in the AI field, nor have I delved deeply into the scholarly articles. I have a fairly rudimentary understanding of LLMs, but would be happy to learn from those who know more. One of the fundamental articles shaping my personal views on LLM based AI is Ted Chiang's excellent piece "ChatGPT is a blurry JPEG of the web" (https://lnkd.in/gmkRBe6A). For me, it captured the idea that LLMs are just summarizing what we know. To be sure, they are capable of really cool things, but are they really doing any thinking? More important are they even capable of thinking? Again, time will tell, but I'm increasingly skeptical. The kicker for me is that they have no common sense. One of the reasons we find LLMs so amusing is they crazy ways they go off the rails (aka hallucinations). I think that these are largely because there's no common sense 'boundary'. When a human's thoughts start to get carried away, we have ways of reigning them in by asking ourselves "is this actually a reasonable thought?" Indeed, it can be quite alarming to encounter people who communicate about ideas that we find to be outside the bounds of common reason. It feels like this 'framework of rationality' is very important but I don't think it's something you can just use as guard rails. It seems like you need some kind of other AI to be able to generate that framework. What do you think? Are LLMs capable of AGI? Or are they like our own language processing center, just a small piece of a much larger 'intelligence framework'? [Edited to use AGI instead of GenAI so as not to be confused with Generative AI] [Edit: Thanks to Tareq Alkhatib here is a direct link to the paper in question: https://lnkd.in/g9b7Q66c]
Language is primarily a tool for communication rather than thought - Nature
nature.com
To view or add a comment, sign in
-
The journey of artificial intelligence from understanding to generating human language is a complex dance of technology, linguistics, and psychology. It’s about making machines hear, listen, speak, and communicate, turning the once-distant dream of seamless human-AI interaction into our everyday reality. In this piece, I will take you through the fascinating trip of how language is generated and understood by AI. https://lnkd.in/dxWVRPgC
A dummy's guide to how AI is trained to understand human languages
thecable.ng
To view or add a comment, sign in
-
Do you work in #wine #communications? Try a metaphor check with ChatGPT 4.0 👀 ✍ Here is the prompt I used relying on MIP(VU): 🗣️Identify the conceptual metaphors in this wine review using the following procedure: 1. Read the wine review. 2. Inspect each lexical unit to determine its contextual meaning within the surrounding context. 3. Check for a more basic, concrete, or bodily action-related sense using a corpus-based dictionary. 4. Mark as metaphorical if: The meaning involves a more concrete sense than the contextual one. Both basic and contextual senses can be understood in contrast or comparison to one another. 🥂 Wine Review: "Aromas of soft herbs and ripe white-fleshed fruits, some whispers of tropical fruits and a distinctive soft-soil quality. Sage and apple, fresh cape gooseberry and lemon, some peach and white pineapple flavours. A lovely wine with acidity for freshness and crunch, and a youthful vibrant texture with a sense of place stony mineral quality." ↘ Here is the conceptual metaphor analysis: - Aromas of soft herbs and ripe white-fleshed fruits: Soft herbs: Metaphorical in the sense of "soft" describing a gentle quality, not a physical softness. - Some whispers of tropical fruits: Whispers: Metaphorical, suggesting subtle hints or undertones, comparing sensory perception to a quiet sound. - A distinctive soft-soil quality: Soft-soil: Metaphorical, describing the wine's earthy characteristics. "Soft" here is used metaphorically to describe texture. - A lovely wine with acidity for freshness and crunch: Acidity for freshness: "Freshness" is metaphorical, using a taste quality to describe a sensory experience. Crunch: Metaphorical, comparing the sensory experience of drinking to the texture of a solid food. - A youthful vibrant texture with a sense of place stony mineral quality: Youthful vibrant texture: "Youthful" and "vibrant" are metaphorical, describing the wine’s feel as young and lively. Sense of place: Metaphorical, suggesting the wine's ability to convey the characteristics of its origin. Stony mineral quality: "Stony" is metaphorical, comparing the taste to the texture and flavor associated with stones and minerals. 🗝 Key Metaphorical Lexical Units: - Whispers (contextual meaning: subtle hints, basic meaning: quiet sound) - Soft-soil (contextual meaning: earthy texture, basic meaning: physically soft soil) - Crunch (contextual meaning: sensation of crispness, basic meaning: the sound/feel of breaking something hard) - Youthful (contextual meaning: fresh and lively, basic meaning: characteristic of young age) - Vibrant (contextual meaning: lively and bright, basic meaning: full of energy and life) - Sense of place (contextual meaning: origin characteristics, basic meaning: perception of a location) - Stony mineral (contextual meaning: taste characteristics, basic meaning: relating to stones/minerals) #language #AI #linguistics #winereview #winecommunication #wineeducation #sensoryevaluation
AI L&D Expert | Language Scientist🍷Simplifying Wine | Coach ICF ACC Empowering individuals and organisations to navigate the future of learning with AI, communication, and coaching.
Information packed read for #language lovers and #metaphor enthusiasts interested in #AI. University of Nottingham's Professor Brigitte Nerlich’s recent blog post takes you on a journey from manual to AI-driven metaphor identification (so many metaphors already!). 🏫 Drawing on the foundational work of George Lakoff and Mark Johnson's "Metaphors We Live By," and referencing the extensive metaphor corpora collected by the Metaphor Lab at VU Amsterdam and University of Amsterdam, the blog delves into the development of datasets and machine learning techniques for automating metaphor analysis, from recognition to identification to interpretation. The advances in #AI, particularly with large language models (LLMs), have significantly enhanced our ability to recognise and interpret metaphors, transforming both linguistic research and practical applications. 🤔 What are your thoughts? ✍ 💻 👾 🥡 Here are my key takeaways from Nerlich’s blog: ➡ Automation Advancement: AI and machine learning are revolutionising metaphor identification, making it faster and more accurate. ➡ Large Language Models (LLMs): These models, trained on huge datasets, can detect subtle and complex metaphors that were previously challenging. ➡ Interdisciplinary Research: Combining linguistics, AI, and cognitive science research will continue to enrichs metaphor analysis. ➡ Practical Applications: Automated metaphor identification can enhance fields like marketing, education, and psychology. ➡ Continuous Improvement: Ongoing research and dataset refinement will further enhance AI's metaphor detection capabilities. Read more here: https://lnkd.in/gppkUU8Z #research #linguistics #cognitivelinguistics #communication #GenAI #LLM #scienceeducation #research #RaAM #MetaphorLab Vrije Universiteit Amsterdam (VU Amsterdam) UvA
Metaphor identification: From manual to automatic - Making Science Public
https://blogs.nottingham.ac.uk/makingsciencepublic
To view or add a comment, sign in
-
Information packed read for #language lovers and #metaphor enthusiasts interested in #AI. University of Nottingham's Professor Brigitte Nerlich’s recent blog post takes you on a journey from manual to AI-driven metaphor identification (so many metaphors already!). 🏫 Drawing on the foundational work of George Lakoff and Mark Johnson's "Metaphors We Live By," and referencing the extensive metaphor corpora collected by the Metaphor Lab at VU Amsterdam and University of Amsterdam, the blog delves into the development of datasets and machine learning techniques for automating metaphor analysis, from recognition to identification to interpretation. The advances in #AI, particularly with large language models (LLMs), have significantly enhanced our ability to recognise and interpret metaphors, transforming both linguistic research and practical applications. 🤔 What are your thoughts? ✍ 💻 👾 🥡 Here are my key takeaways from Nerlich’s blog: ➡ Automation Advancement: AI and machine learning are revolutionising metaphor identification, making it faster and more accurate. ➡ Large Language Models (LLMs): These models, trained on huge datasets, can detect subtle and complex metaphors that were previously challenging. ➡ Interdisciplinary Research: Combining linguistics, AI, and cognitive science research will continue to enrichs metaphor analysis. ➡ Practical Applications: Automated metaphor identification can enhance fields like marketing, education, and psychology. ➡ Continuous Improvement: Ongoing research and dataset refinement will further enhance AI's metaphor detection capabilities. Read more here: https://lnkd.in/gppkUU8Z #research #linguistics #cognitivelinguistics #communication #GenAI #LLM #scienceeducation #research #RaAM #MetaphorLab Vrije Universiteit Amsterdam (VU Amsterdam) UvA
Metaphor identification: From manual to automatic - Making Science Public
https://blogs.nottingham.ac.uk/makingsciencepublic
To view or add a comment, sign in
-
The field of AI is moving somewhat too fast and researchers have started to forget about the past. The latest example is "Super Weight". In the past week, I've seen the discussions about this paper (https://lnkd.in/e5aU59cG) from multiple sources and many people said "It's surprising" that changing a single weight can cripple an LLM's behavior. My first impression was: didn't we KNOW this already? Deep Neural Networks have tremendous capacity and can easily overfit, and overfitting leads to neuronal co-adaptation. And we have addressed this with Dropout, over 10 years ago. How didn't the LLM researchers see this coming? Prof. Dan Roy on Twitter nailed it with a five-word comment: "Somebody didn't train with dropout" (https://lnkd.in/eq_6cjGk). However, this paper uncovered an even more serious problem. LLMs today do not live up to the promise of faulty tolerance anymore. The ML community has to fix this, this is not a feature. Please don't try to make it a feature.
2411.07191
arxiv.org
To view or add a comment, sign in
-
EUtopia: Human-centred AI - Beyond Words Our department, together with EUtopia, hosted the workshop Beyond Words: Human-centred AI this week. The main idea of this EUTOPIA collaboration is to encourage research and teaching activities between the Universities of Gothenburg, Ljubljana and Barcelona. We had guests from Slovenia, Belgium, Germany, France, and the UK, as well as from the local industry (Volvo) and charities, who presented a variety of research projects and opportunities for students and researchers in Computational Linguistics. Voices from the participants: ”One of the standout aspects for me was the focus on cutting-edge technologies, particularly Large Language Models (LLMs), which are driving significant advances in the field. I was particularly interested to learn about recent research into how to assess linguistic differences between human and machine-generated text, and whether these differences are still detectable. This is a critical area as we continue to refine the capabilities of AI systems” Pierluigi Cassotti, Researcher and part of the program Change is Key! ”The entire workshop was so relevant for our times where machines talk and perceive more and more complex language. The take-home message for me was that human interaction is extremely messy, full of incomplete and repaired exchanges, and much more organic than an idealized system of exchanges. I think that is what makes human dialogue challenging to model but also interesting to explore.” Talha Bedir, PhD student in computational linguistics ”It was inspiring to see how different traditions and themes could come together, both on a theoretical and practical level. I was also given new names and faces to associate with different aspects of linguistics. In the future it will be easier to find people to collaborate with thanks to the interactions at this workshop.” Erik Lagerstedt, Postdoc in the research project DivCon
To view or add a comment, sign in
-
-
🌟 Thrilled to share that our paper, "Argument Relation Classification through Discourse Markers and Adversarial Training" by Luca Contalbo, Francesco Guerra, and Matteo Paganelli, has been accepted to EMNLP 2024 (main track)! Our research focuses on Argument Relation Classification (ARC), a task essential for identifying supportive, contrasting, and neutral relationships between argumentative units. In this work, we introduce DISARM, a model that pushes ARC performance forward by integrating discourse marker detection and adversarial training to achieve a unified, robust embedding space. With this approach, DISARM surpasses the accuracy of current ARC methods, combining multi-task and adversarial strategies. Looking forward to presenting our work at EMNLP 2024 in Miami from November 12–16! Excited to connect with other researchers and share our findings in person. Paper link: https://lnkd.in/d5u5bfev #DTALab #NLP #AI #EMNLP2024 #ArgumentMining #AdversarialLearning #DiscourseMarkers
2024.emnlp-main.1054.pdf
aclanthology.org
To view or add a comment, sign in
-
A new paper out of Russia - December 2024 says AI still having difficulty with metaphorical language https://lnkd.in/gadCBw_H Abstract In recent years, numerous studies have pointed to the ability of artificial intelligence (AI) to generate and analyze expressions of natural language. However, the question of whether AI is capable of actually interpreting human language, rather than imitating its understanding, remains open. Metaphors, being an integral part of human language, as both a common figure of speech and the predominant cognitive mechanism of human reasoning, pose a considerable challenge to AI systems. Based on an overview of the existing studies findings in computational linguistics and related fields, the paper identifies a number of problems associated with the interpretation of non-literal expressions of language by large language models (LLM). It reveals that there is still no clear understanding of the methods for training language models to automatically recognize and interpret metaphors that would bring it closer to the level of human “interpretive competencies”. The purpose of the study is to identify possible reasons that hinder the understanding of figurative language by artificial systems and to outline possible directions for solving this problem. The study suggests that the main barriers to AI’s human-like interpretation of figurative natural language are the absence of a physical body, the inability to reason by analogy and make inferences based on common sense, the latter being both the result and the cognitive process in extracting and processing information. The author concludes that further improvement of the AI systems creative skills should be at the top of the research agenda in the coming years. ————- ‘It is the east, and Juliet is the sun.’ - whatever thumped in Romeo’s chest hasn’t quite found its way to ClaudeAI just yet.
(PDF) Interpreting Metaphorical Language: A Challenge to Artificial Intelligence
researchgate.net
To view or add a comment, sign in
Co-founder & CPO @ Coheso (legaltech) | AI @ CMU | x-HP Inc
2moVery insightful and cited sources help build conviction. High quality stuff. I’m surprised that O1 outperformed the competition drastically. Would you like me to run some prompts on O1-pro for further testing?