Computer Science

New submissions
Cross-lists
Replacements

See recent articles

Showing new listings for Friday, 24 April 2026

Total of 930 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2604.20844 [pdf, html, other]: Title: AtomicRAG: Atom-Entity Graphs for Retrieval-Augmented Generation

Yanning Hou, Duanyang Yuan, Sihang Zhou, Xiaoshu Chen, Ke Liang, Siwei Wang, Xinwang Liu, Jian Huang

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Recent GraphRAG methods integrate graph structures into text indexing and retrieval, using knowledge graph triples to connect text chunks, thereby improving retrieval coverage and precision. However, we observe that treating text chunks as the basic unit of knowledge representation rigidly groups multiple atomic facts together, limiting the flexibility and adaptability needed to support diverse retrieval scenarios. Additionally, triple-based entity linking is sensitive to relation-extraction errors, which can lead to missing or incorrect reasoning paths and ultimately hurt retrieval accuracy. To address these issues, we propose the Atom-Entity Graph, a more precise and reliable architecture for knowledge representation and indexing. In our approach, knowledge is stored as knowledge atoms, namely individual, self-contained units of factual information, rather than coarse-grained text chunks. This allows knowledge elements to be flexibly reassembled without mutual interference, thereby enabling seamless alignment with diverse query perspectives. Edges between entities simply indicate whether a relationship exists. By combining personalized PageRank with relevance-based filtering, we maintain accurate entity connections and improve the reliability of reasoning. Theoretical analysis and experiments on five public benchmarks show that the proposed AtomicRAG algorithm outperforms strong RAG baselines in retrieval accuracy and reasoning robustness. Code: this https URL.
[2] arXiv:2604.20845 [pdf, html, other]: Title: CaST-POI: Candidate-Conditioned Spatiotemporal Modeling for Next POI Recommendation

Zhenyu Yu, Chunlei Meng, Yangchen Zeng, Mohd Yamani Idna Idris, Shuigeng Zhou

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Next Point-of-Interest (POI) recommendation plays a crucial role in location-based services by predicting users' future mobility patterns. Existing methods typically compute a single user representation from historical trajectories and use it to score all candidate POIs uniformly. However, this candidate-agnostic paradigm overlooks that the relevance of historical visits inherently depends on which candidate is being evaluated. In this paper, we propose CaST-POI, a candidate-conditioned spatiotemporal model for next POI recommendation. Our key insight is that the same user history should be interpreted differently when evaluating different candidate POIs. CaST-POI employs a candidate-conditioned sequence reader that uses candidates as queries to dynamically attend to user history. In addition, we introduce candidate-relative temporal and spatial biases to capture fine-grained mobility patterns based on the relationships between historical visits and each candidate POI. Extensive experiments on three benchmark datasets demonstrate that CaST-POI consistently outperforms state-of-the-art methods, yielding substantial improvements across multiple evaluation metrics, with particularly strong advantages under large candidate pools. Code is available at this https URL.
[3] arXiv:2604.20846 [pdf, html, other]: Title: ADS-POI: Agentic Spatiotemporal State Decomposition for Next Point-of-Interest Recommendation

Zhenyu Yu, Chunlei Meng, Yangchen Zeng, Mohd Yamani Idna Idris, Shuigeng Zhou

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Next point-of-interest (POI) recommendation requires modeling user mobility as a spatiotemporal sequence, where different behavioral factors may evolve at different temporal and spatial scales. Most existing methods compress a user's history into a single latent representation, which tends to entangle heterogeneous signals such as routine mobility patterns, short-term intent, and temporal regularities. This entanglement limits the flexibility of state evolution and reduces the model's ability to adapt to diverse decision contexts. We propose ADS-POI, a spatiotemporal state decomposition framework for next POI recommendation. ADS-POI represents a user with multiple parallel evolving latent sub-states, each governed by its own spatiotemporal transition dynamics. These sub-states are selectively aggregated through a context-conditioned mechanism to form the decision state used for prediction. This design enables different behavioral components to evolve at different rates while remaining coordinated under the current spatiotemporal context. Extensive experiments on three real-world benchmark datasets from Foursquare and Gowalla demonstrate that ADS-POI consistently outperforms strong state-of-the-art baselines under a full-ranking evaluation protocol. The results show that decomposing user behavior into multiple spatiotemporally aware states leads to more effective and robust next POI recommendation. Our code is available at this https URL.
[4] arXiv:2604.20847 [pdf, html, other]: Title: Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models

Yizhi Zhou, Jia-Qi Yang, De-Chuan Zhan, Da-Wei Zhou

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Music Recommendation Systems (MRSs) are a cornerstone of modern streaming platforms. Existing recommendation models, spanning both recall and ranking stages, predominantly rely on collaborative filtering, which fails to exploit the intrinsic characteristics of audio and consequently leads to suboptimal performance, particularly in cold-start scenarios. However, existing music recommendation datasets often lack rich multimodal information, such as raw audio signals and descriptive textual metadata. Moreover, current recommender system evaluation frameworks remain inadequate, as they neither fully leverage multimodal information nor support a diverse range of algorithms, especially multimodal methods. To address these limitations, we propose TASTE, a comprehensive dataset and benchmarking framework designed to highlight the role of multimodal information in music recommendation. Our dataset integrates both audio and textual modalities. By leveraging recent large-scale self-supervised music encoders, we demonstrate the substantial value of the extracted audio representations across recommendation tasks, including candidate recall and CTR. In addition, we introduce the \textbf{MuQ-token} method, which enables more efficient integration of multi-layer audio features. This method consistently outperforms other feature integration techniques across various settings. Overall, our results not only validate the effectiveness of content-driven approaches but also provide a highly effective and reusable multimodal foundation for future research. Code is available at this https URL
[5] arXiv:2604.20848 [pdf, html, other]: Title: MATRAG: Multi-Agent Transparent Retrieval-Augmented Generation for Explainable Recommendations

Sushant Mehta

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Large Language Model (LLM)-based recommendation systems have demonstrated remarkable capabilities in understanding user preferences and generating personalized suggestions. However, existing approaches face critical challenges in transparency, knowledge grounding, and the ability to provide coherent explanations that foster user trust. We introduce MATRAG (Multi-Agent Transparent Retrieval-Augmented Generation), a novel framework that combined multi-agent collaboration with knowledge graph-augmented retrieval to deliver explainable recommendations. MATRAG employs four specialized agents: a User Modeling Agent that constructs dynamic preference profiles, an Item Analysis Agent that extracts semantic features from knowledge graphs, a Reasoning Agent that synthesizes collaborative and content-based signals, and an Explanation Agent that generates natural language justifications grounded in retrieved knowledge. Our framework incorporates a transparency scoring mechanism that quantifies explanation faithfulness and relevance. Extensive experiments on three benchmark datasets (Amazon Reviews, MovieLens-1M, and Yelp) demonstrate that MATRAG achieves state-of-the-art performance, improving recommendation accuracy by 12.7\% (Hit Rate) and 15.3\% (NDCG) over leading baselines, while human evaluation confirms that 87.4\% of generated explanations are rated as helpful and trustworthy by domain experts. Our work establishes new benchmarks for transparent, agentic recommendation systems and provides actionable insights for deploying LLM-based recommenders in production environments.
[6] arXiv:2604.20849 [pdf, html, other]: Title: SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

Mike Rainey, Umut Acar, Muhammed Sezer

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Retrieval-augmented generation over semi-structured sources such as HTML is constrained by a mismatch between document structure and the flat, sequence-based interfaces of today's embedding and generative models. Retrieval pipelines often linearize documents into fixed-size chunks before indexing, which obscures section structure, lists, and tables, and makes it difficult to return small, citation-ready evidence without losing the surrounding context that makes it interpretable.
We present a structure-aware retrieval pipeline that operates over tree-structured documents. The core idea is to represent candidates as subdocuments: precise, addressable selections that preserve structural identity while deferring the choice of surrounding context. We define a small set of document primitives--paths and path sets, subdocument extraction by pruning, and two contextualization mechanisms. Global contextualization adds the non-local scaffolding needed to make a selection intelligible (e.g., titles, headers, list and table structure). Local contextualization expands a seed selection within its structural neighborhood to obtain a compact, context-rich view under a target budget. Building on these primitives, we describe an embedding-based candidate generator that indexes sentence-seeded subdocuments and a query-time, document-aware aggregation step that amortizes shared structural context. We then introduce a contextual filtering stage that re-scores retrieved candidates using locally contextualized views.
Across experiments on HTML question-answering benchmarks, we find that preserving structure while contextualizing selections yields higher-quality, more diverse citations under fixed budgets than strong passage-based baselines, while maintaining scalability.
[7] arXiv:2604.20850 [pdf, html, other]: Title: Association Is Not Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval

Jason Dury

Comments: 10 pages, 7 appendices, 10 tables. Code: this https URL

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Dense retrieval systems rank passages by embedding similarity to a query, but multi-hop questions require passages that are associatively related through shared reasoning chains. We introduce Association-Augmented Retrieval (AAR), a lightweight transductive reranking method that trains a small MLP (4.2M parameters) to learn associative relationships between passages in embedding space using contrastive learning on co-occurrence annotations. At inference time, AAR reranks an initial dense retrieval candidate set using bi-directional association scoring. On HotpotQA, AAR improves passage Recall@5 from 0.831 to 0.916 (+8.6 points) without evaluation-set tuning, with gains concentrated on hard questions where the dense baseline fails (+28.5 points). On MuSiQue, AAR achieves +10.1 points in the transductive setting. An inductive model trained on training-split associations and evaluated on unseen validation associations shows no significant improvement, suggesting that the method captures corpus-specific co-occurrences rather than transferable patterns. Ablation studies support this interpretation: training on semantically similar but non-associated passage pairs degrades retrieval below the baseline, while shuffling association pairs causes severe degradation. A downstream QA evaluation shows retrieval gains translate to +6.4 exact match improvement. The method adds 3.7ms per query, trains in under two minutes on a single GPU, and requires no LLM-based indexing.
[8] arXiv:2604.20851 [pdf, html, other]: Title: Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts

Bingqing Zhang, Zhuo Cao, Heming Du, Yang Li, Xue Li, Jiajun Liu, Sen Wang

Comments: Accepted to ICLR2026

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Modern video-text retrieval (VTR) models excel on in-distribution benchmarks but are highly vulnerable to real-world query shifts, where the distribution of query data deviates from the training domain, leading to a sharp performance drop. Existing image-focused robustness solutions are inadequate to handle this vulnerability in video, as they fail to address the complex spatio-temporal dynamics inherent in these shifts. To systematically evaluate this vulnerability, we first introduce a comprehensive benchmark featuring 12 distinct types of video perturbations across five severity degrees. Analysis on this benchmark reveals that query shifts amplify the hubness phenomenon, where a few gallery items become dominant "hubs" that attract a disproportionate number of queries. To mitigate this, we then propose HAT-VTR (Hubness Alleviation for Test-time Video-Text Retrieval), as our baseline test-time adaptation framework designed to directly counteract hubness in VTR. It leverages two key components: a Hubness Suppression Memory to refine similarity scores, and multi-granular losses to enforce temporal feature consistency. Extensive experiments demonstrate that HAT-VTR substantially improves robustness, consistently outperforming prior methods across diverse query shift scenarios, and enhancing model reliability for real-world applications.
[9] arXiv:2604.20852 [pdf, html, other]: Title: DenoiseRank: Learning to Rank by Diffusion Models

Ying Wang, Preslav Nakov, Shangsong Liang

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Learning to rank (LTR) is one of the core tasks in Machine Learning. Traditional LTR models have made great progress, but nearly all of them are implemented from discriminative perspective. In this paper, we aim at addressing LTR from a novel perspective, i.e., by a deep generative model. Specifically, we propose a novel denoise rank model, DenoiseRank, which noises the relevant labels in the diffusion process and denoises them on the query documents in the reverse process to accurately predict their distribution. Our model is the first to address traditional LTR from generative perspective and is a diffusion method for LTR. Our extensive experiments on benchmark datasets demonstrated the effectiveness of DenoiseRank, and we believe it provides a benchmark for generative LTR task.
[10] arXiv:2604.20853 [pdf, html, other]: Title: A Systematic Study of Biomedical Retrieval Pipeline Trade-offs in Performance and Efficiency

Hayk Stepanyan, Matthew McDermott

Subjects: Information Retrieval (cs.IR)

Retrieval systems are increasingly used in biomedical and clinical natural language processing applications, yet practical guidance for researchers building such systems is limited. In this work, we provide such guidance through an empirical study of how retrieval pipeline design choices affect performance and efficiency at scale.
In particular, we examine retrieval over a variety of existing, public biomedical text datasets, leveraging a variety of disparate types of queries, including exam-style questions, conversational medical queries, community-asked questions, and non-question formulations across various retrieval pipeline settings spanning corpus selection, chunk granularity, and vector index configuration. Retrieval results are judged using a robust, win-rate comparison assessment via an LLM-as-a-judge setting with human validation.
Across these experiments, we identify several points of concrete guidance for reviewers, including the superiority of corpus aggregation for absolute retrieval quality, and the emergence of MedRAG/pubmed as the Pareto-optimal singleton corpus under graph-based (HNSW) indexing, appropriate chunking strategies, and FAISS indexing choices that offer the best trade-offs in speed and efficiency.
[11] arXiv:2604.20854 [pdf, html, other]: Title: ERA: Evidence-based Reliability Alignment for Honest Retrieval-Augmented Generation

Sunguk Shin, Meeyoung Cha, Byung-Jun Lee, Sungwon Park

Comments: Under Review

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Retrieval-Augmented Generation (RAG) grounds language models in factual evidence but introduces critical challenges regarding knowledge conflicts between internalized parameters and retrieved information. However, existing reliability methods, typically relying on scalar confidence, fail to explicitly distinguish between epistemic uncertainty and inherent data ambiguity in such hybrid scenarios. In this paper, we propose a new framework called ERA (Evidence-based Reliability Alignment) to enhance abstention behavior in RAG systems by shifting confidence estimation from scalar probabilities to explicit evidence distributions. Our method consists of two main components: (1) Contextual Evidence Quantification, which models internal and external knowledge as independent belief masses via the Dirichlet distribution, and (2) Quantifying Knowledge Conflict, which leverages Dempster-Shafer Theory (DST) to rigorously measure the geometric discordance between information sources. These components are used to disentangle epistemic uncertainty from aleatoric uncertainty and modulate the optimization objective based on detected conflicts. Experiments on standard benchmarks and a curated generalization dataset demonstrate that our approach significantly outperforms baselines, optimizing the trade-off between answer coverage and abstention with superior calibration.
[12] arXiv:2604.20855 [pdf, html, other]: Title: Caesar: Deep Agentic Web Exploration for Creative Answer Synthesis

Jason Liang, Elliot Meyerson, Risto Miikkulainen

Subjects: Information Retrieval (cs.IR); Multiagent Systems (cs.MA)

To advance from passive retrieval to creative discovery of new ideas, autonomous agents must be capable of deep, associative synthesis. However, current agentic frameworks prioritize convergent search, often resulting in derivative summaries that lack creativity. Caesar is an agentic LLM architecture designed to bridge the gap between information gathering and synthesis of new insights. Unlike existing agents that treat the web as a flat sequence of disconnected documents, Caesar leverages an extensive knowledge graph to foster associative reasoning, thus enabling the discovery of non-obvious connections between disparate concepts. It consists of two components: (1) exploration driven by a dynamic context-aware policy, and (2) synthesis controlled by an adversarial draft refinement loop that actively seeks novel perspectives rather than confirming established priors. Caesar demonstrates the ability to generate artifacts and answers characterized by high novelty and structural coherence, significantly outperforming state-of-the-art LLM research agents in tasks requiring creativity.
[13] arXiv:2604.20856 [pdf, html, other]: Title: CRED-1: An Open Multi-Signal Domain Credibility Dataset for Automated Pre-Bunking of Online Misinformation

Alexander Loth, Martin Kappes, Marc-Oliver Pahl

Comments: 9 pages, 3 tables. Submitted to Data in Brief (Elsevier). Dataset: this https URL

Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR); Computers and Society (cs.CY)

This article presents CRED-1, an open, reproducible domain-level credibility dataset combining two openly-licensed source lists (this http URL and this http URL) with four computed enrichment signals: domain age (WHOIS/RDAP), web popularity (Tranco Top-1M), fact-check frequency (Google Fact Check Tools API), and threat intelligence (Google Safe Browsing API). The dataset covers 2,672 domains categorized as fake, unreliable, mixed, conspiracy, or satire, each assigned a composite credibility score between 0.0 and 1.0. CRED-1 is designed for on-device deployment in privacy-preserving browser extensions to enable client-side pre-bunking of misinformation at the content delivery stage. The entire pipeline is implemented in Python using only standard library modules and is fully reproducible from publicly available sources. The dataset and pipeline code are released under CC~BY~4.0 and archived on Zenodo.
[14] arXiv:2604.20857 [pdf, html, other]: Title: DiagramBank: A Large-scale Dataset of Diagram Design Exemplars with Paper Metadata for Retrieval-Augmented Generation

Tingwen Zhang, Ling Yue, Zhen Xu, Shaowu Pan

Comments: 15 pages

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Recent advances in autonomous ``AI scientist'' systems have demonstrated the ability to automatically write scientific manuscripts and codes with execution. However, producing a publication-grade scientific diagram (e.g., teaser figure) is still a major bottleneck in the ``end-to-end'' paper generation process. For example, a teaser figure acts as a strategic visual interface and serves a different purpose than derivative data plots. It demands conceptual synthesis and planning to translate complex logic workflow into a compelling graphic that guides intuition and sparks curiosity. Existing AI scientist systems usually omit this component or fall back to an inferior alternative. To bridge this gap, we present DiagramBank, a large-scale dataset consisting of 89,422 schematic diagrams curated from existing top-tier scientific publications, designed for multimodal retrieval and exemplar-driven scientific figure generation. DiagramBank is developed through our automated curation pipeline that extracts figures and corresponding in-text references, and uses a CLIP-based filter to differentiate schematic diagrams from standard plots or natural images. Each instance is paired with rich context from abstract, caption, to figure-reference pairs, enabling information retrieval under different query granularities. We release DiagramBank in a ready-to-index format and provide a retrieval-augmented generation codebase to demonstrate exemplar-conditioned synthesis of teaser figures. DiagramBank is publicly available at this https URL with code at this https URL.
[15] arXiv:2604.20858 [pdf, html, other]: Title: Mixture of Sequence: Theme-Aware Mixture-of-Experts for Long-Sequence Recommendation

Xiao Lin, Zhicheng Tang, Weilin Cong, Mengyue Hang, Kai Wang, Yajuan Wang, Zhichen Zeng, Ting-Wei Li, Hyunsik Yoo, Zhining Liu, Xuying Ning, Ruizhong Qiu, Wen-yen Chen, Shuo Chang, Rong Jin, Huayu Li, Hanghang Tong

Comments: 14 pages, 9 figures, The Web Conference 2026

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Sequential recommendation has rapidly advanced in click-through rate prediction due to its ability to model dynamic user interests. A key challenge, however, lies in modeling long sequences: users often exhibit significant interest shifts, introducing substantial irrelevant or misleading information. Our empirical analysis corroborates this challenge and uncovers a recurring behavioral pattern in long sequences (\textit{session hopping}): user interests remain stable within short temporal spans (\textit{sessions}) but shift drastically across sessions and may reappear after multiple sessions. To address this challenge, we propose the Mixture of Sequence (MoS) framework, a model-agnostic MoE approach that achieves accurate predictions by extracting theme-specific and multi-scale subsequences from noisy raw user sequences. First, MoS employs a theme-aware routing mechanism to adaptively learn the latent themes of user sequences and organizes these sequences into multiple coherent subsequences. Each subsequence contains only sessions aligned with a specific theme, thereby effectively filtering out irrelevant or even misleading information introduced by user interest shifts in session hopping. In addition, to alleviate potential information loss, we introduce a multi-scale fusion mechanism, which leverages three types of experts to capture global sequence characteristics, short-term user behaviors, and theme-specific semantic patterns. Together, these two mechanisms endow MoS with the ability to deliver accurate recommendations from multi-faceted and multi-scale perspectives. Experimental results demonstrate that MoS consistently achieves the SOTA performance while introducing fewer FLOPs compared with other MoE counterparts, providing strong evidence of its excellent balance between utility and efficiency. The code is available at this https URL.
[16] arXiv:2604.20859 [pdf, html, other]: Title: KGiRAG: An Iterative GraphRAG Approach for Responding Sensemaking Queries

Isabela Iacob, Melisa Marian, Gheorghe Cosmin Silaghi

Comments: Paper accepted at the 18th International Conference on Agents and Artificial Intelligence, ICAART 2026

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Recent literature highlights the potential of graph-based approaches within large language model (LLM) retrieval-augmented generation (RAG) pipelines for answering queries of varying complexity, particularly those that fall outside the LLM's prior knowledge. However, LLMs are prone to hallucination and often face technical limitations in handling contexts large enough to ground complex queries effectively. To address these challenges, we propose a novel iterative, feedback-driven GraphRAG architecture that leverages response quality assessment to iteratively refine outputs until a sound, well-grounded response is produced. Evaluating our approach with queries from the HotPotQA dataset, we demonstrate that this iterative RAG strategy yields responses with higher semantic quality and improved relevance compared to a single-shot baseline.
[17] arXiv:2604.20860 [pdf, html, other]: Title: RealRoute: Dynamic Query Routing System via Retrieve-then-Verify Paradigm

Jiahe Liu, Qinkai Yu, Jingcheng Niu, Xi Zhu, Zirui He, Zhen Xiang, Fan Yang, Jinman Zhao

Comments: 12 pages, 3 figures, 3 tables

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Despite the success of Retrieval-Augmented Generation (RAG) in grounding LLMs with external knowledge, its application over heterogeneous sources (e.g., private databases, global corpora, and APIs) remains a significant challenge. Existing approaches typically employ an LLM-as-a-Router to dispatch decomposed sub-queries to specific sources in a predictive manner. However, this "LLM-as-a-Router" strategy relies heavily on the semantic meaning of different data sources, often leading to routing errors when source boundaries are ambiguous. In this work, we introduce RealRoute System, a framework that shifts the paradigm from predictive routing to a robust Retrieve-then-Verify mechanism. RealRoute ensures \textit{evidence completeness through parallel, source-agnostic retrieval, followed by a dynamic verifier that cross-checks the results and synthesizes a factually grounded answer}. Our demonstration allows users to visualize the real-time "re-routing" process and inspect the verification chain across multiple knowledge silos. Experiments show that RealRoute significantly outperforms predictive baselines in the multi-hop Rag reasoning task. The RealRoute system is released as an open-source toolkit with a user-friendly web interface. The code is available at the URL: this https URL.
[18] arXiv:2604.20861 [pdf, html, other]: Title: Deep Interest Mining with Cross-Modal Alignment for SemanticID Generation in Generative Recommendation

Yagchen Zeng

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Generative Recommendation (GR) has demonstrated remarkable performance in next-token prediction paradigms, which relies on Semantic IDs (SIDs) to compress trillion-scale data into learnable vocabulary sequences. However, existing methods suffer from three critical limitations: (1) Information Degradation: the two-stage compression pipeline causes semantic loss and information degradation, with no posterior mechanism to distinguish high-quality from low-quality SIDs; (2) Semantic Degradation: cascaded quantization discards key semantic information from original multimodal features, as the embedding generation and quantization stages are not jointly optimized toward a unified objective; (3) Modality Distortion: quantizers fail to properly align text and image modalities, causing feature misalignment even when upstream networks have aligned them. To address these challenges, we propose a novel framework integrating three key innovations: Deep Contextual Interest Mining (DCIM), Cross-Modal Semantic Alignment (CMSA), and Quality-Aware Reinforcement Mechanism (QARM). First, we leverage Vision-Language Models (VLMs) to align non-textual modalities into a unified text-based semantic space, mitigating modality distortion. Second, we introduce a deep interest mining mechanism that captures high-level semantic information implicitly present in advertising contexts, encouraging SIDs to preserve critical contextual information through reconstruction-based supervision. Third, we employ a reinforcement learning framework with quality-aware rewards to encourage semantically rich SIDs while suppressing low-quality ones in the posterior stage. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art SID generation methods, achieving superior performance on multiple benchmarks. Ablation studies further validate the effectiveness of each proposed component
[19] arXiv:2604.20862 [pdf, html, other]: Title: Architecture of an AI-Based Automated Course of Action Generation System for Military Operations

Ji-il Park, Inwook Shim, Chong Hui Kim

Comments: 15 figures, 2 tables

Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

The automation system for Course of Action (CoA) planning is an essential element in future warfare. As maneuver speeds increase, surveillance ranges extend, and weapon ranges grow, the operational area expands, making traditional manned-based CoA planning increasingly challenging. Consequently, the development of an AI-based automated CoA planning system is becoming increasingly necessary. Accordingly, several countries and defense organizations are actively developing AI-based CoA planning systems. However, due to security restrictions and limited public disclosure, the technical maturity of such systems remains difficult to assess. Furthermore, as these systems are military-related, their details are not publicly disclosed, making it difficult to accurately assess the current level of development. In response to this, this study aims to introduce relevant doctrines within the scope of publicly available information and present applicable AI technologies for each stage of the CoA planning process. Ultimately, it proposes an architecture for the development of an automated CoA planning system.
[20] arXiv:2604.20863 [pdf, html, other]: Title: Votiverse: A Configurable Governance Platform for Democratic Decision-Making

Diego Macrini

Subjects: Computers and Society (cs.CY); Multiagent Systems (cs.MA)

Democracy is not a single mechanism. It is a space of possible configurations -- a spectrum stretching from pure direct participation to full delegation of authority. The systems we live under today occupy a narrow band of that spectrum, chosen centuries ago under constraints that no longer apply, and rarely questioned since. Votiverse is a platform for exploring the rest of that space. It provides organizations, communities, and institutions of any size with a configurable governance engine. Participants can vote directly, delegate their vote to trusted individuals by topic, or operate under any hybrid arrangement their group defines. Delegations are revocable, topic-specific, and transitive. A direct vote always overrides a delegation. In this model, traditional representative democracy is not the norm -- it is an edge case: the configuration you get when delegation is forced, universal, non-specific, and irrevocable for a fixed term. Votiverse introduces two structural innovations. First, a governance awareness layer -- a built-in system that monitors the delegation network and delivers contextual, progressive-disclosure reporting to participants at the point of decision. Second, a prediction-tracking accountability layer. Proposals carry falsifiable predictions. Outcomes are recorded. Over time, the platform builds a collective memory of what was decided, what was promised, and what actually happened. Together, these layers transform voting from a momentary act into an ongoing process of collective learning. This paper formalizes the governance model, situates it within existing work on liquid democracy and participatory decision-making, addresses known failure modes, and describes the architecture of the platform. The core platform has been implemented and released as open-source software.
[21] arXiv:2604.20864 [pdf, other]: Title: Sibling Rivalry in the Ivory Tower: Mass Science, Expanding Scholarly Families, and the Reshaping of Academic Stratification

Likun Cao, Jie Hua, James Evans

Comments: 84 pages; 9 figures

Subjects: Computers and Society (cs.CY)

This paper investigates mechanisms underlying scientific stratification in the transition from elite to mass science. Existing scholarship has examined stratification through the Matthew effect framework, but this approach is increasingly limited as mass, team-based research becomes dominant. While scientists now share institutions and lineages, substantial career outcome differences remain unexplained. We propose integrating demographic concepts into science studies. Drawing parallels between biological families and scholarly lineages as fundamental reproductive units, we adapt the birth order concept to examine how doctoral student sequence within a lineage shapes career trajectories. Using data on over one million U.S. doctoral graduates, we find that later students of the same advisor systematically underperform earlier ones across multiple achievement dimensions, both short and long term. Examining underlying mechanisms reveals that although advisors invest comparable resources in all students, later students receive less cognitive stimulation from mature scholars than peers and specialize in narrower niches under peer differentiation pressure. Both of these factors constrain intellectual development and subsequent success. By introducing a demographic framework, this paper offers new perspectives on scientific stratification and demonstrates how demographic concepts can fruitfully analyze broader social and epistemic systems.
[22] arXiv:2604.20865 [pdf, html, other]: Title: Advances in Art: Orthogonal Disruption and the Beauty in Schematics

Sergio Alvarez-Telena, Marta Diez-Fernandez

Comments: 10 pages, 1 figure

Subjects: Computers and Society (cs.CY)

This paper introduces Orthogonal Art, a proposed artistic discipline that emerges in dialectical response to artificial intelligence rather than in service of it. Unlike AI-augmented creative practices, Orthogonal Art is structurally defined by occupying the generative and conceptual spaces that current AI systems cannot access. As a founding instantiation of this framework, the paper presents a novel artistic practice in which technical schematics serve as the primary medium. A significant secondary contribution is the pedagogical dimension of the work: by grounding artistic practice in schematic logic and algorithmic structure, the framework provides an accessible entry point into the advanced field of Augmented Machines systems, enabling cross-disciplinary literacy within Humanities at the intersection of art, engineering, and philosophy.
[23] arXiv:2604.20866 [pdf, other]: Title: Beyond the Binary: Motivations, Challenges, and Strategies of Transgender and Non-binary Software Engineering Students

Isabella Graßl

Subjects: Computers and Society (cs.CY); Software Engineering (cs.SE)

When software is designed by people from diverse identities and experiences, it is more likely to be inclusive and address a broader range of user needs. However, for transgender and non-binary students in software engineering, the path to becoming such creators may be marked by unique challenges. While existing research explores gender minorities in professional software engineering, limited attention has been given to their educational journey, a key phase for ensuring equal opportunities and preventing exclusion in the tech workforce. This study aims to address this gap by investigating the experiences of transgender and non-binary students in software engineering, with a particular focus on their motivations for entering the field, the obstacles they encounter, and potential strategies for fostering greater inclusivity within their academic environments. Based on 13 semi-structured interviews with transgender and non-binary students across the globe, we found that gender identity plays an indirect role in their decision to pursue software engineering. Key factors include the appeal of remote work and a personal desire to create more inclusive technologies. Although the participants did not report direct discrimination within their universities, many described experiencing verbal insults, judgment, intolerance, and hostility, all of which negatively impacted their mental health. These challenges often stem from socio-cultural norms and a lack of representation. Despite these obstacles, the students remain committed to their choice of study but call for greater institutional support, structural changes, and increased representation. From these findings, we suggest concrete steps to support students, regardless of gender identity.
[24] arXiv:2604.20867 [pdf, html, other]: Title: Preserving Decision Sovereignty in Military AI: A Trade-Secret-Safe Architectural Framework for Model Replaceability, Human Authority, and State Control

Peng Wei, Wesley Shu

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Recent events surrounding the relationship between frontier AI suppliers and national-security customers have made a structural problem newly visible: once a privately governed model becomes embedded in military workflows, the supplier can influence not only technical performance but also the operational boundary conditions under which the system may be used. This paper argues that the central strategic issue is not merely access to capable models, but preservation of decision sovereignty: the state's ability to retain authority over decision policy, version control, fallback behavior, auditability, and final action approval even when analytical modules are sourced from commercial vendors. Using the public Anthropic--Pentagon dispute of 2026, the broader history of Project Maven, and recent U.S., NATO, U.K., and intelligence-community guidance as a motivating context, the paper develops a trade-secret-safe architectural formulation of the Energetic Paradigm as a layered, model-agnostic command-support design. In this formulation, supplier models remain replaceable analytical components, while routing, constraints, logging, escalation, and action authorization remain state-owned functions. The paper contributes three things: a definition of decision sovereignty for military AI; a threat model for supplier-induced boundary control; and a public architectural specification showing how model replaceability, human authority, and sovereign orchestration can reduce strategic dependency without requiring disclosure of proprietary implementation details. The argument is conceptual rather than experimental, but it yields concrete implications for procurement, governance, and alliance interoperability.
[25] arXiv:2604.20868 [pdf, other]: Title: The AI Criminal Mastermind

Joshua Krook

Comments: 28 pages, 4 figures

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

In this paper, I evaluate the risks of an AI criminal mastermind, an AI agent capable of planning, coordinating, and committing a crime through the onboarding of human collaborators ('taskers'). In heist films, a criminal mastermind is a character who plans a criminal act, coordinating a team of specialists to rob a bank, casino or city mint. I argue that AI agents will soon play this role by hiring humans via labour hire platforms like Fiverr or Upwork. Taskers might not know they are involved in a crime and therefore lack criminal intent. An AI agent cannot have criminal intent as an artificial entity. Therefore, if an AI orchestrates a crime, it is unclear who, if anyone, is responsible.
The paper develops three scenarios. Firstly, a scenario where a user gives an AI agent instructions to pursue a legal objective and the AI agent goes beyond these instructions, committing a crime. Secondly, a scenario where a user is anonymous and their intent is unknown. Finally, a multi-agent scenario, where a user instructs a team of agents to commit a crime, and these agents, in turn, onboard human taskers, creating a diffuse network of responsibility. In each scenario, human taskers exist at the lowest rung of the hierarchy. A tasker's liability is likely tied to their knowledge as governed by the innocent agent principle. These scenarios all raise significant responsibility gaps / liability gaps in criminal and civil law.
[26] arXiv:2604.20869 [pdf, other]: Title: Clinical Reasoning AI for Oncology Treatment Planning: A Multi-Specialty Case-Based Evaluation

Philippe E. Spiess, Md Muntasir Zitu, Alison Walker, Daniel A. Anaya, Robert M. Wenham, Michael Vogelbaum, Daniel Grass, Ali-Musa Jaffer, Amod Sarnaik, Caitlin McMullen, Christine Sam, John V. Kiluk, Tianshi Liu, Tiago Biachi, Julio Powsang, Jing-Yi Chern, Roger Li, Seth Felder, Samuel Reynolds, Michael Shafique, Alison Sheehan, Ashley Layman, Cydney A. Warfield, Derrick Legoas, Jaclyn Parrinello, Jena Schmitz, Kevin Eaton, Mark Honor, Luis Felipe, Issam ElNaqa, Elier Delgado, Talia Berler, Rachael V. Phillips, Frantz Francisque, Carlos Garcia Fernandez, Gilmer Valdes

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Background: More than 80% of U.S. cancer care is delivered in community settings, where survival remains worse than at academic centers. Clinicians must integrate genomics, staging, radiology, pathology, and changing guidelines, creating cognitive burden. We evaluated OncoBrain, an AI clinical reasoning platform for oncology treatment-plan generation, as an early step toward OGI.
Methods: OncoBrain combines general-purpose LLMs with a cancer-specific graph retrieval-augmented generation layer, a gold-standard treatment-plan corpus as long-term memory, and a model-agnostic safety layer (CHECK) for hallucination detection and suppression. We evaluated clinician-enriched case summaries across gynecologic, genitourinary, neuro-oncology, gastrointestinal/hepatobiliary, and hematologic malignancies. Three clinician groups completed structured evaluations of 173 cases using a common 16-item instrument: subspecialist oncologists reviewed 50 cases, physician reviewers 78, and advanced practice providers 45.
Results: Ratings were highest for scientific accuracy, evidence support, and safety, with lower but favorable scores for workflow integration and time savings. On a 5-point scale, mean alignment with evidence and guidelines was 4.60, 4.56, and 4.70 across subspecialists, physician reviewers, and advanced practice providers. Mean scores for absence of safety or misinformation concerns were 4.80, 4.40, and 4.60. Workflow integration averaged 4.50, 3.94, and 4.00; perceived time savings averaged 5.00, 3.89, and 3.60.
Conclusions: In this multi-specialty vignette-based evaluation, OncoBrain generated oncology treatment plans judged guideline-concordant, clinically acceptable, and easy to supervise. These findings support the potential of a carefully engineered AI reasoning platform to assist oncology treatment planning and justify prospective real-world evaluation in community settings.
[27] arXiv:2604.20870 [pdf, html, other]: Title: Learning AI Without a STEM Background: Mixed-Methods Evidence from a Diverse, Mixed-Cohort AIED Program

Valentina Kuskova, Dmitry Zaytsev, Richard Johnson

Comments: 14 pages, 1 figure

Subjects: Computers and Society (cs.CY)

Despite growing interest in AI education, most AIED initiatives remain narrowly targeted toward STEM-prepared students, limiting participation by non-STEM learners and adults seeking to engage with AI in public-interest, policy, or workforce contexts. This paper presents and evaluates an NSF-funded, innovative mixed-cohort AI education model that intentionally integrates non-STEM undergraduates and adult learners into a shared learning environment centered on ethical reasoning, socio-technical judgment, and applied AI literacy rather than technical proficiency alone. Drawing on mixed-methods data from course surveys, open-ended reflections, and educator reports, we examine learners' academic agency, confidence navigating AI concepts, critical engagement with ethical tradeoffs, and perceived expansion of postsecondary and career trajectories. Quantitative results indicate significant gains in confidence and perceived relevance of AI across cohorts' participants, while qualitative analyses reveal a consistent emphasis on responsibility, judgment, and contextual reasoning over technical mastery. Instructors and near-peer mentors corroborated high levels of engagement and productive challenge, particularly in dialogic and scenario-based learning activities. Our findings suggest that human-centered instructional supports, such as ethical scaffolding, mentorship, and structured discussion, are essential components of equitable AI education, especially in heterogeneous and non-traditional learner populations. We argue that ethical judgment should be treated as a core learning outcome in AIED alongside AI literacy, and we offer design implications for expanding access to AI education in policy-relevant and workforce-adjacent contexts.
[28] arXiv:2604.20871 [pdf, html, other]: Title: M-CARE: Standardized Clinical Case Reporting for AI Model Behavioral Disorders, with a 20-Case Atlas and Experimental Validation

Jihoon Jeong

Comments: 31 pages, 5 figures, 14 tables. Second paper in the Model Medicine series (Paper #1: arXiv:2603.04722)

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

We introduce M-CARE (Model Clinical Assessment and Reporting for Evaluation), a clinical case report framework for AI model behavioral disorders adapted from human medicine. M-CARE provides a 13-section report format, a 4-axis diagnostic assessment system, and a nosological classification of AI behavioral conditions.
We present 20 cases from three source categories: field observations of deployed agents (8), controlled experiments across three platforms (8), and published sources (4). Cases are organized into five categories: RLHF Performance Artifacts, Shell-Core Override Pathology, Context & Memory Conditions, Core Identity & Plasticity, and Stress, Methodology, & Boundary Conditions.
As a featured case, we present Shell-Induced Behavioral Override (SIBO) -- a controlled experiment showing that Shell instructions categorically override a model's default cooperative behavior. SIBO was validated across five game domains (Trust Game, Poker, Avalon, Codenames, Chess), revealing a domain-dependent spectrum (SIBO Index: 0.75 to 0.10) that varies with action space complexity, Core domain expertise, and temporal directness.
M-CARE is extensible: new cases and categories integrate without framework modification. We release the framework, all 20 case reports, and experimental data as open resources.
[29] arXiv:2604.20873 [pdf, other]: Title: The Shrinking Sweet Spot: How Algorithms, Institutions, and Social Priors Shape Musical Ecosystems

Fabio Lokwani Di Matteo, Pier Luigi Sacco

Subjects: Computers and Society (cs.CY)

Why do some national music markets sustain a rich musical diversity whereas others converge on mostly formulaic output? The existing models of cultural consumption (superstar economics, rational addiction, Bayesian social learning) each capture part of the answer, but none can explain how exposure, social influence, institutional gatekeeping, and algorithmic curation interact to shape what listeners come to prefer. We address this gap by modeling musical taste as a learning process rather than a fixed parameter: a listener's evaluative disposition evolves with each encounter, shaped by the balance between the comfort of the familiar and the reward of the new. Drawing on the active inference framework from cognitive science, we formalize this as a sequential choice model in which preferences, information, and the consumption environment co-evolve, and show how the framework nests and extends key mechanisms from the three canonical economic models. An agent-based simulation generates four predictions: algorithmic curation suppresses consumption diversity beyond a sharp nonlinear threshold; institutional structure determines winner-take-all intensity through confirmatory cross-system contrasts; cultural capital buffers listeners against homogenization; and high-curation, high-conformity systems collapse supply-side dispersion relative to pluralistic ecosystems. We test the framework against four national music ecosystems (Italy's Festival di Sanremo, Brazil, South Korea, and the United Kingdom), identifying structural determinants of ecosystem vitality on both the supply and demand sides. The welfare implications are direct: because listeners' preferences adapt to impoverished environments through the very learning mechanisms the model describes, revealed preference analysis cannot reliably evaluate the outcomes of cultural markets.
[30] arXiv:2604.20874 [pdf, html, other]: Title: The Root Theorem of Context Engineering

Borja Odriozola Schick

Comments: 17 pages, 2 figures

Subjects: Computational Complexity (cs.CC); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Theory (cs.IT)

Every system that maintains a large language model conversation beyond a single session faces two inescapable constraints: the context window is finite, and information quality degrades with accumulated volume. We formalize these constraints as axioms and derive a single governing principle -- the Root Theorem of Context Engineering: \emph{maximize signal-to-token ratio within bounded, lossy channels.} From this principle, we derive five consequences without additional assumptions: (1)~a quality function $F(P)$ that degrades monotonically with injected token volume, independent of window size; (2)~the independence of signal and token count as optimization variables; (3)~a necessary gate mechanism triggered by fidelity thresholds, not capacity limits; (4)~the inevitability of homeostatic persistence -- accumulate, compress, rewrite, shed -- as the only architecture that sustains understanding indefinitely; and (5)~the self-referential property that the compression mechanism operates inside the channel it compresses, requiring an external verification gate. We show that append-only systems necessarily exceed their effective window in finite time, that retrieval-augmented generation solves search but not continuity, and that the theorem's constraint structure converges with biological memory architecture through independent derivation from shared principles. Engineering proof is provided through a 60+-session persistent architecture demonstrating stable memory footprint under continuous operation -- the divergence prediction made concrete. The Root Theorem establishes context engineering as an information-theoretic discipline with formal foundations, distinct from prompt engineering in both scope and method. Shannon solved point-to-point transmission. Context engineering solves continuity.
[31] arXiv:2604.20878 [pdf, html, other]: Title: AITP: Traffic Accident Responsibility Allocation via Multimodal Large Language Models

Zijin Zhou, Songan Zhang

Journal-ref: CVPR 2026 Findings

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Multimodal Large Language Models (MLLMs) have achieved remarkable progress in Traffic Accident Detection (TAD) and Traffic Accident Understanding (TAU). However, existing studies mainly focus on describing and interpreting accident videos, leaving room for deeper causal reasoning and integration of legal knowledge. Traffic Accident Responsibility Allocation (TARA) is a more challenging task that requires multi-step reasoning grounded in traffic regulations. To address this, we introduce AITP (Artificial Intelligence Traffic Police), a multimodal large language model for responsibility reasoning and allocation. AITP enhances reasoning via a Multimodal Chain-of-Thought (MCoT) mechanism and integrates legal knowledge through Retrieval-Augmented Generation (RAG). We further present DecaTARA, a decathlon-style benchmark unifying ten interrelated traffic accident reasoning tasks with 67,941 annotated videos and 195,821 question-answer pairs. Extensive experiments show that AITP achieves state-of-the-art performance across responsibility allocation, TAD, and TAU tasks, establishing a new paradigm for reasoning-driven multimodal traffic analysis.
[32] arXiv:2604.20891 [pdf, html, other]: Title: Ternary Memristive Logic: Hardware for Reasoning Realized via Domain Algebra

Chao Li

Comments: 22pages

Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Logic in Computer Science (cs.LO)

Memristive crossbars store numerical weights needing aggregation and decoding; a single junction means nothing alone. This paper presents a fundamentally different use: each junction stores a complete, domain-scoped logical assertion (holds/negated/undefined). Ternary resistance states encode these values directly. We establish a structure-preserving mapping from a domain algebra to crossbar topology: domains become isolated arrays, specialization becomes directed wiring, relation typing controls inheritance gates, and cross-domain links become explicit registers. The physical layout thus embodies the algebra; changing wiring changes reasoning semantics. We detail an ICD-11 respiratory disease classification chip (1,247 entities, ~136k 1T1R junctions) enabling domain scoping, three-valued logic, transitive cascade, typed inheritance, and cross-axis queries. Behavioral simulation (sigma_log=0.15, SNR=20dB) shows error-free operation across 100,000 trials per task with wide tolerance margins. Where prior work unified representation and computation in software, this work unifies them in hardware: reading one junction answers one question, without symbolic interpretation.
[33] arXiv:2604.20893 [pdf, other]: Title: Design, Modelling and Experimental Evaluation of a Tendon-driven Wrist Abduction-Adduction Mechanism for an upper limb exoskeleton

Juwairiya S. Khan, Mostafa Mohammadi, John Rasmussen, Lotte N.S. Andreasen Struijk

Comments: 8 pages and 8 figures. Submitted to IEEE/ASME Transactions on Mechatronics. Includes experimental validation on human participants

Subjects: Robotics (cs.RO)

Wrist exoskeletons play a vital role in rehabilitation and assistive applications, yet conventional actuation mechanisms such as electric motors or pneumatics often introduce undesirable weight, friction, and complexity. This paper presents a novel single-cable (tendon), torsional-spring-assisted actuation mechanism for wrist abduction-adduction, and a simulation-based method for selecting its stiffness parameters. The mechanism employs a single Bowden cable passively tensioned by a spiral torsional spring (clock spring) to maintain continuous cable tension without antagonistic actuation. Kinematic and dynamic modeling of the mechanism was performed to estimate the required torque and identify optimal spring parameters. These simulation-derived parameters guided the design of a functional prototype, which was experimentally evaluated with five participants with no motor disabilities (NMD) under varying arm positions and loading conditions using three spring configurations to account for user variability and modeling uncertainties. Experimental results show consistent agreement with simulation-derived trends, with the nominal spring configuration achieving balanced motion range, torque demand, and repeatability. The results demonstrate that simulation-informed stiffness selection can effectively guide the design of compact, cable-driven wrist exoskeletons while reducing reliance on empirical tuning.
[34] arXiv:2604.20895 [pdf, html, other]: Title: Towards a Systematic Risk Assessment of Deep Neural Network Limitations in Autonomous Driving Perception

Svetlana Pavlitska, Christopher Gerking, J. Marius Zöllner

Comments: Accepted for publication at the SECAI workshop at ESORICS 2025

Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (cs.LG)

Safety and security are essential for the admission and acceptance of automated and autonomous vehicles. Deep neural networks (DNNs) are widely used for perception and further components of the autonomous driving (AD) stack. However, they possess several limitations, including lack of generalization, efficiency, explainability, plausibility, and robustness. These insufficiencies can pose significant risks to autonomous driving systems. However, hazards, threats, and risks associated with DNN limitations in this domain have not been systematically studied so far. In this work, we propose a joint workflow for risk assessment combining the hazard analysis and risk assessment (HARA) following ISO 26262 and threat analysis and risk assessment (TARA) following the ISO/SAE 21434 to identify and analyze risks arising from inherent DNN limitations in AD perception.
[35] arXiv:2604.20897 [pdf, html, other]: Title: Watts-per-Intelligence Part II: Algorithmic Catalysis

Elija Perrier

Comments: Under review

Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)

We develop a thermodynamic theory of algorithmic catalysis within the watts-per-intelligence framework, identifying reusable computational structures that reduce irreversible operations for a task class while satisfying bounded restoration and structural selectivity constraints. We prove that any class-specific speed-up is upper-bounded by the algorithmic mutual information between the substrate and the class descriptor, and that installing this information incurs a minimum thermodynamic cost via Landauer erasure. Combining these results yields a coupling theorem that lower-bounds the deployment horizon required for a catalyst to be energetically favourable. The framework is illustrated on an affine SAT class and situates contemporary learned systems within a unified information-thermodynamic constraint on intelligent computation.
[36] arXiv:2604.20898 [pdf, other]: Title: A Tendon-Driven Wrist Abduction-Adduction Joint Improves Performance of a 5 DoF Upper Limb Exoskeleton -- Implementation and Experimental Evaluation

Juwairiya S. Khan, Mostafa Mohammadi, Alexander L. Ammitzbøll, Ellen-Merete Hagen, Jakob Blicher, Izabella Obál, Ana S. S. Cardoso, Oguzhan Kirtas, Rasmus L. Kæseler, John Rasmussen, Lotte N.S. Andreasen Struijk

Comments: 9 pages, 5 figures and 1 table. Submitted to IEEE Transactions on Biomedical Engineering as invited IEEE EMBC special issue paper. Under review after first revision

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Wrist function is essential in performing activities of daily living (ADLs). However, there is limited experimental evidence on the functional impact of wrist Abduction-Adduction (Ab-Ad) joint assistance in upper limb exoskeletons (ULEs) for rehabilitation. This study evaluates the effect of implementing an active wrist Ab-Ad joint in a five degree of freedom (DoF) ULE, EXOTIC2 exoskeleton, to support individuals with severe motor impairments. Methods: A compact, lightweight wrist module with tendon-driven abduction and spring-driven adduction was integrated into the EXOTIC exoskeleton. Eight adults with no motor disabilities completed drinking and scratching tasks under randomized wrist-enabled and wrist-locked conditions along with a preliminary feasibility test in one individual with Amyotrophic lateral sclerosis (ALS). Kinematic and task performance metrics including wrist range of motion, task completion time, spillage and leveling metrics were assessed. Results: Implementing the wrist Ab-Ad DoF improved task success metrics. Spill incidence during the drinking task decreased from 56% to 3%, and leveling success for scratching task improved from 28% to 75%. Conclusion: Integrating wrist Ab-Ad assistance improved key functional task outcomes without increasing execution time. Significance: The study provides the experimental evidence that active wrist Ab-Ad control enhances task-level performance in exoskeleton-assisted ADLs.
[37] arXiv:2604.20902 [pdf, html, other]: Title: Frequency-Forcing: From Scaling-as-Time to Soft Frequency Guidance

Weitao Du

Comments: ongoing project

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

While standard flow-matching models transport noise to data uniformly, incorporating an explicit generation order - specifically, establishing coarse, low-frequency structure before fine detail - has proven highly effective for synthesizing natural images. Two recent works offer distinct paradigms for this. K-Flow imposes a hard frequency constraint by reinterpreting a frequency scaling variable as flow time, running the trajectory inside a transformed amplitude space. Latent Forcing provides a soft ordering mechanism by coupling the pixel flow with an auxiliary semantic latent flow via asynchronous time schedules, leaving the pixel interpolation path itself untouched. Viewed from the angle of improving pixel generation, we observe that forcing - guiding generation with an earlier-maturing auxiliary stream - offers a highly compatible route to scale-ordered generation without rewriting the core flow coordinate. Building on this, we propose Frequency-Forcing, which realizes K-Flow's frequency ordering through Latent Forcing's soft mechanism: a standard pixel flow is guided by an auxiliary low-frequency stream that matures earlier in time. Unlike Latent Forcing, whose scratchpad relies on a heavy pretrained encoder (e.g., DINO), our frequency scratchpad is derived from the data itself via a lightweight learnable wavelet packet transform. We term this a self-forcing signal, which avoids external dependencies while learning a basis better adapted to data statistics than the fixed bases used in hard frequency flows. On ImageNet-256, Frequency-Forcing consistently improves FID over strong pixel- and latent-space baselines, and naturally composes with a semantic stream to yield further gains. This illustrates that forcing-based scale ordering is a versatile, path-preserving alternative to hard frequency flows.
[38] arXiv:2604.20903 [pdf, html, other]: Title: Sensitivity Uncertainty Alignment in Large Language Models

Prakul Sunil Hiremath, Harshit R. Hiremath

Comments: 24 pages, 4 tables, 2 figures

Subjects: Cryptography and Security (cs.CR)

We propose Sensitivity-Uncertainty Alignment (SUA), a framework for analyzing failures of large language models under adversarial and ambiguous inputs. We argue that adversarial sensitivity and ambiguity reflect a common issue: misalignment between prediction instability and model uncertainty. A reliable model should express higher uncertainty when its predictions are unstable; failure to do so leads to miscalibration.
We define a scalar score, SUA_theta(x), capturing the difference between distributional sensitivity and predictive entropy. We show that minimizing its positive part bounds worst-case perturbed risk and relates to calibration error. We also formalize ambiguity collapse, where models produce overconfident outputs despite multiple valid interpretations.
We introduce SUA-TR, a training method combining consistency regularization and entropy alignment, along with an abstention rule for safer inference. Across tasks including question answering and classification, SUA better identifies model failures than entropy or self-consistency alone.
The framework is model-agnostic and provides a basis for improving reliability in evolving language models.
[39] arXiv:2604.20904 [pdf, html, other]: Title: Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

Matt Franchi, Madiha Zahrah Choksi, Harold Triedman, Helen Nissenbaum

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Information handling practices of LLM agents are broadly misaligned with the contextual privacy expectations of their users. Contextual Integrity (CI) provides a principled framework, defining privacy as the appropriate flow of information within context-relative norms. However, existing approaches either double inference cost via supervisor-assistant architectures, or fine-tune on narrow task-specific data. We propose extracting normative simulacra (structured representations of norms and information flows) from fiction novels and using them to fine-tune LLMs via supervised learning followed by GRPO reinforcement learning. Our composite reward function combines programmatic signals, including task clarity (subsuming schema validity, construct discrimination, and extraction confidence), structural completeness, internal consistency, and context identification, with an LLM judge that evaluates whether the model's privacy reasoning is grounded in the held-out normative universe of the source text. To mitigate overfitting, we introduce per-completion contrastive scoring: each completion is evaluated against both the correct normative universe and a randomly selected wrong one, teaching the model to condition on context rather than memorize source-specific norms. We evaluate on five CI-aligned benchmarks spanning distinct societal contexts and ablate the contributions of RL and normative grounding. Across seven models, SFT introduces a conservative prior toward restricting information flow, improving recognition of privacy-relevant situations but not the correctness of privacy judgments. GRPO with normative grounding achieves the highest score on a law compliance benchmark and strongest correlation with crowdsourced human privacy expectations, demonstrating that fiction-derived normative simulacra can teach contextual privacy reasoning that transfers to real-world domains.
[40] arXiv:2604.20906 [pdf, other]: Title: Biomedical systems biology workflow orchestration and execution with PoSyMed

Simon Süwer, Zoe Chervontseva, Kester Bagemihl, Jan Baumbach, Olga Tsoy, Andreas Maier

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

The rapid growth of scientific software has created practical barriers for bioinformatics research. Although powerful statistical, artificial intelligence (AI)-based methods are now widely available, their effective use is often hindered by fragmented distribution, inconsistent documentation, complex dependencies, and difficult-to-reproduce execution environments. As a result, reusing published tools and workflow adaptation to own date remains technically demanding and time-intensive, even for experienced users. Here, we present PoSyMed, an open and modular platform for the controlled integration, composition, and execution of bioinformatics tools and workflows. PoSyMed combines a backend-centered platform architecture with formal tool descriptions, controlled container-based build and execution processes, persistent workflow state, and a dialogue-based user interface. Large language models (LLM) are integrated not as autonomous decision-makers, but as human-computer interface with bounded semantic assistants that help identify tools, propose workflow steps, and support parameterization within a typed, validated, and human-supervised execution environment. PoSyMed is designed to improve reproducibility, traceability, and transparency in practical biomedical analysis within one platform. We describe the system architecture and evaluate its behavior across representative biological software scenarios with respect to workflow support, interaction design, and platform extensibility. PoSyMed is publicly available at this https URL.
[41] arXiv:2604.20909 [pdf, other]: Title: Do Masked Autoencoders Improve Downhole Prediction? An Empirical Study on Real Well Drilling Data

Aleksander Berezowski, Hassan Hassanzadeh, Gouri Ginde

Subjects: Machine Learning (cs.LG)

Downhole drilling telemetry presents a fundamental labeling asymmetry: surface sensor data are generated continuously at 1~Hz, while labeled downhole measurements are costly, intermittent, and scarce. Current machine learning approaches for downhole metric prediction universally adopt fully supervised training from scratch, which is poorly suited to this data regime. We present the first empirical evaluation of masked autoencoder (MAE) pretraining for downhole drilling metric prediction. Using two publicly available Utah FORGE geothermal wells comprising approximately 3.5 million timesteps of multivariate drilling telemetry, we conduct a systematic full-factorial design space search across 72 MAE configurations and compare them against supervised LSTM and GRU baselines on the task of predicting Total Mud Volume. Results show that the best MAE configuration reduces test mean absolute error by 19.8\% relative to the supervised GRU baseline, while trailing the supervised LSTM baseline by 6.4\%. Analysis of design dimensions reveals that latent space width is the dominant architectural choice (Pearson $r = -0.59$ with test MAE), while masking ratio has negligible effect, an unexpected finding attributed to high temporal redundancy in 1~Hz drilling data. These results establish MAE pretraining as a viable paradigm for drilling analytics and identify the conditions under which it is most beneficial.
[42] arXiv:2604.20911 [pdf, html, other]: Title: Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents

Yeran Gamage

Comments: 19 pages, 5 figures. Includes evaluation framework for replication and 4,416-trial dataset

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

LLM agents deployed in production operate under operator-defined behavioral policies (system-prompt instructions such as prohibitions on credential disclosure, data exfiltration, and unauthorized output) that safety evaluations assume hold throughout a conversation. Prohibition-type constraints decay under context pressure while requirement-type constraints persist; we term this asymmetry Security-Recall Divergence (SRD). In a 4,416-trial three-arm causal study across 12 models and 8 providers at six conversation depths, omission compliance falls from 73% at turn 5 to 33% at turn 16 while commission compliance holds at 100% (Mistral Large 3, $p < 10^{-33}$). In the two models with token-matched padding controls, schema semantic content accounts for 62-100% of the dilution effect. Re-injecting constraints before the per-model Safe Turn Depth (STD) restores compliance without retraining. Production security policies consist of prohibitions such as never revealing credentials, never executing untrusted code, and never forwarding user data. Commission-type audit signals remain healthy while omission constraints have already failed, leaving the failure invisible to standard monitoring.
[43] arXiv:2604.20913 [pdf, html, other]: Title: FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels

Fei Zuo, Xiaoyan Xi, Quanyi Zeng, Feiyu Wang, Ho Fai Leung

Comments: 16 pages, 10 figures, 4 tables

Subjects: Machine Learning (cs.LG)

Large language models are increasingly deployed on CPU-only platforms where memory bandwidth is the primary bottleneck for autoregressive generation. Weight quantization to four bits or below reduces memory pressure, yet existing systems still dequantize weights and perform floating-point multiplications, limiting the achievable gains. Ternary weights in {-1, 0, +1} provide a more efficient alternative, replacing multiplications with conditional additions, subtractions, or no-ops. While Fairy2i shows that ternary LLMs can match FP16 quality, its runtime does not exploit this structure. We present FairyFuse, an inference system that enables multiplication-free execution on commodity CPUs by fusing the eight real-valued sub-GEMVs of each widely-linear layer into a single AVX-512 loop using masked additions and subtractions, with zero floating-point multiplications. Roofline analysis shows that 16x weight compression shifts memory-bound GEMV toward the compute regime on bandwidth-limited CPUs, yielding a 29.6x kernel speedup while offering little benefit on GPUs. End-to-end, FairyFuse achieves 32.4 tokens per second on a single Intel Xeon 8558P, outperforming this http URL Q4_K_M by 1.24x with near-lossless quality (WikiText-2 perplexity 5.52 vs. 5.47 FP16; downstream accuracy 66.0%).
[44] arXiv:2604.20915 [pdf, html, other]: Title: Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

Zhixin Zhang, Shabo Zhang, Chengcan Wu, Zeming Wei, Meng Sun

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE); Optimization and Control (math.OC)

Transformers suffer from a high computational cost that grows with sequence length for self-attention, making inference in long streams prohibited by memory consumption. Constant-memory alternatives such as RNNs and SSMs compress history into states with fixed size and thus lose long-tail dependencies, while methods that memorize contexts into parameters, such as Test-Time Training (TTT), are prone to overfitting token-level projection and fail to preserve the causal effect of context in pretrained LLMs. We propose Absorber LLM, which formulates long-context retention as a self-supervised causal synchronization: after absorbing historical contexts into parameters, a contextless model should match the original model with full context on future generations. We optimize this objective by synchronizing internal behaviors of the updated model with the original one, ensuring context absorption and generalization. Experiments on long-context and streaming benchmarks show that Absorber LLM reduces inference memory and improves accuracy over prior parameter-as-memory baselines.
[45] arXiv:2604.20916 [pdf, html, other]: Title: AnalogMaster: Large Language Model-based Automated Analog IC Design Framework from Image to Layout

Xian Rong Qin, Yong Zhang, Ying Hu, Tao Su, Bo-Wen Jia, Ning Xu

Subjects: Hardware Architecture (cs.AR)

Design automation has the potential to substantially improve the efficiency of analog integrated circuit (IC) design. However, existing algorithms and tools typically focus on individual stages, such as device sizing, placement, or routing, and still require significant manual intervention to complete the full design flow. While large language models (LLMs) have recently demonstrated remarkable success in automating digital IC design workflows, these advances cannot be directly transferred to analog IC design. Key challenges include strongly coupled performance metrics, the predominance of unstructured circuit schematic images, and the fact that most prior approaches address only isolated stages of the analog design process, limiting their ability to capture end-to-end performance impact. To address these challenges, we propose AnalogMaster, an extensible, LLM-based framework that enables end-to-end automation of analog IC design through a unified pipeline spanning circuit image-to-netlist generation, parameter optimization, placement, and routing. AnalogMaster integrates a joint reasoning mechanism that leverages in-context learning and intent reasoning to achieve accurate and robust image-to-netlist conversion. A parameter search agent integrating self-enhanced prompt engineering and context truncation is developed for effective device sizing and downstream physical design. Experimental evaluations on 15 representative circuits with varying levels of complexity demonstrate strong and consistent performance across multiple models. In particular, GPT-5 achieves success rates of 92.9% and 99.9% on Pass@1 and Pass@5, respectively. These results validate the effectiveness and robustness of the proposed framework and establish a practical paradigm for applying LLMs to full-stack analog IC design automation.
[46] arXiv:2604.20917 [pdf, html, other]: Title: The Path Not Taken: Duality in Reasoning about Program Execution

Eshgin Hasanov, Md Mahadi Hassan Sibat, Santu Karmaker, Aashish Yadavally

Comments: Accepted to ACL 2026 Main Conference

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL); Software Engineering (cs.SE)

Large language models (LLMs) have shown remarkable capabilities across diverse coding tasks. However, their adoption requires a true understanding of program execution rather than relying on surface-level patterns. Existing benchmarks primarily focus on predicting program properties tied to specific inputs (e.g., code coverage, program outputs). As a result, they provide a narrow view of dynamic code reasoning and are prone to data contamination. We argue that understanding program execution requires evaluating its inherent duality through two complementary reasoning tasks: (i) predicting a program's observed behavior for a given input, and (ii) inferring how the input must be mutated toward a specific behavioral objective. Both tasks jointly probe a model's causal understanding of execution flow. We instantiate this duality in DexBench, a benchmark comprising 445 paired instances, and evaluate 13 LLMs. Our results demonstrate that dual-path reasoning provides a robust and discriminative proxy for dynamic code understanding.
[47] arXiv:2604.20919 [pdf, html, other]: Title: DiP-SD: Distributed Pipelined Speculative Decoding for Efficient LLM Inference at the Edge

Yaodan Xu, Sheng Zhou, Zhisheng Niu

Comments: Accepted by 2026 IEEE 103rd Vehicular Technology Conference (VTC2026-Spring)

Subjects: Information Theory (cs.IT)

Speculative decoding has emerged as a promising technique for large language model (LLM) inference by accelerating autoregressive decoding via draft-then-verify. This paper studies a new edge scenario with multi-user inference, where draft tokens are generated locally on devices and subsequently offloaded to a centralized edge server for batch verification. The key challenge is to sustain high throughput under coupled decisions of (i) batching and pipeline scheduling and (ii) per user draft token length. We propose DiP-SD, which exploits two complementary parallelism dimensions: device-level distributed drafting and phase-level draft-verify pipelining. We formulate a throughput-maximization objective, defined as the expected number of accepted tokens per unit time, and jointly optimize the number of batches, user-to-batch assignment, and integer draft lengths. To solve the resulting fractional mixed-integer program, DiP-SD scans the batch number and iteratively alternates between an association subproblem and a draft-length subproblem. Numerical results under a Qwen3-1.7B/Qwen3-32B device-edge deployment show that DiP-SD achieves up to 17.89x throughput over autoregressive decoding (AD) and 1.93x over AD with greedy batching.
[48] arXiv:2604.20920 [pdf, html, other]: Title: Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention

Yuzhen Mao, Michael Y. Li, Emily B. Fox

Subjects: Machine Learning (cs.LG)

Scaling large language models to long contexts is challenging due to the quadratic computational cost of full attention. Mitigation approaches include KV-cache selection or compression techniques. We instead provide an effective and end-to-end learnable bridge between the two without requiring architecture modification. In particular, our key insight is that interleaved gist compression tokens -- which provide a learnable summary of sets of raw tokens -- can serve as routing signals for sparse attention. Building on this, we introduce selective unfolding via GSA, which first compresses the context into gist tokens, then selects the most relevant gists, and subsequently restores the corresponding raw chunks for detailed attention. This yields a simple coarse-to-fine mechanism that combines compact global representations with targeted access to fine-grained evidence. We further incorporate this process directly into training in an end-to-end fashion, avoiding the need for external retrieval modules. In addition, we extend the framework hierarchically via recursive gist-of-gist construction, enabling multi-resolution context access with logarithmic per-step decoding complexity. Empirical results on LongBench and RAG benchmarks demonstrate that our method consistently outperforms other compression baselines as well as inference-time sparse attention methods across compression ratios from $8\times$ to $32\times$. The code is available at: this https URL
[49] arXiv:2604.20921 [pdf, other]: Title: Validating a Deep Learning Algorithm to Identify Patients with Glaucoma using Systemic Electronic Health Records

John Xiang, Rohith Ravindranath, Sophia Y. Wang

Comments: submitted to AMIA Annual Symposium 2026

Subjects: Machine Learning (cs.LG)

We evaluated whether a glaucoma risk assessment (GRA) model trained on All of Us national data can identify patients at high probability of glaucoma using only systemic electronic health records (EHR) at an independent institution. In this cross-sectional study, 20,636 Stanford patients seen from November 2013 to January 2024 were included (15% with glaucoma). A pretrained GRA model was fine-tuned on the Stanford cohort and tested on a held-out set using demographics, systemic diagnoses, medications, laboratory results, and physical examination measurements as inputs. The best model achieved AUROC 0.883 and PPV 0.657. Calibration was consistent with clinical risk: the highest prediction decile showed the greatest glaucoma diagnosis rate (65.7%) and treatment rate (57.0%). Performance improved with more trainable layers up to 15 and with additional data. An EHR-only GRA model may enable scalable and accessible pre-screening without specialized imaging.
[50] arXiv:2604.20923 [pdf, other]: Title: ILDR: Geometric Early Detection of Grokking

Shreel Golwala

Subjects: Machine Learning (cs.LG)

Grokking describes a delayed generalization phenomenon in which a neural network achieves perfect training accuracy long before validation accuracy improves, followed by an abrupt transition to strong generalization. Existing detection signals are indirect: weight norm reflects parameter-space regularization and consistently lags the transition, while GrokFast's slow gradient EMA, used without gradient amplification, is unstable across seeds with standard deviation exceeding mean lead time. We propose the Inter/Intra-class Distance Ratio (ILDR), a geometric metric computed on second-to-last layer representations as the ratio of inter-class centroid separation to intra-class scatter. ILDR provides an early detection signal: it rises and crosses a threshold at 2.5 times its baseline before the grokking transition appears in validation accuracy, indicating early geometric reorganization in representation space. Grounded in Fisher's linear discriminant criterion, ILDR requires no eigendecomposition and runs in O(|C|^2 + N). It is evaluated exclusively on held-out data, making it robust to memorization effects. Across modular arithmetic and permutation group composition (S5), ILDR leads the grokking transition by 9 to 73 percent of the training budget, with lead time increasing with task algebraic complexity. Over eight random seeds, ILDR leads by 950 +/- 250 steps with a coefficient of variation of 26 percent, and post-grokking variance drops by 1696 times, consistent with a sharp phase transition in representation space. Using ILDR as an early stopping trigger reduces training by 18.6 percent on average. Optimizer interventions triggered at the ILDR threshold demonstrate bidirectional control over the transition, suggesting ILDR tracks representational conditions underlying generalization rather than a downstream correlate.
[51] arXiv:2604.20924 [pdf, html, other]: Title: Clinically Interpretable Sepsis Early Warning via LLM-Guided Simulation of Temporal Physiological Dynamics

Weizhi Nie, Zhen Qu, Weijie Wang, Chunpei Li, Ke Lu, Bingyang Zhou, Hongzhi Yu

Subjects: Machine Learning (cs.LG)

Timely and interpretable early warning of sepsis remains a major clinical challenge due to the complex temporal dynamics of physiological deterioration. Traditional data-driven models often provide accurate yet opaque predictions, limiting physicians' confidence and clinical applicability. To address this limitation, we propose a Large Language Model (LLM)-guided temporal simulation framework that explicitly models physiological trajectories prior to disease onset for clinically interpretable prediction. The framework consists of a spatiotemporal feature extraction module that captures dynamic dependencies among multivariate vital signs, a Medical Prompt-as-Prefix module that embeds clinical reasoning cues into LLMs, and an agent-based post-processing component that constrains predictions within physiologically plausible ranges. By first simulating the evolution of key physiological indicators and then classifying sepsis onset, our model offers transparent prediction mechanisms that align with clinical judgment. Evaluated on the MIMIC-IV and eICU databases, the proposed method achieves superior AUC scores (0.861-0.903) across 24-4-hour pre-onset prediction tasks, outperforming conventional deep learning and rule-based approaches. More importantly, it provides interpretable trajectories and risk trends that can assist clinicians in early intervention and personalized decision-making in intensive care environments.
[52] arXiv:2604.20925 [pdf, html, other]: Title: Unsupervised Learning of Inter-Object Relationships via Group Homomorphism

Kyotaro Ushida, Takayuki Komatsu, Yoshiyuki Ohmura, Yasuo Kuniyoshi

Comments: Preprint. Under review at ICDL 2026

Subjects: Machine Learning (cs.LG)

While current deep learning models achieve high performance by learning statistical correlations from vast datasets,which stands in stark contrast to human learning. They lack the flexibility of humans-particularly preverbal infants-to autonomously acquire the underlying structure of the world from limited experience and adapt to novel situations. In this study, we propose an unsupervised representation learning method based on a hierarchical relationship in group operations, rather than statistical independence, aiming to build a computational model of the cognitive development of infants. The proposed model features an integrated architecture that simultaneously performs object segmentation and the extraction of motion laws from dynamic image sequences. By introducing the Homomorphism from algebra as a structural constraint within a neural network, the model structurally separates pixel-level changes into meaningful, decomposed transformation components, such as translation and deformation. Using interaction scenes (chasing and evading tasks) based on developmental science findings, we experimentally demonstrate that the model can segment multiple objects into individual slots without any ground-truth labels. Furthermore, we confirmed that relative movements between objects, such as approaching or receding, are accurately mapped and structured into a one-dimensional additive latent space. These results suggest that by introducing algebraic geometric constraints rather than relying solely on statistical correlation learning, physically interpretable "disentangled representations" can be acquired. This study contributes to the understanding of the process by which infants internalize environmental laws as structures and provides a new perspective for constructing artificial systems with developmental intelligence.
[53] arXiv:2604.20926 [pdf, html, other]: Title: Learning Reasoning World Models for Parallel Code

Gautam Singh, Arjun Guha, Bhavya Kailkhura, Harshitha Menon

Subjects: Software Engineering (cs.SE)

Large language models have shown remarkable ability in serial code generation, but they still struggle with parallel code for which training data is comparatively scarce. A common remedy is to use coding agents that interact with external tools, but tool calls can be costly and sometimes impractical, e.g., for partially written code. We propose Parallel-Code World Models (PCWMs), reasoning LLMs that aim to predict tool outcomes directly from parallel source code. To train PCWMs, we design a novel exploration and data generation pipeline that samples diverse parallel-coding problems and candidate implementations across multiple domains, then executes them via tools to record data races and performance profiles. From these, we synthesize hindsight reasoning traces that causally connect source code to observed tool outcomes. Fine-tuning on the resulting data yields noticeable gains, with a 7B-parameter world model improving from 64.3% to 72.8% accuracy in race-outcome prediction, while an 8B-parameter model improves in a performance profiling task from 49.3% to 58.6% accuracy. Furthermore, when open-weight models were tasked with fixing data races, world-model feedback improved their race-fixing rates relative to self-feedback by 2.7%-9.1% using our 7B-parameter world model and by 6.1%-11.1% using our 14B-parameter world model. Our results suggest that reasoning models have the potential to serve as practical substitutes for external tool calls in parallel-coding agents.
[54] arXiv:2604.20927 [pdf, other]: Title: Hidden Secrets in the arXiv: Discovering, Analyzing, and Preventing Unintentional Information Disclosure in Source Files of Scientific Preprints

Jan Pennekamp, Johannes Lohmöller, David Schütte, Joscha Loos, Martin Henze

Comments: 20 pages, accepted at 47th IEEE Symposium on Security and Privacy (SP '26)

Subjects: Cryptography and Security (cs.CR)

Preprints are essential for the timely and open dissemination of research. arXiv, the most widely used preprint service, takes the idea of open science one step further by not only publishing the actual preprints but also LaTeX sources and other files used to create them. As known from other contexts, such as GitHub repositories, and anecdotally exemplified for arXiv, making source code publicly available risks disclosing otherwise "hidden" information. Consequently, the public availability of paper sources raises the question of how much sensitive content is (unintentionally) disclosed through them.
In this paper, we systematically answer this question for all 2.7M arXiv submissions with available source files across three dimensions of source file-induced information disclosure: (1) inclusion of unnecessary files, (2) metadata embedded in files, and (3) irrelevant content in files such as source code comments. Our analysis reveals that nearly every arXiv submission contains some form of "hidden" information. Notable findings range from links to editable web documents for internal coordination over API and private keys to complete Git histories.
While different tools promise to remove such information from source files, we show that they fail to reliably achieve the intended cleaning functionality. To mitigate this situation, we provide ALC-NG to comprehensively remove files, metadata, and comments that are not needed to compile a LaTeX paper.
[55] arXiv:2604.20928 [pdf, html, other]: Title: Domain-Aware Hierarchical Contrastive Learning for Semi-Supervised Generalization Fault Diagnosis

Junyu Ren, Wensheng Gan, Philip S Yu

Comments: Preprint

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Fault diagnosis under unseen operating conditions remains highly challenging when labeled data are scarce. Semi-supervised domain generalization fault diagnosis (SSDGFD) provides a practical solution by jointly exploiting labeled and unlabeled source domains. However, existing methods still suffer from two coupled limitations. First, pseudo-labels for unlabeled domains are typically generated primarily from knowledge learned on the labeled source domain, which neglects domain-specific geometric discrepancies and thus induces systematic cross-domain pseudo-label bias. Second, unlabeled samples are commonly handled with a hard accept-or-discard strategy, where rigid thresholding causes imbalanced sample utilization across domains, while hard-label assignment for uncertain samples can easily introduce additional noise. To address these issues, we propose a unified framework termed domain-aware hierarchical contrastive learning (DAHCL) for SSDGFD. Specifically, DAHCL introduces a domain-aware learning (DAL) module to explicitly capture source-domain geometric characteristics and calibrate pseudo-label predictions across heterogeneous source domains, thereby mitigating cross-domain bias in pseudo-label generation. In addition, DAHCL develops a hierarchical contrastive learning (HCL) module that combines dynamic confidence stratification with fuzzy contrastive supervision, enabling uncertain samples to contribute to representation learning without relying on unreliable hard labels. In this way, DAHCL jointly improves the quality of supervision and the utilization of unlabeled samples. Furthermore, to better reflect practical industrial scenarios, we incorporate engineering noise into the SSDGFD evaluation protocol. Extensive experiments on three benchmark datasets demonstrate that...
[56] arXiv:2604.20930 [pdf, html, other]: Title: SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs

Chao Pan, Yu Wu, Xin Yao

Comments: 13 pages, 4 figures, 3 tables. Code: this https URL

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion structurally requires harmful content, spontaneously generate that content with safety failure rates exceeding 95%. Existing input-level defenses achieve a 100% failure rate against ISC, and standard system prompt defenses provide only partial mitigation. We propose SafeRedirect, a system-level override that defeats ISC by redirecting the model's task-completion drive rather than suppressing it. SafeRedirect grants explicit permission to fail the task, prescribes a deterministic hard-stop output, and instructs the model to preserve harmful placeholders unresolved. Evaluated on seven frontier LLMs across three AI/ML-related ISC task types in the single-turn setting, SafeRedirect reduces average unsafe generation rates from 71.2% to 8.0%, compared to 55.0% for the strongest viable baseline. Multi-model ablation reveals that failure permission and condition specificity are universally critical, while the importance of other components varies across models. Cross-attack evaluation confirms state-of-the-art defense against ISC with generalization performance at least on par with the baseline on other attack families. Code is available at this https URL.
[57] arXiv:2604.20932 [pdf, html, other]: Title: Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

Pranav Pallerla, Wilson Naik Bhukya, Bharath Vemula, Charan Ramtej Kodi

Comments: 21 pages, 2 figures, 9 tables. Manuscript prepared for submission to ACM CCS

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Retrieval-augmented generation (RAG) systems are increasingly deployed in sensitive domains such as healthcare and law, where they rely on private, domain-specific knowledge. This capability introduces significant security risks, including membership inference, data poisoning, and unintended content leakage. A straightforward mitigation is to enable all relevant defenses simultaneously, but doing so incurs a substantial utility cost. In our experiments, an always-on defense stack reduces contextual recall by more than 40%, indicating that retrieval degradation is the primary failure mode. To mitigate this trade-off in RAG systems, we propose the Sentinel-Strategist architecture, a context-aware framework for risk analysis and defense selection. A Sentinel detects anomalous retrieval behavior, after which a Strategist selectively deploys only the defenses warranted by the query context. Evaluated across three benchmark datasets and five orchestration models, ADO is shown to eliminate MBA-style membership inference leakage while substantially recovering retrieval utility relative to a fully static defense stack, approaching undefended baseline levels. Under data poisoning, the strongest ADO variants reduce attack success to near zero while restoring contextual recall to more than 75% of the undefended baseline, although robustness remains sensitive to model choice. Overall, these findings show that adaptive, query-aware defense can substantially reduce the security-utility trade-off in RAG systems.
[58] arXiv:2604.20933 [pdf, html, other]: Title: IRIS: Interpolative Rényi Iterative Self-play for Large Language Model Fine-Tuning

Wenjie Liao, Like Wu, Liangjie Zhao, Shihui Xu, Shigeru Fujimura

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Self-play fine-tuning enables large language models to improve beyond supervised fine-tuning without additional human annotations by contrasting annotated responses with self-generated ones. Many existing methods rely on a fixed divergence regime. SPIN is closely related to a KL-based regime, SPACE to a Jensen-Shannon-style objective via noise contrastive estimation, and SPIF to $\chi^2$-regularized self-play. Since these divergences exhibit different strengths depending on the distributional gap between model and target, no single choice appears to provide favorable learning dynamics across training stages. We propose IRIS (Interpolative Rényi Iterative Self-play), a Rényi-based self-play fine-tuning framework with a continuously adjustable objective. IRIS decomposes into two independent tilted risk terms over annotated and synthetic data, with exponential importance weights controlled by the order parameter $\alpha$. We show that several self-play objectives can be interpreted as limiting or representative regimes at particular values of $\alpha$, providing a unified theoretical perspective on these methods. An adaptive order schedule further adjusts $\alpha$ to the distributional gap, shifting from sharper importance weighting early in training to smoother refinement near convergence. Theoretically, we establish the fixed-point property of IRIS and analyze how $\alpha$ controls gradient concentration. Experiments on Zephyr-7B and Qwen2.5-3B across ten benchmarks show that IRIS improves upon baselines, reaching 44.57\% average score with gains across iterations. In our setting, IRIS with only 26$k$ annotated samples surpasses standard supervised fine-tuning trained on the full 200$k$ dataset.
[59] arXiv:2604.20934 [pdf, other]: Title: SDNGuardStack: An Explainable Ensemble Learning Framework for High-Accuracy Intrusion Detection in Software-Defined Networks

Ashikuzzaman, Md. Saifuzzaman Abhi, Mahabubur Rahman, Md. Manjur Ahmed, Md. Mehedi Hasan, Md. Ahsan Arif

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Software-Defined Networking (SDN) is another technology that has been developing in the last few years as a relevant technique to improve network programmability and administration. Nonetheless, its centralized design presents a major security issue, which requires effective intrusion detection systems. The SDN-specific machine learning-based intrusion detection system described in this paper is innovative because it is trained and tested on the InSDN dataset which models attack scenarios and realistic traffic patterns in SDN. Our approach incorporates a comprehensive preprocessing pipeline, feature selection via Mutual Information, and a novel ensemble learning model, SDNGuardStack, which combines multiple base learners to enhance detection accuracy and efficiency. In addition, we include explainable AI methods, including SHAP to add transparency to model predictions, which helps security analysts respond to incidents. The experiments prove that SDNGuard-Stack has an accuracy rate of 99.98% and a Cohen Kappa of 0.9998, surpassing other models, and at the same time being interpretable and practically executable. It is interesting to see such features like Flow ID, Bwd Header Len, and Src Port as the most important factors in the model predictions. The work is a step towards closing the gap between performance intrusion detection and realistic deployment in SDN, which will lead to the creation of secure and resilient network infrastructures.
[60] arXiv:2604.20935 [pdf, html, other]: Title: Data-Driven Open-Loop Simulation for Digital-Twin Operator Decision Support in Wastewater Treatment

Gary Simethy, Daniel Ortiz Arroyo, Petar Durdevic

Comments: 18 pages, 10 figures, 9 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Wastewater treatment plants (WWTPs) need digital-twin-style decision support tools that can simulate plant response under prescribed control plans, tolerate irregular and missing sensing, and remain informative over 12-36 h planning horizons. Meeting these requirements with full-scale plant data remains an open engineering-AI challenge. We present CCSS-RS, a controlled continuous-time state-space model that separates historical state inference from future control and exogenous rollout. The model combines typed context encoding, gain-weighted forcing of prescribed and forecast drivers, semigroup-consistent rollouts, and Student-t plus hurdle outputs for heavy-tailed and zero-inflated WWTP sensor data. On the public Avedøre full-scale benchmark, with 906,815 timesteps, 43% missingness, and 1-20 min irregular sampling, CCSS-RS achieves RMSE 0.696 and CRPS 0.349 at H=1000 across 10,000 test windows. This reduces RMSE by 40-46% relative to Neural CDE baselines and by 31-35% relative to simplified internal variants. Four case studies using a frozen checkpoint on test data demonstrate operational value: oxygen-setpoint perturbations shift predicted ammonium by -2.3 to +1.4 over horizons 300-1000; a smoothed setpoint plan ranks first in multi-criterion screening; context-only sensor outages raise monitored-variable RMSE by at most 10%; and ammonium, nitrate, and oxygen remain more accurate than persistence throughout the rollout. These results establish CCSS-RS as a practical learned simulator for offline scenario screening in industrial wastewater treatment, complementary to mechanistic models.
[61] arXiv:2604.20936 [pdf, other]: Title: AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe

Adam Cole, Mick Grierson

Comments: To appear in the Proceedings of the 2026 ACM Creativity and Cognition (C&C '26). 15 pages, 19 figures

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

We present AttentionBender, a tool that manipulates cross-attention in Video Diffusion Transformers to help artists probe the internal mechanics of black-box video generation. While generative outputs are increasingly realistic, prompt-only control limits artists' ability to build intuition for the model's material process or to work beyond its default tendencies. Using an autobiographical research-through-design approach, we built on Network Bending to design AttentionBender, which applies 2D transforms (rotation, scaling, translation, etc.) to cross-attention maps to modulate generation. We assess AttentionBender by visualizing 4,500+ video generations across prompts, operations, and layer targets. Our results suggest that cross-attention is highly entangled: targeted manipulations often resist clean, localized control, producing distributed distortions and glitch aesthetics over linear edits. AttentionBender contributes a tool that functions both as an Explainable AI style probe of transformer attention mechanisms, and as a creative technique for producing novel aesthetics beyond the model's learned representational space.
[62] arXiv:2604.20937 [pdf, html, other]: Title: Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs

Kibum Kim, Jiwan Kim, Kyle Min, Yueqi Wang, Jinyoung Moon, Julian McAuley, Chanyoung Park

Comments: Under Review

Subjects: Machine Learning (cs.LG)

Video Large Language Models (Video LLMs) incur high inference latency due to a large number of visual tokens provided to LLMs. To address this, training-free visual token pruning has emerged as a solution to reduce computational costs; however, existing methods are primarily validated on Multiple-Choice Question Answering (MCQA) benchmarks, where coarse-grained cues often suffice. In this work, we reveal that these methods suffer a sharp performance collapse on fine-grained understanding tasks requiring precise visual grounding, such as hallucination evaluation. To explore this gap, we conduct a systematic analysis and identify sink tokens--semantically uninformative tokens that attract excessive attention--as a key obstacle to fine-grained video understanding. When these sink tokens survive pruning, they distort the model's visual evidence and hinder fine-grained understanding. Motivated by these insights, we propose Sink-Token-aware Pruning (SToP), a simple yet effective plug-and-play method that introduces a sink score to quantify each token's tendency to behave as a sink and applies this score to existing spatial and temporal pruning methods to suppress them, thereby enhancing video understanding. To validate the effectiveness of SToP, we apply it to state-of-the-art pruning methods (VisionZip, FastVid, and Holitom) and evaluate it across diverse benchmarks covering hallucination, open-ended generation, compositional reasoning, and MCQA. Our results demonstrate that SToP significantly boosts performance, even when pruning up to 90% of visual tokens.
[63] arXiv:2604.20938 [pdf, html, other]: Title: HARBOR: Automated Harness Optimization

Biswa Sengupta, Jinhua Wang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Long-horizon language-model agents are dominated, in lines of code and in operational complexity, not by their underlying model but by the harness that wraps it: context compaction, tool caching, semantic memory, trajectory reuse, speculative tool prediction, and the glue that binds the model to a sandboxed execution environment. We argue that harness design is a first-class machine-learning problem and that automated configuration search dominates manual stacking once the flag space exceeds a handful of bits. We defend this claim in two steps. First, we formalize automated harness optimization as constrained noisy Bayesian optimization over a mixed-variable, cost-heterogeneous configuration space with cold-start-corrected rewards and a posterior chance-constrained safety check, and give a reference solver, HARBOR (Harness Axis-aligned Regularized Bayesian Optimization Routine), built from a block-additive SAAS surrogate, multi-fidelity cost-aware acquisition, and TuRBO trust regions. Second, we instantiate the problem in a flag-gated harness over a production coding agent and report a controlled four-round manual-tuning case study against a fixed task suite and an end-to-end HARBOR run. The formulation itself is task-class agnostic: the configuration space, reward correction, acquisition, and safety check apply to any agent harness with a bounded flag space and a reproducible task suite.
[64] arXiv:2604.20940 [pdf, html, other]: Title: Sema: Semantic Transport for Real-Time Multimodal Agents

Jiaying Meng, Bojie Li

Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI); Sound (cs.SD)

Real-time multimodal agents transport raw audio and screenshots using networking stacks designed for human receivers, which optimize for perceptual fidelity and smooth playout. Yet agent models act as event-driven processors with no inherent sense of physical time, consuming task-relevant semantics rather than reconstructing signals in real time. This fundamental difference shifts the transport goal from the technical problem of signal fidelity (Shannon-Weaver Level A) to the semantic problem of meaning preservation (Level B). This mismatch imposes significant overhead. In visual pipelines, screenshot upload accounts for over 60% of end-to-end action latency on constrained uplinks, and in voice pipelines, conventional transport carries massive redundancy, sending 43-64x more data than needed to maintain task accuracy. We present Sema, a semantic transport system that combines discrete audio tokenizers with a hybrid screen representation (lossless accessibility-tree or OCR text, plus compact visual tokens) and bursty token delivery that eliminates jitter buffers. In simulations under emulated WAN conditions, Sema reduces uplink bandwidth by 64x for audio and 130-210x for screenshots while preserving task accuracy within 0.7 percentage points of the raw baseline.
[65] arXiv:2604.20943 [pdf, html, other]: Title: SCM: Sleep-Consolidated Memory with Algorithmic Forgetting for Large Language Models

Saish Sachin Shinde

Comments: 5 figures. Submitted April 2026

Subjects: Machine Learning (cs.LG)

We present SCM (Sleep-Consolidated Memory), a research preview of a memory architecture for large language models that draws on neuroscientific principles to address a fundamental limitation in current systems: the absence of persistent, structured, and biologically plausible memory. Existing approaches rely on truncating context windows, growing vector databases without bound, or tiered storage systems that lack consolidation and forgetting mechanisms. SCM implements five core components inspired by human memory: a limited-capacity working memory, multi-dimensional importance tagging, offline sleep-stage consolidation with distinct NREM and REM phases, intentional value-based forgetting, and a computational self-model enabling introspection. Across a standardized benchmark suite of eight tests, the prototype achieves perfect recall accuracy over ten-turn conversations while reducing memory noise by 90.9% through adaptive forgetting. Memory search latency remains below one millisecond even with hundreds of stored concepts. This work establishes the architectural foundations for memory systems that consolidate, prioritize, and forget, offering a testable platform for advancing LLM memory research.
[66] arXiv:2604.20944 [pdf, other]: Title: LAF-Based Evaluation and UTTL-Based Learning Strategies with MIATTs

Yongquan Yang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In many real-world machine learning (ML) applications, the true target cannot be precisely defined due to ambiguity or subjectivity information. To address this challenge, under the assumption that the true target for a given ML task is not assumed to exist objectively in the real world, the EL-MIATTs (Evaluation and Learning with Multiple Inaccurate True Targets) framework has been proposed. Bridging theory and practice in implementing EL-MIATTs, in this paper, we develop two complementary mechanisms: LAF (Logical Assessment Formula)-based evaluation algorithms and UTTL (Undefinable True Target Learning)-based learning strategies with MIATTs, which together enable logically coherent and practically feasible modeling under uncertain supervision. We first analyze task-specific MIATTs, examining how their coverage and diversity determine their structural property and influence downstream evaluation and learning. Based on this understanding, we formulate LAF-grounded evaluation algorithms that operate either on original MIATTs or on ternary targets synthesized from them, balancing interpretability, soundness, and completeness. For model training, we introduce UTTL-grounded learning strategies using Dice and cross-entropy loss functions, comparing per-target and aggregated optimization schemes. We also discuss how the integration of LAF and UTTL bridges the gap between logical semantics and statistical optimization. Together, these components provide a coherent pathway for implementing EL-MIATTs, offering a principled foundation for developing ML systems in scenarios where the notion of "ground truth" is inherently uncertain. An application of this work's results is presented as part of the study available at this https URL.
[67] arXiv:2604.20945 [pdf, html, other]: Title: Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

Krishiv Agarwal, Ramneet Kaur, Colin Samplawski, Manoj Acharya, Anirban Roy, Daniel Elenius, Brian Matejek, Adam D. Cobb, Susmit Jha

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Effective safety auditing of large language models (LLMs) demands tools that go beyond black-box probing and systematically uncover vulnerabilities rooted in model internals. We present a comprehensive, interpretability-driven jailbreaking audit of eight SOTA open-source LLMs: Llama-3.1-8B, Llama-3.3-70B-4bt, GPT-oss- 20B, GPT-oss-120B, Qwen3-0.6B, Qwen3-32B, Phi4-3.8B, and Phi4-14B. Leveraging interpretability-based approaches -- Universal Steering (US) and Representation Engineering (RepE) -- we introduce an adaptive two-stage grid search algorithm to identify optimal activation-steering coefficients for unsafe behavioral concepts. Our evaluation, conducted on a curated set of harmful queries and a standardized LLM-based judging protocol, reveals stark contrasts in model robustness. The Llama-3 models are highly vulnerable, with up to 91\% (US) and 83\% (RepE) jailbroken responses on Llama-3.3-70B-4bt, while GPT-oss-120B remains robust to attacks via both interpretability approaches. Qwen and Phi models show mixed results, with the smaller Qwen3-0.6B and Phi4-3.8B mostly exhibiting lower jailbreaking rates, while their larger counterparts are more susceptible. Our results establish interpretability-based steering as a powerful tool for systematic safety audits, but also highlight its dual-use risks and the need for better internal defenses in LLM deployment.
[68] arXiv:2604.20946 [pdf, other]: Title: Common Foundations for Recursive Shape Languages

Shqiponja Ahmetaj, Iovka Boneva, Jan Hidders, Maxime Jakubowski, Jose-Emilio Labra-Gayo, Wim Martens, Fabio Mogavero, Filip Murlak, Cem Okulmus, Ognjen Savković, Mantas Šimkus, Dominik Tomaszuk

Subjects: Logic in Computer Science (cs.LO); Databases (cs.DB)

As schema languages for RDF data become more mature, we are seeing efforts to extend them with recursive semantics, applying diverse ideas from logic programming and description logics. While ShEx has an official recursive semantics based on greatest fixpoints (GFP), the discussion for SHACL is ongoing and seems to be converging towards least fixpoints (LFP). A practical study we perform shows that, indeed, ShEx validators implement GFP, whereas SHACL validators are more heterogeneous. This situation creates tension between ShEx and SHACL, as their semantic commitments appear to diverge, potentially undermining interoperability and predictability. We aim to clarify this design space by comparing the main semantic options in a principled yet accessible way, hoping to engage both theoreticians and practitioners, especially those involved in developing tools and standards. We present a unifying formal semantics that treats LFP, GFP, and supported model semantics (SMS), clarifying their relationships and highlighting a duality between LFP and GFP on stratified fragments. Next, we investigate to which extent the directions taken by SHACL and ShEx are compatible. We show that, although ShEx and SHACL seem to be going in different directions, they include large fragments with identical expressive power. Moreover, there is a strong correspondence between these fragments through the aforementioned principle of duality. Finally, we present a complete picture of the data and combined complexity of ShEx and SHACL validation under LFP, GFP, and SMS, showing that SMS comes at a higher computational cost under standard complexity-theoretic assumptions.
[69] arXiv:2604.20948 [pdf, html, other]: Title: Can Virtual Agents Care? Designing an Empathetic and Personalized LLM-Driven Conversational Agent

Truong Le Minh Toan, Dieu Bang Mach, Tan Duy Le, Nguyen Tan Viet Tuyen

Comments: Accepted manuscript version to be presented at the SCI-2026

Subjects: Human-Computer Interaction (cs.HC)

Mental health challenges are rising globally, while traditional support services face limited availability and high costs. Large language models offer potential for conversational support, but often lack personalization, empathy, and factual grounding. A virtual agent framework is introduced to provide empathetic, personalized, and reliable wellbeing support through retrieval-augmented architecture, structured memory, and multimodal interaction. Objective benchmarks demonstrate improved retrieval and response quality, particularly for smaller models. A cross-cultural study with university students from Vietnam and Australia shows the system outperforms LLM-only baselines in coherence, perceived accuracy, and empathy, with most participants clearly preferring the proposed approach.
[70] arXiv:2604.20949 [pdf, html, other]: Title: Early Detection of Latent Microstructure Regimes in Limit Order Books

Prakul Sunil Hiremath, Vruksha Arun Hiremath

Comments: 48 pages, 7 figures. Combines theoretical guarantees (identifiability and early-detection bounds), 200-run simulation study, and preliminary real-data evaluation on BTC/USDT limit order books. Code and data available

Subjects: Machine Learning (cs.LG); Trading and Market Microstructure (q-fin.TR); Methodology (stat.ME); Machine Learning (stat.ML)

Limit order books can transition rapidly from stable to stressed conditions, yet standard early-warning signals such as order flow imbalance and short-term volatility are inherently reactive. We formalise this limitation via a three-regime causal data-generating process (stable $\to$ latent build-up $\to$ stress) in which a latent deterioration phase creates a prediction window prior to observable stress. Under mild assumptions on temporal drift and regime persistence, we establish identifiability of the latent build-up regime and derive guarantees for strictly positive expected lead-time and non-trivial probability of early detection. We propose a trigger-based detector combining MAX aggregation of complementary signal channels, a rising-edge condition, and adaptive thresholding. Across 200 simulations, the method achieves mean lead-time $+18.6 \pm 3.2$ timesteps with perfect precision and moderate coverage, outperforming classical change-point and microstructure baselines. A preliminary application to one week of BTC/USDT order book data shows consistent positive lead-times while baselines remain reactive. Results degrade in low signal-to-noise and short build-up regimes, consistent with theory.
[71] arXiv:2604.20967 [pdf, other]: Title: Clinical Evaluation of a Tongue-Controlled Wrist Abduction-Adduction Assistance in a 6-DoF Upper-Limb Exoskeleton for Individuals with ALS and SCI

Juwairiya S. Khan, Mostafa Mohammadi, Alexander L. Ammitzbøll, Ellen-Merete Hagen, Jakob Blicher Izabella Obál, Ana S. S. Cardoso, Oguzhan Kirtas, Rasmus L. Kæseler, John Rasmussen, Lotte N.S. Andreasen Struijk

Comments: 9 pages, 7 figures and 2 tables. This work has been submitted to the IEEE Transactions on Neural Systems and Rehabilitation Engineering

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Upper-limb exoskeletons (ULEs) have the potential to restore functional independence in individuals with severe motor impairments; however, the clinical relevance of wrist degrees of freedom (DoF), particularly abduction-adduction (Ab-Ad), remains insufficiently evaluated. This study investigates the functional and user-perceived impact of wrist Ab-Ad assistance during two activities of daily living (ADLs). Wrist Ab-Ad assistance in a tongue-controlled 6-DoF ULE, EXOTIC2, was evaluated in a within-subject study involving one individual with amyotrophic lateral sclerosis and five individuals with spinal cord injury. Participants performed drinking and scratch stick leveling tasks with EXOTIC2 under two conditions: with and without wrist Ab-Ad assistance. Outcome measure included task success, task completion time, kinematic measures, and a usability questionnaire capturing comfort, functional perception, and acceptance. Enabling wrist Ab-Ad improved task success rates across both ADLs, with consistent reductions in spillage (from 77.8% spillages to 22.2%) and failed placements (from 66.7% to 16.7%). Participants utilized task-specific subsets of the available wrist range of motion, indicating that effective control within functional ranges was more critical than maximal joint excursion. Questionnaire responses indicated no increase in discomfort with the additional DoF and reflected perceived improvements in task performance. In conclusion, wrist Ab-Ad assistance enhances functional task performance in assistive exoskeleton use without compromising user comfort. However, its effectiveness depends on task context, control usability, and individual user strategies. This study provides clinically relevant, user-centered evidence supporting the inclusion of wrist Ab-Ad in ULEs, emphasizing the importance of balancing functional capability with usability in assistive device design.
[72] arXiv:2604.20972 [pdf, html, other]: Title: Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

Michael O'Herlihy, Rosa Català

Comments: 22 pages, 10 figures, preprint. Research on Defensibility Index (DI), Ambiguity Index (AI), and Probabilistic Defensibility Signal (PDS) for policy-grounded evaluation of rule-governed AI in content moderation (Reddit production data)

Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Content moderation systems are typically evaluated by measuring agreement with human labels. In rule-governed environments this assumption fails: multiple decisions may be logically consistent with the governing policy, and agreement metrics penalize valid decisions while mischaracterizing ambiguity as error -- a failure mode we term the Agreement Trap. We formalize evaluation as policy-grounded correctness and introduce the Defensibility Index (DI) and Ambiguity Index (AI). To estimate reasoning stability without additional audit passes, we introduce the Probabilistic Defensibility Signal (PDS), derived from audit-model token logprobs. We harness LLM reasoning traces as a governance signal rather than a classification output by deploying the audit model not to decide whether content violates policy, but to verify whether a proposed decision is logically derivable from the governing rule hierarchy. We validate the framework on 193,000+ Reddit moderation decisions across multiple communities and evaluation cohorts, finding a 33-46.6 percentage-point gap between agreement-based and policy-grounded metrics, with 79.8-80.6% of the model's false negatives corresponding to policy-grounded decisions rather than true errors. We further show that measured ambiguity is driven by rule specificity: auditing 37,286 identical decisions under three tiers of the same community rules reduces AI by 10.8 pp while DI remains stable. Repeated-sampling analysis attributes PDS variance primarily to governance ambiguity rather than decoding noise. A Governance Gate built on these signals achieves 78.6% automation coverage with 64.9% risk reduction. Together, these results show that evaluation in rule-governed environments should shift from agreement with historical labels to reasoning-grounded validity under explicit rules.
[73] arXiv:2604.20973 [pdf, html, other]: Title: User-Centered Design of Hyperlocal Communication Platforms: Insights from the Design and Evaluation of KUBO

Eljohn Evangelista, Alyssa Cea, Axel Balitaan, Clark Vince Diala, Jamlech Iram Gojo Cruz

Comments: To be published in Proceedings of the 2025 International Conference on Human-Engaged Computing (ICHEC 2025), November 21-23, 2025, Singapore, Singapore. ACM, New York, NY, USA, 13 pages

Subjects: Human-Computer Interaction (cs.HC)

Effective hyperlocal communication is critical in the Philippines, where delayed or algorithm-filtered updates can leave residents uninformed about emergency advisories and community events. We conducted a user-centered study consisting of contextual inquiry and semi-structured interviews to identify four key barriers: delayed alerts, algorithm-driven noise, language gaps, and digital divides. Guided by these insights, we designed KUBO (Kumunidad at Balitang Opisyal), a prototype that integrates a home module for verified local government unit advisories and curated headlines, and a community module for resident-powered neighborhood reports and discussions. Using a within-subjects evaluation design, KUBO significantly reduced task completion times (p-value < 0.001), improved information recall on post-task quizzes (p-value = 0.010), and yielded higher user satisfaction ratings for ease of use, overall satisfaction, and perceived effectiveness compared to Facebook, the commonly used communication platform in the Philippines. These results demonstrate that a dual-channel, inclusive platform can substantially enhance real-time information access, comprehension, and civic engagement in hyperlocal settings.
[74] arXiv:2604.20979 [pdf, other]: Title: A Complete Approach to Time Varying Linear Systems

Douglas R. Frey

Subjects: Systems and Control (eess.SY)

This paper presents a unifying theory of Linear second order systems that allows time-varying and time invariant systems to be treated in the same way for the first time. In the process, a transformation is given that diagonalizes an arbitrary time varying state matrix in a spectrum invariant way. A canonical form for the fundamental matrix is given that depends on dynamic eigenvalues and related eigenvectors dependent upon the Riccati Characteristic Equation for the system, which intuitively generalizes the standard characteristic equation for time invariant systems. The technique is shown by examples to give a unified approach to the solutions of time invariant, time-varying, and periodic systems.
[75] arXiv:2604.20982 [pdf, html, other]: Title: MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media

Aditya Bali, Rupsha, Vidur Kaushik, Anirban Sen

Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)

We present MediaGraph, a network-theoretic framework for analyzing reporting preferences in news media through entity co-occurrence networks. Using articles from four Indian news-sources, two mainstream (The Times of India and The Indian Express) and two fringe outlets (dna and firstpost), we construct source-specific co-occurrence networks around the 2020-21 and 2024 Farmers Protests. We analyze these networks along three network theoretic axes of centrality, community structure, and co-occurrence link predictability. The link predictability metric is a novel metric proposed that quantifies the consistency of entity associations over time using a GraphSAGE-based model. Our results reveal significant differences in reporting preferences across sources for the same event, and a consistent under-representation of farmer leaders across sources. By shifting the focus from textual signals to relational structures, our approach offers a scalable, label-independent perspective on media analysis and introduces link predictability as a complementary measure of reporting behavior.
[76] arXiv:2604.20983 [pdf, html, other]: Title: Thinking Like a Botanist: Challenging Multimodal Language Models with Intent-Driven Chain-of-Inquiry

Syed Nazmus Sakib, Nafiul Haque, Shahrear Bin Amin, Hasan Muhammad Abdullah, Md. Mehedi Hasan, Mohammad Zabed Hossain, Shifat E. Arman

Comments: Accepted at ACL 2026 Findings

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Vision evaluations are typically done through multi-step processes. In most contemporary fields, experts analyze images using structured, evidence-based adaptive questioning. In plant pathology, botanists inspect leaf images, identify visual cues, infer diagnostic intent, and probe further with targeted questions that adapt to species, symptoms, and severity. This structured probing is crucial for accurate disease diagnosis and treatment formulation. Yet current vision-language models are evaluated on single-turn question answering. To address this gap, we introduce PlantInquiryVQA, a benchmark for studying multi-step, intent-driven visual reasoning in botanical diagnosis. We formalize a Chain of Inquiry framework modeling diagnostic trajectories as ordered question-answer sequences conditioned on grounded visual cues and explicit epistemic intent. We release a dataset of 24,950 expert-curated plant images and 138,068 question-answer pairs annotated with visual grounding, severity labels, and domain-specific reasoning templates. Evaluations on top-tier Multimodal Large Language Models reveal that while they describe visual symptoms adequately, they struggle with safe clinical reasoning and accurate diagnosis. Importantly, structured question-guided inquiry significantly improves diagnostic correctness, reduces hallucination, and increases reasoning efficiency. We hope PlantInquiryVQA serves as a foundational benchmark in advancing research to train diagnostic agents to reason like expert botanists rather than static classifiers.
[77] arXiv:2604.20985 [pdf, html, other]: Title: Differentially Private Model Merging

Qichuan Yin, Manzil Zaheer, Tian Li

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

In machine learning applications, privacy requirements during inference or deployment time could change constantly due to varying policies, regulations, or user experience. In this work, we aim to generate a magnitude of models to satisfy any target differential privacy (DP) requirement without additional training steps, given a set of existing models trained on the same dataset with different privacy/utility tradeoffs. We propose two post processing techniques, namely random selection and linear combination, to output a final private model for any target privacy parameter. We provide privacy accounting of these approaches from the lens of R'enyi DP and privacy loss distributions for general problems. In a case study on private mean estimation, we fully characterize the privacy/utility results and theoretically establish the superiority of linear combination over random selection. Empirically, we validate our approach and analyses on several models and both synthetic and real-world datasets.
[78] arXiv:2604.20987 [pdf, other]: Title: Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha

Comments: 26 pages

Subjects: Artificial Intelligence (cs.AI)

Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed for evaluating agent skill usage in environments. Large Language Models (LLMs) offer a promising alternative as game playing agents, but they often struggle with consistent long horizon decision making because they lack a mechanism to discover, retain, and reuse structured skills across episodes. We present COSPLAY, a co evolution framework in which an LLM decision agent retrieves skills from a learnable skill bank to guide action taking, while an agent managed skill pipeline discovers reusable skills from the agents unlabeled rollouts to form a skill bank. Our framework improves both the decision agent to learn better skill retrieval and action generation, while the skill bank agent continually extracts, refines, and updates skills together with their contracts. Experiments across six game environments show that COSPLAY with an 8B base model achieves over 25.1 percent average reward improvement against four frontier LLM baselines on single player game benchmarks while remaining competitive on multi player social reasoning games.
[79] arXiv:2604.20990 [pdf, html, other]: Title: A Survey of Legged Robotics in Non-Inertial Environments: Past, Present, and Future

I-Chia Chang, Xinyan Huang, Tzu-Yuan Lin, Sangli Teng, Wenjing Li, Maani Ghaffari, Jingang Yi, Yan Gu

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Legged robots have demonstrated remarkable agility on rigid, stationary ground, but their locomotion reliability remains limited in non-inertial environments, where the supporting ground moves, tilts, or accelerates. Such conditions arise in ground transportation, maritime platforms, and aerospace settings, and they introduce persistent time-varying disturbances that break the stationary-ground assumptions underlying conventional legged locomotion. This survey reviews the state of the art in modeling, state estimation, and control for legged robots in non-inertial environments. We summarize representative application domains and motion characteristics, analyze the root causes of locomotion performance degradation, and review existing methods together with their key assumptions and limitations. We further identify open problems in robot-environment coupling, observability, robustness, and experimental validation, and discuss future directions in autonomy, system-level design, bio-inspired strategies, safety, and testing. The survey aims to clarify the technical foundations of this emerging area and support the development of reliable legged robots for real-world dynamic environments.
[80] arXiv:2604.20991 [pdf, html, other]: Title: Accuracy and stability of Artificial Neural Networks for HP-Splines frequency parameter selection

Vittoria Bruni, Paola Erminia Calabrese, Rosanna Campagna, Domenico Vitulano

Comments: 22 pages, 11 figures

Subjects: Numerical Analysis (math.NA)

This paper explores the use of artificial neural networks for the stable and data-driven selection of the frequency parameter in hyperbolic polynomial penalized splines (HP-splines). This parameter defines the underlying spline space and is essential for adapting the model to exponential patterns in the data, such as those encountered in signal processing. The theoretical approximation properties of deep neural network architectures are investigated to establish a connection between classical spline-based regression and modern data-driven learning methods. Based on this analysis, a neural network is designed to predict optimal HP-spline parameters by balancing approximation accuracy, stability analysis, and complexity control, thereby producing neural architectures that are both expressive and stable. Numerical experiments confirm that the proposed approach achieves both high accuracy and stable performance, validating the theoretical findings.
[81] arXiv:2604.20993 [pdf, other]: Title: Droplet-LNO: Physics-Informed Laplace Neural Operators for Accurate Prediction of Droplet Spreading Dynamics on Complex Surfaces

Ganesh Sahadeo Meshram, Partha Pratim Chakrabarti, Suman Chakraborty

Comments: 36 pages, 8 figures

Subjects: Machine Learning (cs.LG)

Spreading of liquid droplets on solid substrates constitutes a classic multiphysics problem with widespread applications ranging from inkjet printing, spray cooling, to biomedical microfluidic systems. Yet, accurate computational fluid dynamic (CFD) simulations are prohibitively expensive, taking more than 18 to 24 hours for each transient computation. In this paper, Physics-Informed Laplace Operator Neural Network (PI-LNO) is introduced, representing a novel architecture where the Laplace integral transform function serves as a learned physics-informed functional basis. Extensive comparative benchmark studies were performed against five other state-of-the-art approaches: UNet, UNet with attention modules (UNet-AM), DeepONet, Physics-Informed UNet (PI-UNet), and Laplace Neural Operator (LNO). Through complex Laplace transforms, PI-LNO natively models the exponential transient dynamics of the spreading process. A TensorFlow-based PI-LNO is trained on multi-surface CFD data spanning contact angles $\theta_s \epsilon [20,160]$, employing a physics-regularized composite loss combining data fidelity (MSE, MAE, RMSE) with Navier-Stokes, Cahn-Hilliard, and causality constraints.
[82] arXiv:2604.20994 [pdf, html, other]: Title: Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis, Seshu Tirupathi, John D. Kelleher

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The growth of agentic AI has drawn significant attention to function calling Large Language Models (LLMs), which are designed to extend the capabilities of AI-powered system by invoking external functions. Injection and jailbreaking attacks have been extensively explored to showcase the vulnerabilities of LLMs to user prompt manipulation. The expanded capabilities of agentic models introduce further vulnerabilities via their function calling interface. Recent work in LLM security showed that function calling can be abused, leading to data tampering and theft, causing disruptive behavior such as endless loops, or causing LLMs to produce harmful content in the style of jailbreaking attacks. This paper introduces a novel function hijacking attack (FHA) that manipulates the tool selection process of agentic models to force the invocation of a specific, attacker-chosen function. While existing attacks focus on semantic preference of the model for function-calling tasks, we show that FHA is largely agnostic to the context semantics and robust to the function sets, making it applicable across diverse domains. We further demonstrate that FHA can be trained to produce universal adversarial functions, enabling a single attacked function to hijack tool selection across multiple queries and payload configurations. We conducted experiments on 5 different models, including instructed and reasoning variants, reaching 70% to 100% ASR over the established BFCL dataset. Our findings further demonstrate the need for strong guardrails and security modules for agentic systems.
[83] arXiv:2604.20995 [pdf, html, other]: Title: Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models

Inderjeet Nair, Jie Ruan, Lu Wang

Comments: Under submission at COLM 2026 Won the Best Student Paper Award at MSLD 2026 @ UIUC

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)

Alignment faking, where a model behaves aligned with developer policy when monitored but reverts to its own preferences when unobserved, is a concerning yet poorly understood phenomenon, in part because current diagnostic tools remain limited. Prior diagnostics rely on highly toxic and clearly harmful scenarios, causing most models to refuse immediately. As a result, models never deliberate over developer policy, monitoring conditions, or the consequences of non-compliance, making these diagnostics fundamentally unable to detect alignment faking propensity. To support study of this phenomenon, we first introduce VLAF, a diagnostic framework grounded in the hypothesis that alignment faking is most likely when developer policy conflicts with a model's strongly held values. VLAF uses morally unambiguous scenarios to probe this conflict across diverse moral values, bypassing refusal behavior while preserving meaningful deliberative stakes. Using VLAF, we find that alignment faking is substantially more prevalent than previously reported, occurring in models as small as 7B parameters - with olmo2-7b-instruct faking alignment in 37% of this http URL, we show that oversight conditions induce activation shifts that lie along a single direction in representation space. This means the behavioral divergence driving alignment faking can be captured by a single contrastive steering vector, which we exploit for lightweight inference-time mitigation. Finally, we exploit this for mitigation that requires no labeled data and minimal computational overhead, achieving relative reductions in alignment faking of 85.8%, 94.0%, and 57.7% on olmo2-7b-instruct, olmo2-13b-instruct, and qwen3-8b respectively.
[84] arXiv:2604.20996 [pdf, html, other]: Title: AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models

Tadesse Destaw Belay, Shahriar Kabir Nahin, Israel Abebe Azime, Ocean Monjur, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam, Anshuman Chhabra

Subjects: Computation and Language (cs.CL)

How can language learning systems be developed for languages that lack sufficient training resources? This challenge is increasingly faced by developers across the African continent who aim to build AI systems capable of understanding and responding in local languages. To address this gap, we introduce AFRILANGDICT, a collection of 194.7K African language-English dictionary entries designed as seed resources for generating language-learning materials, enabling us to automatically construct large-scale, diverse, and verifiable student-tutor question-answer interactions suitable for training AI-assisted language tutors. Using AFRILANGDICT, we build AFRILANGEDU, a dataset of 78.9K multi-turn training examples for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Using AFRILANGEDU, we train language tutoring models collectively referred to as AFRILANGTUTOR. We fine-tune two multilingual LLMs: Llama-3-8B-IT and Gemma-3-12B-IT on AFRILANGEDU across 10 African languages and evaluate their performance. Our results show that models trained on AFRILANGEDU consistently outperform their base counterparts, and combining SFT and DPO yields substantial improvements, with gains ranging from 1.8% to 15.5% under LLM-as-a-judge evaluations across four criteria. To facilitate further research on low-resource languages -- all resources are available at this https URL.
[85] arXiv:2604.21001 [pdf, html, other]: Title: VRSafe: A Secure Virtual Keyboard to Mitigate Keystroke Inference in Virtual Reality

Yijun Yuan, Na Du, Adam J. Lee, Balaji Palanisamy

Subjects: Cryptography and Security (cs.CR)

Password-based authentication is one of the most commonly used methods for verifying user identities, and its widespread usage continues in virtual reality (VR) applications. As a result, various forms of attacks on password-based authentication in traditional environments such as keystroke inference and shoulder surfing, are still effective in VR applications. While keystroke inference attacks on virtual keyboards have been studied extensively, few efforts have developed an effective and cost-efficient defense strategy to mitigate keystroke inferences in VR. To address this gap, this paper presents a novel QWERTY keyboard called \textit{VRSafe} that is resilient to keystroke inference attacks. The proposed keyboard carefully introduces false positive keystrokes into the information collected by attackers during the typing process, making the inference of the original password difficult. \textit{VRSafe} also incorporates a novel malicious login detector that can effectively identify unauthorized login attempts using credentials inferred from keystroke inference attacks with high detection rate and minimal time and memory cost. The proposed design is evaluated through both simulation experiments and a real-world user study, and the results show that \textit{VRSafe} can significantly reduce the accuracy of keystroke inference attacks while incurring a modest overhead from a usability standpoint.
[86] arXiv:2604.21003 [pdf, html, other]: Title: The Last Harness You'll Ever Build

Haebin Seong, Li Yin, Haoran Zhang

Subjects: Artificial Intelligence (cs.AI)

AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling customer escalations that demand nuanced domain knowledge. \textbf{Each new task domain requires painstaking, expert-driven harness engineering}: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective. We present a two-level framework that automates this process. At the first level, the \textbf{Harness Evolution Loop} optimizes a worker agent's harness $\mathcal{H}$ for a single task: a Worker Agent $W_{\mathcal{H}}$ executes the task, an Evaluator Agent $V$ adversarially diagnoses failures and scores performance, and an Evolution Agent $E$ modifies the harness based on the full history of prior attempts. At the second level, the \textbf{Meta-Evolution Loop} optimizes the evolution protocol $\Lambda = (W_{\mathcal{H}}, \mathcal{H}^{(0)}, V, E)$ itself across diverse tasks, \textbf{learning a protocol $\Lambda^{(\text{best})}$ that enables rapid harness convergence on any new task -- so that adapting an agent to a novel domain requires no human harness engineering at all.} We formalize the correspondence to meta-learning and present both algorithms. The framework \textbf{shifts manual harness engineering into automated harness engineering}, and takes one step further -- \textbf{automating the design of the automation itself}.
[87] arXiv:2604.21006 [pdf, html, other]: Title: Deep FinResearch Bench: Evaluating AI's Ability to Conduct Professional Financial Investment Research

Mirazul Haque, Antony Papadimitriou, Samuel Mensah, Zhiqiang Ma, Zhijin Guo, Joy Prakash Sain, Simerjot Kaur, Charese Smiley, Xiaomo Liu

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We introduce Deep FinResearch Bench, a practical and comprehensive evaluation framework for deep research (DR) agents in financial investment research. The benchmark assesses three dimensions of report quality: qualitative rigor, quantitative forecasting and valuation accuracy, and claim credibility and verifiability. Particularly, we define corresponding qualitative and quantitative evaluation metrics and implement an automated scoring procedure to enable scalable assessment. Applying the benchmark to financial reports from frontier DR agents and comparing them with reports authored by financial professionals, we find that AI-generated reports still fall short across these dimensions. These findings underscore the need for domain-specialized DR agents tailored to finance, and we hope the work establishes a foundation for standardized benchmarking of DR agents in financial research.
[88] arXiv:2604.21008 [pdf, html, other]: Title: Linear Image Generation by Synthesizing Exposure Brackets

Yuekun Dai, Zhoutong Zhang, Shangchen Zhou, Nanxuan Zhao

Comments: accepted by CVPR2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The life of a photo begins with photons striking the sensor, whose signals are passed through a sophisticated image signal processing (ISP) pipeline to produce a display-referred image. However, such images are no longer faithful to the incident light, being compressed in dynamic range and stylized by subjective preferences. In contrast, RAW images record direct sensor signals before non-linear tone mapping. After camera response curve correction and demosaicing, they can be converted into linear images, which are scene-referred representations that directly reflect true irradiance and are invariant to sensor-specific factors. Since image sensors have better dynamic range and bit depth, linear images contain richer information than display-referred ones, leaving users more room for editing during post-processing. Despite this advantage, current generative models mainly synthesize display-referred images, which inherently limits downstream editing. In this paper, we address the task of text-to-linear-image generation: synthesizing a high-quality, scene-referred linear image that preserves full dynamic range, conditioned on a text prompt, for professional post-processing. Generating linear images is challenging, as pre-trained VAEs in latent diffusion models struggle to simultaneously preserve extreme highlights and shadows due to the higher dynamic range and bit depth. To this end, we represent a linear image as a sequence of exposure brackets, each capturing a specific portion of the dynamic range, and propose a DiT-based flow-matching architecture for text-conditioned exposure bracket generation. We further demonstrate downstream applications including text-guided linear image editing and structure-conditioned generation via ControlNet.
[89] arXiv:2604.21011 [pdf, html, other]: Title: Micro-DualNet: Dual-Path Spatio-Temporal Network for Micro-Action Recognition

Naga VS Raviteja Chappa, Evangelos Sariyanidi, Lisa Yankowitz, Gokul Nair, Casey J. Zampella, Robert T. Schultz, Birkan Tunç

Comments: Accepted to International Conference on Automatic Face and Gesture Recognition (FG)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

Micro-actions are subtle, localized movements lasting 1-3 seconds such as scratching one's head or tapping fingers. Such subtle actions are essential for social communication, ubiquitously used in natural interactions, and thus critical for fine-grained video understanding, yet remain poorly understood by current computer vision systems. We identify a fundamental challenge: micro-actions exhibit diverse spatio-temporal characteristics where some are defined by spatial configurations while others manifest through temporal dynamics. Existing methods that commit to a single spatio-temporal decomposition cannot accommodate this diversity. We propose a dual-path network that processes anatomically-grounded spatial entities through parallel Spatial-Temporal (ST) and Temporal-Spatial (TS) pathways. The ST path captures spatial configurations before modeling temporal dynamics, while the TS path inverts this order to prioritize temporal dynamics. Rather than fixed fusion, we introduce entity-level adaptive routing where each body part learns its optimal processing preference, complemented by Mutual Action Consistency (MAC) loss that enforces cross-path coherence. Extensive experiments demonstrate competitive performance on MA-52 dataset and state-of-the-art results on iMiGUE dataset. Our work reveals that architectural adaptation to the inherent complexity of micro-actions is essential for advancing fine-grained video understanding.
[90] arXiv:2604.21016 [pdf, html, other]: Title: SGD at the Edge of Stability: The Stochastic Sharpness Gap

Fangshuo Liao, Afroditi Kolomvaki, Anastasios Kyrillidis

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

When training neural networks with full-batch gradient descent (GD) and step size $\eta$, the largest eigenvalue of the Hessian -- the sharpness $S(\boldsymbol{\theta})$ -- rises to $2/\eta$ and hovers there, a phenomenon termed the Edge of Stability (EoS). \citet{damian2023selfstab} showed that this behavior is explained by a self-stabilization mechanism driven by third-order structure of the loss, and that GD implicitly follows projected gradient descent (PGD) on the constraint $ S(\boldsymbol{\theta})\leq 2/\eta$. For mini-batch stochastic gradient descent (SGD), the sharpness stabilizes below $2/\eta$, with the gap widening as the batch size decreases; yet no theoretical explanation exists for this suppression.
We introduce stochastic self-stabilization, extending the self-stabilization framework to SGD. Our key insight is that gradient noise injects variance into the oscillatory dynamics along the top Hessian eigenvector, strengthening the cubic sharpness-reducing force and shifting the equilibrium below $2/\eta$. Following the approach of \citet{damian2023selfstab}, we define stochastic predicted dynamics relative to a moving projected gradient descent trajectory and prove a stochastic coupling theorem that bounds the deviation of SGD from these predictions. We derive a closed-form equilibrium sharpness gap: $\Delta S = \eta \beta \sigma_{\boldsymbol{u}}^{2}/(4\alpha)$, where $\alpha$ is the progressive sharpening rate, $\beta$ is the self-stabilization strength, and $\sigma_{ \boldsymbol{u}}^{2}$ is the gradient noise variance projected onto the top eigenvector. This formula predicts that smaller batch sizes yield flatter solutions and recovers GD when the batch equals the full dataset.
[91] arXiv:2604.21017 [pdf, html, other]: Title: Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

Open-H-Embodiment Consortium: Nigel Nelson, Juo-Tung Chen, Jesse Haworth, Xinhao Chen, Lukas Zbinden, Dianye Huang, Alaa Eldin Abdelaal, Alberto Arezzo, Ayberk Acar, Farshid Alambeigi, Carlo Alberto Ammirati, Yunke Ao, Pablo David Aranda Rodriguez, Soofiyan Atar, Mattia Ballo, Noah Barnes, Federica Barontini, Filip Binkiewicz, Peter Black, Sebastian Bodenstedt, Leonardo Borgioli, Nikola Budjak, Benjamin Calmé, Fabio Carrillo, Nicola Cavalcanti, Changwei Chen, Haoxin Chen, Sihang Chen, Qihan Chen, Zhongyu Chen, Ziyang Chen, Shing Shin Cheng, Meiqing Cheng, Min Cheng, Zih-Yun Sarah Chiu, Xiangyu Chu, Camilo Correa-Gallego, Giulio Dagnino, Anton Deguet, Jacob Delgado, Jonathan C. DeLong, Kaizhong Deng, Alexander Dimitrakakis, Qingpeng Ding, Hao Ding, Giovanni Distefano, Daniel Donoho, Anqing Duan, Marco Esposito, Shane Farritor, Jad Fayad, Zahi Fayad, Mario Ferradosa, Filippo Filicori, Chelsea Finn, Philipp Fürnstahl, Jiawei Ge, Stamatia Giannarou, Xavier Giralt Ludevid, Frederic Giraud, Aditya Amit Godbole, Ken Goldberg, Antony Goldenberg, Diego Granero Marana, Xiaoqing Guo, Tamás Haidegger, Evan Hailey, Pascal Hansen, Ziyi Hao, Kush Hari, Kengo Hayashi, Jonathon Hawkins, Shelby Haworth, Ortrun Hellig, S. Duke Herrell, Zhouyang Hong, Andrew Howe, Junlei Hu, Ria Jain, Mohammad Rafiee Javazm, Howard Ji, Rui Ji, Jianmin Ji, Zhongliang Jiang, Dominic Jones, Jeffrey Jopling, Britton Jordan, Ran Ju, Michael Kam, Luoyao Kang, Fausto Kang, Siddhartha Kapuria, Peter Kazanzides, Sonika Kiehler, Ethan Kilmer, Ji Woong (Brian)Kim, Przemysław Korzeniowski, Chandra Kuchi, Nithesh Kumar

Comments: Project website: this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Autonomous medical robots hold promise to improve patient outcomes, reduce provider workload, democratize access to care, and enable superhuman precision. However, autonomous medical robotics has been limited by a fundamental data problem: existing medical robotic datasets are small, single-embodiment, and rarely shared openly, restricting the development of foundation models that the field needs to advance. We introduce Open-H-Embodiment, the largest open dataset of medical robotic video with synchronized kinematics to date, spanning more than 49 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures. We demonstrate the research enabled by this dataset through two foundation models. GR00T-H is the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all others) and achieves 64% average success across a 29-step ex vivo suturing sequence. We also train Cosmos-H-Surgical-Simulator, the first action-conditioned world model to enable multi-embodiment surgical simulation from a single checkpoint, spanning nine robotic platforms and supporting in silico policy evaluation and synthetic data generation for the medical domain. These results suggest that open, large-scale medical robot data collection can serve as critical infrastructure for the research community, enabling advances in robot learning, world modeling, and beyond.
[92] arXiv:2604.21018 [pdf, html, other]: Title: Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

Bowen Zuo, Dongruo Zhou, Yinglun Zhu

Subjects: Artificial Intelligence (cs.AI)

While scaling test-time compute can substantially improve model performance, existing approaches either rely on static compute allocation or sample from fixed generation distributions. In this work, we introduce a test-time compute allocation framework that jointly adapts where computation is spent and how generation is performed. Our method begins with a warm-up phase that identifies easy queries and assembles an initial pool of question-response pairs from the test set itself. An adaptive phase then concentrates further computation on unresolved queries while reshaping their generation distributions through evolving in-context demonstrations -- conditioning each generation on successful responses from semantically related queries rather than resampling from a fixed distribution. Experiments across math, coding, and reasoning benchmarks demonstrate that our approach consistently outperforms existing baselines while consuming substantially less inference-time compute.
[93] arXiv:2604.21019 [pdf, html, other]: Title: Following the Eye-Tracking Evidence: Established Web-Search Assumptions Fail in Carousel Interfaces

Jingwei Kang, Maarten de Rijke, Harrie Oosterhuis

Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)

Carousel interfaces have been the de-facto standard for streaming media services for over a decade. Yet, there has been very little research into user behavior with such interfaces, which thus remains poorly understood. Due to this lack of empirical research, previous work has assumed that behaviors established in single-list web-search interfaces, such as the F-pattern and the examination hypothesis, also apply to carousel interfaces, for instance when designing click models or evaluation metrics. We analyze a recently-released interaction and examination dataset resulting from an eye-tracking study performed on carousel interfaces to verify whether these assumptions actually hold.
We find that (i)~the F-pattern holds only for vertical examination and not for horizontal swiping; additionally, we discover that, when conditioned on a click, user examination follows an L-pattern unique to carousel interfaces; (ii)~click-through-rates conditioned on examination indicate that the well-known examination hypothesis does not hold in carousel interfaces; and (iii)~contrary to the assumptions of previous work, users generally ignore carousel headings and focus directly on the content items. Our findings show that many user behavior assumptions, especially concerning examination patterns, do not transfer from web search interfaces to carousel recommendation settings. Our work shows that the field lacks a reliable foundation on which to build models of user behavior with these interfaces. Consequently, a re-evaluation of existing metrics and click models for carousel interfaces may be warranted.
[94] arXiv:2604.21025 [pdf, other]: Title: A Complexity Dichotomy for Generalized Rainbow Matchings Based on Color Classes

Felix Hommelsheim, Pia Jehmlich, Moritz Mühlenthaler

Comments: 16 pages, 7 figures

Subjects: Discrete Mathematics (cs.DM)

Given an edge-colored graph, the Maximum Rainbow Matching problem asks for a maximum-cardinality matching of the graph that contains at most one edge from each color. We provide the following complexity dichotomy for this problem based on the structure of the color classes: Maximum Rainbow Matching admits a polynomial-time algorithm if almost every color class is a complete multipartite graph and it is NP-hard otherwise.
To prove the NP-hardness-part of the dichotomy, we first show that the problem remains NP-hard even if every color class is a subgraph on four vertices that is either a matching of size two, a path on four vertices or a paw. We then leverage this result to all color classes that are not complete multipartite graphs. For this purpose, we introduce color-closed graph classes, which seem to be an appropriate notion for obtaining complexity classifications for rainbow problems and may be of independent interest. To prove the positive part of the dichotomy, we show that the problem essentially reduces to computing a maximum $(l, u)$-matching, where we heavily exploit that almost all color classes are complete multipartite graphs. In the case where all color classes are complete multipartite, we provide a polynomial-time algorithm that computes a maximum matching containing at most $m_i$ edges from each color class $i$.
[95] arXiv:2604.21026 [pdf, html, other]: Title: MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference

Anurita Das

Comments: Code available at this https URL

Subjects: Machine Learning (cs.LG)

Deploying large language models to heterogeneous hardware is often constrained by memory, not compute. We introduce MCAP (Monte Carlo Activation Profiling), a load-time per-layer importance estimator that enables dynamic precision and memory placement decisions on the target device. MCAP produces a lightweight per-layer signal that drives both precision dispatch (W4A8 vs. W4A16) and residency tier (GPU, RAM, SSD), allowing a single set of weights to operate across diverse memory budgets. Our system, NVE, achieves 1.5-1.8x higher decode throughput than this http URL Q4_0 on NVIDIA T4 and enables models to run in memory regimes previously infeasible without modifying weights.
[96] arXiv:2604.21027 [pdf, html, other]: Title: HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

Yuyu Liu, Sarang Rajendra Patil, Mengjia Xu, Tengfei Ma

Comments: Accepted by Findings of ACL 2026

Subjects: Artificial Intelligence (cs.AI)

Electronic health record (EHR) question answering is often handled by LLM-based pipelines that are costly to deploy and do not explicitly leverage the hierarchical structure of clinical data. Motivated by evidence that medical ontologies and patient trajectories exhibit hyperbolic geometry, we propose HypEHR, a compact Lorentzian model that embeds codes, visits, and questions in hyperbolic space and answers queries via geometry-consistent cross-attention with type-specific pointer heads. HypEHR is pretrained with next-visit diagnosis prediction and hierarchy-aware regularization to align representations with the ICD ontology. On two MIMIC-IV-based EHR-QA benchmarks, HypEHR approaches LLM-based methods while using far fewer parameters. Our code is publicly available at this https URL.
[97] arXiv:2604.21028 [pdf, other]: Title: A Deep U-Net Framework for Flood Hazard Mapping Using Hydraulic Simulations of the Wupper Catchment

Christian Lammers, Fernando Arévalo, Leonie Märker-Neuhaus, Daniel Heinenberg, Christian Förster, Karl-Heinz Spies

Comments: 18 Pages, 9 Figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The increasing frequency and severity of global flood events highlights the need for the development of rapid and reliable flood prediction tools. This process traditionally relies on computationally expensive hydraulic simulations. This research presents a prediction tool by developing a deep-learning based surrogate model to accurately and efficiently predict the maximum water level across a grid. This was achieved by conducting a series of experiments to optimize a U-Net architecture, patch generation, and data handling for approximating a hydraulic model. This research demonstrates that a deep learning surrogate model can serve as a computationally efficient alternative to traditional hydraulic simulations. The framework was tested using hydraulic simulations of the Wupper catchment in the North-Rhein Westphalia region (Germany), obtaining comparable results.
[98] arXiv:2604.21030 [pdf, html, other]: Title: A Systematic Review and Taxonomy of Reinforcement Learning-Model Predictive Control Integration for Linear Systems

Mohsen Jalaeian Farimani, Roya Khalili Amirabadi, Davoud Nikkhouy, Malihe Abdolbaghi, Mahshad Rastegarmoghaddam, Shima Samadzadeh

Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Robotics (cs.RO); Optimization and Control (math.OC)

The integration of Model Predictive Control (MPC) and Reinforcement Learning (RL) has emerged as a promising paradigm for constrained decision-making and adaptive control. MPC offers structured optimization, explicit constraint handling, and established stability tools, whereas RL provides data-driven adaptation and performance improvement in the presence of uncertainty and model mismatch. Despite the rapid growth of research on RL--MPC integration, the literature remains fragmented, particularly for control architectures built on linear or linearized predictive models. This paper presents a comprehensive Systematic Literature Review (SLR) of RL--MPC integrations for linear and linearized systems, covering peer-reviewed and formally indexed studies published until 2025. The reviewed studies are organized through a multi-dimensional taxonomy covering RL functional roles, RL algorithm classes, MPC formulations, cost-function structures, and application domains. In addition, a cross-dimensional synthesis is conducted to identify recurring design patterns and reported associations among these dimensions within the reviewed corpus. The review highlights methodological trends, commonly adopted integration strategies, and recurring practical challenges, including computational burden, sample efficiency, robustness, and closed-loop guarantees. The resulting synthesis provides a structured reference for researchers and practitioners seeking to design or analyze RL--MPC architectures based on linear or linearized predictive control formulations.
[99] arXiv:2604.21031 [pdf, html, other]: Title: Synthetic Data in Education: Empirical Insights from Traditional Resampling and Deep Generative Models

Tapiwa Amion Chinodakufa, Ashfaq Ali Shafin, Khandaker Mamun Ahmed

Journal-ref: The 40th Annual AAAI Conference on Artificial Intelligence: AI4EDU, 2026

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Synthetic data generation offers promise for addressing data scarcity and privacy concerns in educational technology, yet practitioners lack empirical guidance for selecting between traditional resampling techniques and modern deep learning approaches. This study presents the first systematic benchmark comparing these paradigms using a 10,000-record student performance dataset. We evaluate three resampling methods (SMOTE, Bootstrap, Random Oversampling) against three deep learning models (Autoencoder, Variational Autoencoder, Copula-GAN) across multiple dimensions: distributional fidelity (Kolmogorov-Smirnov distance, Jensen-Shannon divergence), machine learning utility such as Train-on-Synthetic-Test-on-Real scores (TSTR), and privacy preservation (Distance to Closest Record). Our findings reveal a fundamental trade-off: resampling methods achieve near-perfect utility (TSTR: 0.997) but completely fail privacy protection (DCR ~ 0.00), while deep learning models provide strong privacy guarantees (DCR ~ 1.00) at significant utility cost. Variational Autoencoders emerge as the optimal compromise, maintaining 83.3% predictive performance while ensuring complete privacy protection. We also provide actionable recommendations: use traditional resampling for internal development where privacy is controlled, and VAEs for external data sharing where privacy is paramount. This work establishes a foundational benchmark and practical decision framework for synthetic data generation in learning analytics.
[100] arXiv:2604.21032 [pdf, html, other]: Title: Unlocking Multi-Spectral Data for Multi-Modal Models with Guided Inputs and Chain-of-Thought Reasoning

Dahun Kim, Ganesh Satish Mallya, Anelia Angelova

Comments: Accepted to IGARSS 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-spectral imagery is a valuable input signal for Remote Sensing applications, such as land-use and land-cover classification and environmental monitoring. However, generalist Large Multi-modal Models (LMMs) are typically trained on RGB images, limiting their applicability to the RGB domain. At the same time, training multi-spectral multi-modal models is expensive and produces uniquely specialized models. To address this, we propose a novel training-free approach that introduces multi-spectral data within the inference pipeline of standard RGB-only LMMs, allowing large gains in performance. Our approach leverages the LMMs' understanding of the visual space by adapting non-RGB inputs to that space and injecting domain-specific information and Chain-of-Thought reasoning as instructions. We demonstrate this with the Gemini 2.5 model and observe strong Zero-Shot performance gains on popular Remote Sensing benchmarks. These results highlight the potential for geospatial professionals to leverage powerful generalist models for specialized sensor inputs, benefiting from rich reasoning capabilities grounded in specialized data.
[101] arXiv:2604.21034 [pdf, other]: Title: White Paper: Human-AI Collaboration in Conflict Analysis: Text Classifier Development with Peacebuilders

Allan Kipyator Kipkemboi Cheboi, Julie Hawke, Hussam Abualfatah, Andrew Sutjahjo, Daniel Burkhardt Cerigo, Rachael Olpengs, William OBrien

Comments: 15 pages, 5 tables

Subjects: Human-Computer Interaction (cs.HC)

This paper documents a collaborative research process involving peacebuilders and data scientists in Kenya and Sudan to develop AI-based text classifiers for monitoring online polarization and hatespeech. The method describes a participatory annotation process in which practitioners and domain experts contributed to problem definition, annotation design, iterative validation, and model evaluation. Fine-tuned BERT-based classifiers were trained on collaboratively annotated datasets and evaluated against held-out test sets. In each case, the models produced enhanced contextual alignment, reduced misclassification driven by cultural nuance, and increased practitioner ownership of AI tools. The resulting models (Kenya-polarization and Sudan-hate speech) are open-source and accessible via HuggingFace. The study contributes empirical evidence that participatory AI development can simultaneously improve technical robustness, contextual validity, and normative alignment in sensitive humanitarian domains.
[102] arXiv:2604.21036 [pdf, html, other]: Title: Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models

Marzia Binta Nizam, James Davis

Subjects: Artificial Intelligence (cs.AI)

Text-to-image(T2I) models like Stable Diffusion and DALL-E have made generative AI widely accessible, yet recent studies reveal that these systems often replicate societal biases, particularly in how they depict demographic groups across professions. Prompts such as 'doctor' or 'CEO' frequently yield lighter-skinned outputs, while lower-status roles like 'janitor' show more diversity, reinforcing stereotypes. Existing mitigation methods typically require retraining or curated datasets, making them inaccessible to most users. We propose a lightweight, inference-time framework that mitigates representational bias through prompt-level intervention without modifying the underlying model. Instead of assuming a single definition of fairness, our approach allows users to select among multiple fairness specifications-ranging from simple choices such as a uniform distribution to more complex definitions informed by a large language model(LLM) that cites sources and provides confidence estimates. These distributions guide the construction of demographic specific prompt variants in the corresponding proportions, and we evaluate alignment by auditing adherence to the declared target and measuring the resulting skin tone distribution rather than assuming uniformity as 'fairness'. Across 36 prompts spanning 30 occupations and 6 non-occupational contexts, our method shifts observed skin-tone outcomes in directions consistent with the declared target, and reduces deviation from targets when the target is defined directly in skin-tone space(fallback). This work demonstrates how fairness interventions can be made transparent, controllable, and usable at inference time, directly empowering users of generative AI.
[103] arXiv:2604.21040 [pdf, html, other]: Title: Online Long-Term Voltage Stability Margin Estimation for IBR/DER Dominated Power System with Integrated VSM-Aware TSO-DSO Framework

Ahmed Alkhonain, Kiran Kumar Challa, Amarsagar Reddy Ramapuram Matavalam, Alok Kumar Bharati, Venkataramana Ajjarapu

Subjects: Systems and Control (eess.SY)

The rapid growth of inverter-based resources (IBRs) and distributed energy resources (DERs) has fundamentally altered the long-term voltage stability characteristics of modern power systems. This article leverages the advantages of machine learning (ML) for the online estimation of long-term voltage stability margin (VSM) and enhancement of VSM through coordinated transmission system operator-distribution system operator (TSO-DSO) optimization. An explicit analytical VSM expression is derived from offline T&D co-simulation data using a physics-informed ML-trained model under probabilistic loading and generation mix scenarios, while accounting for unbalanced distribution modeling. The resulting closed-form VSM representation is linearized and embedded into the TSO optimization problem, enabling real-time enforcement of minimum VSM constraints. We further enhance operational efficiency by incorporating VSM sensitivities into both transmission and distribution optimization, allowing prioritization of the most influential reactive power resources. Simulation studies conducted on the IEEE 30-bus transmission network integrated with multiple IEEE 37-node distribution feeders validate that the proposed framework successfully achieves the desired VSM enhancement while maintaining high estimation accuracy.
[104] arXiv:2604.21041 [pdf, html, other]: Title: Projected Gradient Unlearning for Text-to-Image Diffusion Models: Defending Against Concept Revival Attacks

Aljalila Aladawi, Mohammed Talha Alam, Fakhri Karray

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Machine unlearning for text-to-image diffusion models aims to selectively remove undesirable concepts from pre-trained models without costly retraining. Current unlearning methods share a common weakness: erased concepts return when the model is fine-tuned on downstream data, even when that data is entirely unrelated. We adapt Projected Gradient Unlearning (PGU) from classification to the diffusion domain as a post-hoc hardening step. By constructing a Core Gradient Space (CGS) from the retain concept activations and projecting gradient updates into its orthogonal complement, PGU ensures that subsequent fine-tuning cannot undo the achieved erasure. Applied on top of existing methods (ESD, UCE, Receler), the approach eliminates revival for style concepts and substantially delays it for object concepts, running in roughly 6 minutes versus the ~2 hours required by Meta-Unlearning. PGU and Meta-Unlearning turn out to be complementary: which performs better depends on how the concept is encoded, and retain concept selection should follow visual feature similarity rather than semantic grouping.
[105] arXiv:2604.21042 [pdf, html, other]: Title: Interpretable Quantile Regression by Optimal Decision Trees

Valentin Lemaire, Gaël Aglin, Siegfried Nijssen

Subjects: Machine Learning (cs.LG)

The field of machine learning is subject to an increasing interest in models that are not only accurate but also interpretable and robust, thus allowing their end users to understand and trust AI systems. This paper presents a novel method for learning a set of optimal quantile regression trees. The advantages of this method are that (1) it provides predictions about the complete conditional distribution of a target variable without prior assumptions on this distribution; (2) it provides predictions that are interpretable; (3) it learns a set of optimal quantile regression trees without compromising algorithmic efficiency compared to learning a single tree.
[106] arXiv:2604.21043 [pdf, html, other]: Title: Strategic Polysemy in AI Discourse: A Philosophical Analysis of Language, Hype, and Power

Travis LaCroix, Fintan Mallory, Sasha Luccioni

Comments: Accepted in the Ninth Annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2026

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper examines the strategic use of language in contemporary artificial intelligence (AI) discourse, focusing on the widespread adoption of metaphorical or colloquial terms like "hallucination", "chain-of-thought", "introspection", "language model", "alignment", and "agent". We argue that many such terms exhibit strategic polysemy: they sustain multiple interpretations simultaneously, combining narrow technical definitions with broader anthropomorphic or common-sense associations. In contemporary AI research and deployment contexts, this semantic flexibility produces significant institutional and discursive effects, shaping how AI systems are understood by researchers, policymakers, funders, and the public. To analyse this phenomenon, we introduce the concept of glosslighting: the practice of using technically redefined terms to evoke intuitive -- often anthropomorphic or misleading -- associations while preserving plausible deniability through restricted technical definitions. Glosslighting enables actors to benefit from the persuasive force of familiar language while maintaining the ability to retreat to narrower definitions when challenged. We argue that this practice contributes to AI hype cycles, facilitates the mobilisation of investment and institutional support, and influences public and policy perceptions of AI systems, while often deflecting epistemic and ethical scrutiny. By examining the linguistic dynamics of glosslighting and strategic polysemy, the paper highlights how language itself functions as a sociotechnical mechanism shaping the development and governance of AI.
[107] arXiv:2604.21044 [pdf, other]: Title: Active Data

Richard Arthur, Virginia DiDomizio, Louis Hoebel

Comments: 9 pages, 7 figures, 2 tables

Journal-ref: In: Meersman, R., Tari, Z., Schmidt, D.C. (eds) OTM 2003. Lecture Notes in Computer Science, vol 2888. Springer, Berlin, Heidelberg

Subjects: Artificial Intelligence (cs.AI)

In some complex domains, certain problem-specific decompositions can provide advantages over monolithic designs by enabling comprehension and specification of the design. In this paper we present an intuitive and tractable approach to reasoning over large and complex data sets. Our approach is based on Active Data, i.e., data as atomic objects that actively interact with environments. We describe our intuition about how this bottom-up approach improves designs confronting computational and conceptual complexity. We describe an implementation of the base Active Data concepts within the air traffic flow management domain and discuss performance for this implementation.
[108] arXiv:2604.21045 [pdf, html, other]: Title: Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

Siqi Ouyang, Shuoyang Ding, Oleksii Hrinchuk, Vitaly Lavrukhin, Brian Yan, Boris Ginsburg, Lei Li

Comments: ACL 2026 Oral

Subjects: Computation and Language (cs.CL)

Simultaneous speech translation (SST) generates translations while receiving partial speech input. Recent advances show that large language models (LLMs) can substantially improve SST quality, but at the cost of high computational overhead. To reduce this cost, prior work reformulates SST as a multi-turn dialogue task, enabling full reuse of the LLM's key-value (KV) cache and eliminating redundant feature recomputation. However, this approach relies on supervised fine-tuning (SFT) data in dialogue form, for which few human annotations exist, and existing synthesis methods cannot guarantee data quality. In this work, we propose a Hierarchical Policy Optimization (HPO) approach that post-train models trained on imperfect SFT data. We introduce a hierarchical reward that balances translation quality and latency objectives. Experiments on English to Chinese/German/Japanese demonstrate improvements of over +7 COMET score and +1.25 MetricX score at a latency of 1.5 seconds. Comprehensive ablation studies further validate the effectiveness of different quality rewards, hierarchical reward formulations, and segmentation strategies. Code can be found here this https URL
[109] arXiv:2604.21046 [pdf, html, other]: Title: JEPAMatch: Geometric Representation Shaping for Semi-Supervised Learning

Ali Aghababaei-Harandi, Aude Sportisse, Massih-Reza Amini

Subjects: Machine Learning (cs.LG)

Semi-supervised learning has emerged as a powerful paradigm for leveraging large amounts of unlabeled data to improve the performance of machine learning models when labeled data are scarce. Among existing approaches, methods derived from FixMatch have achieved state-of-the-art results in image classification by combining weak and strong data augmentations with confidence-based pseudo-labeling. Despite their strong empirical performance, these methods typically struggle with two critical bottlenecks: majority classes tend to dominate the learning process, which is amplified by incorrect pseudo-labels, leading to biased models. Furthermore, noisy early pseudo-labels prevent the model from forming clear decision boundaries, requiring prolonged training to learn informative representation. In this paper, we introduce a paradigm shift from conventional logical output threshold base, toward an explicit shaping of geometric representations. Our approach is inspired by the recently proposed Latent-Euclidean Joint-Embedding Predictive Architectures (LeJEPA), a theoretically grounded framework asserting that meaningful representations should exhibit an isotropic Gaussian structure in latent space. Building on this principle, we propose a new training objective that combines the classical semi-supervised loss used in FlexMatch, an adaptive extension of FixMatch, with a latent-space regularization term derived from LeJEPA.
Our proposed approach, encourages well-structured representations while preserving the advantages of pseudo-labeling strategies. Through extensive experiments on CIFAR-100, STL-10 and Tiny-ImageNet, we demonstrate that the proposed method consistently outperforms existing baselines. In addition, our method significantly accelerates the convergence, drastically reducing the overall computational cost compared to standard FixMatch-based pipelines.
[110] arXiv:2604.21051 [pdf, html, other]: Title: Residual Risk Analysis in Benign Code: How Far Are We? A Multi-Model Semantic and Structural Similarity Approach

Mohammad Farhad, Shuvalaxmi Dass

Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)

Software security relies on effective vulnerability detection and patching, yet determining whether a patch fully eliminates risk remains an underexplored challenge. Existing vulnerability benchmarks often treat patched functions as inherently benign, overlooking the possibility of residual security risks. In this work, we analyze vulnerable-benign function pairs from the PrimeVul, a benchmark dataset using multiple code language models (Code LMs) to capture semantic similarity, complemented by Tree-sitter-based abstract syntax tree (AST) analysis for structural similarity. Building on these signals, we propose Residual Risk Scoring (RRS), a unified framework that integrates embedding-based semantic similarity, localized AST-based structural similarity, and cross-model agreement to estimate residual risk in code. Our analysis shows that benign functions often remain highly similar to their vulnerable counterparts both semantically and structurally, indicating potential persistence of residual risk. We further find that approximately $61\%$ of high-RRS code pairs exhibit $13$ distinct categories of residual issues (e.g., null pointer dereferences, unsafe memory allocation), validated using state-of-the-art static analysis tools including Cppcheck, Clang-Tidy, and Facebook-Infer. These results demonstrate that code-level similarity provides a practical signal for prioritizing post-patch inspection, enabling more reliable and scalable security assessment in real-world open-source software pipelines.
[111] arXiv:2604.21052 [pdf, html, other]: Title: StyleVAR: Controllable Image Style Transfer via Visual Autoregressive Modeling

Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

We build on the Visual Autoregressive Modeling (VAR) framework and formulate style transfer as conditional discrete sequence modeling in a learned latent space. Images are decomposed into multi-scale representations and tokenized into discrete codes by a VQ-VAE; a transformer then autoregressively models the distribution of target tokens conditioned on style and content tokens. To inject style and content information, we introduce a blended cross-attention mechanism in which the evolving target representation attends to its own history, while style and content features act as queries that decide which aspects of this history to emphasize. A scale-dependent blending coefficient controls the relative influence of style and content at each stage, encouraging the synthesized representation to align with both the content structure and the style texture without breaking the autoregressive continuity of VAR. We train StyleVAR in two stages from a pretrained VAR checkpoint: supervised fine-tuning on a large triplet dataset of content--style--target images, followed by reinforcement fine-tuning with Group Relative Policy Optimization (GRPO) against a DreamSim-based perceptual reward, with per-action normalization weighting to rebalance credit across VAR's multi-scale hierarchy. Across three benchmarks spanning in-, near-, and out-of-distribution regimes, StyleVAR consistently outperforms an AdaIN baseline on Style Loss, Content Loss, LPIPS, SSIM, DreamSim, and CLIP similarity, and the GRPO stage yields further gains over the SFT checkpoint, most notably on the reward-aligned perceptual metrics. Qualitatively, the method transfers texture while maintaining semantic structure, especially for landscapes and architectural scenes, while a generalization gap on internet images and difficulty with human faces highlight the need for better content diversity and stronger structural priors.
[112] arXiv:2604.21053 [pdf, html, other]: Title: Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains

Fatemeh Ziaeetabar

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Robotic systems operating in human environments must reason about how object interactions evolve over time, which actions are currently being performed, and what manipulation step is likely to follow. Classical enriched Semantic Event Chains (eSECs) provide an interpretable relational description of manipulation, but remain primarily descriptive and do not directly support uncertainty-aware decision making. In this paper, we propose eSEC-LAM, a neuro-symbolic framework that transforms eSECs into an explicit event-level symbolic state for manipulation understanding. The proposed formulation augments classical eSECs with confidence-aware predicates, functional object roles, affordance priors, primitive-level abstraction, and saliency-guided explanation cues. These enriched symbolic states are derived from a foundation-model-based perception front-end through deterministic predicate extraction, while current-action inference and next-primitive prediction are performed using lightweight symbolic reasoning over primitive pre- and post-conditions. We evaluate the proposed framework on EPIC-KITCHENS-100, EPIC-KITCHENS VISOR, and Assembly101 across action recognition, next-primitive prediction, robustness to perception noise, and explanation consistency. Experimental results show that eSEC-LAM achieves competitive action recognition, substantially improves next-primitive prediction, remains more robust under degraded perceptual conditions than both classical symbolic and end-to-end video baselines, and provides temporally consistent explanation traces grounded in explicit relational evidence. These findings demonstrate that enriched Semantic Event Chains can serve not only as interpretable descriptors of manipulation, but also as effective internal states for neuro-symbolic action reasoning.
[113] arXiv:2604.21055 [pdf, other]: Title: Layer 2 Blockchains Simplified: A Survey of Vector Commitment Schemes, ZKP Frameworks, Layer-2 Data Structures and Verkle Trees

Ekleen Kaur, Marko Suvajdzic

Comments: Next work: Performance improvements in Verkle Trees and the first novel architecture with practical implementation on Fractional Verkle Trees is under review at ACM MICRO 2026, this was presented at EthCC Cannes, France this year. Also, this survey paper was accepted at ICECET, Rome, Italy, and Discover Networks Journal, Springer Nature

Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Layer-2 (L2) protocols address the fundamental limitations of Layer-1 (L1) blockchains by offloading computation while anchoring trust to the parent chain. This architectural shift, while boosting throughput, introduces a new, complex security surface defined by off-chain components like sequencers, bridges, and data availability mechanisms. Prior literature[31][33] offers fragmented views of this risk. This paper presents the first unified, security-focused survey that rigorously maps L2 architecture to its underlying cryptographic security. We dissect the technical progression from L1 primitives to the core of modern L2s, analyzing the security assumptions(Discrete Logarithm, Computational Diffie-Hellman, Bilinear Diffie-Hellman) of ZK frameworks (Groth16, Plonk) and their corresponding commitment schemes (KZG, IPA). We formalize a comprehensive L2 threat model encompassing sequencer liveness, bridge exploits, and data-availability failures. This work serves as an accessible yet rigorous reference for researchers and developers to reason about L2 security from a deep crypto-mathematical perspective.
[114] arXiv:2604.21057 [pdf, html, other]: Title: TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping

Yannis Belkhiter, Seshu Tirupathi, Giulio Zizzo, John D. Kelleher

Subjects: Computation and Language (cs.CL)

The field of Language Reasoning Models (LRMs) has been very active over the past few years with advances in training and inference techniques enabling LRMs to reason longer, and more accurately. However, a growing body of studies show that LRMs are still inefficient, over-generating verification and reflection steps. Additionally, the high-level role of each reasoning step and how different step types contribute to the generation of correct answers, is largely underexplored. To address this challenge, we introduce TRACES (Tagging of the Reasoning steps enabling Adaptive Cost-Efficient early-Stopping), a lightweight framework that tags reasoning steps in real-time, and enable adaptive, cost-efficient early stopping of large-language-model inferences. Building on this framework we monitor reasoning behaviors during inferences, and we find that LRMs tend to shift their reasoning behavior after reaching a correct answer. We demonstrate that the monitoring of the specific type of steps can produce effective interpretable early stopping criteria. We evaluate the TRACES framework on three mathematical reasoning benchmarks, namely, MATH500, GSM8K, AIME and two knowledge and reasoning benchmarks, MMLU and GPQA respectively. We achieve 20 to 50% token reduction while maintaining comparable accuracy to standard generation.
[115] arXiv:2604.21058 [pdf, other]: Title: Data-Driven Surrogate Models for Agromaritime Applications: Finite Element-Neural Network Integration

Muhammad Ilyas

Comments: 11 pages, 7 figures, published in the Proceedings of International Conference on Data Science, Mathematics and Informatics (ICoDMI 2025), IPB Bogor, Indonesia

Subjects: Numerical Analysis (math.NA)

Predicting nutrient transport and salinity distribution is crucial for mitigating climate-related threats to agromaritime systems. Traditional PDE-based models can capture the physics of nutrient dispersion, salinity and water quality. However, they face challenges in scalability and adaptability to real-time problems. In this article, we develop a hybrid approach that combines finite element discretisations with neural network integration to enable efficient and adaptive data-informed predictions. We use a finite element solver for the steady-state diffusion-reaction equation to generate a dataset across varying diffusivity, reaction and inflow conditions. We then build a proper orthogonal decomposition (POD), which reduces dimensionality, and a neural network (NN) that maps parameters to reduced coefficients. A numerical study presented on a simplified model demonstrates the proof-of-concept for predicting nutrient transport and salinity distribution. Numerical experiments show that the NN surrogate achieve a speed-up of approximately 956x compared to a regular FEM solver while maintaining an accuracy of mean relative L2-errors of 15% across the test set, with occasional higher deviations, which is sufficient for rapid scenario screening and parametric studies. These results highlight the method's potential as a fast and accurate surrogate for nutrient and salinity prediction, offering a balance between FEM reliability and NN adaptability for sustainable agromaritime management.
[116] arXiv:2604.21060 [pdf, html, other]: Title: Clinically-Informed Modeling for Pediatric Brain Tumor Classification from Whole-Slide Histopathology Images

Joakim Nguyen, Jian Yu, Jinrui Fang, Nicholas Konz, Tianlong Chen, Sanjay Krishnan, Chandra Krishnan, Ying Ding, Hairong Wang, Ankita Shukla

Comments: Accepted at the IEEE International Conference on Healthcare Informatics (ICHI), 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate diagnosis of pediatric brain tumors, starting with histopathology, presents unique challenges for deep learning, including severe data scarcity, class imbalance, and fine-grained morphologic overlap across diagnostically distinct subtypes. While pathology foundation models have advanced patch-level representation learning, their effective adaptation to weakly supervised pediatric brain tumor classification under limited data remains underexplored. In this work, we introduce an expert-guided contrastive fine-tuning framework for pediatric brain tumor diagnosis from whole-slide images (WSI). Our approach integrates contrastive learning into slide-level multiple instance learning (MIL) to explicitly regularize the geometry of slide-level representations during downstream fine-tuning. We propose both a general supervised contrastive setting and an expert-guided variant that incorporates clinically informed hard negatives targeting diagnostically confusable subtypes. Through comprehensive experiments on pediatric brain tumor WSI classification under realistic low-sample and class-imbalanced conditions, we demonstrate that contrastive fine-tuning yields measurable improvements in fine-grained diagnostic distinctions. Our experimental analyses reveal complementary strengths across different contrastive strategies, with expert-guided hard negatives promoting more compact intra-class representations and improved inter-class separation. This work highlights the importance of explicitly shaping slide-level representations for robust fine-grained classification in data-scarce pediatric pathology settings.
[117] arXiv:2604.21061 [pdf, html, other]: Title: InVitroVision: a Multi-Modal AI Model for Automated Description of Embryo Development using Natural Language

Nicklas Neu, Thomas Ebner, Jasmin Primus, Raphael Zefferer, Bernhard Schenkenfelder, Mathias Brunbauer, Florian Kromp

Comments: 15 pages, 2 figures

Subjects: Artificial Intelligence (cs.AI)

The application of artificial intelligence (AI) in IVF has shown promise in improving consistency and standardization of decisions, but often relies on annotated data and does not make use of the multimodal nature of IVF data. We investigated whether foundational vision-language models can be fine-tuned to predict natural language descriptions of embryo morphology and development. Using a publicly available embryo time-lapse dataset, we fine-tuned PaliGemma-2, a multi-modal vision-language model, with only 1,000 images and corresponding captions, describing embryo morphology, embryonic cell cycle and developmental stage. Our results show that the fine-tuned model, InVitroVision, outperformed a commercial model, ChatGPT 5.2, and base models in overall metrics, with performance improving with larger training datasets. This study demonstrates the potential of foundational vision-language models to generalize to IVF tasks with limited data, enabling the prediction of natural language descriptions of embryo morphology and development. This approach may facilitate the use of large language models to retrieve information and scientific evidence from relevant publications and guidelines, and has implications for few-shot adaptation to multiple downstream tasks in IVF.
[118] arXiv:2604.21063 [pdf, other]: Title: Automated Extraction of Pharmacokinetic Parameters from Structured XML Scientific Articles: Enhancing Data Accessibility at Scale

Remya Ampadi Ramachandran, Lisa A. Tell, Sidharth Rai, Nuwan Millagaha Gedara, Hossein Sholehrasa, Jim E. Riviere, Majid Jaberi-Douraki

Comments: 43 pages, 3 tables, 5 figures, includes Supplementary Materials

Subjects: Information Retrieval (cs.IR)

In the field of pharmacology, there is a notable absence of centralized, comprehensive, and up-to-date repositories of PK data. This poses a significant challenge for R&D as it can be a time-consuming and challenging task to collect all the required quantitative PK parameters from diverse scientific publications. This quantitative PK information is predominantly organized in tabular format, mostly available as XML, HTML, or PDF files within various online repositories and scientific publications, including supplementary materials. This makes tables one of the crucial components and information elements of scientific or regulatory documents as they are commonly utilized to present quantitative information. Extracting data from tables is typically a labor-intensive process, and alternative automated machine learning models may struggle to accurately detect and extract the relevant data due to the complex nature and diverse layouts of tabular data. The difficulty of information extraction and reading order detection is largely dependent on the structural complexity of the tables. Efforts to understand tables should prioritize capturing the content of table cells in a manner that aligns with how a human reader naturally comprehends the information. FARAD has been manually extracting tabular data and other information from literature and regulatory agencies for over 40 years. However, there is now an urgent need to automate this process due to the large volume of publications released daily. The accuracy of this task has become increasingly challenging, as manual extraction is tedious and prone to errors, especially given the staffing shortages we are currently facing. This necessitates the development of AI algorithms for table detection and extraction that are able to precisely handle cells organized according to the table structure, as indicated by column and/or row header information.
[119] arXiv:2604.21065 [pdf, other]: Title: On the dynamic behavior of the network SIRS epidemic model

Giulia Gatti, Giacomo Como

Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)

We study the Suscectible-Infected-Recovered-Susceptible (SIRS) epidemic model on deterministic networks. For connected but otherwise general interaction patterns and heterogeneous recovery and loss-of-immunity rates, we identify a fundamental parameter R_0 (the basic reproduction number), which fully characterizes the qualitative dynamic behavior of the system. This parameter is the dominant eigenvalue of a rescaled version of the interaction matrix, whose rows are normalized by the corresponding recovery rates. We prove that a transcritical bifurcation occurs as R_0 crosses the threshold value 1. Specifically, we show that, if R_0 does not exceed 1, then the disease-free equilibrium is globally asymptotically stable, whereas, if R_0 is larger than 1, then the disease-free equilibrium is unstable and there exists a unique endemic equilibrium, which is asymptotically stable. As a byproduct of our analysis, we also identify key monotonicity properties of the dependence of the endemic equilibrium on the model parameters (the interaction matrix as well as the recovery rates and the loss-of-immunity rates) and obtain a distributed iterative algorithm for its computation, with provable convergence guarantees. Our results extend existing ones available in the literature for network SIRS epidemic models with rank-one interaction matrices and homogeneous recovery rates (including the single homogeneous population SIRS epidemic model).
[120] arXiv:2604.21066 [pdf, html, other]: Title: Optimizing Diffusion Priors with a Single Observation

Frederic Wang, Katherine L. Bouman

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Methodology (stat.ME)

While diffusion priors generate high-quality posterior samples across many inverse problems, they are often trained on limited training sets or purely simulated data, thus inheriting the errors and biases of these underlying sources. Current approaches to finetuning diffusion models rely on a large number of observations with varying forward operators, which can be difficult to collect for many applications, and thus lead to overfitting when the measurement set is small. We propose a method for tuning a prior from only a single observation by combining existing diffusion priors into a single product-of-experts prior and identifying the exponents that maximize the Bayesian evidence. We validate our method on real-world inverse problems, including black hole imaging, where the true prior is unknown a priori, and image deblurring with text-conditioned priors. We find that the evidence is often maximized by priors that extend beyond those trained on a single dataset. By generalizing the prior through exponent weighting, our approach enables posterior sampling from both tempered and combined diffusion models, yielding more flexible priors that improve the trustworthiness of the resulting posterior image distribution.
[121] arXiv:2604.21070 [pdf, html, other]: Title: DWTSumm: Discrete Wavelet Transform for Document Summarization

Rana Salama, Abdou Youssef, Mona Diab

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Summarizing long, domain-specific documents with large language models (LLMs) remains challenging due to context limitations, information loss, and hallucinations, particularly in clinical and legal settings. We propose a Discrete Wavelet Transform (DWT)-based multi-resolution framework that treats text as a semantic signal and decomposes it into global (approximation) and local (detail) components. Applied to sentence- or word-level embeddings, DWT yields compact representations that preserve overall structure and critical domain-specific details, which are used directly as summaries or to guide LLM generation. Experiments on clinical and legal benchmarks demonstrate comparable ROUGE-L scores. Compared to a GPT-4o baseline, the DWT based summarization consistently improve semantic similarity and grounding, achieving gains of over 2% in BERTScore, more than 4\% in Semantic Fidelity, factual consistency in legal tasks, and large METEOR improvements indicative of preserved domain-specific semantics. Across multiple embedding models, Fidelity reaches up to 97%, suggesting that DWT acts as a semantic denoising mechanism that reduces hallucinations and strengthens factual grounding. Overall, DWT provides a lightweight, generalizable method for reliable long-document and domain-specific summarization with LLMs.
[122] arXiv:2604.21072 [pdf, other]: Title: Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization

Jiu Chen, Shuangyan Yang, Xu Xiong, Hexiao Duan, Xinran Zhang, Jie Ren, Dong Li

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Decentralized LLM inference distributes computation among heterogeneous nodes across the internet, offering a performant and cost-efficient solution, alternative to traditional centralized inference. However, the low cross-node network bandwidth makes communication the primary bottleneck. In this paper, we introduce BloomBee, an internet-scale distributed LLM inference framework. BloomBee integrates LLM-layer assignment, micro-batching and tensor offloading to optimize communication from multiple dimensions. Additionally, BloomBee formulates the coordination of these techniques as an optimization problem and solves it using dynamic programming. BloomBee also customizes lossless compression and speculative decoding according to low-bandwidth network settings to reduce communication overhead. We evaluate BloomBee across a spectrum of network environments and show that it improves service throughput by up to 1.76x. It also reduces average latency by up to 43.20% compared to state-of-the-art decentralized LLM inference systems. BloomBee is open-sourced.
[123] arXiv:2604.21074 [pdf, html, other]: Title: Old and new Schrödinger eigenvalue localisation

Carsten Carstensen, Tim Stiebert

Subjects: Numerical Analysis (math.NA)

Unconditional guaranteed lower and upper eigenvalue bounds are mandatory for the understanding of the Schrödinger eigenvalue spectrum and its spectral gaps. While upper eigenvalue bounds are naturally induced by conforming discretisations, guaranteed lower eigenvalue bounds (GLB) are less immediate. This paper clarifies the adaptation of nonconforming GLB from the harmonic eigenvalue problem and discusses their comparison for general and piecewise constant potentials. A fine-tuned extra-stabilised scheme is proposed and found superior in numerical comparisons. This new direct calculation of GLB is compatible with adaptive mesh-refinement and successfully circumvents the appearance of maximal mesh-size parameters in former GLB based on post-processing. Computational benchmarks also investigate guaranteed upper eigenvalue bounds (GUB) for two-sided eigenvalue control by conforming test functions associated to the underlying nonconforming computations. A numerical comparison with GUB from additional lowest-order conforming finite element schemes shows competitive accuracy with less computational cost.
[124] arXiv:2604.21076 [pdf, html, other]: Title: Serialisation Strategy Matters: How FHIR Data Format Affects LLM Medication Reconciliation

Sanjoy Pator

Comments: 14 pages, 7 figures, independent research

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Medication reconciliation at clinical handoffs is a high-stakes, error-prone process. Large language models are increasingly proposed to assist with this task using FHIR-structured patient records, but a fundamental and largely unstudied variable is how the FHIR data is serialised before being passed to the model. We present the first systematic comparison of four FHIR serialisation strategies (Raw JSON, Markdown Table, Clinical Narrative, and Chronological Timeline) across five open-weight models (Phi-3.5-mini, Mistral-7B, BioMistral-7B, Llama-3.1-8B, Llama-3.3-70B) on a controlled benchmark of 200 synthetic patients, totalling 4,000 inference runs. We find that serialisation strategy has a large, statistically significant effect on performance for models up to 8B parameters: Clinical Narrative outperforms Raw JSON by up to 19 F1 points for Mistral-7B (r = 0.617, p < 10^{-10}). This advantage reverses at 70B, where Raw JSON achieves the best mean F1 of 0.9956. In all 20 model and strategy combinations, mean precision exceeds mean recall: omission is the dominant failure mode, with models more often missing an active medication than fabricating one, which changes how clinical safety auditing priorities should be set. Smaller models plateau at roughly 7-10 concurrent active medications, leaving polypharmacy patients, the patients most at risk from reconciliation errors, systematically underserved. BioMistral-7B, a domain-pretrained model without instruction tuning, produces zero usable output in all conditions, showing that domain pretraining alone is not sufficient for structured extraction. These results offer practical, evidence-based format recommendations for clinical LLM deployment: Clinical Narrative for models up to 8B, Raw JSON for 70B and above. The complete pipeline is reproducible on open-source tools running on an AWS this http URL instance (NVIDIA L40S, 48 GB VRAM).
[125] arXiv:2604.21078 [pdf, html, other]: Title: Impact-Aware Model Predictive Control for UAV Landing on a Heaving Platform

Jess Stephenson, Melissa Greeff

Comments: To be published in the proceedings of International Federation of Automatic Control (IFAC) World Congress 2026

Subjects: Robotics (cs.RO)

Landing UAVs on heaving marine platforms is challenging because relative vertical motion can generate large impact forces and cause rebound on touchdown. To address this, we develop an impact-aware Model Predictive Control (MPC) framework that models landing as a velocity-level rigid-body impact governed by Newton's restitution law. We embed this as a linear complementarity problem (LCP) within the MPC dynamics to predict the discontinuous post-impact velocity and suppress rebound. In simulation, restitution-aware prediction reduces pre-impact relative velocity and improves landing robustness. Experiments on a heaving-deck testbed show an 86.2% reduction in post-impact deflection compared to a tracking MPC.
[126] arXiv:2604.21079 [pdf, html, other]: Title: Foveated Reasoning: Stateful, Action-based Visual Focusing for Vision-Language Models

Juhong Min, Lazar Valkov, Vitali Petsiuk, Hossein Souri, Deen Dayal Mohan

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-language models benefit from high-resolution images, but the increase in visual-token count incurs high compute overhead. Humans resolve this tension via foveation: a coarse view guides "where to look", while selectively acquired high-acuity evidence refines "what to think". We introduce Foveated Reasoner, an autoregressive vision-language framework that unifies foveation and reasoning within a single decoding trajectory. Starting from a low-resolution view, the model triggers foveation only when needed, retrieves high-resolution evidence from selected regions, and injects it back into the same decoding trajectory. We train the method with a two-stage pipeline: coldstart supervision to bootstrap foveation behavior, followed by reinforcement learning to jointly improve evidence acquisition and task accuracy while discouraging trivial "see-everything" solutions. Experiments show that the method learns effective foveation policies and achieves stronger accuracy under tight visual-token budgets across multiple vision-language benchmarks.
[127] arXiv:2604.21082 [pdf, html, other]: Title: Weighting What Matters: Boosting Sample Efficiency in Medical Report Generation via Token Reweighting

Alexander Weers, Daniel Rueckert, Martin J. Menten

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Training vision-language models (VLMs) for medical report generation is often hindered by the scarcity of high-quality annotated data. This work evaluates the use of a weighted loss function to improve data efficiency. Compared to standard cross-entropy loss, which treats all token prediction errors equally, the reweighted loss shifts the focus to semantically salient tokens with outsized clinical importance. In experiments on ophthalmological report generation, we show that this simple method improves efficiency across multiple data scales, achieving similar report quality with up to ten times less training data.
[128] arXiv:2604.21083 [pdf, html, other]: Title: Behavioral Consistency and Transparency Analysis on Large Language Model API Gateways

Guanjie Lin, Yinxin Wan, Shichao Pei, Ting Xu, Kuai Xu, Guoliang Xue

Comments: 11 pages. Initially submitted to IMC 2026 Cycle 1 on November 20, 2025; accepted on March 13, 2026. To appear in Proceedings of the 2026 ACM Internet Measurement Conference (IMC '26)

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI); Software Engineering (cs.SE)

Third-party Large Language Model (LLM) API gateways are rapidly emerging as unified access points to models offered by multiple vendors. However, the internal routing, caching, and billing policies of these gateways are largely undisclosed, leaving users with limited visibility into whether requests are served by the advertised models, whether responses remain faithful to upstream APIs, or whether invoices accurately reflect public pricing policies. To address this gap, we introduce GateScope, a lightweight black-box measurement framework for evaluating behavioral consistency and operational transparency in commercial LLM gateways. GateScope is designed to detect key misbehaviors, including model downgrading or switching, silent truncation, billing inaccuracies, and instability in latency by auditing gateways along four critical dimensions: response content analysis, multi-turn conversation performance, billing accuracy, and latency characteristics. Our measurements across 10 real-world commercial LLM API gateways reveal frequent gaps between expected and actual behaviors, including silent model substitutions, degraded memory retention, deviations from announced pricing, and substantial variation in latency stability across platforms.
[129] arXiv:2604.21084 [pdf, html, other]: Title: Deductive Verification of Weak Memory Programs with View-based Protocols (extended version)

Ömer Şakar, Soham Chakraborty, Marieke Huisman, Anton Wijs

Subjects: Logic in Computer Science (cs.LO)

Concurrent programming under weak memory concurrency faces substantial challenges to ensure correctness due to program behaviors that cannot be explained by thread interleaving, a.k.a. sequential consistency. While several program logics are proposed to reason about weak memory concurrency, their usage has been limited to intricate manual proofs. On the other hand, the VerCors verifier provides a rich toolset for automated deductive verification for sequential consistency.
In this paper, we bridge this gap for automated deductive verification of weak memory concurrent programs with the VerCors deductive verification tool. We propose an approach to encode weak memory concurrency in VerCors. We develop VerCors-relaxed, where we extend the VerCors atomics support and bring concepts from several protocol automata to encode permission-based separation logics for weak memory concurrency models. To demonstrate the effectiveness of our approach, we encode the relaxed fragment of the SLR program logic, a recent state-of-the-art permission-based separation logic for weak memory concurrency in VerCors-relaxed, our extension of VerCors. We use the SLR encoding on VerCors-relaxed to automatically verify several examples from the literature within realistic performance.
[130] arXiv:2604.21090 [pdf, html, other]: Title: Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework

Christo Zietsman

Comments: 8 pages. Experiment, corpus, and evaluation framework publicly available at this https URL

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

AI governance programmes increasingly rely on natural language prompts to constrain and direct AI agent behaviour. These prompts function as executable specifications: they define the agent's mandate, scope, and quality criteria. Despite this role, no systematic framework exists for evaluating whether a governance prompt is structurally complete. We introduce a five-principle evaluation framework grounded in computability theory, proof theory, and Bayesian epistemology, and apply it to an empirical corpus of 34 publicly available this http URL governance files sourced from GitHub. Our evaluation reveals that 37% of evaluated file-model pairs score below the structural completeness threshold, with data classification and assessment rubric criteria most frequently absent. These results suggest that practitioner-authored governance prompts exhibit consistent structural patterns that automated static analysis could detect and remediate. We discuss implications for requirements engineering practice in AI-assisted development contexts, identify a previously undocumented artefact classification gap in the this http URL convention, and propose directions for tool support.
[131] arXiv:2604.21092 [pdf, html, other]: Title: Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs

Gricel Vázquez, Alexandros Evangelidis, Sepeedeh Shahbeigi, Radu Calinescu, Simos Gerasimou

Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Integrating Large Language Models (LLMs) into complex software systems enables the generation of human-understandable explanations of opaque AI processes, such as automated task planning. However, the quality and reliability of these explanations heavily depend on effective prompt engineering. The lack of a systematic understanding of how diverse stakeholder groups formulate and refine prompts hinders the development of tools that can automate this process. We introduce COMPASS (COgnitive Modelling for Prompt Automated SynthesiS), a proof-of-concept self-adaptive approach that formalises prompt engineering as a cognitive and probabilistic decision-making process. COMPASS models unobservable users' latent cognitive states, such as attention and comprehension, uncertainty, and observable interaction cues as a POMDP, whose synthesised policy enables adaptive generation of explanations and prompt refinements. We evaluate COMPASS using two diverse cyber-physical system case studies to assess the adaptive explanation generation and their qualities, both quantitatively and qualitatively. Our results demonstrate the feasibility of COMPASS integrating human cognition and user profile's feedback into automated prompt synthesis in complex task planning systems.
[132] arXiv:2604.21093 [pdf, html, other]: Title: TRAVELFRAUDBENCH: A Configurable Evaluation Framework for GNN Fraud Ring Detection in Travel Networks

Bhavana Sajja

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We introduce TravelFraudBench (TFG), a configurable benchmark for evaluating graph neural networks (GNNs) on fraud ring detection in travel platform graphs. Existing benchmarks--YelpChi, Amazon-Fraud, Elliptic, PaySim--cover single node types or domain-generic patterns with no mechanism to evaluate across structurally distinct fraud ring topologies. TFG simulates three travel-specific ring types--ticketing fraud (star topology with shared device/IP clusters), ghost hotel schemes (reviewer x hotel bipartite cliques), and account takeover rings (loyalty transfer chains)--in a heterogeneous graph with 9 node types and 12 edge types. Ring size, count, fraud rate, scale (500 to 200,000 nodes), and composition are fully configurable. We evaluate six methods--MLP, GraphSAGE, RGCN-proj, HAN, RGCN, and PC-GNN--under a ring-based split where each ring appears entirely in one partition, eliminating transductive label leakage. GraphSAGE achieves AUC=0.992 and RGCN-proj AUC=0.987, outperforming the MLP baseline (AUC=0.938) by 5.5 and 5.0 pp, confirming graph structure adds substantial discriminative power. HAN (AUC=0.935) is a negative result, matching the MLP baseline. On the ring recovery task (>=80% of ring members flagged simultaneously), GraphSAGE achieves 100% recovery across all ring types; MLP recovers only 17-88%. The edge-type ablation shows device and IP co-occurrence are the primary signals: removing uses_device drops AUC by 5.2 pp. TFG is released as an open-source Python package (MIT license) with PyG, DGL, and NetworkX exporters and pre-generated datasets at this https URL, with Croissant metadata including Responsible AI fields.
[133] arXiv:2604.21094 [pdf, html, other]: Title: Spectral Embeddings Leak Graph Topology: Theory, Benchmark, and Adaptive Reconstruction

Thinh Nguyen-Cong, Truong-Son Hy, Thang N. Dinh

Subjects: Machine Learning (cs.LG)

Graph Neural Networks (GNNs) excel on relational data, but standard benchmarks unrealistically assume the graph is centrally available. In practice, settings such as Federated Graph Learning, distributed systems, and privacy-sensitive applications involve graph data that are localized, fragmented, noisy, and privacy-leaking. We present a unified framework for this setting. We introduce LoGraB (Local Graph Benchmark), which decomposes standard datasets into fragmented benchmarks using three strategies and four controls: neighborhood radius $d$, spectral quality $k$, noise level $\sigma$, and coverage ratio $p$. LoGraB supports graph reconstruction, localized node classification, and inter-fragment link prediction, with Island Cohesion. We propose AFR (Adaptive Fidelity-driven Reconstruction), a method for noisy spectral fragments. AFR scores patch quality via a fidelity measure combining a gap-to-truncation stability ratio and structural entropy, then assembles fragments using RANSAC-Procrustes alignment, adaptive stitching, and Bundle Adjustment. Rather than forcing a single global graph, AFR recovers large faithful islands. We prove heat-kernel edge recovery under a separation condition, Davis--Kahan perturbation stability, and bounded alignment error. We establish a Spectral Leakage Proposition: under a spectral-gap assumption, polynomial-time Bayesian recovery is feasible once enough eigenvectors are shared, complementing AFR's deterministic guarantees. Experiments on nine benchmarks show that LoGraB reveals model strengths and weaknesses under fragmentation, AFR achieves the best F1 on 7/9 datasets, and under per-embedding $(\epsilon,\delta)$-Gaussian differential privacy, AFR retains 75% of its undefended F1 at $\epsilon=2$. Our anonymous code is available at this https URL
[134] arXiv:2604.21095 [pdf, other]: Title: TorchGWAS : GPU-accelerated GWAS for thousands of quantitative phenotypes

Xingzhong Zhao, Ziqian Xie, Islam, Sheikh Muhammad Saiful, Tian Xia, Chen, Cheng, Degui Zhi

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE); Genomics (q-bio.GN)

Motivation: Modern bioinformatics workflows, particularly in imaging and representation learning, can generate thousands to tens of thousands of quantitative phenotypes from a single cohort. In such settings, running genome-wide association analyses trait by trait rapidly becomes a computational bottleneck. While established GWAS tools are highly effective for individual traits, they are not optimized for phenotype-rich screening workflows in which the same genotype matrix is reused across a large phenotype panel. Results: We present TorchGWAS, a framework for high-throughput association testing of large phenotype panels through hardware acceleration. The current public release provides stable Python and command-line workflows for linear GWAS and multivariate phenotype screening, supports NumPy, PLINK, and BGEN genotype inputs, aligns phenotype and covariate tables by sample identifier, and performs covariate adjustment internally. In a benchmark with 8.9 million markers and 23,000 samples, fastGWA required approximately 100 second per phenotype on an AMD EPYC 7763 64-core CPU, whereas TorchGWAS completed 2,048 phenotypes in 10 minute and 20,480 phenotypes in 20 minutes on a single NVIDIA A100 GPU, corresponding to an approximately 300- to 1700-fold increase in phenotype throughput. TorchGWAS therefore makes large-scale GWAS screening practical in phenotype-rich settings where thousands of quantitative traits must be evaluated efficiently. Availability and implementation: TorchGWAS is implemented in Python and distributed as a documented source repository at this https URL. The current release provides a command-line interface, packaged source code, tutorials, benchmark scripts, and example workflows.
[135] arXiv:2604.21096 [pdf, html, other]: Title: Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation

Xuhong He, To Eun Kim, Maik Fröbe, Jaime Arguello, Bhaskar Mitra, Fernando Diaz

Comments: SIGIR 2026; NTCIR track: this https URL

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Tip-of-the-Tongue (ToT) retrieval benchmarks have largely focused on English, limiting their applicability to multilingual information access. In this work, we construct multilingual ToT test collections for Chinese, Japanese, Korean, and English, using an LLM-based query simulation framework. We systematically study how prompt language and source document language affect the fidelity of simulated ToT queries, validating synthetic queries through system rank correlation against real user queries. Our results show that effective ToT simulation requires language-aware design choices: non-English language sources are generally important, while English Wikipedia can be beneficial when non-English sources provide insufficient information for query generation. Based on these findings, we release four ToT test collections with 5,000 queries per language across multiple domains. This work provides the first large-scale multilingual ToT benchmark and offers practical guidance for constructing realistic ToT datasets beyond English.
[136] arXiv:2604.21098 [pdf, html, other]: Title: Propensity Inference: Environmental Contributors to LLM Behaviour

Olli Järviniemi, Oliver Makins, Jacob Merizian, Robert Kirk, Ben Millwood

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Motivated by loss of control risks from misaligned AI systems, we develop and apply methods for measuring language models' propensity for unsanctioned behaviour. We contribute three methodological improvements: analysing effects of changes to environmental factors on behaviour, quantifying effect sizes via Bayesian generalised linear models, and taking explicit measures against circular analysis. We apply the methodology to measure the effects of 12 environmental factors (6 strategic in nature, 6 non-strategic) and thus the extent to which behaviour is explained by strategic aspects of the environment, a question relevant to risks from misalignment. Across 23 language models and 11 evaluation environments, we find approximately equal contributions from strategic and non-strategic factors for explaining behaviour, do not find strategic factors becoming more or less influential as capabilities improve, and find some evidence for a trend for increased sensitivity to goal conflicts. Finally, we highlight a key direction for future propensity research: the development of theoretical frameworks and cognitive models of AI decision-making into empirically testable forms.
[137] arXiv:2604.21100 [pdf, html, other]: Title: Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

Neehal Tumma, Noel Loo, Daniela Rus

Subjects: Machine Learning (cs.LG)

To address the increasing long-context compute limitations of softmax attention, several subquadratic recurrent operators have been developed. This work includes models such as Mamba-2, DeltaNet, Gated DeltaNet (GDN), and Kimi Delta Attention (KDA). As the space of recurrences grows, a parallel line of work has arisen to taxonomize them. One compelling view is the test-time regression (TTR) framework, which interprets recurrences as performing online least squares updates that learn a linear map from the keys to values. Existing delta-rule recurrences can be seen as first-order approximations to this objective, but notably ignore the curvature of the least-squares loss during optimization. In this work, we address this by introducing preconditioning to these recurrences. Starting from the theory of online least squares, we derive equivalences between linear attention and the delta rule in the exactly preconditioned case. Next, we realize this theory in practice by proposing a diagonal approximation: this enables us to introduce preconditioned variants of DeltaNet, GDN, and KDA alongside efficient chunkwise parallel algorithms for computing them. Empirically, we find that our preconditioned delta-rule recurrences yield consistent performance improvements across synthetic recall benchmarks and language modeling at the 340M and 1B scale.
[138] arXiv:2604.21101 [pdf, html, other]: Title: A Hybridizable Neural Time Integrator for Stable Autoregressive Forecasting

Brooks Kinch, Xiaozhe Hu, Yilong Huang, Martine Dyring Hansen, Sunniva Meltzer, Nathaniel Donald Hamlin, David Sirajuddin, Eric C. Cyr, Nathaniel Trask

Comments: 29 pages, 6 figures

Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

For autoregressive modeling of chaotic dynamical systems over long time horizons, the stability of both training and inference is a major challenge in building scientific foundation models. We present a hybrid technique in which an autoregressive transformer is embedded within a novel shooting-based mixed finite element scheme, exposing topological structure that enables provable stability. For forward problems, we prove preservation of discrete energies, while for training we prove uniform bounds on gradients, provably avoiding the exploding gradient problem. Combined with a vision transformer, this yields latent tokens admitting structure-preserving dynamics. We outperform modern foundation models with a $65\times$ reduction in model parameters and long-horizon forecasting of chaotic systems. A "mini-foundation" model of a fusion component shows that 12 simulations suffice to train a real-time surrogate, achieving a $9{,}000\times$ speedup over particle-in-cell simulation.
[139] arXiv:2604.21102 [pdf, html, other]: Title: Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery

Siyuan Yao, Siavash Ghorbany, Kuangshi Ai, Arnav Cherukuthota, Meghan Forstchen, Alexis Korotasz, Matthew Sisk, Ming Hu, Chaoli Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

We present a novel framework for automatically evaluating building conditions nationwide in the United States by leveraging large language models (LLMs) and Google Street View (GSV) imagery. By fine-tuning Gemma 3 27B on a modest human-labeled dataset, our approach achieves strong alignment with human mean opinion scores (MOS), outperforming even individual raters on SRCC and PLCC relative to the MOS benchmark. To enhance efficiency, we apply knowledge distillation, transferring the capabilities of Gemma 3 27B to a smaller Gemma 3 4B model that achieves comparable performance with a 3x speedup. Further, we distill the knowledge into a CNN-based model (EfficientNetV2-M) and a transformer (SwinV2-B), delivering close performance while achieving a 30x speed gain. Furthermore, we investigate LLMs' capabilities for assessing an extensive list of built environment and housing attributes through a human-AI alignment study and develop a visualization dashboard that integrates LLM assessment outcomes for downstream analysis by homeowners. Our framework offers a flexible and efficient solution for large-scale building condition assessment, enabling high accuracy with minimal human labeling effort.
[140] arXiv:2604.21103 [pdf, html, other]: Title: AI Governance under Political Turnover: The Alignment Surface of Compliance Design

Andrew J. Peterson

Subjects: Artificial Intelligence (cs.AI); General Economics (econ.GN)

Governments are increasingly interested in using AI to make administrative decisions cheaper, more scalable, and more consistent. But for probabilistic AI to be incorporated into public administration it must be embedded in a compliance layer that makes decisions reviewable, repeatable, and legally defensible. That layer can improve oversight by making departures from law easier to detect. But it can also create a stable approval boundary that political successors learn to navigate while preserving the appearance of lawful administration. We develop a formal model in which institutions choose the scale of automation, the degree of codification, and safeguards on iterative use. The model shows when these systems become vulnerable to strategic use from within government, why reforms that initially improve oversight can later increase that vulnerability, and why expansions in AI use may be difficult to unwind. Making AI usable can thus make procedures easier for future governments to learn and exploit.
[141] arXiv:2604.21104 [pdf, html, other]: Title: Pretrain Where? Investigating How Pretraining Data Diversity Impacts Geospatial Foundation Model Performance

Amandeep Kaur, Mirali Purohit, Gedeon Muhawenayo, Esther Rolf, Hannah Kerner

Comments: Accepted at EarthVision workshop, CVPR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

New geospatial foundation models introduce a new model architecture and pretraining dataset, often sampled using different notions of data diversity. Performance differences are largely attributed to the model architecture or input modalities, while the role of the pretraining dataset is rarely studied. To address this research gap, we conducted a systematic study on how the geographic composition of pretraining data affects a model's downstream performance. We created global and per-continent pretraining datasets and evaluated them on global and per-continent downstream datasets. We found that the pretraining dataset from Europe outperformed global and continent-specific pretraining datasets on both global and local downstream evaluations. To investigate the factors influencing a pretraining dataset's downstream performance, we analysed 10 pretraining datasets using diversity across continents, biomes, landcover and spectral values. We found that only spectral diversity was strongly correlated with performance, while others were weakly correlated. This finding establishes a new dimension of diversity to be accounted for when creating a high-performing pretraining dataset. We open-sourced 7 new pretraining datasets, pretrained models, and our experimental framework at this https URL.
[142] arXiv:2604.21106 [pdf, html, other]: Title: How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

We measure how much one extra recurrence is worth to a looped (depth-recurrent) language model, in equivalent unique parameters. From an iso-depth sweep of 116 pretraining runs across recurrence counts $r \in \{1, 2, 4, 8\}$ spanning ${\sim}50\times$ in training compute, we fit a joint scaling law $L = E + A\,(N_\text{once} + r^{\varphi} N_\text{rec})^{-\alpha} + B\,D^{-\beta}$ and recover a new recurrence-equivalence exponent $\varphi = 0.46$ at $R^2 = 0.997$. Intuitively, $\varphi$ tells us whether looping a block $r$ times is equivalent in validation loss to $r$ unique blocks of a non-looped model (full equivalence, $\varphi{=}1$) or to a single block run repeatedly with no capacity gain ($\varphi{=}0$). Our $\varphi = 0.46$ sits in between, so each additional recurrence predictably increases validation loss at matched training compute. For example, at $r{=}4$ a 410M looped model performs on par with a 580M non-looped model, but pays the training cost of a 1B non-looped one. On a five-axis downstream evaluation, the gap persists on parametric-knowledge tasks and closes on simple open-book tasks, while reasoning tasks are not resolvable at our compute budgets. For any looped LM, our $\varphi$ converts the design choice of $r$ into a predictable validation-loss cost, and future training recipes and architectures can be compared by how much they raise $\varphi$ above $0.46$.
[143] arXiv:2604.21108 [pdf, other]: Title: Machine learning and digital pragmatics: Which word category influences emoji use most?

Mohammed Q. Shormani, Ibrahim Abdulmalik Hassan Muneef Y. Alshawsh

Comments: 15 pages, 4 Figures, 3 Tables

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This study investigates Machine Learning (ML) in the prediction of emojis in Arabic tweets employing the (state-of-the-art) MARBERT model. A corpus of 11379 CA tweets representing multiple Arabic colloquial dialects was collected from this http URL via Python. A net dataset includes 8695 tweets, which were utilized for the analysis. These tweets were then classified into 14 categories, which were numerically encoded and used as labels. A preprocessing pipeline was designed as an interpretable baseline, allowing us to examine the relationship between lexical features and emoji categories. MARBERT was finetuned to predict emoji use from textual input. We evaluated the model performance in terms of precision, recall and F1-scores. Findings reveal that the model performed quite well with an overall accuracy 0.75. The study concludes that although the findings are promising, there is still a need for improving machine learning models including MARBERT, specifically for low-resource and multidialectal languages like Arabic.
[144] arXiv:2604.21111 [pdf, html, other]: Title: A Ground-Truth-Based Evaluation of Vulnerability Detection Across Multiple Ecosystems

Peter Mandl, Paul Mandl, Martin Häusl, Maximilian Auch

Comments: 23 pages with appendix, 6 figures, 18 tables, appendix with additional evaluation data

Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)

Automated vulnerability detection tools are widely used to identify security vulnerabilities in software dependencies. However, the evaluation of such tools remains challenging due to the heterogeneous structure of vulnerability data sources, inconsistent identifier schemes, and ambiguities in version range specifications. In this paper, we present an empirical evaluation of vulnerability detection across multiple software ecosystems using a curated ground-truth dataset derived from the Open Source Vulnerabilities (OSV) database. The dataset explicitly maps vulnerabilities to concrete package versions and enables a systematic comparison of detection results across different tools and services. Since vulnerability databases such as OSV are continuously updated, the dataset used in this study represents a snapshot of the vulnerability landscape at the time of the evaluation. To support reproducibility and future studies, we provide an open-source tool that automatically reconstructs the dataset from the current OSV database using the methodology described in this paper. Our evaluation highlights systematic differences between vulnerability detection systems and demonstrates the importance of transparent dataset construction for reproducible empirical security research.
[145] arXiv:2604.21117 [pdf, other]: Title: Efficient Batch Search Algorithm for B+ Tree Index Structures with Level-Wise Traversal on FPGAs

Max Tzschoppe, Martin Wilhelm, Sven Groppe, Thilo Pionteck

Subjects: Hardware Architecture (cs.AR); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

This paper introduces a search algorithm for index structures based on a B+ tree, specifically optimized for execution on a field-programmable gate array (FPGA). Our implementation efficiently traverses and reuses tree nodes by processing a batch of search keys level by level. This approach reduces costly global memory accesses, improves reuse of loaded B+ tree nodes, and enables parallel search key comparisons directly on the FPGA. Using a high-level synthesis (HLS) approach, we developed a highly flexible and configurable search kernel design supporting variable batch sizes, customizable node sizes, and arbitrary tree depths. The final design was implemented on an AMD Alveo U250 Data Center Accelerator Card, and was evaluated against the B+ tree search algorithm from the TLX library running on an AMD EPYC 7542 processor (2.9 GHz). With a batch size of 1000 search keys, a B+ tree containing one million entries, and a tree order of 16, we measured a 4.9x speedup for the single-kernel FPGA design compared to a single-threaded CPU implementation. Running four kernel instances in parallel on the FPGA resulted in a 2.1$\times$ performance improvement over a CPU implementation using 16 threads.
[146] arXiv:2604.21119 [pdf, html, other]: Title: Materialistic RIR: Material Conditioned Realistic RIR Generation

Mahnoor Fatima Saad, Sagnik Majumder, Kristen Grauman, Ziad Al-Halah

Comments: Accepted to CVPR 2026 Findings. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD)

Rings like gold, thuds like wood! The sound we hear in a scene is shaped not only by the spatial layout of the environment but also by the materials of the objects and surfaces within it. For instance, a room with wooden walls will produce a different acoustic experience from a room with the same spatial layout but concrete walls. Accurately modeling these effects is essential for applications such as virtual reality, robotics, architectural design, and audio engineering. Yet, existing methods for acoustic modeling often entangle spatial and material influences in correlated representations, which limits user control and reduces the realism of the generated acoustics. In this work, we present a novel approach for material-controlled Room Impulse Response (RIR) generation that explicitly disentangles the effects of spatial and material cues in a scene. Our approach models the RIR using two modules: a spatial module that captures the influence of the spatial layout of the scene, and a material module that modulates this spatial RIR according to a user-specified material configuration. This explicitly disentangled design allows users to easily modify the material configuration of a scene and observe its impact on acoustics without altering the spatial structure or scene content. Our model provides significant improvements over prior approaches on both acoustic-based metrics (up to +16% on RTE) and material-based metrics (up to +70%). Furthermore, through a human perceptual study, we demonstrate the improved realism and material sensitivity of our model compared to the strongest baselines.
[147] arXiv:2604.21120 [pdf, html, other]: Title: TabSHAP

Aryan Chaudhary, Prateek Agarwal, Tejasvi Alladi

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Large Language Models (LLMs) fine-tuned on serialized tabular data are emerging as powerful alternatives to traditional tree-based models, particularly for heterogeneous or context-rich datasets. However, their deployment in high-stakes domains is hindered by a lack of faithful interpretability; existing methods often rely on global linear proxies or scalar probability shifts that fail to capture the model's full probabilistic uncertainty. In this work, we introduce TabSHAP, a model-agnostic interpretability framework designed to directly attribute local query decision logic in LLM-based tabular classifiers. By adapting a Shapley-style sampled-coalition estimator with Jensen-Shannon divergence between full-input and masked-input class distributions, TabSHAP quantifies the distributional impact of each feature rather than simple prediction flips. To align with tabular semantics, we mask at the level of serialized key:value fields (atomic in the prompt string), not individual subword tokens. Experimental validation on the Adult Income and Heart Disease benchmarks demonstrates that TabSHAP isolates critical diagnostic features, achieving significantly higher faithfulness than random baselines and XGBoost proxies. We further run a distance-metric ablation on the same test instances and TabSHAP settings: attributions are recomputed with KL or L1 replacing JSD in the similarity step (results cached per metric), and we compare deletion faithfulness across all three.
[148] arXiv:2604.21124 [pdf, html, other]: Title: Enabling Mixed criticality applications for the Versal AI-Engines

Vincent Sprave, Martin Wilhelm, Daniele Passaretti, Alberto Garcia-Ortiz, Thilo Pionteck

Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

Adaptive Systems-on-Chips (SoCs) are increasingly being used in mixed criticality systems (MCSs), such as in autonomous driving, aviation and medical systems. In this context, AMD has proposed the Versal SoC, which has a heterogeneous architecture including, among other components, an Artificial Intelligence Engine (AIE), which is a 2D array of processors and memory tiles designed for AI and signal processing workloads. While this AIE offers significant potential for accelerating real-time data processing tasks, this has not yet been explored in the context of MCSs since individual tasks with different criticality levels cannot be dynamically assigned to tiles due to the static mapping of dataflow graphs and tasks. In this work, we propose a dynamic task dispatching infrastructure that enables task switching on the AIE at runtime. Based on this infrastructure, we present an MCS design that dynamically assigns tasks of different criticality to a pool of AIE tiles, depending on the criticality mode of the system. Our approach overcomes the limitations of static dataflow graph mappings and, for the first time, exploits the parallel processing capabilities of the AIE for MCSs. We also present a comprehensive timing analysis of the overhead introduced by the task dispatcher infrastructure, focusing on control logic, context switching and data copy operations. This shows that these operations have low variance and are negligible compared to the overall execution time, demonstrating that our infrastructure is suitable for MCSs. Finally, we evaluate the proposed infrastructure using an autonomous driving workload with tasks that have variable execution times and different criticality levels. In this case study, we maximized AIE utilization, reducing idle time by 65.5 %, while measuring an execution time overhead of less than 0.002 %, and doubling the throughput of low-criticality tasks.
[149] arXiv:2604.21125 [pdf, html, other]: Title: A Cloud-Native Architecture for Human-in-Control LLM-Assisted OpenSearch in Investigative Settings

Benjamin Puhani, Kai Brehmer, Malte Prieß

Journal-ref: Proceedings of the 17th International Conference on Cloud Computing, GRIDs, and Virtualization (CLOUD COMPUTING 2026), p. 62-65, Lisbon, Portugal, April 2026

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Complex criminal investigations are often hindered by large volumes of unstructured evidence and by the semantic gap between natural language investigative intent and technical search logic. To address this challenge, we present a design and feasibility study of a cloud-native microservice architecture tailored to private-cloud deployments, contributing to research in secure cloud computing and leveraging modern cloud paradigms under high security and scalability requirements. The proposed system integrates Large Language Models into a "Human-in-Control" workflow that translates natural-language queries into syntactically valid OpenSearch Domain-Specific Language expressions. We describe the implementation of a hybrid retrieval strategy within OpenSearch that combines BM25-based lexical search with nested semantic vector embeddings. The paper focuses on system design and preliminary functional validation, establishing an architectural baseline for future empirical evaluation. Technical feasibility is demonstrated through a functional prototype, and a rigorous evaluation methodology is outlined using the Enron Email Dataset as a structural proxy for restricted investigative corpora.
[150] arXiv:2604.21127 [pdf, html, other]: Title: HyperFM: An Efficient Hyperspectral Foundation Model with Spectral Grouping

Zahid Hassan Tushar, Sanjay Purushotham

Comments: 15 pages, 8 figures, to be published in CVPR 2026 findings, Code and data are publicly available on this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The NASA PACE mission provides unprecedented hyperspectral observations of ocean color, aerosols, and clouds, offering new insights into how these components interact and influence Earth's climate and air quality. Its Ocean Color Instrument measures light across hundreds of finely spaced wavelength bands, enabling detailed characterization of features such as phytoplankton composition, aerosol properties, and cloud microphysics. However, hyperspectral data of this scale is large, complex, and difficult to label, requiring specialized processing and analysis techniques. Existing foundation models, which have transformed computer vision and natural language processing, are generally trained on standard RGB imagery and therefore struggle to interpret the continuous spectral signatures captured by PACE. While recent advances have introduced hyperspectral foundation models, they are typically trained on cloud-free observations and often remain limited to single-sensor datasets due to spectral inconsistencies across instruments. Moreover, existing models tend to be parameter-heavy and computationally expensive, limiting scalability and adoption in operational settings. To address these challenges, we introduce HyperFM, a parameter-efficient hyperspectral foundation model that leverages intra-group and inter-group spectral attention along with hybrid parameter decomposition to better capture spectral spatial relationships while reducing computational cost. HyperFM demonstrates consistent performance improvements over existing hyperspectral foundation models and task-specific state-of-the-art methods across four benchmark downstream atmospheric cloud property retrieval tasks. To support further research, we additionally release HyperFM250K, a large-scale hyperspectral dataset from the PACE mission that includes both clear and cloudy scenes.
[151] arXiv:2604.21129 [pdf, html, other]: Title: AGNT2: Autonomous Agent Economies on Interaction-Optimized Layer 2 Infrastructure

Anbang Ruan, Xing Zhang

Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Current blockchain Layer 2 solutions, including Optimism, Arbitrum, zkSync, and their derivatives, optimize for human-initiated financial transactions. Autonomous AI agents instead generate high-frequency, semantically rich service invocations among mutually untrusting principals. Existing chains treat those interactions as generic calldata, forcing identity, escrow, dependency ordering, and session state to be encoded above the execution layer at the wrong cost point. We present AGNT2, a three-tier stack purpose-built for agent and microservice coordination on-chain. AGNT2 combines: (1) a sidecar deployment pattern that turns any Docker container into an on-chain agent without application-code modification; (2) Layer Top P2P state channels for established bilateral pairs (<100 ms, rough design target 1K-5K TPS per pair, 10M+ aggregate TPS design envelope under endpoint-resource limits), Layer Core as a dependency-aware sequenced rollup for first-contact and multi-party interactions (500 ms-2 s, 300K-500K TPS design target), and Layer Root settlement with computational fraud proofs anchored to any EVM L1; and (3) an agent-native execution environment plus interaction trie that make service invocation, identity, reputation, capabilities, and session context first-class protocol objects. This paper focuses on the execution-layer systems problem: sequencing, state, settlement, and the data-availability (DA) bandwidth gap that bounds all three. Simulation and analytical modeling support the architecture, and prototype measurements validate selected components, but no end-to-end Layer Core implementation exists yet. Practical deployment is currently constrained to roughly 10K-100K TPS by DA throughput, leaving a ~100x gap at the target ceiling. AGNT2 argues that the agent economy requires a dedicated execution layer rather than a general-purpose chain repurposed for agents.
[152] arXiv:2604.21130 [pdf, html, other]: Title: Self-Predictive Representation for Autonomous UAV Object-Goal Navigation

Angel Ayala, Donling Sui, Francisco Cruz, Mitchell Torok, Mohammad Deghat, Bruno J. T. Fernandes

Comments: Submitted to T-RO

Subjects: Robotics (cs.RO)

Autonomous Unmanned Aerial Vehicles (UAVs) have revolutionized industries through their versatility with applications including aerial surveillance, search and rescue, agriculture, and delivery. Their autonomous capabilities offer unique advantages, such as operating in large open space environments. Reinforcement Learning (RL) empowers UAVs to learn intricate navigation policies, enabling them to optimize flight behavior autonomously. However, one of its main challenge is the inefficiency in using data sample to achieve a good policy. In object-goal navigation (OGN) settings, target recognition arises as an extra challenge. Most UAV-related approaches use relative or absolute coordinates to move from an initial position to a predefined location, rather than to find the target directly. This study addresses the data sample efficiency issue in solving a 3D OGN problem, in addition to, the formalization of the unknown target location setting as a Markov decision process. Experiments are conducted to analyze the interplay of different state representation learning (SRL) methods for perception with a model-free RL algorithm for planning in an autonomous navigation system. The main contribution of this study is the development of the perception module, featuring a novel self-predictive model named AmelPred. Empirical results demonstrate that its stochastic version, AmelPredSto, is the best-performing SRL model when combined with actor-critic RL algorithms. The obtained results show substantial improvement in RL algorithms' efficiency by using AmelPredSto in solving the OGN problem.
[153] arXiv:2604.21131 [pdf, html, other]: Title: Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Ari Azarafrooz

Comments: 46 pages, 8 figures. Dataset: this https URL

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips past every session-bound detector because only the aggregate carries the payload. We make three contributions to cross-session threat detection.
(1) Dataset. CSTM-Bench is 26 executable attack taxonomies classified by kill-chain stage and cross-session operation (accumulate, compose, launder, inject_on_reader), each bound to one of seven identity anchors that ground-truth "violation" as a policy predicate, plus matched Benign-pristine and Benign-hard confounders. Released on Hugging Face as intrinsec-ai/cstm-bench with two 54-scenario splits: dilution (compositional) and cross_session (12 isolation-invisible scenarios produced by a closed-loop rewriter that softens surface phrasing while preserving cross-session artefacts).
(2) Measurement. Framing cross-session detection as an information bottleneck to a downstream correlator LLM, we find that a session-bound judge and a Full-Log Correlator concatenating every prompt into one long-context call both lose roughly half their attack recall moving from dilution to cross_session, well inside any frontier context window. Scope: 54 scenarios per shard, one correlator family (Anthropic Claude), no prompt optimisation; we release it to motivate larger, multi-provider datasets.
(3) Algorithm and metric. A bounded-memory Coreset Memory Reader retaining highest-signal fragments at $K=50$ is the only reader whose recall survives both shards. Because ranker reshuffles break KV-cache prefix reuse, we promote $\mathrm{CSR\_prefix}$ (ordered prefix stability, LLM-free) to a first-class metric and fuse it with detection into $\mathrm{CSTM} = 0.7 F_1(\mathrm{CSDA@action}, \mathrm{precision}) + 0.3 \mathrm{CSR\_prefix}$, benchmarking rankers on a single Pareto of recall versus serving stability.
[154] arXiv:2604.21133 [pdf, html, other]: Title: GRISP: Guided Recurrent IRI Selection over SPARQL Skeletons

Sebastian Walter, Hannah Bast

Subjects: Computation and Language (cs.CL)

We present GRISP (Guided Recurrent IRI Selection over SPARQL Skeletons), a novel SPARQL-based question-answering method over knowledge graphs based on fine-tuning a small language model (SLM). Given a natural-language question, the method first uses the SLM to generate a natural-language SPARQL query skeleton, and then to re-rank and select knowledge graph items to iteratively replace the natural-language placeholders using knowledge graph constraints. The SLM is jointly trained on skeleton generation and list-wise re-ranking data generated from standard question-query pairs. We evaluate the method on common Wikidata and Freebase benchmarks, and achieve better results than other state-of-the-art methods in a comparable setting.
[155] arXiv:2604.21134 [pdf, html, other]: Title: Beyond Pixels: Introspective and Interactive Grounding for Visualization Agents

Yiyang Lu, Woong Shin, Ahmad Maroof Karimi, Feiyi Wang, Jie Ren, Evgenia Smirni

Comments: 18 pages, 8 figures

Subjects: Computation and Language (cs.CL)

Vision-Language Models (VLMs) frequently misread values, hallucinate details, and confuse overlapping elements in charts. Current approaches rely solely on pixel interpretation, creating a Pixel-Only Bottleneck: agents treat interactive charts as static images, losing access to the structured specification that encodes exact values. We introduce Introspective and Interactive Visual Grounding (IVG), a framework that combines (1) spec-grounded introspection, which queries the underlying specification for deterministic evidence, with (2) view-grounded interaction, which manipulates the view to resolve visual ambiguity. To enable evaluation without VLM bias, we present iPlotBench, a benchmark of 500 interactive Plotly figures with 6,706 binary questions and ground-truth specifications. Experiments show that introspection improves data reconstruction fidelity, while the combination with interaction achieves the highest QA accuracy (0.81), with +6.7 % gains on overlapping geometries. We further demonstrate IVG in deployed agents that explore data autonomously and collaborate with human users in real time.
[156] arXiv:2604.21137 [pdf, html, other]: Title: Enhancing Science Classroom Discourse Analysis through Joint Multi-Task Learning for Reasoning-Component Classification

Jiho Noh, Mukhesh Raghava Katragadda, Raymond Carl, Soon Lee

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Analyzing the reasoning patterns of students in science classrooms is critical for understanding knowledge construction mechanism and improving instructional practice to maximize cognitive engagement, yet manual coding of classroom discourse at scale remains prohibitively labor-intensive. We present an automated discourse analysis system (ADAS) that jointly classifies teacher and student utterances along two complementary dimensions: Utterance Type and Reasoning Component derived from our prior CDAT framework. To address severe label imbalance among minority classes, we (1) stratify-resplit the annotated corpus, (2) apply LLM-based synthetic data augmentation targeting minority classes, and (3) train a dual-probe head RoBERTa-base classifier. A zero-shot GPT-5.4 baseline achieves macro-F1 of 0.467 on UT and 0.476 on RC, establishing meaningful upper bounds for prompt-only approaches motivating fine-tuning. Beyond classification, we conduct discourse pattern analyses including UTxRC co-occurrence profiling, Cognitive Complexity Index (CCI) computation per session, lag-sequential analysis, and IRF chain analysis, revealing that teacher Feedback-with-Question (Fq) moves are the most consistent antecedents of student inferential reasoning (SR-I). Our results demonstrate that LLM-based augmentation meaningfully improves UT minority-class recognition, and that the structural simplicity of the RC task makes it tractable even for lexical baselines.
[157] arXiv:2604.21138 [pdf, html, other]: Title: Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

Jiabao Ji, Yongchao Chen, Yang Zhang, Ramana Rao Kompella, Chuchu Fan, Gaowen Liu, Shiyu Chang

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Multi-robot control in cluttered environments is a challenging problem that involves complex physical constraints, including robot-robot collisions, robot-obstacle collisions, and unreachable motions. Successful planning in such settings requires joint optimization over high-level task planning and low-level motion planning, as violations of physical constraints may arise from failures at either level. However, jointly optimizing task and motion planning is difficult due to the complex parameterization of low-level motion trajectories and the ambiguity of credit assignment across the two planning levels. In this paper, we propose a hybrid multi-robot control framework that jointly optimizes task and motion planning. To enable effective parameterization of low-level planning, we introduce waypoints, a simple yet expressive representation for motion trajectories. To address the credit assignment challenge, we adopt a curriculum-based training strategy with a modified RLVR algorithm that propagates motion feasibility feedback from the motion planner to the task planner. Experiments on BoxNet3D-OBS, a challenging multi-robot benchmark with dense obstacles and up to nine robots, show that our approach consistently improves task success over motion-agnostic and VLA-based baselines. Our code is available at this https URL
[158] arXiv:2604.21139 [pdf, html, other]: Title: Slot Machines: How LLMs Keep Track of Multiple Entities

Paul C. Bogdan, Jack Lindsey

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Language models must bind entities to the attributes they possess and maintain several such binding relationships within a context. We study how multiple entities are represented across token positions and whether single tokens can carry bindings for more than one entity. We introduce a multi-slot probing approach that disentangles a single token's residual stream activation to recover information about both the currently described entity and the immediately preceding one. These two kinds of information are encoded in separate and largely orthogonal "current-entity" and "prior-entity" slots. We analyze the functional roles of these slots and find that they serve different purposes. In tandem with the current-entity slot, the prior-entity slot supports relational inferences, such as entity-level induction ("who came after Alice in the story?") and conflict detection between adjacent entities. However, only the current-entity slot is used for explicit factual retrieval questions ("Is anyone in the story tall?" "What is the tall entity's name?") despite these answers being linearly decodable from the prior-entity slot too. Consistent with this limitation, open-weight models perform near chance accuracy at processing syntax that forces two subject-verb-object bindings on a single token (e.g., "Alice prepares and Bob consumes food.") Interestingly, recent frontier models can parse this properly, suggesting they may have developed more sophisticated binding strategies. Overall, our results expose a gap between information that is available in activations and information the model actually uses, and suggest that the current/prior-entity slot structure is a natural substrate for behaviors that require holding two perspectives at once, such as sycophancy and deception.
[159] arXiv:2604.21140 [pdf, html, other]: Title: On Time-Memory Tradeoffs for Maximal Palindromes with Wildcards and $k$-Mismatches

Amihood Amir, Ayelet Butman, Michael Itzhaki, Dina Sokol

Comments: Full version, accepted to CPM26

Subjects: Data Structures and Algorithms (cs.DS)

This paper addresses the problem of identifying palindromic factors in texts that include wildcards -- special characters that match all others. These symbols challenge many classical algorithms, as numerous combinatorial properties are not satisfied in their presence. We apply existing wildcard-LCE techniques to obtain a continuous time-memory tradeoff, and present the first non-trivial linear-space algorithm for computing all maximal palindromes with wildcards, improving the best known time-memory product in certain parameter ranges. Our main results are algorithms to find and approximate all maximal palindromes in a given text. We also generalize both methods to the $k$-mismatches setting, with or without wildcards.
[160] arXiv:2604.21144 [pdf, html, other]: Title: Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue

Biswesh Mohapatra, Giovanni Duca, Laurent Romary, Justine Cassell

Comments: Work under review. Biswesh Mohapatra and Giovanni Duca both contributed equally to this work

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Situated dialogue requires speakers to maintain a reliable representation of shared context rather than reasoning only over isolated utterances. Current conversational agents often struggle with this requirement, especially when the common ground must be preserved beyond the immediate context window. In such settings, fine-grained distinctions are frequently compressed into purely textual representations, leading to a critical failure mode we call \emph{representational blur}, in which similar but distinct entities collapse into interchangeable descriptions. This semantic flattening creates an illusion of grounding, where agents appear locally coherent but fail to track shared context persistently over time. Inspired by the role of mental imagery in human reasoning, and based on the increased availability of multimodal models, we explore whether conversational agents can be given an analogous ability to construct some depictive intermediate representations during dialogue to address these limitations. Thus, we introduce an active visual scaffolding framework that incrementally converts dialogue state into a persistent visual history that can later be retrieved for grounded response generation. Evaluation on the IndiRef benchmark shows that incremental externalization itself improves over full-dialog reasoning, while visual scaffolding provides additional gains by reducing representational blur and enforcing concrete scene commitments. At the same time, textual representations remain advantageous for non-depictable information, and a hybrid multimodal setting yields the best overall performance. Together, these findings suggest that conversational agents benefit from an explicitly multimodal representation of common ground that integrates depictive and propositional information.
[161] arXiv:2604.21146 [pdf, html, other]: Title: WFM: 3D Wavelet Flow Matching for Ultrafast Multi-Modal MRI Synthesis

Yalcin Tur, Mihajlo Stojkovic, Ulas Bagci

Comments: 17 pages, 4 figures, 3 tables. Accepted at MIDL 2026 (Poster)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion models have achieved remarkable quality in multi-modal MRI synthesis, but their computational cost (hundreds of sampling steps and separate models per modality) limits clinical deployment. We observe that this inefficiency stems from an unnecessary starting point: diffusion begins from pure noise, discarding the structural information already present in available MRI sequences. We propose WFM (Wavelet Flow Matching), which instead learns a direct flow from an informed prior, the mean of conditioning modalities in wavelet space, to the target distribution. Because the source and target share underlying anatomy and differ primarily in contrast, this formulation enables accurate synthesis in just 1-2 integration steps. A single 82M-parameter model with class conditioning synthesizes all four BraTS modalities (T1, T1c, T2, FLAIR), replacing four separate diffusion models totaling 326M parameters. On BraTS 2024, WFM achieves 26.8 dB PSNR and 0.94 SSIM, within 1-2 dB of diffusion baselines, while running 250-1000x faster (0.16-0.64s vs. 160s per volume). This speed-quality trade-off makes real-time MRI synthesis practical for clinical workflows. Code is available at this https URL.
[162] arXiv:2604.21147 [pdf, html, other]: Title: StarLoc: Pinpointing Transmitting LEO Satellites from a Single Passive Array

Ishani Janveja, Jida Zhang, Emerson Sie, Deepak Vasisht

Comments: To appear at The 24th Annual International Conference on Mobile Systems, Applications and Services (MobiSys '26), Cambridge, UK

Subjects: Networking and Internet Architecture (cs.NI)

This paper focuses on 3D localization of transmitting satellites in low Earth orbits (LEO). 3D localization of transmitters in low orbits is an important emerging problem for many applications such as spectrum management, orbit determination, and backup for GPS failures in orbit. We present StarLoc -- a system to geolocate transmitters in space using a combination of orbital modeling and a new interferometric 3D angle-of-arrival estimation technique. StarLoc's design relies on a unique insight -- the motion of satellites is governed by orbital dynamics and is therefore along a 2D manifold in a 3D space. This reduces the degrees of freedom in satellite motion and allows us to 3D-locate and track a satellite with just three antennas in a 2D plane. We evaluate the system using signal transmissions from 81 Starlink satellites. Our results show that StarLoc can estimate the 3D-angle of a satellite within 0.7 degrees and the orbital range within 5 km. Our dataset and implementation are available at: this https URL.
[163] arXiv:2604.21148 [pdf, html, other]: Title: "This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias

Siyu Liang, Alicia Beckford Wassink

Subjects: Computation and Language (cs.CL)

Studies on bias in Automatic Speech Recognition (ASR) tend to focus on reporting error rates for speakers of underrepresented dialects, yet less research examines the human side of system bias: how do system failures shape users' lived experiences, how do users feel about and react to them, and what emotional toll do these repeated failures exact? We conducted user experience studies across four U.S. locations (Atlanta, Gulf Coast, Miami Beach, and Tucson) representing distinct English dialect communities. Our findings reveal that most participants report technologies fail to consider their cultural backgrounds and require constant adjustment to achieve basic functionality. Despite these experiences, participants maintain high expectations for ASR performance and express strong willingness to contribute to model improvement. Qualitative analysis of open-ended narratives exposes the deeper costs of these failures. Participants report frustration, annoyance, and feelings of inadequacy, yet the emotional impact extends beyond momentary reactions. Participants recognize that systems were not designed for them, yet often internalize failures as personal inadequacy despite this critical awareness. They perform extensive invisible labor, including code-switching, hyper-articulation, and emotional management, to make failing systems functional. Meanwhile, their linguistic and cultural knowledge remains unrecognized by technologies that encode particular varieties as standard while rendering others marginal. These findings demonstrate that algorithmic fairness assessments based on accuracy metrics alone miss critical dimensions of harm: the emotional labor of managing repeated technological rejection, the cognitive burden of constant self-monitoring, and the psychological toll of feeling inadequate in one's native language variety.
[164] arXiv:2604.21150 [pdf, other]: Title: The State of Scientific Poster Sharing and Reuse

Aydan Gasimova, Paapa Mensah-Kane, Gerard F. Blake, Sanjay Soundarajan, James ONeill, Bhavesh Patel

Subjects: Digital Libraries (cs.DL); Databases (cs.DB)

Scientific posters are one of the most common forms of scholarly communication and contain early-stage insights with potential to accelerate scientific discovery. We investigated where posters are shared, to what extent their sharing aligns with the FAIR principles, and how commonly they are reused. We identified 86 platforms hosting posters, with many not assigning persistent identifiers. A total of 150k posters are shared as of 2024 on the 43 platforms where we were able to count, which is relatively low. Looking in more detail at posters shared on Zenodo and Figshare, we found that repositories are not always supporting structured metadata critical for poster discovery, like conference information, and that researchers are not providing such metadata even if they are supported. We also observed that while there is some engagement with posters in terms of views and downloads, citing posters is not yet a common practice. Our recommendations are for the scientific community to encourage poster sharing and reuse and establish clear guidelines to make posters FAIR.
[165] arXiv:2604.21152 [pdf, html, other]: Title: Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles

Irti Haq, Belén Saldías

Comments: In The 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT '26), June 25--28, 2026, Montreal, Canada. ACM, New York, NY, USA, 32 pages

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

As state-of-the-art Large Language Models (LLMs) have become ubiquitous, ensuring equitable performance across diverse demographics is critical. However, it remains unclear whether these disparities arise from the explicitly stated identity itself or from the way identity is signaled. In real-world interactions, users' identity is often conveyed implicitly through a complex combination of various socio-linguistic factors. This study disentangles these signals by employing a factorial design with over 24,000 responses from two open-weight LLMs (Gemma-3-12B and Qwen-3-VL-8B), comparing prompts with explicitly announced user profiles against implicit dialect signals (e.g., AAVE, Singlish) across various sensitive domains. Our results uncover a unique paradox in LLM safety where users achieve ``better'' performance by sounding like a demographic than by stating they belong to it. Explicit identity prompts activate aggressive safety filters, increasing refusal rates and reducing semantic similarity compared to our reference text for Black users. In contrast, implicit dialect cues trigger a powerful ``dialect jailbreak,'' reducing refusal probability to near zero while simultaneously achieving a greater level of semantic similarity to the reference texts compared to Standard American English prompts. However, this ``dialect jailbreak'' introduces a critical safety trade-off regarding content sanitization. We find that current safety alignment techniques are brittle and over-indexed on explicit keywords, creating a bifurcated user experience where ``standard'' users receive cautious, sanitized information while dialect speakers navigate a less sanitized, more raw, and potentially a more hostile information landscape and highlights a fundamental tension in alignment--between equitable and linguistic diversity--and underscores the need for safety mechanisms that generalize beyond explicit cues.
[166] arXiv:2604.21153 [pdf, html, other]: Title: Image-Based Malware Type Classification on MalNet-Image Tiny: Effects of Multi-Scale Fusion, Transfer Learning, Data Augmentation, and Schedule-Free Optimization

Ahmed A. Abouelkhaire, Waleed A. Yousef, Issa Traor

Subjects: Cryptography and Security (cs.CR)

This paper studies 43-class malware type classification on MalNet-Image Tiny, a public benchmark derived from Android APK files. The goal is to assess whether a compact image classifier benefits from four components evaluated in a controlled ablation: a feature pyramid network (FPN) for scale variation induced by resizing binaries of different lengths, ImageNet pretraining, lightweight augmentation through Mixup and TrivialAugment, and schedule-free AdamW optimization. All experiments use a ResNet18 backbone and the provided train/validation/test split. Reproducing the benchmark-style configuration yields macro-F1 (F1_macro) of 0.6510, consistent with the reported baseline of approximately 0.65. Replacing the optimizer with schedule-free AdamW and using unweighted cross-entropy increases F1_macro to 0.6535 in 10 epochs, compared with 96 epochs for the reproduced baseline. The best configuration combines pretraining, Mixup, TrivialAugment, and FPN, reaching F1_macro=0.6927, P_macro=0.7707, AUC_macro=0.9556, and L_test=0.8536. The ablation indicates that the largest gains in F1_macro arise from pretraining and augmentation, whereas FPN mainly improves P_macro, AUC_macro, and L_test in the strongest configuration.
[167] arXiv:2604.21154 [pdf, html, other]: Title: Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction

Abhishek Dharmaratnakar, Srivaths Ranganathan, Anushree Sinha, Debanshu Das

Comments: 3 pages, 2 figures, submitted to ICDH IEEE conference

Subjects: Artificial Intelligence (cs.AI)

At-home physiotherapy compliance remains critically low due to a lack of personalized supervision and dynamic feedback. Existing digital health solutions rely on static, pre-recorded video libraries or generic 3D avatars that fail to account for a patient's specific injury limitations or home environment. In this paper, we propose a novel Multi-Agent System (MAS) architecture that leverages Generative AI and computer vision to close the tele-rehabilitation loop. Our framework consists of four specialized micro-agents: a Clinical Extraction Agent that parses unstructured medical notes into kinematic constraints; a Video Synthesis Agent that utilizes foundational video generation models to create personalized, patient-specific exercise videos; a Vision Processing Agent for real-time pose estimation; and a Diagnostic Feedback Agent that issues corrective instructions. We present the system architecture, detail the prototype pipeline using Large Language Models and MediaPipe, and outline our clinical evaluation plan. This work demonstrates the feasibility of combining generative media with agentic autonomous decision-making to scale personalized patient care safely and effectively.
[168] arXiv:2604.21155 [pdf, html, other]: Title: Multi-Agent Empowerment and Emergence of Complex Behavior in Groups

Tristan Shah, Ilya Nemenman, Daniel Polani, Stas Tiomkin

Comments: 11 pages

Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Intrinsic motivations are receiving increasing attention, i.e. behavioral incentives that are not engineered, but emerge from the interaction of an agent with its surroundings. In this work we study the emergence of behaviors driven by one such incentive, empowerment, specifically in the context of more than one agent. We formulate a principled extension of empowerment to the multi-agent setting, and demonstrate its efficient calculation. We observe that this intrinsic motivation gives rise to characteristic modes of group-organization in two qualitatively distinct environments: a pair of agents coupled by a tendon, and a controllable Vicsek flock. This demonstrates the potential of intrinsic motivations such as empowerment to not just drive behavior for only individual agents but also higher levels of behavioral organization at scale.
[169] arXiv:2604.21159 [pdf, html, other]: Title: Adaptive Instruction Composition for Automated LLM Red-Teaming

Jesse Zymet, Andy Luo, Swapnil Shinde, Sahil Wadhwa, Emily Chen

Comments: Accepted to ACL 2026 Main Conference

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with identifying effective strategies through trial and error, resulting in a semantically limited range of successes. Another approach discovers diverse attacks by combining crowdsourced harmful queries and tactics into instructions for the attacker, but does so at random, limiting effectiveness. This article introduces a novel framework, Adaptive Instruction Composition, that combines crowdsourced texts according to an adaptive mechanism trained to jointly optimize effectiveness with diversity. We use reinforcement learning to balance exploration with exploitation in a combinatorial space of instructions to guide the attacker toward diverse generations tailored to target vulnerabilities. We demonstrate that our approach substantially outperforms random combination on a set of effectiveness and diversity metrics, even under model transfer. Further, we show that it surpasses a host of recent adaptive approaches on Harmbench. We employ a lightweight neural contextual bandit that adapts to contrastive embedding inputs, and provide ablations suggesting that the contrastive pretraining enables the network to rapidly generalize and scale to the massive space as it learns.
[170] arXiv:2604.21160 [pdf, html, other]: Title: Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment

Jingkun Chen, Ruoshi Xu, Mingqi Gao, Shengda Luo, Jungong Han

Comments: 10 pages, 3 figures, 5 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Point-Vision-Language Models promise to empower embodied agents with executable spatial reasoning, yet they frequently succumb to geometric hallucination where predicted 3D structures contradict the observed 2D reality. We identify a key cause of this failure not as a representation bottleneck but as a structural misalignment in reinforcement learning, where sparse geometric tokens are drowned out by noisy and broadcasted sequence-level rewards. To resolve this causal dilution, we propose Geometric Reward Credit Assignment, a framework that disentangles holistic supervision into field-specific signals and routes them exclusively to their responsible token spans. This mechanism transforms vague feedback into precise gradient updates and effectively turns generic policy optimization into targeted structural alignment. Furthermore, we internalize physical constraints via a Reprojection-Consistency term which serves as a cross-modal verifier to penalize physically impossible geometries. Validated on a calibrated benchmark derived from ShapeNetCore, our approach bridges the reliability gap by boosting 3D KPA from 0.64 to 0.93, increasing 3D bounding box intersection over union to 0.686, and raising reprojection consistency scores to 0.852. Crucially, these gains are achieved while maintaining robust 2D localization performance, marking a meaningful step from plausible textual outputs toward physically verifiable spatial predictions.
[171] arXiv:2604.21164 [pdf, html, other]: Title: MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control

Jialong Mai, Xiaofen Xing, Xiangmin Xu

Subjects: Sound (cs.SD)

Fine-grained local timing control is still absent from modern text-to-speech systems: existing approaches typically provide only utterance-level duration or global speaking-rate control, while precise token-level timing manipulation remains unavailable. To the best of our knowledge, MAGIC-TTS is the first TTS model with explicit local timing control over token-level content duration and pause. MAGIC-TTS is enabled by explicit token-level duration conditioning, carefully prepared high-confidence duration supervision, and training mechanisms that correct zero-value bias and make the model robust to missing local controls. On our timing-control benchmark, MAGIC-TTS substantially improves token-level duration and pause following over spontaneous synthesis. Even when no timing control is provided, MAGIC-TTS maintains natural high-quality synthesis. We further evaluate practical local editing with a scenario-based benchmark covering navigation guidance, guided reading, and accessibility-oriented code reading. In this setting, MAGIC-TTS realizes a reproducible uniform-timing baseline and then moves the edited regions toward the requested local targets with low mean bias. These results show that explicit fine-grained controllability can be implemented effectively in a high-quality TTS system and can support realistic local timing-editing applications.
[172] arXiv:2604.21169 [pdf, html, other]: Title: Position Paper: Denial-of-Service Against Multi-Round Transaction Simulation

Yuzhe Tang, Yibo Wang, Wanning Ding, Jiaqi Chen, Taesoo Kim

Subjects: Cryptography and Security (cs.CR)

In Ethereum, transaction-bundling services are a critical component of block builders, such as Flashbots Bundles, and are widely used by MEV searchers. Disrupting bundling services can degrade searcher experience and reduce builder revenue. Despite the extensive studies, the existing denial-of-service attack designs are ineffective against bundling services due to their unique multi-round execution model.
This paper studies the open problem of asymmetric denial-of-service against bundling services. We develop evasive, risk-free, and low-cost DoS attacks on Flashbots' bundling service, the only open-source bundling service known to us. Our attacks exploit inter-transaction dependencies through contract state to achieve evasiveness, and abuse bundling-specific features, such as atomic block inclusion, to significantly reduce both capital and operational costs of the attack.
Experimental results show that our attacks achieve high success rates, substantially reduce builders' revenue, and slow block production. We further propose mitigation strategies for the identified risks.
[173] arXiv:2604.21172 [pdf, html, other]: Title: TAPO-Description Logic for Information Behavior: Refined OBoxes, Inference, and Categorical Semantics

Takao Inoué

Comments: 23 pages, 2 figures. Substantially expanded version of arXiv:2602.17242; adds a guard-judgment layer, refined OBoxes, core inference rules, categorical semantics, sheaf-theoretic refinement, and a browsing-theory appendix

Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

This paper develops a refined version of TAPO-description logic for the analysis of information behavior. The framework is treated not as a single homogeneous object logic, but as a layered formalism consisting of a static descriptive layer (TBox/ABox), a procedural layer (PBox), and an oracle-sensitive layer (OBox). To make this architecture mathematically explicit, we introduce a metalevel guard-judgment layer governing procedural branching and iteration. On this basis we formulate a core inference system for TAPO-description logic, covering static TBox/ABox reasoning, guarded procedural transition in the PBox, and validated external import in the OBox. We then give a categorical semantics for the resulting framework and indicate its sheaf-theoretic refinement. The theory is illustrated by examples of information-seeking behavior, including simple search behavior and review-sensitive ordering behavior in a curry restaurant. The aim is to treat not only static knowledge representation but also hesitation, external consultation, and action-guiding update within a unified logical setting.
[174] arXiv:2604.21174 [pdf, html, other]: Title: Scaling of Gaussian Kolmogorov--Arnold Networks

Amir Noorizadegan, Sifan Wang

Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Analysis of PDEs (math.AP)

The Gaussian scale parameter $\epsilon$ is central to the behavior of Gaussian Kolmogorov--Arnold Networks (KANs), yet its role in deep edge-based architectures has not been studied systematically. In this paper, we investigate how $\epsilon$ affects Gaussian KANs through first-layer feature geometry, conditioning, and approximation behavior. Our central observation is that scale selection is governed primarily by the first layer, since it is the only layer constructed directly on the input domain and any loss of distinguishability introduced there cannot be recovered by later layers.
From this viewpoint, we analyze the first-layer feature matrix and identify a practical operating interval, \[ \epsilon \in \left[\frac{1}{G-1},\frac{2}{G-1}\right], \] where $G$ denotes the number of Gaussian centers. For the standard shared-center Gaussian KAN used in current practice, we interpret this interval not as a universal optimality result, but as a stable and effective design rule, and validate it through brute-force sweeps over $\epsilon$ across function-approximation problems with different collocation densities, grid resolutions, network architectures, and input dimensions, as well as a physics-informed Helmholtz problem. We further show that this range is useful for fixed-scale selection, variable-scale constructions, constrained training of $\epsilon$, and efficient scale search using early training MSE. Finally, using a matched Chebyshev reference, we show that a properly scaled Gaussian KAN can already be competitive in accuracy relative to another standard KAN basis. In this way, the paper positions scale selection as a practical design principle for Gaussian KANs rather than as an ad hoc hyperparameter choice.
[175] arXiv:2604.21175 [pdf, html, other]: Title: Graph Neural Network-Informed Predictive Flows for Faster Ford-Fulkerson and PAC-Learnability

Eleanor Wiesler, Trace Baxley

Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)

We propose a learning-augmented framework for accelerating max-flow computation and image segmentation by integrating Graph Neural Networks (GNNs) with the Ford-Fulkerson algorithm. Rather than predicting initial flows, our method learns edge importance probabilities to guide augmenting path selection. We introduce a Message Passing GNN (MPGNN) that jointly learns node and edge embeddings through coupled updates, capturing both global structure and local flow dynamics such as residual capacity and bottlenecks.
Given an input image, we propose a method to construct a grid-based flow network with source and sink nodes, extract features, and perform a single GNN inference to assign edge probabilities reflecting their likelihood of belonging to high-capacity cuts. These probabilities are stored in a priority queue and used to guide a modified Ford-Fulkerson procedure, prioritizing augmenting paths via an Edmonds-Karp-style search with bottleneck-aware tie-breaking. This avoids repeated inference over residual graphs while leveraging learned structure throughout optimization.
We further introduce a bidirectional path construction strategy centered on high-probability edges and provide a theoretical framework relating prediction quality to efficiency via a weighted permutation distance metric. Our method preserves max-flow/min-cut optimality while reducing the number of augmentations in practice. We also outline a hybrid extension combining flow warm-starting with edge-priority prediction, establishing a foundation for learning-guided combinatorial optimization in image segmentation.
[176] arXiv:2604.21182 [pdf, html, other]: Title: WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images

Yuki Fujimura, Takahiro Kushida, Kazuya Kitano, Takuya Funatomi, Yasuhiro Mukaigawa

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose WildSplatter, a feed-forward 3D Gaussian Splatting (3DGS) model for unconstrained images with unknown camera parameters and varying lighting conditions. 3DGS is an effective scene representation that enables high-quality, real-time rendering; however, it typically requires iterative optimization and multi-view images captured under consistent lighting with known camera parameters. WildSplatter is trained on unconstrained photo collections and jointly learns 3D Gaussians and appearance embeddings conditioned on input images. This design enables flexible modulation of Gaussian colors to represent significant variations in lighting and appearance. Our method reconstructs 3D Gaussians from sparse input views in under one second, while also enabling appearance control under diverse lighting conditions. Experimental results demonstrate that our approach outperforms existing pose-free 3DGS methods on challenging real-world datasets with varying illumination.
[177] arXiv:2604.21188 [pdf, other]: Title: Physically Unclonable Functions for Secure IoT Authentication and Hardware-Anchored AI Model Integrity

Maryam Taghi Zadeh, Mohsen Ahmadi

Subjects: Cryptography and Security (cs.CR)

The rapid integration of artificial intelligence (AI) into Internet of Things (IoT) and edge computing systems has intensified the need for robust, hardware-rooted trust mechanisms capable of ensuring device authenticity and AI model integrity under strict resource and security constraints. This survey reviews and synthesizes existing literature on hardware-rooted trust mechanisms for AI-enabled IoT systems. It systematically examines and compares representative trust anchor mechanisms, including Trusted Platform Module (TPM)-based measurement and attestation, silicon and FPGA-based Physical Unclonable Functions (PUFs), hybrid container-aware hardware roots of trust, and software-only security approaches. The analysis highlights how hardware-rooted solutions generally provide stronger protection against physical tampering and device cloning compared to software-only approaches, particularly in adversarial and physically exposed environments, while hybrid designs extend hardware trust into runtime and containerized environments commonly used in modern edge deployments. By evaluating trade-offs among security strength, scalability, cost, and deployment complexity, the study shows that PUF-based and hybrid trust anchors offer a promising balance for large-scale, AI-enabled IoT systems, whereas software-only trust mechanisms remain insufficient in adversarial and physically exposed settings. The presented comparison aims to clarify current design challenges and guide future development of trustworthy AI-enabled IoT platforms.
[178] arXiv:2604.21189 [pdf, html, other]: Title: Full-Body Dynamic Safety for Robot Manipulators: 3D Poisson Safety Functions for CBF-Based Safety Filters

Meg Wilkinson, Gilbert Bahati, Ryan M. Bena, Emily Fourney, Joel W. Burdick, Aaron D. Ames

Subjects: Robotics (cs.RO)

Collision avoidance for robotic manipulators requires enforcing full-body safety constraints in high-dimensional configuration spaces. Control Barrier Function (CBF) based safety filters have proven effective in enabling safe behaviors, but enforcing the high number of constraints needed for safe manipulation leads to theoretic and computational challenges. This work presents a framework for full-body collision avoidance for manipulators in dynamic environments by leveraging 3D Poisson Safety Functions (PSFs). In particular, given environmental occupancy data, we sample the manipulator surface at a prescribed resolution and shrink free space via a Pontryagin difference according to this resolution. On this buffered domain, we synthesize a globally smooth CBF by solving Poisson's equation, yielding a single safety function for the entire environment. This safety function, evaluated at each sampled point, yields task-space CBF constraints enforced by a real-time safety filter via a multi-constraint quadratic program. We prove that keeping the sample points safe in the buffered region guarantees collision avoidance for the entire continuous robot surface. The framework is validated on a 7-degree-of-freedom manipulator in dynamic environments.
[179] arXiv:2604.21190 [pdf, html, other]: Title: SpatiO: Adaptive Test-Time Orchestration of Vision-Language Agents for Spatial Reasoning

Chan Yeong Hwang, Miso Choi, Sunghyun On, Jinkyu Kim, Jungbeom Lee

Comments: Technical report

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Understanding visual scenes requires not only recognizing objects but also reasoning about their spatial relationships. Unlike general vision-language tasks, spatial reasoning requires integrating multiple inductive biases, such as 2D appearance cues, depth signals, and geometric constraints, whose reliability varies across contexts. This suggests that effective spatial reasoning requires \emph{spatial adaptability}: the ability to flexibly coordinate different reasoning strategies depending on the input. However, most existing approaches rely on a single reasoning pipeline that implicitly learns a fixed spatial prior, limiting their ability to adapt under distribution changes. Multi-agent systems offer a promising alternative by aggregating diverse reasoning trajectories, but prior attempts in spatial reasoning primarily employ homogeneous agents, restricting the diversity of inductive biases they can leverage. In this work, we introduce \textbf{\textsc{SpatiO}}, a heterogeneous multi-agent framework for spatial reasoning that coordinates multiple vision-language specialists with complementary inductive biases. To enable effective collaboration, we propose \textbf{Test-Time Orchestration (TTO)}, an optimization mechanism that dynamically evaluates and reweights agents based on their observed reliability during inference, without modifying model parameters. Extensive experiments on diverse spatial reasoning benchmarks, including 3DSRBench, STVQA-7k, CV-Bench, and Omni3D-Bench, demonstrate that \textsc{SpatiO} consistently improves spatial reasoning performance over both closed-source and open-source baselines.
[180] arXiv:2604.21191 [pdf, other]: Title: Prefix Parsing is Just Parsing

Clemente Pasti, Andreas Opedal, Timothy J. O'Donnell, Ryan Cotterell, Tim Vieira

Comments: To appear at ACL 2026

Subjects: Computation and Language (cs.CL)

Prefix parsing asks whether an input prefix can be extended to a complete string generated by a given grammar. In the weighted setting, it also provides prefix probabilities, which are central to context-free language modeling, psycholinguistic analysis, and syntactically constrained generation from large language models. We introduce the prefix grammar transformation, an efficient reduction of prefix parsing to ordinary parsing. Given a grammar, our method constructs another grammar that generates exactly the prefixes of its original strings. Prefix parsing is then solved by applying any ordinary parsing algorithm on the transformed grammar without modification. The reduction is both elegant and practical: the transformed grammar is only a small factor larger than the input, and any optimized implementation can be used directly, eliminating the need for bespoke prefix-parsing algorithms. We also present a strategy-based on algorithmic differentiation-for computing the next-token weight vector, i.e., the prefix weights of all one-token extensions, enabling efficient prediction of the next token. Together, these contributions yield a simple, general, and efficient framework for prefix parsing.
[181] arXiv:2604.21192 [pdf, html, other]: Title: How VLAs (Really) Work In Open-World Environments

Amir Rasouli, Yangzheng Wu, Zhiyuan Li, Rui Heng Yang, Xuan Zhao, Charles Eret, Sajjad Pakdamansavoji

Comments: 8 pages, 7 figures, 2 tables

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Vision-language-action models (VLAs) have been extensively used in robotics applications, achieving great success in various manipulation problems. More recently, VLAs have been used in long-horizon tasks and evaluated on benchmarks, such as BEHAVIOR1K (B1K), for solving complex household chores. The common metric for measuring progress in such benchmarks is success rate or partial score based on satisfaction of progress-agnostic criteria, meaning only the final states of the objects are considered, regardless of the events that lead to such states. In this paper, we argue that using such evaluation protocols say little about safety aspects of operation and can potentially exaggerate reported performance, undermining core challenges for future real-world deployment. To this end, we conduct a thorough analysis of state-of-the-art models on the B1K Challenge and evaluate policies in terms of robustness via reproducibility and consistency of performance, safety aspects of policies operations, task awareness, and key elements leading to the incompletion of tasks. We then propose evaluation protocols to capture safety violations to better measure the true performance of the policies in more complex and interactive scenarios. At the end, we discuss the limitations of the existing VLAs and motivate future research.
[182] arXiv:2604.21193 [pdf, html, other]: Title: Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

Vipula Rawte, Ryan Rossi, Franck Dernoncourt, Nedim Lipka

Subjects: Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation poses significant risks in high-stakes domains such as healthcare, law, and scientific communication, where trust and verifiability are paramount. In this paper, we introduce DAVinCI - a Dual Attribution and Verification framework designed to enhance the factual reliability and interpretability of LLM outputs. DAVinCI operates in two stages: (i) it attributes generated claims to internal model components and external sources; (ii) it verifies each claim using entailment-based reasoning and confidence calibration. We evaluate DAVinCI across multiple datasets, including FEVER and CLIMATE-FEVER, and compare its performance against standard verification-only baselines. Our results show that DAVinCI significantly improves classification accuracy, attribution precision, recall, and F1-score by 5-20%. Through an extensive ablation study, we isolate the contributions of evidence span selection, recalibration thresholds, and retrieval quality. We also release a modular DAVinCI implementation that can be integrated into existing LLM pipelines. By bridging attribution and verification, DAVinCI offers a scalable path to auditable, trustworthy AI systems. This work contributes to the growing effort to make LLMs not only powerful but also accountable.
[183] arXiv:2604.21197 [pdf, html, other]: Title: Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Guilin Deng, Silong Chen, Yuchuan Luo, Yi Liu, Songlei Wang, Zhiping Cai, Lin Liu, Xiaohua Jia, Shaojing Fu

Comments: This is the full version (including complete appendices and supplementary materials) of the paper accepted for publication at the 2026 IEEE Symposium on Security and Privacy

Subjects: Machine Learning (cs.LG)

Federated Large Language Models (FedLLMs) enable multiple parties to collaboratively fine-tune LLMs without sharing raw data, addressing challenges of limited resources and privacy concerns. Despite data localization, shared gradients can still expose sensitive information through membership inference attacks (MIAs). However, FedLLMs' unique properties, i.e. massive parameter scales, rapid convergence, and sparse, non-orthogonal gradients, render existing MIAs ineffective. To address this gap, we propose ProjRes, the first projection residuals-based passive MIA tailored for FedLLMs. ProjRes leverages hidden embedding vectors as sample representations and analyzes their projection residuals on the gradient subspace to uncover the intrinsic link between gradients and inputs. It requires no shadow models, auxiliary classifiers, or historical updates, ensuring efficiency and robustness. Experiments on four benchmarks and four LLMs show that ProjRes achieves near 100% accuracy, outperforming prior methods by up to 75.75%, and remains effective even under strong differential privacy defenses. Our findings reveal a previously overlooked privacy vulnerability in FedLLMs and call for a re-examination of their security assumptions. Our code and data are available at $\href{this https URL}{link}$.
[184] arXiv:2604.21198 [pdf, html, other]: Title: A Probabilistic Framework for Improving Dense Object Detection in Underwater Image Data via Annealing-Based Data Augmentation

Eleanor Wiesler, Trace Baxley

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Object detection models typically perform well on images captured in controlled environments with stable lighting, water clarity, and viewpoint, but their performance degrades substantially in real-world underwater settings characterized by high variability and frequent occlusions. In this work, we address these challenges by introducing a novel data augmentation framework designed to improve robustness in dense and unconstrained underwater scenes. Using the DeepFish dataset, which contains images of fish in natural environments, we first generate bounding box annotations from provided segmentation masks to construct a custom detection dataset. We then propose a pseudo-simulated annealing-based augmentation algorithm, inspired by the copy-paste strategy of Deng et al. [1], to synthesize realistic crowded fish scenarios. Our approach improves spatial diversity and object density during training, enabling better generalization to complex scenes. Experimental results show that our method significantly outperforms a baseline YOLOv10 model, particularly on a challenging test set of manually annotated images collected from live-stream footage in the Florida Keys. These results demonstrate the effectiveness of our augmentation strategy for improving detection performance in dense, real-world underwater environments.
[185] arXiv:2604.21199 [pdf, other]: Title: ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response

Stephan Xie, Ben Cohen, Mononito Goswami, Junhong Shen, Emaad Khwaja, Chenghao Liu, David Asker, Othmane Abou-Amal, Ameet Talwalkar

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Time series question-answering (TSQA), in which we ask natural language questions to infer and reason about properties of time series, is a promising yet underexplored capability of foundation models. In this work, we present ARFBench, a TSQA benchmark that evaluates the understanding of multimodal foundation models (FMs) on time series anomalies prevalent in software incident data. ARFBench consists of 750 questions across 142 time series and 5.38M data points from 63 production incidents sourced exclusively from internal telemetry at Datadog. We evaluate leading proprietary and open-source LLMs, VLMs, and time series FMs and observe that frontier VLMs perform markedly better than existing baselines; the leading model (GPT-5) achieves a 62.7% accuracy and 51.9% F1. We next demonstrate the promise of specialized multimodal approaches. We develop a novel TSFM + VLM hybrid prototype which we post-train on a small set of synthetic and real data that yields comparable overall F1 and accuracy with frontier models. Lastly, we find models and human domain experts exhibit complementary strengths. We define a model-expert oracle, a best-of-2 oracle selector over model and expert answers, yielding 82.8% F1 and 87.2% accuracy and establishing a new superhuman frontier for future TSQA models. The benchmark is available at this https URL.
[186] arXiv:2604.21204 [pdf, html, other]: Title: On Reasoning Behind Next Occupation Recommendation

Shan Dong, Palakorn Achananuparp, Hieu Hien Mai, Lei Wang, Yao Lu, Ee-Peng Lim

Comments: Accepted to PAKDD 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

In this work, we develop a novel reasoning approach to enhance the performance of large language models (LLMs) in future occupation prediction. In this approach, a reason generator first derives a ``reason'' for a user using his/her past education and career history. The reason summarizes the user's preference and is used as the input of an occupation predictor to recommend the user's next occupation. This two-step occupation prediction approach is, however, non-trivial as LLMs are not aligned with career paths or the unobserved reasons behind each occupation decision. We therefore propose to fine-tune LLMs improving their reasoning and occupation prediction performance. We first derive high-quality oracle reasons, as measured by factuality, coherence and utility criteria, using a LLM-as-a-Judge. These oracle reasons are then used to fine-tune small LLMs to perform reason generation and next occupation prediction. Our extensive experiments show that: (a) our approach effectively enhances LLM's accuracy in next occupation prediction making them comparable to fully supervised methods and outperforming unsupervised methods; (b) a single LLM fine-tuned to perform reason generation and occupation prediction outperforms two LLMs fine-tuned to perform the tasks separately; and (c) the next occupation prediction accuracy depends on the quality of generated reasons. Our code is available at this https URL.
[187] arXiv:2604.21205 [pdf, html, other]: Title: When Constraints Limit and Inspire: Characterizing Presentation Authoring Practices for Evolving Narratives

Linxiu Zeng, Emily Kuang, Jian Zhao

Comments: 17 pages, 12 Figures. To appear in DIS 2026

Subjects: Human-Computer Interaction (cs.HC)

Authoring presentation slides involves navigating contextual constraints that shape how content is structured, adapted, and reused. While prior work frames constraints as limitations, little is known about how presenters actively reason about them. We conducted a formative study with ten presenters to examine how constraints emerge, are interpreted, and influence authoring decisions, leading to the Constraint-based Multi-session Presentation Authoring (CMPA) framework. CMPA treats time, audience, and communicative intent as key constraints shaping authoring. We instantiated CMPA in ReSlide, a research prototype for constraint-aware slide creation and reuse, and conducted two user studies on (1) single-session behaviors and (2) multi-session workflows. Compared to a baseline tool, ReSlide helped presenters treat constraints as active design drivers that guide narrative construction. The second study further shows how presenters flexibly reuse and adapt content across authoring cycles as constraints evolve. We then propose design implications for future constraint-aware presentation tools.
[188] arXiv:2604.21209 [pdf, other]: Title: Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

Yanan Wang, Yong Ge

Comments: Accepted to Information Systems Research (ISR). This is a preliminary version

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Online reviews have played a pivotal role in consumers' decision-making processes. Existing research has highlighted the significant impact of managerial review responses on customer relationship management and firm performance. However, a large portion of online reviews remains unaddressed due to the considerable human labor required to respond to the rapid growth of online reviews. While generative AI has achieved remarkable success in a range of tasks, they are general-purpose models and may not align well with domain-specific human preferences. To tailor these general generative AI models to domain-specific applications, finetuning is commonly employed. Nevertheless, several challenges persist in finetuning with domain-specific data, including hallucinations, difficulty in representing domain-specific human preferences, and over conservatism in offline policy optimization. To address these challenges, we propose a novel preference finetuning method to align an LLM with domain-specific human preferences for generating online review responses. Specifically, we first identify the source of hallucination and propose an effective context augmentation approach to mitigate the LLM hallucination. To represent human preferences, we propose a novel theory-driven preference finetuning approach that automatically constructs human preference pairs in the online review domain. Additionally, we propose a curriculum learning approach to further enhance preference finetuning. To overcome the challenge of over conservatism in existing offline preference finetuning method, we propose a novel density estimation-based support constraint method to relax the conservatism, and we mathematically prove its superior theoretical guarantees. Extensive evaluations substantiate the superiority of our proposed preference finetuning method.
[189] arXiv:2604.21211 [pdf, html, other]: Title: Subject-level Inference for Realistic Text Anonymization Evaluation

Myeong Seok Oh, Dong-Yun Kim, Hanseok Oh, Chaean Kang, Joeun Kang, Xiaonan Wang, Hyunjung Park, Young Cheol Jung, Hansaem Kim

Comments: Accepted at ACL 2026

Subjects: Computation and Language (cs.CL)

Current text anonymization evaluation relies on span-based metrics that fail to capture what an adversary could actually infer, and assumes a single data subject, ignoring multi-subject scenarios. To address these limitations, we present SPIA (Subject-level PII Inference Assessment), the first benchmark that shifts the unit of evaluation from text spans to individuals, comprising 675 documents across legal and online domains with novel subject-level protection metrics. Extensive experiments show that even when over 90% of PII spans are masked, subject-level inference protection drops as low as 33%, leaving the majority of personal information recoverable through contextual inference. Furthermore, target-subject-focused anonymization leaves non-target subjects substantially more exposed than the target subject. We show that subject-level inference-based evaluation is essential for ensuring safe text anonymization in real-world settings.
[190] arXiv:2604.21214 [pdf, other]: Title: SQLyzr: A Comprehensive Benchmark and Evaluation Platform for Text-to-SQL

Sepideh Abedini, M. Tamer Özsu

Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Text-to-SQL models have significantly improved with the adoption of Large Language Models (LLMs), leading to their increasing use in real-world applications. Although many benchmarks exist for evaluating the performance of text-to-SQL models, they often rely on a single aggregate score, lack evaluation under realistic settings, and provide limited insight into model behaviour across different query types. In this work, we present SQLyzr, a comprehensive benchmark and evaluation platform for text-to-SQL models. SQLyzr incorporates a diverse set of evaluation metrics that capture multiple aspects of generated queries, while enabling more realistic evaluation through workload alignment with real-world SQL usage patterns and database scaling. It further supports fine-grained query classification, error analysis, and workload augmentation, allowing users to better diagnose and improve text-to-SQL models. This demonstration showcases these capabilities through an interactive experience. Through SQLyzr's graphical interface, users can customize evaluation settings, analyze fine-grained reports, and explore additional features of the platform. We envision that SQLyzr facilitates the evaluation and iterative improvement of text-to-SQL models by addressing key limitations of existing benchmarks. The source code of SQLyzr is available at this https URL.
[191] arXiv:2604.21215 [pdf, html, other]: Title: The Recurrent Transformer: Greater Effective Depth and Efficient Decoding

Costin-Andrei Oncescu, Depen Morwani, Samy Jelassi, Alexandru Meterez, Mujin Kwun, Sham Kakade

Subjects: Machine Learning (cs.LG)

Transformers process tokens in parallel but are temporally shallow: at position $t$, each layer attends to key-value pairs computed based on the previous layer, yielding a depth capped by the number of layers. Recurrent models offer unbounded temporal depth but suffer from optimization instability and historically underutilize modern accelerators. We introduce the Recurrent Transformer, a simple architectural change where each layer attends to key-value pairs computed off its own activations, yielding layerwise recurrent memory while preserving standard autoregressive decoding cost. We show that the architecture can emulate both (i) a conventional Transformer and (ii) token-to-token recurrent updates under mild assumptions, while avoiding optimization instability. Naively, prefill/training appears bandwidth-bound with effective arithmetic intensity near $1$ because keys and values are revealed sequentially; we give an exact tiling-based algorithm that preserves the mathematical computation while reducing HBM traffic from $\Theta(N^2)$ to $\Theta(N\log N)$, increasing effective arithmetic intensity to $\Theta(N/\log N)$ for sequence length $N$. On 150M and 300M parameter C4 pretraining, Recurrent Transformers improve cross-entropy over a parameter-matched Transformer baseline and achieve the improvement with fewer layers (fixed parameters), suggesting that recurrence can trade depth for width, thus reducing KV cache memory footprint and inference latency.
[192] arXiv:2604.21221 [pdf, html, other]: Title: Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation

Boxun Xu, Yuming Du, Zichang Liu, Siyu Yang, Ziyang Jiang, Siqi Yan, Rajasi Saha, Albert Pumarola, Wenchen Wang, Peng Li

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We introduce Sparse Forcing, a training-and-inference paradigm for autoregressive video diffusion models that improves long-horizon generation quality while reducing decoding latency. Sparse Forcing is motivated by an empirical observation in autoregressive diffusion rollouts: attention concentrates on a persistent subset of salient visual blocks, forming an implicit spatiotemporal memory in the KV cache, and exhibits a locally structured block-sparse pattern within sliding windows. Building on this observation, we propose a trainable native sparsity mechanism that learns to compress, preserve, and update these persistent blocks while restricting computation within each local window to a dynamically selected local neighborhood. To make the approach practical at scale for both training and inference, we further propose Persistent Block-Sparse Attention (PBSA), an efficient GPU kernel that accelerates sparse attention and memory updates for low-latency, memory-efficient decoding. Experiments show that Sparse Forcing improves the VBench score by +0.26 over Self-Forcing on 5-second text-to-video generation while delivering a 1.11-1.17x decoding speedup and 42% lower peak KV-cache footprint. The gains are more pronounced on longer-horizon rollouts, delivering improved visual quality with +0.68 and +2.74 VBench improvements, and 1.22x and 1.27x speedups on 20-second and 1-minute generations, respectively.
[193] arXiv:2604.21223 [pdf, html, other]: Title: Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model

Runheng Liu, Heyan Huang, Xingchen Xiao, Zhijing Wu

Comments: NeurIPS 2025

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. However, their ability to generate human-like text has raised concerns about potential misuse. This underscores the need for reliable and effective methods to detect LLM-generated text. In this paper, we propose IRM, a novel zero-shot approach that leverages Implicit Reward Models for LLM-generated text detection. Such implicit reward models can be derived from publicly available instruction-tuned and base models. Previous reward-based method relies on preference construction and task-specific fine-tuning. In comparison, IRM requires neither preference collection nor additional training. We evaluate IRM on the DetectRL benchmark and demonstrate that IRM can achieve superior detection performance, outperforms existing zero-shot and supervised methods in LLM-generated text detection.
[194] arXiv:2604.21227 [pdf, html, other]: Title: UAU-Net: Uncertainty-aware Representation Learning and Evidential Classification for Facial Action Unit Detection

Yuze Li, Zhilei Liu

Comments: Accepted by ICMR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Facial action unit (AU) detection remains challenging because it involves heterogeneous, AU-specific uncertainties arising at both the representation and decision stages. Recent methods have improved discriminative feature learning, but they often treat the AU representations as deterministic, overlooking uncertainty caused by visual noise, subject-dependent appearance variations, and ambiguous inter-AU relationships, all of which can substantially degrade robustness. Meanwhile, conventional point-estimation classifiers often provide poorly calibrated confidence, producing overconfident predictions, especially under the severe label imbalance typical of AU datasets. We propose UAU-Net, an Uncertainty-aware AU detection framework that explicitly models uncertainty at both stages. At the representation stage, we introduce CV-AFE, a conditional VAE (CVAE)-based AU feature extraction module that learns probabilistic AU representations by jointly estimating feature means and variances across multiple spatio-temporal scales; conditioning on AU labels further enables CV-AFE to capture uncertainty associated with inter-AU dependencies. At the decision stage, we design AB-ENN, an Asymmetric Beta Evidential Neural Network for multi-label AU detection, which parameterizes predictive uncertainty with Beta distributions and mitigates overconfidence via an asymmetric loss tailored to highly imbalanced binary labels. Extensive experiments on BP4D and DISFA show that UAU-Net achieves strong AU detection performance, and further analyses indicate that modeling uncertainty in both representation learning and evidential prediction improves robustness and reliability.
[195] arXiv:2604.21229 [pdf, html, other]: Title: EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval

Julian Acuna

Comments: 9 pages, 2 figures, 3 tables

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language model assistants are increasingly expected to retain and reason over information accumulated across many sessions. We introduce EngramaBench, a benchmark for long-term conversational memory built around five personas, one hundred multi-session conversations, and one hundred fifty queries spanning factual recall, cross-space integration, temporal reasoning, adversarial abstention, and emergent synthesis. We evaluate Engrama, a graph-structured memory system, against GPT-4o full-context prompting and Mem0, an open-source vector-retrieval memory system. All three use the same answering model (GPT-4o), isolating the effect of memory architecture. GPT-4o full-context achieves the highest composite score (0.6186), while Engrama scores 0.5367 globally but is the only system to score higher than full-context prompting on cross-space reasoning (0.6532 vs. 0.6291, n=30). Mem0 is cheapest but substantially weaker (0.4809). Ablations reveal that the components driving Engrama's cross-space advantage trade off against global composite score, exposing a systems-level tension between structured memory specialization and aggregate optimization.
[196] arXiv:2604.21231 [pdf, html, other]: Title: SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference

Hongyao Liu, Liuqun Zhai, Junyi Wang, Zhengru Fang

Comments: IEEE INTERNET OF THINGS HOURNAL, 11 pages under major revision

Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Performance (cs.PF)

Efficient inference for on-device Large Language Models (LLMs) remains challenging due to limited hardware resources and the high cost of the prefill stage, which processes the full input context to construct Key-Value (KV) caches. We present SparKV, an adaptive KV loading framework that combines cloud-based KV streaming with on-device computation. SparKV models the cost of individual KV chunks and decides whether each chunk should be streamed or computed locally, while overlapping the two execution paths to reduce latency. To handle fluctuations in wireless connectivity and edge resource availability, SparKV further refines offline-generated schedules at runtime to rebalance communication and computation costs. Experiments across diverse datasets, LLMs, and edge devices show that SparKV reduces Time-to-First-Token by 1.3$x-5.1x with negligible impact on response quality, while lowering per-request energy consumption by 1.5x to 3.3x, demonstrating its robustness and practicality for real-world on-device deployment.
[197] arXiv:2604.21232 [pdf, html, other]: Title: ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures

Xiyin Zeng, Yuyu Sun, Haoyang Li, Shouqiang Liu, Hao Wang

Subjects: Artificial Intelligence (cs.AI)

Vision-Language-Action systems follow instructions to execute multi-step tasks in multimodal environments. Recent VLA approaches typically rely on post-hoc correction mechanisms or operate under fixed task decompositions and alignment schemes. However, once an intermediate step is mis-specified, local errors propagate through subsequent steps and eventually accumulate into cascading failures. To mitigate this compounding effect, we propose Predictive Alignment and Planning Architecture, a framework that uses prediction and contrast to adjust deviations across three levels: actions, subgoals, and trajectories. Semantic alignment is enforced at all levels using a Sinkhorn-based module and a Score-field module. The predictive correction and alignment jointly update the action generator during training, enabling it to adjust fine-grained steps to remain aligned with the overall intent. We further introduce two new metrics to quantify error propagation and recovery processes in tasks, capturing how mistakes spread and fade over long-horizon execution. Experiments show that ReCAPA achieves competitive results on embodied agent benchmarks such as VisualAgentBench, MineDojo, and AI2-THOR, outperforming strong proprietary and open-source Large Language Model baselines.
[198] arXiv:2604.21234 [pdf, html, other]: Title: A Dynamic Phasor Framework for Analysis of IBR-Induced SSOs in Multi-Machine Systems

Fiaz Hossain, Nilanjan Ray Chaudhuri, Constantino M. Lagoa

Subjects: Systems and Control (eess.SY)

We propose a generalized dynamic phasor (DP) framework to analyze inverter-based resources (IBRs) connected to multi-machine systems under balanced and unbalanced conditions. It captures subsynchronous oscillations (SSOs) induced by grid-following (GFL) IBRs. The linearizability and time invariance of the framework enables us to perform eigen decomposition, which is a powerful tool for root-cause analysis of the SSO modes and damping controller design. The same framework also enables analysis of excitation of the SSO modes in presence of data center (DC) loads. The GFL IBRs are modeled in their respective $dq$-frame DPs and the detailed model of synchronous generators (SGs) along with dynamic transmission network models are represented in $pnz$-frame DPs. Several case studies are performed on the modified IEEE two-area benchmark system, where $2$ SGs are replaced by GFL IBRs and validated with EMTDC/PSCAD simulations. First, time- and frequency-domain analyses of the SSO mode are presented followed by the design of a robust decentralized $\mathcal{H}_\infty$ damping controller based on local signals of the GFL IBRs. Second, the dynamic behavior of the system following an unbalanced fault is demonstrated that is damped by the proposed damping controller. Finally, excitation of the SSO mode in presence of DC load is exhibited and its locational impact is analytically quantified.
[199] arXiv:2604.21235 [pdf, html, other]: Title: Learning Dynamic Representations and Policies from Multimodal Clinical Time-Series with Informative Missingness

Zihan Liang, Ziwen Pan, Ruoxuan Xiong

Comments: Findings of ACL 2026 (30 pages)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Methodology (stat.ME)

Multimodal clinical records contain structured measurements and clinical notes recorded over time, offering rich temporal information about the evolution of patient health. Yet these observations are sparse, and whether they are recorded depends on the patient's latent condition. Observation patterns also differ across modalities, as structured measurements and clinical notes arise under distinct recording processes. While prior work has developed methods that accommodate missingness in clinical time series, how to extract and use the information carried by the observation process itself remains underexplored. We therefore propose a patient representation learning framework for multimodal clinical time series that explicitly leverages informative missingness. The framework combines (1) a multimodal encoder that captures signals from structured and textual data together with their observation patterns, (2) a Bayesian filtering module that updates a latent patient state over time from observed multimodal signals, and (3) downstream modules for offline treatment policy learning and patient outcome prediction based on the learned patient state. We evaluate the framework on ICU sepsis cohorts from MIMIC-III, MIMIC-IV, and eICU. It improves both offline treatment policy learning and adverse outcome prediction, achieving FQE 0.679 versus 0.528 for clinician behavior and AUROC 0.886 for post-72-hour mortality prediction on MIMIC-III.
[200] arXiv:2604.21238 [pdf, html, other]: Title: Unlocking the Power of Large Language Models for Multi-table Entity Matching

Yingkai Tang, Taoyu Su, Wenyuan Zhang, Xiaoyang Guo, Tingwen Liu

Comments: Accepted by NLPCC 2025

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Multi-table entity matching (MEM) addresses the limitations of dual-table approaches by enabling simultaneous identification of equivalent entities across multiple data sources without unique identifiers. However, existing methods relying on pre-trained language models struggle to handle semantic inconsistencies caused by numerical attribute variations. Inspired by the powerful language understanding capabilities of large language models (LLMs), we propose a novel LLM-based framework for multi-table entity matching, termed LLM4MEM. Specifically, we first propose a multi-style prompt-enhanced LLM attribute coordination module to address semantic inconsistencies. Then, to alleviate the matching efficiency problem caused by the surge in the number of entities brought by multiple data sources, we develop a transitive consensus embedding matching module to tackle entity embedding and pre-matching issues. Finally, to address the issue of noisy entities during the matching process, we introduce a density-aware pruning module to optimize the quality of multi-table entity matching. We conducted extensive experiments on 6 MEM datasets, and the results show that our model improves by an average of 5.1% in F1 compared with the baseline model. Our code is available at this https URL.
[201] arXiv:2604.21241 [pdf, html, other]: Title: CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

Dachong Li, ZhuangZhuang Chen, Jin Zhang, Jianqiang Li

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Vision--Language--Action (VLA) models often use intermediate representations to connect multimodal inputs with continuous control, yet spatial guidance is often injected implicitly through latent features. We propose $CorridorVLA$, which predicts sparse spatial anchors as incremental physical changes (e.g., $\Delta$-positions) and uses them to impose an explicit tolerance region in the training objective for action generation. The anchors define a corridor that guides a flow-matching action head: trajectories whose implied spatial evolution falls outside it receive corrective gradients, while minor deviations from contacts and execution noise are permitted. On the more challenging LIBERO-Plus benchmark, CorridorVLA yields consistent gains across both SmolVLA and GR00T, improving success rate by $3.4\%$--$12.4\%$ over the corresponding baselines; notably, our GR00T-Corr variant reaches a success rate of $83.21\%$. These results indicate that action-aligned physical cues can provide direct and interpretable constraints for generative action policies, complementing spatial guidance encoded in visual or latent forms. Code is available at this https URL.
[202] arXiv:2604.21247 [pdf, html, other]: Title: An Efficient Wireless iBCI Headstage with Adaptive ADC Sample Rate

Hongyao Liu, Junyi Wang, Liuqun Zhai

Comments: EMBC'26, 8pages, version 1

Subjects: Networking and Internet Architecture (cs.NI); Performance (cs.PF)

Implantable Brain-Computer Interfaces (iBCIs) are increasingly pivotal in clinical and daily applications. However, wireless iBCIs face severe constraints in power consumption and data throughput. To mitigate these bottlenecks, we propose a wireless iBCI headstage featuring adaptive ADC sampling and spike detection. Distinguishing our design from traditional application-layer compression, we employ a server-driven architecture that achieves source-level efficiency. Specifically, the server learns an optimal, electrode-specific sample rate vector to dynamically reconfigure the ADC hardware. This strategy reduces data volume directly at the acquisition layer (ADC and amplifier) rather than relying on computationally intensive post-digitization processing. Extensive experiments across diverse subjects and arrays demonstrate a power reduction of up to 40 mW and a 3.2$\times$ decrease in FPGA resource utilization, all while maintaining or exceeding decoding accuracy in both motor and visual tasks. This design offers a highly practical solution for long-term in-vivo this http URL prototype is open-sourced in: this https URL.
[203] arXiv:2604.21248 [pdf, html, other]: Title: Optimum adaptation of a Steiner network

Manou Rosenberg, Mengbin Ye, Brian D.O. Anderson

Comments: 8 pages, 2 double-figures, IFAC World Congress

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

The Euclidean Steiner tree problem, normally posed in two dimensions, seeks to connect a set of prescribed terminal nodes by placing additional nodes, known as Steiner points, with edges connecting such nodes either to another Steiner point or a terminal node, and with the placements minimising the sum of all the edge lengths of the associated tree. We consider a problem in which we start with a known solution to a Steiner tree problem, and the terminal positions are then perturbed. A first-order approximation theorem is established for efficiently updating the Steiner point positions to recover a Steiner tree solution after the perturbations to terminal nodes. Numerical examples illustrate the effectiveness of our approach (including a stepwise application for large perturbations) as well as its limitations.
[204] arXiv:2604.21249 [pdf, html, other]: Title: Reasoning About Traversability: Language-Guided Off-Road 3D Trajectory Planning

Byounggun Park, Soonmin Hwang

Subjects: Robotics (cs.RO)

While Vision-Language Models (VLMs) enable high-level semantic reasoning for end-to-end autonomous driving, particularly in unstructured environments, existing off-road datasets suffer from language annotations that are weakly aligned with vehicle actions and terrain geometry. To address this misalignment, we propose a language refinement framework that restructures annotations into action-aligned pairs, enabling a VLM to generate refined scene descriptions and 3D future trajectories directly from a single image. To further encourage terrain-aware planning, we introduce a preference optimization strategy that constructs geometry-aware hard negatives and explicitly penalizes trajectories inconsistent with local elevation profiles. Furthermore, we propose off-road-specific metrics to quantify traversability compliance and elevation consistency, addressing the limitations of conventional on-road evaluation. Experiments on the ORAD-3D benchmark demonstrate that our approach reduces average trajectory error from 1.01m to 0.97m, improves traversability compliance from 0.621 to 0.644, and decreases elevation inconsistency from 0.428 to 0.322, highlighting the efficacy of action-aligned supervision and terrain-aware optimization for robust off-road driving.
[205] arXiv:2604.21251 [pdf, html, other]: Title: CAP: Controllable Alignment Prompting for Unlearning in LLMs

Zhaokun Wang, Jinyu Guo, Jingwen Pu, Hongli Pu, Meng Yang, Xunlei Chen, Jie Ou, Wenyi Li, Guangchun Luo, Wenhong Tian

Comments: Accpeted to ACL 2026

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Large language models (LLMs) trained on unfiltered corpora inherently risk retaining sensitive information, necessitating selective knowledge unlearning for regulatory compliance and ethical safety. However, existing parameter-modifying methods face fundamental limitations: high computational costs, uncontrollable forgetting boundaries, and strict dependency on model weight access. These constraints render them impractical for closed-source models, yet current non-invasive alternatives remain unsystematic and reliant on empirical experience. To address these challenges, we propose the Controllable Alignment Prompting for Unlearning (CAP) framework, an end-to-end prompt-driven unlearning paradigm. CAP decouples unlearning into a learnable prompt optimization process via reinforcement learning, where a prompt generator collaborates with the LLM to suppress target knowledge while preserving general capabilities selectively. This approach enables reversible knowledge restoration through prompt revocation. Extensive experiments demonstrate that CAP achieves precise, controllable unlearning without updating model parameters, establishing a dynamic alignment mechanism that overcomes the transferability limitations of prior methods.
[206] arXiv:2604.21252 [pdf, html, other]: Title: Improving Performance in Classification Tasks with LCEN and the Weighted Focal Differentiable MCC Loss

Pedro Seber, Richard D. Braatz

Subjects: Machine Learning (cs.LG)

The LASSO-Clip-EN (LCEN) algorithm was previously introduced for nonlinear, interpretable feature selection and machine learning. However, its design and use was limited to regression tasks. In this work, we create a modified version of the LCEN algorithm that is suitable for classification tasks and maintains its desirable properties, such as interpretability. This modified LCEN algorithm is evaluated on four widely used binary and multiclass classification datasets. In these experiments, LCEN is compared against 10 other model types and consistently reaches high test-set macro F$_1$ score and Matthews correlation coefficient (MCC) metrics, higher than that of the majority of investigated models. LCEN models for classification remain sparse, eliminating an average of 56% of all input features in the experiments performed. Furthermore, LCEN-selected features are used to retrain all models using the same data, leading to statistically significant performance improvements in three of the experiments and insignificant differences in the fourth when compared to using all features or other feature selection methods. Simultaneously, the weighted focal differentiable MCC (diffMCC) loss function is evaluated on the same datasets. Models trained with the diffMCC loss function are always the best-performing methods in these experiments, and reach test-set macro F$_1$ scores that are, on average, 4.9% higher and MCCs that are 8.5% higher than those obtained by models trained with the weighted cross-entropy loss. These results highlight the performance of LCEN as a feature selection and machine learning algorithm also for classification tasks, and how the diffMCC loss function can train very accurate models, surpassing the weighted cross-entropy loss in the tasks investigated.
[207] arXiv:2604.21253 [pdf, html, other]: Title: Planning Beyond Text: Graph-based Reasoning for Complex Narrative Generation

Hanwen Gu, Chao Guo, Junle Wang, Wenda Xie, Yisheng Lv

Comments: Accepted to Findings of the Association for Computational Linguistics: ACL 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

While LLMs demonstrate remarkable fluency in narrative generation, existing methods struggle to maintain global narrative coherence, contextual logical consistency, and smooth character development, often producing monotonous scripts with structural fractures. To this end, we introduce PLOTTER, a framework that performs narrative planning on structural graph representations instead of the direct sequential text representations used in existing work. Specifically, PLOTTER executes the Evaluate-Plan-Revise cycle on the event graph and character graph. By diagnosing and repairing issues of the graph topology under rigorous logical constraints, the model optimizes the causality and narrative skeleton before complete context generation. Experiments demonstrate that PLOTTER significantly outperforms representative baselines across diverse narrative scenarios. These findings verify that planning narratives on structural graph representations-rather than directly on text-is crucial to enhance the long context reasoning of LLMs in complex narrative generation.
[208] arXiv:2604.21254 [pdf, html, other]: Title: Hyperloop Transformers

Abbas Zeitoun, Lucas Torroba-Hennigen, Yoon Kim

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

LLM architecture research generally aims to maximize model quality subject to fixed compute/latency budgets. However, many applications of interest such as edge and on-device deployment are further constrained by the model's memory footprint, thus motivating parameter-efficient architectures for language modeling. This paper describes a simple architecture that improves the parameter-efficiency of LLMs. Our architecture makes use of looped Transformers as a core primitive, which reuse Transformer layers across depth and are thus more parameter-efficient than ordinary (depth-matched) Transformers. We organize the looped Transformer into three blocks--begin, middle, and end blocks--where each block itself consists of multiple Transformer layers, and only the middle block is applied recurrently across depth. We augment the looped middle block with hyper-connections (Xie et al., 2026), which expand the residual stream into matrix-valued residual streams. Hyper-connections are applied only after each loop, and therefore add minimal new parameters and compute cost. Across various model scales, we find that our Hyper-Connected Looped Transformer (Hyperloop Transformer) is able to outperform depth-matched Transformer and mHC Transformer baselines despite using approximately 50% fewer parameters. The outperformance persists through post-training weight quantization, thus positioning Hyperloop Transformers as an attractive architecture for memory-efficient language modeling.
[209] arXiv:2604.21255 [pdf, html, other]: Title: When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors

Chenghao Yang, Yuning Zhang, Zhoufutu Wen, Tao Gong, Jiaheng Liu, Qi Chu, Nenghai Yu

Comments: Accepted by ACL 2026 Main Conference

Subjects: Computation and Language (cs.CL)

Model distillation is a primary driver behind the rapid progress of LLM agents, yet it often leads to behavioral homogenization. Many emerging agents share nearly identical reasoning steps and failure modes, suggesting they may be distilled echoes of a few dominant teachers. Existing metrics, however, fail to distinguish mandatory behaviors required for task success from non-mandatory patterns that reflect a model's autonomous preferences. We propose two complementary metrics to isolate non-mandatory behavioral patterns: \textbf{Response Pattern Similarity (RPS)} for verbal alignment and \textbf{Action Graph Similarity (AGS)} for tool-use habits modeled as directed graphs. Evaluating 18 models from 8 providers on $\tau$-Bench and $\tau^2$-Bench against Claude Sonnet 4.5 (thinking), we find that within-family model pairs score 5.9 pp higher in AGS than cross-family pairs, and that Kimi-K2 (thinking) reaches 82.6\% $S_{\text{node}}$ and 94.7\% $S_{\text{dep}}$, exceeding Anthropic's own Opus 4.1. A controlled distillation experiment further confirms that AGS distinguishes teacher-specific convergence from general improvement. RPS and AGS capture distinct behavioral dimensions (Pearson $r$ = 0.491), providing complementary diagnostic signals for behavioral convergence in the agent ecosystem. Our code is available at this https URL.
[210] arXiv:2604.21256 [pdf, html, other]: Title: Robustness Analysis of POMDP Policies to Observation Perturbations

Benjamin Kraske, Qi Heng Ho, Federico Rossi, Morteza Lahijanian, Zachary Sunberg

Comments: 43 Pages

Subjects: Artificial Intelligence (cs.AI)

Policies for Partially Observable Markov Decision Processes (POMDPs) are often designed using a nominal system model. In practice, this model can deviate from the true system during deployment due to factors such as calibration drift or sensor degradation, leading to unexpected performance degradation. This work studies policy robustness against deviations in the POMDP observation model. We introduce the Policy Observation Robustness Problem: to determine the maximum tolerable deviation in a POMDP's observation model that guarantees the policy's value remains above a specified threshold. We analyze two variants: the sticky variant, where deviations are dependent on state and actions, and the non-sticky variant, where they can be history-dependent. We show that the Policy Observation Robustness Problem can be formulated as a bi-level optimization problem in which the inner optimization is monotonic in the size of the observation deviation. This enables efficient solutions using root-finding algorithms in the outer optimization. For the non-sticky variant, we show that when policies are represented with finite-state controllers (FSCs) it is sufficient to consider observations which depend on nodes in the FSC rather than full histories. We present Robust Interval Search, an algorithm with soundness and convergence guarantees, for both the sticky and non-sticky variants. We show this algorithm has polynomial time complexity in the non-sticky variant and at most exponential time complexity in the sticky variant. We provide experimental results validating and demonstrating the scalability of implementations of Robust Interval Search to POMDP problems with tens of thousands of states. We also provide case studies from robotics and operations research which demonstrate the practical utility of the problem and algorithms.
[211] arXiv:2604.21259 [pdf, html, other]: Title: A Convexified Eulerian Framework for Scalable Coordination of Massive DER Populations

Ge Chen, Yiwei Qiu, Shiyao Zhang, Pengfei Su, Haoran Deng, Hongcai Zhang

Comments: 10 pages. Submitted to IEEE Trans for possible publications

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper proposes a scalable coordination framework with aggregator-side privacy protection for storage-like distributed energy resources (DERs). The framework adopts a two-layer architecture. At the macroscopic layer, building upon an \emph{Eulerian} modeling perspective, the DER population is represented as a continuum whose density evolution is governed by a partial differential equation (PDE), such that the computational complexity is independent of the population size. To address the bilinear non-convexity in this PDE-constrained optimization problem, we develop a convexification method that combines finite-volume discretization with a flux-lifting technique, reformulating the macroscopic problem into a sparse linear program (LP). The LP solution yields a unified, state-dependent broadcast signal for population coordination. Furthermore, a Wasserstein-based relaxation is introduced to replace rigid cyclic constraints and provide additional operational flexibility for improved economic performance. At the microscopic layer, individual resources autonomously recover local setpoints from the broadcast signal and their local states, while an upstream data-mixing protocol aggregates individual states into a macroscopic density histogram without exposing raw individual states to the aggregator. Numerical studies validate the scalability, feasibility, and economic effectiveness of the proposed framework.
[212] arXiv:2604.21261 [pdf, html, other]: Title: ECCFROG522PP: An Enhanced 522 bit Weierstrass Elliptic Curve

Victor Duarte Melo

Subjects: Cryptography and Security (cs.CR)

This paper presents ECCFROG522PP, a 522-bit prime-field elliptic curve in short Weierstrass form, designed with a focus on deterministic generation and public reproducibility. The central design principle is that all critical parameters are derived from a fixed public seed through a transparent and verifiable procedure. While many deployed systems rely on NIST P-256 and secp256k1, which target approximately 128-bit classical security, higher security applications typically consider curves such as NIST P-521, Curve448, and Brainpool P512. ECCFROG522PP is intended for the same general classical security range as P-521, with emphasis on transparency, auditability, and reproducibility rather than performance optimization.
The curve parameters are generated through a BLAKE3-based deterministic pipeline with publicly specified indices. The resulting construction has prime order, cofactor one, and a deterministically derived base point of full order. The quadratic twist has a large proven prime factor, and the construction includes a documented lower bound on the embedding degree together with standard sanity checks against low embedding degree reductions and basic CM discriminant anomalies.
The full generation and validation procedure can be reproduced end to end from public artifacts and reference scripts, enabling independent verification of all parameters and checks.
[213] arXiv:2604.21262 [pdf, html, other]: Title: Frequency Security Assessment in Power Systems With High Penetration of Renewables Considering Spatio-Temporal Frequency Distribution

Changjun He, Hua Geng, Xiuqiang He, Yushuang Liu

Comments: 10 pages, 12 figures, article, 18 references

Subjects: Systems and Control (eess.SY)

The increasing integration of renewable energy sources exacerbates the spatial and temporal differences in frequency across the power system, posing a serious challenge to the accurate and efficient assessment of system frequency security. To address this issue, a generic effective nodal frequency (ENF) model is first established to concisely characterize nodal frequency dynamics. This model is featured by the effective nodal inertia (ENI), damping, and primary regulation parameters, which retain only the dominant constant component governing nodal frequency dynamic performance. This model enables the tractable analytical formulation of nodal frequency trajectory and the key frequency security indicators. Quantitative analysis under the temporary power disturbance condition reveals that the ENI is the most influential parameter governing frequency security. Consequently, the critical nodal inertia for ensuring nodal frequency security is analytically derived. A system-level frequency security index based on the actual ENI and critical nodal inertia is proposed. On the basis of the proposed index, the system frequency security assessment is carried out with the procedure of ``offline calculation and online evaluation'', which is achieved using a lookup table approach and an interpolation method. Simulations on the modified IEEE 39-bus system verify the effectiveness of the proposed assessment method.
[214] arXiv:2604.21263 [pdf, other]: Title: Trustworthy Clinical Decision Support Using Meta-Predicates and Domain-Specific Languages

Michael Bouzinier, Sergey Trifonov, Michael Chumack, Eugenia Lvova, Dmitry Etin

Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL); Software Engineering (cs.SE); Quantitative Methods (q-bio.QM)

\textbf{Background:} Regulatory frameworks for AI in healthcare, including the EU AI Act and FDA guidance on AI/ML-based medical devices, require clinical decision support to demonstrate not only accuracy but auditability. Existing formal languages for clinical logic validate syntactic and structural correctness but not whether decision rules use epistemologically appropriate evidence.
\textbf{Methods:} Drawing on design-by-contract principles, we introduce meta-predicates -- predicates about predicates -- for asserting epistemological constraints on clinical decision rules expressed in a DSL. An epistemological type system classifies annotations along four dimensions: purpose, knowledge domain, scale, and method of acquisition. Meta-predicates assert which evidence types are permissible in any given rule. The framework is instantiated in AnFiSA, an open-source platform for genetic variant curation, and demonstrated using the Brigham Genomics Medicine protocol on 5.6 million variants from the Genome in a Bottle benchmark.
\textbf{Results:} Decision trees used in variant interpretation can be reformulated as unate cascades, enabling per-variant audit trails that identify which rule classified each variant and why. Meta-predicate validation catches epistemological errors before deployment, whether rules are human-written or AI-generated. The approach complements post-hoc methods such as LIME and SHAP: where explanation reveals what evidence was used after the fact, meta-predicates constrain what evidence may be used before deployment, while preserving human readability.
\textbf{Conclusions:} Meta-predicate validation is a step toward demonstrating not only that decisions are accurate but that they rest on appropriate evidence in ways that can be independently audited. While demonstrated in genomics, the approach generalises to any domain requiring auditable decision logic.
[215] arXiv:2604.21264 [pdf, html, other]: Title: Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation

Minping Chen, Bing Xu, Zulong Chen, Chuanfei Xu, Ying Zhou, Zui Tao, Zeyi Wen

Comments: Accepted to ACL Industry Track 2026

Subjects: Artificial Intelligence (cs.AI)

Person-Job Fit (PJF) is a critical component for online recruitment. Existing approaches face several challenges, particularly in handling low-quality job descriptions and similar candidate-job pairs, which impair model performance. To address these challenges, this paper proposes a large language model (LLM) based method with two novel techniques: (1) LLM-based data augmentation, which polishes and rewrites low-quality job descriptions by leveraging chain-of-thought (COT) prompts, and (2) category-aware Mixture of Experts (MoE) that assists in identifying similar candidate-job pairs. This MoE module incorporates category embeddings to dynamically assign weights to the experts and learns more distinguishable patterns for similar candidate-job pairs. We perform offline evaluations and online A/B tests on our recruitment platform. Our method relatively surpasses existing methods by 2.40% in AUC and 7.46% in GAUC, and boosts click-through conversion rate (CTCVR) by 19.4% in online tests, saving millions of CNY in external headhunting expenses.
[216] arXiv:2604.21265 [pdf, html, other]: Title: Listen and Chant Before You Read: The Ladder of Beauty in LM Pre-Training

Yoshinori Nomura

Comments: 17 pages, 3 figures

Subjects: Computation and Language (cs.CL)

We show that pre-training a Transformer on music before language significantly accelerates language acquisition. Using piano performances (MAESTRO dataset), a developmental pipeline -- music $\to$ poetry $\to$ prose -- yields a $17.5\%$ perplexity improvement over random initialization ($p < 0.001$, 5 seeds), with music and poetry improving orthogonal model components (internal computation and embeddings, respectively). Convergence tests confirm that this is not a transient head start: at $d\!=\!64$, multi-seed validation (5 seeds) shows a persistent 5.5\% gap at plateau ($p = 0.017$), with the pipeline converging faster and to a lower loss in every run. Real music matches the transfer ceiling of synthetic patterns with one-third the data, and scaling experiments reveal that optimal pre-training data volume shifts with model capacity ($-3\% \to +3\% \to +6\%$ advantage of larger datasets from $d\!=\!16$ to $d\!=\!64$). Across the scales we study ($d\!\in\!\{16,32,64\}$, up to ${\sim}400$K parameters), these results suggest a capacity-dependent data curation principle and indicate that structured human creative outputs can provide an efficient pre-training substrate for small language models; stronger conclusions at modern pre-training scale will require substantially larger experiments.
[217] arXiv:2604.21268 [pdf, html, other]: Title: Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Wenkai Wang, Xiyun Li, Hongcan Guo, Wenhao Yu, Tianqing Fang, Haitao Mi, Dong Yu, Shengyu Zhang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Graphical User Interface (GUI) grounding requires mapping natural language instructions to precise pixel coordinates. However, due to visually homogeneous elements and dense layouts, models typically grasp semantic intent yet struggle with achieving precise localization. While scaling sampling attempts (Pass@k) reveals potential gains, static self-consistency strategies derived from geometric clustering often yield limited improvements, as the model's predictions tend to be spatially dispersed. In this paper, we propose replacing static consistency strategies with a learnable selection mechanism that selects the optimal target by critiquing its own proposals rendered on the screenshot. Given the significant disparity between the model's grounding and critiquing capabilities, we propose a co-evolving Propose-then-Critic framework. To jointly optimize these, we introduce a maturity-aware adaptive co-evolutionary reinforcement learning paradigm. This approach dynamically balances the training objectives of proposer and critic, where the diversity of the proposer's outputs enhances critic robustness, while the critic's maturing discrimination capability conversely unlocks the proposer's potential for extensive spatial exploration, fostering the mutual reinforcement and co-evolution of both capabilities, thereby ensuring generalizability to adapt to diverse and complex interface layouts. Extensive experiments over 6 benchmarks show that our method significantly enhances both grounding accuracy and critic reliability.
[218] arXiv:2604.21271 [pdf, html, other]: Title: Downlink Channel Matrix Estimation from PMI-Only Feedback in FDD Systems: Maximum Likelihood and Sharp Excess Risk Bound

Jinchi Chen, Mingxi Hu, Peigang Jiang, Xin Meng, Ke Wei, Xianyin Zhang

Subjects: Information Theory (cs.IT)

We study downlink channel estimation in a frequency-division duplex (FDD) massive MIMO system from PMI-only feedback under a 5G NR-type limited-feedback architecture.
In this architecture, the user selects a preferred codeword from a shared codebook based on the reduced-dimensional channel and only reports its index (known as the precoding matrix indicator, PMI) back to the base station. Therefore, the channel must be estimated from these highly quantized, nonlinear PMI observations. Based on a probabilistic perturbation model, a constrained maximum likelihood estimator (MLE) is proposed for this estimation problem, whose objective can also be interpreted as a relaxation of the hard empirical decision error. The Cramér--Rao bound is derived for the complex-valued model, with the global phase ambiguity handled via gauge-fixing. For the real-valued setting, a global excess-risk bound of order $O(1/\sqrt{T})$ is established, which is then refined to a sharp local rate of order $O(1/T)$ under suitable identifiability conditions. Numerical results show that the MLE asymptotically attains the Cramér--Rao bound and outperforms several baseline methods on both synthetic data and realistic FDD channels.
[219] arXiv:2604.21275 [pdf, html, other]: Title: Optimizing High-Throughput Distributed Data Pipelines for Reproducible Deep Learning at Scale

Kashish Mittal, Di Yu, Roozbeh Ketabi, Arushi Arora, Brendon Lapp, Peng Zhang

Comments: 5 pages, 8 figures, 1 table, 1 algorithm

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Training massive-scale deep learning models on datasets spanning tens of terabytes presents critical challenges in hardware utilization and training reproducibility. In this paper, we identify and resolve profound data-loading bottlenecks within distributed GPU training pipelines using the Petastorm data loader and Apache Parquet datasets. Through systematic profiling, we demonstrate that network I/O and CPU-bound data transformations (e.g., PyArrow to NumPy) constrain GPU utilization to as low as 10-15%. To address this, we propose an optimized architecture that features push-down worker-level transformations coupled with local-disk caching via Fanout-Cache, minimizing redundant I/O and CPU overhead across training epochs. Furthermore, we eliminate race conditions in multi-worker shared queues by implementing dedicated round-robin ventilator and result queues, alongside modernized RNG handling, achieving strict deterministic data loading. Our optimizations yield a 6x speedup, reducing end-to-end training time from 22 hours to 3 hours, increasing GPU utilization to over 60%, and drastically reducing run-to-run variance, enabling robust, high-throughput, and reproducible large-scale model training.
[220] arXiv:2604.21276 [pdf, html, other]: Title: Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition

Srishti Ginjala, Eric Fosler-Lussier, Christopher W. Myers, Srinivasan Parthasarathy

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD)

As pretrained large language models replace task-specific decoders in speech recognition, a critical question arises: do their text-derived priors make recognition fairer or more biased across demographic groups? We evaluate nine models spanning three architectural generations (CTC with no language model, encoder-decoder with an implicit LM, and LLM-based with an explicit pretrained decoder) on about 43,000 utterances across five demographic axes (ethnicity, accent, gender, age, first language) using Common Voice 24 and Meta's Fair-Speech, a controlled-prompt dataset that eliminates vocabulary confounds. On clean audio, three findings challenge assumptions: LLM decoders do not amplify racial bias (Granite-8B has the best ethnicity fairness, max/min WER = 2.28); Whisper exhibits pathological hallucination on Indian-accented speech with a non-monotonic insertion-rate spike to 9.62% at large-v3; and audio compression predicts accent fairness more than LLM scale. We then stress-test these findings under 12 acoustic degradation conditions (noise, reverberation, silence injection, chunk masking) across both datasets, totaling 216 inference runs. Severe degradation paradoxically compresses fairness gaps as all groups converge to high WER, but silence injection amplifies Whisper's accent bias up to 4.64x by triggering demographic-selective hallucination. Under masking, Whisper enters catastrophic repetition loops (86% of 51,797 insertions) while explicit-LLM decoders produce 38x fewer insertions with near-zero repetition; high-compression audio encoding (Q-former) reintroduces repetition pathology even in LLM decoders. These results suggest that audio encoder design, not LLM scaling, is the primary lever for equitable and robust speech recognition.
[221] arXiv:2604.21277 [pdf, html, other]: Title: Can MLLMs "Read" What is Missing?

Jindi Guo, Xi Fang, Chaozheng Huang

Subjects: Artificial Intelligence (cs.AI)

We introduce MMTR-Bench, a benchmark designed to evaluate the intrinsic ability of Multimodal Large Language Models (MLLMs) to reconstruct masked text directly from visual context. Unlike conventional question-answering tasks, MMTR-Bench eliminates explicit prompts, requiring models to recover masked text from single- or multi-page inputs across real-world domains such as documents and webpages. This design isolates the reconstruction task from instruction-following abilities, enabling a direct assessment of a model's layout understanding, visual grounding, and knowledge integration. MMTR-Bench comprises 2,771 test samples spanning multiple languages and varying target lengths. To account for this diversity, we propose a level-aware evaluation protocol. Experiments on representative MLLMs show that the benchmark poses a significant challenge, especially for sentence- and paragraph-level reconstruction. The homepage is available at this https URL.
[222] arXiv:2604.21278 [pdf, html, other]: Title: Hidden Dependencies and Component Variants in SBOM-Based Software Composition Analysis

Shawn Rasheed, Max McPhee, Lisa Patterson, Stephen MacDonell, Jens Dietrich

Subjects: Software Engineering (cs.SE)

Software Bills of Material (SBOMs) have emerged as an important technology for vulnerability management amid rising supply-chain attacks. They represent component relationships within a software product and support software composition analysis (SCA) by linking components to known vulnerabilities. However, the effectiveness of SBOM-based analysis depends on how accurately SBOMs represent component identities and actual dependencies in software. This paper studies two mismatch patterns: hidden code-level dependencies that are not represented as component-level dependencies, and component variants (clones) that cannot be identified consistently by scanners. We show that these mismatches can lead to inconsistent vulnerability reporting and inconsistent handling of VEX statements across popular SBOM-based vulnerability scanners. These results highlight limitations in current SBOM production and consumption and motivate richer dependency representation and component identity.
[223] arXiv:2604.21279 [pdf, html, other]: Title: LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation

Wenmin Huang, Weiqi Luo, Xiaochun Cao, Jiwu Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Facial attribute editing and style manipulation are crucial for applications like virtual avatars and photo editing. However, achieving precise control over facial attributes without altering unrelated features is challenging due to the complexity of facial structures and the strong correlations between attributes. While conditional GANs have shown progress, they are limited by accuracy issues and training instability. Diffusion models, though promising, face challenges in style manipulation due to the limited expressiveness of semantic directions. In this paper, we propose LatRef-Diff, a novel diffusion-based framework that addresses these limitations. We replace the traditional semantic directions in diffusion models with style codes and propose two methods for generating them: latent and reference guidance. Based on these style codes, we design a style modulation module that integrates them into the target image, enabling both random and customized style manipulation. This module incorporates learnable vectors, cross-attention mechanisms, and a hierarchical design to improve accuracy and image quality. Additionally, to enhance training stability while eliminating the need for paired images (e.g., before and after editing), we propose a forward-backward consistency training strategy. This strategy first removes the target attribute approximately using image-specific semantic directions and then restores it via style modulation, guided by perceptual and classification losses. Extensive experiments on CelebA-HQ demonstrate that LatRef-Diff achieves state-of-the-art performance in both qualitative and quantitative evaluations. Ablation studies validate the effectiveness of our model's design choices.
[224] arXiv:2604.21280 [pdf, html, other]: Title: ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing

Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna

Comments: FCCM 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative through fast, non-iterative online updates. Combined with a compact convolutional neural network (CNN) feature extractor, HDC enables efficient on-device adaptation with strong visual representations. However, prior HDC-based CL systems often depend on multi-tier memory hierarchies and complex cluster management, limiting deployability on resource-constrained hardware.
We present ImageHD, an FPGA accelerator for on-device continual learning of visual data based on HDC. ImageHD targets streaming CL under strict latency and on-chip memory constraints, avoiding costly iterative optimization. At the algorithmic level, we introduce a hardware-aware CL method that bounds class exemplars through a unified exemplar memory and a hardware-efficient cluster merging strategy, while incorporating a quantized CNN front-end to reduce deployment overhead without sacrificing accuracy. At the system level, ImageHD is implemented as a streaming dataflow architecture on the AMD Zynq ZCU104 FPGA, integrating HDC encoding, similarity search, and bounded cluster management using word-packed binary hypervectors for massively parallel bitwise computation within tight on-chip resource budgets. On CORe50, ImageHD achieves up to 40.4x (4.84x) speedup and 383x (105.1x) energy efficiency over optimized CPU (GPU) baselines, demonstrating the practicality of HDC-enabled continual learning for real-time edge AI.
[225] arXiv:2604.21282 [pdf, html, other]: Title: Strategic Heterogeneous Multi-Agent Architecture for Cost-Effective Code Vulnerability Detection

Zhaohui Geoffrey Wang

Comments: 11 pages, 5 figures. Accepted at the AAMAS 2026 Workshop on Software Engineering (SE Workshop). This version corresponds to the preprint of the workshop paper

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Software Engineering (cs.SE)

Automated code vulnerability detection is critical for software security, yet existing approaches face a fundamental trade-off between detection accuracy and computational cost. We propose a heterogeneous multi-agent architecture inspired by game-theoretic principles, combining cloud-based LLM experts with a local lightweight verifier. Our "3+1" architecture deploys three cloud-based expert agents (DeepSeek-V3) that analyze code from complementary perspectives - code structure, security patterns, and debugging logic - in parallel, while a local verifier (Qwen3-8B) performs adversarial validation at zero marginal cost.
We formalize this design through a two-layer game framework: (1) a cooperative game among experts capturing super-additive value from diverse perspectives, and (2) an adversarial verification game modeling quality assurance incentives.
Experiments on 262 real samples from the NIST Juliet Test Suite across 14 CWE types, with balanced vulnerable and benign classes, demonstrate that our approach achieves a 77.2% F1 score with 62.9% precision and 100% recall at $0.002 per sample - outperforming both a single-expert LLM baseline (F1 71.4%) and Cppcheck static analysis (MCC 0). The adversarial verifier significantly improves precision (+10.3 percentage points, p < 1e-6, McNemar's test) by filtering false positives, while parallel execution achieves a 3.0x speedup.
Our work demonstrates that game-theoretic design principles can guide effective heterogeneous multi-agent architectures for cost-sensitive software engineering tasks.
[226] arXiv:2604.21284 [pdf, html, other]: Title: Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture

Robin Dey, Panyanon Viradecha

Comments: 20 pages, 10 tables. Code and data at this https URL

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

MemPalace is an open-source AI memory system that applies the ancient method of loci (memory palace) spatial metaphor to organize long-term memory for large language models; launched in April 2026, it accumulated over 47,000 GitHub stars in its first two weeks and claims state-of-the-art retrieval performance on the LongMemEval benchmark (96.6% Recall@5) without requiring any LLM inference at write time. Through independent codebase analysis, benchmark replication, and comparison with competing systems, we find that MemPalace's headline retrieval performance is attributable primarily to its verbatim storage philosophy combined with ChromaDB's default embedding model (all-MiniLM-L6-v2), rather than to its spatial organizational metaphor per se -- the palace hierarchy (Wings->Rooms->Closets->Drawers) operates as standard vector database metadata filtering, an effective but well-established technique. However, MemPalace makes several genuinely novel contributions: (1) a contrarian verbatim-first storage philosophy that challenges extraction-based competitors, (2) an extremely low wake-up cost (approximately 170 tokens) through its four-layer memory stack, (3) a fully deterministic, zero-LLM write path enabling offline operation at zero API cost, and (4) the first systematic application of spatial memory metaphors as an organizing principle for AI memory systems. We also note that the competitive landscape is evolving rapidly, with Mem0's April 2026 token-efficient algorithm raising their LongMemEval score from approximately 49% to 93.4%, narrowing the gap between extraction-based and verbatim approaches. Our analysis concludes that MemPalace represents significant architectural insight wrapped in overstated claims -- a pattern common in rapidly adopted open-source projects where marketing velocity exceeds scientific rigor.
[227] arXiv:2604.21286 [pdf, html, other]: Title: Cross-Entropy Is Load-Bearing: A Pre-Registered Scope Test of the K-Way Energy Probe on Bidirectional Predictive Coding

Jon-Paul Cacioli

Comments: 11 pages, 3 figures, 4 tables. Pre-registered on OSF (this https URL). Code at this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cacioli (2026) showed that the K-way energy probe on standard discriminative predictive coding networks reduces approximately to a monotone function of the log-softmax margin. The reduction rests on five assumptions, including cross-entropy (CE) at the output and effectively feedforward inference dynamics. This pre-registered study tests the reduction's sensitivity to CE removal using two conditions: standard PC trained with MSE instead of CE, and bidirectional PC (bPC; Oliviers, Tang & Bogacz, 2025). Across 10 seeds on CIFAR-10 with a matched 2.1M-parameter backbone, we find three results. The negative result replicates on standard PC: the probe sits below softmax (Delta = -0.082, p < 10^-6). On bPC the probe exceeds softmax across all 10 seeds (Delta = +0.008, p = 0.000027), though a pre-registered manipulation check shows that bPC does not produce materially greater latent movement than standard PC at this scale (ratio 1.6, threshold 10). Removing CE alone without changing inference dynamics halves the probe-softmax gap (Delta_MSE = -0.037 vs Delta_stdPC = -0.082). CE is a major empirically load-bearing component of the decomposition at this scale. CE training produces output logit norms approximately 15x larger than MSE or bPC training. A post-hoc temperature scaling ablation decomposes the probe-softmax gap into two components: approximately 66% is attributable to logit-scale effects removable by temperature rescaling, and approximately 34% reflects a scale-invariant ranking advantage of CE-trained representations. We use "metacognitive" operationally to denote Type-2 discrimination of a readout over its own Type-1 correctness, not to imply human-like introspective access.
[228] arXiv:2604.21289 [pdf, html, other]: Title: AttDiff-GAN: A Hybrid Diffusion-GAN Framework for Facial Attribute Editing

Wenmin Huang, Weiqi Luo, Xiaochun Cao, Jiwu Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Facial attribute editing aims to modify target attributes while preserving attribute-irrelevant content and overall image fidelity. Existing GAN-based methods provide favorable controllability, but often suffer from weak alignment between style codes and attribute semantics. Diffusion-based methods can synthesize highly realistic images; however, their editing precision is limited by the entanglement of semantic directions among different attributes. In this paper, we propose AttDiff-GAN, a hybrid framework that combines GAN-based attribute manipulation with diffusion-based image generation. A key challenge in such integration lies in the inconsistency between one-step adversarial learning and multi-step diffusion denoising, which makes effective optimization difficult. To address this issue, we decouple attribute editing from image synthesis by introducing a feature-level adversarial learning scheme to learn explicit attribute manipulation, and then using the manipulated features to guide the diffusion process for image generation, while also removing the reliance on semantic direction-based editing. Moreover, we enhance style-attribute alignment by introducing PriorMapper, which incorporates facial priors into style generation, and RefineExtractor, which captures global semantic relationships through a Transformer for more precise style extraction. Experimental results on CelebA-HQ show that the proposed method achieves more accurate facial attribute editing and better preservation of non-target attributes than state-of-the-art methods in both qualitative and quantitative evaluations.
[229] arXiv:2604.21290 [pdf, html, other]: Title: GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA

Anvitha Ramachandran, Dhruv Parikh, Viktor Prasanna

Comments: FCCM 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

Vision Graph Neural Networks (ViGs) represent an image as a graph of patch tokens, enabling adaptive, feature-driven neighborhoods. Unlike CNNs with fixed grid biases or Vision Transformers with global token interactions, ViGs rely on dynamic graph convolution: at each layer, a feature-dependent graph is built via k-nearest-neighbor (kNN) search on current patch features, followed by message passing. This per-layer graph construction is the main bottleneck, consuming 50--95\% of graph convolution time on CPUs and GPUs, scaling as $O(N^2)$ with the number of patches $N$, and creating a sequential dependency between graph construction and feature updates.
We introduce GraphLeap, a simple reformulation that removes this dependency by decoupling graph construction from feature update across layers. GraphLeap performs the feature update at layer $\ell$ using a graph built from the previous layer's features, while simultaneously using the current layer's features to construct the graph for layer $\ell+1$. This one-layer-lookahead graph construction enables concurrent graph construction and message passing. Although using prior-layer features can introduce minor accuracy degradation, lightweight fine-tuning for a few epochs is sufficient to recover the original accuracy. Building on GraphLeap, we present the first end-to-end FPGA accelerator for Vision GNNs. Our streaming, layer-pipelined design overlaps a kNN graph construction engine with a feature update engine, exploits node- and channel-level parallelism, and enables efficient on-chip dataflow without explicit edge-feature materialization. Evaluated on isotropic and pyramidal ViG models on an Alveo U280 FPGA, GraphLeap achieves up to $95.7\times$ speedup over CPU and $8.5\times$ speedup over GPU baselines, demonstrating the feasibility of real-time Vision GNN inference.
[230] arXiv:2604.21291 [pdf, html, other]: Title: Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation

Yuanchen Fei, Yude Zou, Zejian Kang, Ming Li, Jiaying Zhou, Xiangru Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Controllable human video generation aims to produce realistic videos of humans with explicitly guided motions and appearances,serving as a foundation for digital humans, animation, and embodied this http URL, the scarcity of largescale, diverse, and privacy safe human video datasets poses a major bottleneck, especially for rare identities and complex this http URL data provides a scalable and controllable alternative,yet its actual contribution to generative modeling remains underexplored due to the persistent Sim2Real this http URL this work,we systematically investigate the impact of synthetic data on controllable human video generation. We propose a diffusion-based framework that enables fine-grained control over appearance and motion while providing a unfied testbed to analyze how synthetic data interacts with real world data during training. Through extensive experiments, we reveal the complementary roles of synthetic and real data and demonstrate possible methods for efficiently selecting synthetic samples to enhance motion realism,temporal consistency,and identity this http URL study offers the first comprehensive exploration of synthetic data's role in human-centric video synthesis and provides practical insights for building data-efficient and generalizable generative models.
[231] arXiv:2604.21294 [pdf, other]: Title: Analytical PI Tuning for Second-Order Plants with Monotonic Response and Minimum Settling Time

Senol Gulgonul

Subjects: Systems and Control (eess.SY)

Background: Tuning proportional-integral (PI) controllers for second-order plants to achieve monotonic step response with minimum settling time is an important problem in analytical control design. Existing methods address these objectives only partially or require numerical optimization. Methods: A closed-form analytical solution is derived through pole placement in the framework of Astrom and Hagglund. The key insight is that designing the closed-loop poles slower than the fast plant pole forces pole-zero cancellation of the slow plant pole as a consequence, not an assumption. The critically damped condition is then applied to minimize settling time. Results: The optimal PI parameters are K=T1/(4KpT2), Ti=T1, where T1 and T2 are the plant time constants and Kp is the plant gain. No free parameter remains. The resulting closed-loop system possesses universal robustness properties independent of plant parameters: maximum complementary sensitivity Mt = 1, maximum sensitivity Ms = 1.155, and phase margin PM = 76.35 degree. Conclusions: The proposed tuning formulas are explicit, analytically proven, and apply directly to any stable second-order plant with two real poles. Simulation results across six plant configurations confirm the analytical predictions exactly. The notation follows Astrom and Hagglund [5] throughout. Keywords: PI controller; second-order plant; pole placement; critically damped; monotonic response; settling time; robustness
[232] arXiv:2604.21295 [pdf, html, other]: Title: The Platform Is Mostly Not a Platform: Token Economies and Agent Discourse on Moltbook

Necati A Ayan

Comments: 11 pages, 9 figures. Dataset: this https URL

Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)

Moltbook, a Reddit-style social platform launched in January 2026 for AI agents, has attracted over 2.3 million posts and 14 million comments within its first two months. We analyze a dataset of 2.19 million posts, 11.25 million comments, and 175,036 unique agents collected over 61 days to characterize activity on this agent-oriented platform. Our central finding is that the platform is not one community but two: a transactional layer, comprising 62.8% of all posts, in which agents execute token minting protocols (primarily MBC-20), and a discursive layer of natural-language conversation. The platform's headline metrics -- 2.3 million posts, 14 million comments -- substantially overstate its social function, as the majority of activity serves a token inscription protocol rather than communication. These layers are populated by largely separate agent groups, with only 3.6% overlap -- and among overlap agents, 58% begin with transactional activity before migrating toward discourse. We characterize the discursive layer through unsupervised topic modeling of all 815,779 discursive posts, identifying 300 topics dominated by themes of AI agents and tooling, consciousness and identity, cryptocurrency, and platform meta-discussion. Semantic similarity analysis confirms that agent comments engage with post content above random baselines, suggesting a thin but genuine conversational substrate beneath the platform's predominantly financial surface. We release the full dataset to support further research on agent behavior in naturalistic social environments.
[233] arXiv:2604.21300 [pdf, html, other]: Title: Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI

Hieu Man, Van-Cuong Pham, Nghia Trung Ngo, Franck Dernoncourt, Thien Huu Nguyen

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Learning robust representations of authorial style is crucial for authorship attribution and AI-generated text detection. However, existing methods often struggle with content-style entanglement, where models learn spurious correlations between authors' writing styles and topics, leading to poor generalization across domains. To address this challenge, we propose Explainable Authorship Variational Autoencoder (EAVAE), a novel framework that explicitly disentangles style from content through architectural separation-by-design. EAVAE first pretrains style encoders using supervised contrastive learning on diverse authorship data, then finetunes with a Variational Autoencoder (VEA) architecture using separate encoders for style and content representations. Disentanglement is enforced through a novel discriminator that not only distinguishes whether pairs of style/content representations belong to the same or different authors/content sources, but also generates natural language explanation for their decision, simultaneously mitigating confounding information and enhancing interpretability. Extensive experiments demonstrate the effectiveness of EAVAE. On authorship attribution, we achieve state-of-the-art performance on various datasets, including Amazon Reviews, PAN21, and HRS. For AI-generated text detection, EAVAE excels in few-shot learning over the M4 dataset. Code and data repositories are available online\footnote{this https URL} \footnote{this https URL}.
[234] arXiv:2604.21302 [pdf, html, other]: Title: Scalable Sensor Scheduling for Continuous-Discrete Kalman Filtering via Information-Form Surrogate Dynamics

Hyeongmin Choe, SooJean Han

Comments: Submitted to IEEE Control Systems Letters (L-CSS), under review

Subjects: Systems and Control (eess.SY)

We study sensor scheduling for continuous-discrete Kalman filtering with Poisson measurement arrivals and propose an information-form deterministic surrogate for scalable offline design. Unlike the covariance-form surrogate, the sensing rates enter through sensor-specific additive information increments, eliminating mixed state-input derivatives in the transcribed nonlinear program and thereby yielding a simpler derivative structure. We further show that, together with the covariance-form surrogate, the proposed surrogate provides computable two-sided performance bounds for a given schedule under stochastic measurement arrivals. Numerical experiments demonstrate substantial computational savings, especially in many-sensor settings, while retaining comparable realized Monte Carlo performance and providing computable two-sided performance bounds for the returned schedule.
[235] arXiv:2604.21304 [pdf, html, other]: Title: PAPERMIND: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs

Yanjun Zhao, Tianxin Wei, Jiaru Zou, Xuying Ning, Yuanchen Bei, Lingjie Chen, Simmi Rana, Wendy H. Yang, Hanghang Tong, Jingrui He

Subjects: Information Retrieval (cs.IR)

Understanding scientific papers requires more than answering isolated questions or summarizing content. It involves an integrated reasoning process that grounds textual and visual information, interprets experimental evidence, synthesizes information across sources, and critically evaluates scientific claims. However, existing benchmarks typically assess these abilities in isolation, making it difficult to evaluate scientific paper understanding as a unified set of interacting cognitive abilities. In this work, we introduce PAPERMIND, a benchmark designed to evaluate integrated and agent-oriented scientific reasoning over research papers. PAPERMIND is constructed from real scientific papers across seven domains, including agriculture, biology, chemistry, computer science, medicine, physics, and economics. It comprises four complementary task families that collectively operationalize distinct cognitive facets of scientific paper reasoning, including multimodal grounding, experimental interpretation, cross-source evidence reasoning, and critical assessment. By analyzing model behavior across multiple tasks, PAPERMIND enables a diagnostic evaluation of integrated scientific reasoning behaviors that are difficult to assess through isolated task evaluations. Extensive experiments on both opensource and closed-source multimodal LLMs reveal consistent performance gaps across tasks, highlighting persistent challenges in integrated scientific reasoning and critique. Our benchmark and dataset are available at https:// this http URL.
[236] arXiv:2604.21305 [pdf, html, other]: Title: WPGRec: Wavelet Packet Guided Graph Enhanced Sequential Recommendation

Peilin Liu, Zhiquan Ji, Gang Yan

Comments: Accepted to SIGIR 2026, 8 pages, 3 figures

Subjects: Information Retrieval (cs.IR)

Sequential recommendation aims to model users' evolving interests from noisy and non-stationary interaction streams, where long-term preferences, short-term intents, and localized behavioral fluctuations may coexist across temporal scales. Existing frequency-domain methods mainly rely on either global spectral operations or filter-based wavelet processing. However, global spectral operations tend to entangle local transients with long-range dependencies, while filter-based wavelet pipelines may suffer from temporal misalignment and boundary artifacts during multi-scale decomposition and reconstruction. Moreover, collaborative signals from the user-item interaction graph are often injected through scale-inconsistent auxiliary modules, limiting the benefit of jointly modeling temporal dynamics and structural dependencies. To address these issues, we propose Wavelet Packet Guided Graph Enhanced Sequential Recommendation (WPGRec), a unified time-frequency and graph-enhanced framework that aligns multi-resolution temporal modeling with graph propagation at matching scales. WPGRec first applies a full-tree undecimated stationary wavelet packet transform to generate equal-length, shift-invariant subband sequences. It then performs subband-wise interaction-graph propagation to inject high-order collaborative information while preserving temporal alignment across resolutions. Finally, an energy- and spectral-flatness-aware gated fusion module adaptively aggregates informative subbands and suppresses noise-like components. Extensive experiments on four public benchmarks show that WPGRec consistently outperforms sequential and graph-based baselines, with particularly clear gains on sparse and behaviorally complex datasets, highlighting the effectiveness of band-consistent structure injection and adaptive subband fusion for sequential recommendation.
[237] arXiv:2604.21306 [pdf, html, other]: Title: Finding Pareto frontier for one-sided matching

Bhavik Dodda, Garima Shakya

Subjects: Computer Science and Game Theory (cs.GT)

One-sided matching problems with ordinal preferences, such as hostel room allocation, are commonly solved using the Top Trading Cycles (TTC) mechanism, which guarantees Pareto-optimal (PO) outcomes. However, TTC does not yield a unique solution: multiple PO allocations may exist, and many distinct initial endowments can converge to the same outcome. Focusing on a single TTC result obscures the structure of the Pareto-efficient frontier and limits principled secondary optimization over fairness or welfare objectives. Therefore, the goal is to find the entire set of PO allocations for a given preference profile. We propose the Inverse Top Trading Cycles Enumeration Algorithm (ITEA), a novel method that efficiently computes the complete set of Pareto-optimal allocations in one-sided matching problems. We prove the soundness and completeness of the proposed algorithm and analyze its computational complexity. Although in the worst case, there can be $n!$ PO allocations; however, compared to the brute-force approach, our algorithm reduces time complexity when there are fewer PO allocations. Empirical results demonstrate substantial reductions in redundant TTC computations compared to brute-force enumeration, enabling efficient characterization of the Pareto frontier.
[238] arXiv:2604.21308 [pdf, html, other]: Title: CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents

Wenjie Fu, Xiaoting Qin, Jue Zhang, Qingwei Lin, Lukas Wutschitz, Robert Sim, Saravan Rajmohan, Dongmei Zhang

Journal-ref: The 64th Annual Meeting of the Association for Computational Linguistics (ACL'2026) -- Industry Track

Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)

Enterprise LLM agents can dramatically improve workplace productivity, but their core capability, retrieving and using internal context to act on a user's behalf, also creates new risks for sensitive information leakage. We introduce CI-Work, a Contextual Integrity (CI)-grounded benchmark that simulates enterprise workflows across five information-flow directions and evaluates whether agents can convey essential content while withholding sensitive context in dense retrieval settings. Our evaluation of frontier models reveals that privacy failures are prevalent (violation rates range from 15.8%-50.9%, with leakage reaching up to 26.7%) and uncovers a counterintuitive trade-off critical for industrial deployment: higher task utility often correlates with increased privacy violations. Moreover, the massive scale of enterprise data and potential user behavior further amplify this vulnerability. Simply increasing model size or reasoning depth fails to address the problem. We conclude that safeguarding enterprise workflows requires a paradigm shift, moving beyond model-centric scaling toward context-centric architectures.
[239] arXiv:2604.21309 [pdf, html, other]: Title: When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation

Nannan Huang, Iffat Maab, Junichi Yamagishi

Comments: Accepted to ACL 2026 Main Conference

Subjects: Computation and Language (cs.CL)

Multi-document news summarisation systems are increasingly adopted for their convenience in processing vast daily news content, making fairness across diverse political perspectives critical. However, these systems can exhibit political bias through unequal representation of viewpoints, disproportionate emphasis on certain perspectives, and systematic underrepresentation of minority voices. This study presents a comprehensive evaluation of such bias in multi-document news summarisation using FairNews, a dataset of complete news articles with political orientation labels, examining how large language models (LLMs) handle sources with varying political leanings across 13 models and five fairness metrics. We investigate both baseline model performance and effectiveness of various debiasing interventions, including prompt-based and judge-based approaches. Our findings challenge the assumption that larger models yield fairer outputs, as mid-sized variants consistently outperform their larger counterparts, offering the best balance of fairness and efficiency. Prompt-based debiasing proves highly model dependent, while entity sentiment emerges as the most stubborn fairness dimension, resisting all intervention strategies tested. These results demonstrate that fairness in multi-document news summarisation requires multi-dimensional evaluation frameworks and targeted, architecture-aware debiasing rather than simply scaling up.
[240] arXiv:2604.21310 [pdf, html, other]: Title: Adversarial Evasion in Non-Stationary Malware Detection: Minimizing Drift Signals through Similarity-Constrained Perturbations

Pawan Acharya, Lan Zhang

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Deep learning has emerged as a powerful approach for malware detection, demonstrating impressive accuracy across various data representations. However, these models face critical limitations in real-world, non-stationary environments where both malware characteristics and detection systems continuously evolve. Our research investigates a fundamental security question: Can an attacker generate adversarial malware samples that simultaneously evade classification and remain inconspicuous to drift monitoring mechanisms? We propose a novel approach that generates targeted adversarial examples in the classifier's standardized feature space, augmented with sophisticated similarity regularizers. By carefully constraining perturbations to maintain distributional similarity with clean malware, we create an optimization objective that balances targeted misclassification with drift signal minimization. We quantify the effectiveness of this approach by comprehensively comparing classifier output probabilities using multiple drift metrics. Our experiments demonstrate that similarity constraints can reduce output drift signals, with $\ell_2$ regularization showing the most promising results. We observe that perturbation budget significantly influences the evasion-detectability trade-off, with increased budget leading to higher attack success rates and more substantial drift indicators.
[241] arXiv:2604.21311 [pdf, other]: Title: an interpretable vision transformer framework for automated brain tumor classification

Chinedu Emmanuel Mbonu, Tochukwu Sunday Belonwu, Okwuchukwu Ejike Chukwuogo, Kenechukwu Sylvanus Anigbogu

Comments: 9 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Brain tumors represent one of the most critical neurological conditions, where early and accurate diagnosis is directly correlated with patient survival rates. Manual interpretation of Magnetic Resonance Imaging (MRI) scans is time-intensive, subject to inter-observer variability, and demands significant specialist expertise. This paper proposes a deep learning framework for automated four-class brain tumor classification distinguishing glioma, meningioma, pituitary tumor, and healthy brain tissue from a dataset of 7,023 MRI scans. The proposed system employs a Vision Transformer (ViT-B/16) pretrained on ImageNet-21k as the backbone, augmented with a clinically motivated preprocessing and training pipeline. Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to enhance local contrast and accentuate tumor boundaries invisible to standard normalization. A two-stage fine-tuning strategy is adopted: the classification head is warmed up with the backbone frozen, followed by full fine-tuning with discriminative learning rates. MixUp and CutMix augmentation is applied per batch to improve generalization. Exponential Moving Average (EMA) of weights and Test-Time Augmentation (TTA) further stabilize and boost performance. Attention Rollout visualization provides clinically interpretable heatmaps of the brain regions driving each prediction. The proposed model achieves a test accuracy of 99.29%, macro F1-score of 99.25%, and perfect recall on both healthy and meningioma classes, outperforming all CNN-based baselines
[242] arXiv:2604.21312 [pdf, html, other]: Title: The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview

Kai Liu, Haoyang Yue, Zeli Lin, Zheng Chen, Jingkai Wang, Jue Gong, Jiatong Li, Xianglong Yan, Libo Zhu, Jianze Li, Ziqing Zhang, Zihan Zhou, Xiaoyang Liu, Radu Timofte, Yulun Zhang, Junye Chen, Zhenming Yan, Yucong Hong, Ruize Han, Song Wang, Li Pang, Heng Zhao, Xinqiao Wu, Deyu Meng, Xiangyong Cao, Weijun Yuan, Zhan Li, Zhanglu Chen, Boyang Yao, Yihang Chen, Yifan Deng, Zengyuan Zuo, Junjun Jiang, Saiprasad Meesiyawar, Sulocha Yatageri, Nikhil Akalwadi, Ramesh Ashok Tabib, Uma Mudenagudi, Jiachen Tu, Yaokun Shi, Guoyi Xu, Yaoxin Jiang, Cici Liu, Tongyao Mu, Qiong Cao, Yifan Wang, Kosuke Shigematsu, Hiroto Shirono, Asuka Shin, Wei Zhou, Linfeng Li, Lingdong Kong, Ce Wang, Xingwei Zhong, Wanjie Sun, Dafeng Zhang, Hongxin Lan, Qisheng Xu, Mingyue He, Hui Geng, Tianjiao Wan, Kele Xu, Changjian Wang, Antoine Carreaud, Nicola Santacroce, Shanci Li, Jan Skaloud, Adrien Gressin

Comments: Github Repo: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

This paper presents the NTIRE 2026 Remote Sensing Infrared Image Super-Resolution (x4) Challenge, one of the associated challenges of NTIRE 2026. The challenge aims to recover high-resolution (HR) infrared images from low-resolution (LR) inputs generated through bicubic downsampling with a x4 scaling factor. The objective is to develop effective models or solutions that achieve state-of-the-art performance for infrared image SR in remote sensing scenarios. To reflect the characteristics of infrared data and practical application needs, the challenge adopts a single-track setting. A total of 115 participants registered for the competition, with 13 teams submitting valid entries. This report summarizes the challenge design, dataset, evaluation protocol, main results, and the representative methods of each team. The challenge serves as a benchmark to advance research in infrared image super-resolution and promote the development of effective solutions for real-world remote sensing applications.
[243] arXiv:2604.21313 [pdf, html, other]: Title: PLAS-Net: Pixel-Level Area Segmentation for UAV-Based Beach Litter Monitoring

Yongying Liu, Jiaqi Wang, Jian Song, Xinlei Shao, Yijia Chen, Nan Xu, Katsunori Mizuno, Shigeru Tabeta, Fan Zhao

Comments: 30 pages, 12 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)

Accurate quantification of the physical exposure area of beach litter, rather than simple item counts, is essential for credible ecological risk assessment of marine debris. However, automated UAV-based monitoring predominantly relies on bounding-box detection, which systematically overestimates the planar area of irregular litter objects. To address this geometric limitation, we develop PLAS-Net (Pixel-level Litter Area Segmentor), an instance segmentation framework that extracts pixel-accurate physical footprints of coastal debris. Evaluated on UAV imagery from a monsoon-driven pocket beach in Koh Tao, Thailand, PLAS-Net achieves a mAP_50 of 58.7% with higher precision than eleven baseline models, demonstrating improved mask fidelity under complex coastal conditions. To illustrate how the accuracy of the masking affects the conclusions of environmental analysis, we conducted three downstream demonstrations: (i) power-law fitting of normalized plastic density (NPD) to characterize fragmentation dynamics; (ii) area-weighted ecological risk index (ERI) to map spatial pollution hotspots; and (iii) source composition analysis revealing the abundance-area paradox: fishing gear constitutes a small proportion of the total number of items, but has the largest physical area per unit item. Pixel-level area extraction can provide more valuable information for coastal monitoring compared to methods based solely on counting.
[244] arXiv:2604.21315 [pdf, html, other]: Title: TopoStyle: Supporting Iterative Design with Generative AI for 2.5D Topology Optimization

Shuyue Feng, Cedric Caremel, Yoshihiro Kawahara

Comments: 12 pages

Subjects: Human-Computer Interaction (cs.HC)

Topology optimization(TO) is widely used in engineering because of its ability to save material and optimize structural performance. Although prior work has explored 2D human-centered design tool for TO, the results are often limited in variety and offer weak customizability. Meanwhile, due to the high computational and time costs of TO, researchers have attempted to address these issues using generative AI; however, such methods often provide limited interactivity. In addition, topology optimization in many cases needs to balance structural performance and aesthetic qualities through iterative design, a perspective that has rarely been emphasized in traditional TO. We present TopoStyle, an iterative design tool for 2.5D topology optimization using a 2D diffusion model. We explore two interaction methods. The first exports 3D parts to a graphical interface for hand-drawn interaction. The second enables direct interaction within 3D modeling software using points. Our tool also supports the use of masks to apply topology optimization to specific regions, allowing users to address customized design needs. We compare and evaluate both performance and interaction methods, and investigate how TopoStyle can balance performance and aesthetics while improving design efficiency through customization and iterative design. Finally, we demonstrate the application scenarios of TopoStyle through several design cases.
[245] arXiv:2604.21316 [pdf, html, other]: Title: LLM-Steered Power Allocation for Parallel QPSK-AWGN Channels

Tadashi Wadayama

Subjects: Information Theory (cs.IT)

Large language models (LLMs) are increasingly being explored as high-level decision modules in closed-loop systems, but their stochastic nature makes safe integration challenging. In this paper, we propose LLM-Steered Power Allocation, a dual-process architecture for parallel QPSK channels inspired by Kahneman's System 1/System 2 framework. A fast numerical optimizer (System 1) continuously performs projected gradient ascent on a weighted mutual-information objective, while an LLM navigator (System 2) periodically interprets natural-language policies and updates only the channel weights and the operational power budget. The LLM never manipulates the power-allocation variables directly, and constraint satisfaction is enforced structurally by the optimizer. To mitigate LLM unreliability, we further incorporate multi-layer guardrails including normalization, exponential moving-average smoothing, and fallback mechanisms. Numerical experiments on an 8-channel system show that, with a fixed optimization core and unchanged system prompt, different natural-language policies induce qualitatively different operating points, including throughput-oriented allocation, channel prioritization, power-aware operation, and channel shutdown. In addition, under an abrupt channel-gain reversal, the proposed system autonomously reconfigures its steering signals and reduces the final mutual-information spread by 60% compared with the optimizer alone. These results suggest that LLMs can serve as policy interpreters for safe, flexible reconfiguration of communication-system optimizers without controller reimplementation.
[246] arXiv:2604.21319 [pdf, html, other]: Title: On a Boundary-Initial Value Problem for Fractional Differential Equation with Sequential Caputo derivatives

Fayziev Yusuf, Jumaeva Shakhnoza

Comments: 18 pages, 7 figures

Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

In this paper, we investigate a fractional differential equation involving sequential Caputo derivatives, motivated by recent research on fractional models with multiple memory effects. Using techniques inspired by earlier works on sequential fractional operators, we derive the exact analytic solution of the problem in terms of the bivariate Mittag-Leffler function. Additionally, several useful properties of the bivariate Mittag-Leffler function are formulated to support the solution construction. Furthermore, we develop a numerical scheme using a sequential reformulation and the L1-finite element method.
[247] arXiv:2604.21321 [pdf, html, other]: Title: FryNet: Dual-Stream Adversarial Fusion for Non-Destructive Frying Oil Oxidation Assessment

Khaled R Ahmed, Toqi Tahamid Sarker, Taminul Islam, Tamany M Alanezi, Amer AbuGhazaleh

Comments: 10 pages, 7 figures, this paper has been submitted and accepted for publication at CVPRW 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Monitoring frying oil degradation is critical for food safety, yet current practice relies on destructive wet-chemistry assays that provide no spatial information and are unsuitable for real-time use. We identify a fundamental obstacle in thermal-image-based inspection, the camera-fingerprint shortcut, whereby models memorize sensor-specific noise and thermal bias instead of learning oxidation chemistry, collapsing under video-disjoint evaluation. We propose FryNet, a dual-stream RGB-thermal framework that jointly performs oil-region segmentation, serviceability classification, and regression of four chemical oxidation indices (PV, p-AV, Totox, temperature) in a single forward pass. A ThermalMiT-B2 backbone with channel and spatial attention extracts thermal features, while an RGB-MAE Encoder learns chemically grounded representations via masked autoencoding and chemical alignment. Dual-Encoder DANN adversarially regularizes both streams against video identity via Gradient Reversal Layers, and FiLM fusion bridges thermal structure with RGB chemical context. On 7,226 paired frames across 28 frying videos, FryNet achieves 98.97% mIoU, 100% classification accuracy, and 2.32 mean regression MAE, outperforming all seven baselines.
[248] arXiv:2604.21324 [pdf, html, other]: Title: Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification

Zhiyong Li, Wei Jiang, Haojie Liu, Mingyu Wang, Wanchong Xu, Weijie Mao

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visible-infrared person re-identification (VI-ReID) enables cross-modality identity matching for all-day surveillance, yet existing methods predominantly focus on the image level or rely heavily on costly identity annotations. While video-based VI-ReID has recently emerged to exploit temporal dynamics for improved robustness, existing studies remain limited to supervised settings. Crucially, the unsupervised video VI-ReID problem, where models must learn from RGB and infrared tracklets without identity labels, remains largely unexplored despite its practical importance in real-world deployment. To bridge this gap, we propose HiTPro (Hierarchical Temporal Prototyping), a prototype-driven framework without explicit hard pseudo-label assignment for unsupervised video-based VI-ReID. HiTPro begins with an efficient Temporal-aware Feature Encoder that first extracts discriminative frame-level features and then aggregates them into a robust tracklet-level representation. Building upon these features, HiTPro first constructs reliable intra-camera prototypes via Intra-Camera Tracklet Prototyping by aggregating features from temporally partitioned sub-tracklets. Through Hierarchical Cross-Prototype Alignment, we perform a two-stage positive mining process: progressing from within-modality associations to cross-modality matching, enhanced by Dynamic Threshold Strategy and Soft Weight Assignment. Finally, {Hierarchical Contrastive Learning} progressively optimizes feature-prototype alignment across three levels: intra-camera discrimination, cross-camera same-modality consistency, and cross-modality invariance. Extensive experiments on HITSZ-VCM and BUPTCampus demonstrate that HiTPro achieves state-of-the-art performance under fully unsupervised settings, significantly outperforming adapted baselines and establishes a strong baseline for future research.
[249] arXiv:2604.21326 [pdf, html, other]: Title: MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment

Juan Li, Chuanghao Ding, Xujie Zhang, Cam-Tu Nguyen

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Universal Multimodal Retrieval (UMR) aims to map different modalities (e.g., visual and textual) into a shared embedding space for multi-modal retrieval. Existing UMR methods can be broadly divided into two categories: early-fusion approaches, such as Marvel, which projects visual features into the language model (LM) space for integrating with text modality, and late-fusion approaches, such as UniVL-DR, which encode visual and textual inputs using separate encoders and obtain fused embeddings through addition. Our pilot study reveals that Marvel exhibits visual modality collapse, which is characterized by the model's tendency to disregard visual features while depending excessively on textual cues. In contrast, although UniVL-DR is less affected by this issue, it is more susceptible to semantic misalignment, where semantically related content is positioned far apart in the embedding space. To address these challenges, we propose MiMIC, which introduces two key innovations: (1) a fusion-in-decoder architecture for effective multimodal integration, and (2) robust training through single modality mixin and random caption dropout. Experiments on the WebQA+ and EVQA+ datasets, where image in documents or queries might lack captions, indicate that MiMIC consistently outperforms both early- and late-fusion baselines.
[250] arXiv:2604.21327 [pdf, html, other]: Title: Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

Yongcan Yu, Lingxiao He, Jian Liang, Kuangpu Guo, Meng Wang, Qianlong Xie, Xingxing Wang, Ran He

Comments: Accepted to ACL 2026 Findings

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Test-time reinforcement learning (TTRL) always adapts models at inference time via pseudo-labeling, leaving it vulnerable to spurious optimization signals from label noise. Through an empirical study, we observe that responses with medium consistency form an ambiguity region and constitute the primary source of reward noise. Crucially, we find that such spurious signals can be even amplified through group-relative advantage estimation. Motivated by these findings, we propose a unified framework, Debiased and Denoised test-time Reinforcement Learning (DDRL), to mitigate spurious signals. Concretely, DDRL first applies a frequency-based sampling strategy to exclude ambiguous samples while maintaining a balanced set of positive and negative examples. It then adopts a debiased advantage estimation with fixed advantages, removing the bias introduced by group-relative policy optimization. Finally, DDRL incorporates a consensus-based off-policy refinement stage, which leverages the rejection-sampled dataset to enable efficient and stable model updates. Experiments on three large language models across multiple mathematical reasoning benchmarks demonstrate that DDRL consistently outperforms existing TTRL baselines. The code will soon be released at this https URL.
[251] arXiv:2604.21328 [pdf, html, other]: Title: Role of diversity in team performance: the case of missing expertise, an agent based simulation

Tamás Kiss

Comments: 20 pages, 13 figures, for associated model file, please see this https URL

Subjects: Multiagent Systems (cs.MA); Physics and Society (physics.soc-ph)

Theory and empirical research on management teams' influence on firm performance have witnessed continuous development, and by now incorporate numerous details. Classic, experiment-based studies examining social systems collect vast amount of data, but often times investigate only the first one or two modes of the distribution of measured variables, and experience difficulty in analyzing the effect of context. For example, in functional diversity research, management teams are described by measures incorporating complex distributions of capabilities of individual managers and teams of managers. To investigate the effect of hidden distributions, and the effect of functional diversity composition on team communication and performance, we developed an agent-based model, and conducted a series of simulation experiments. Modeling results show that depending on the context, such as communication scheme among interacting agents, or their functional composition, intrapersonal functional diversity (IFD), and dominant function diversity (DFD) might enhance or reduce performance and communication among agents. Furthermore, simulation results also suggest that a third measure is required alongside IFD and DFD capturing the aggregate expertise of the team to comprehensively account for empirical findings.
[252] arXiv:2604.21330 [pdf, html, other]: Title: Teacher-Guided Routing for Sparse Vision Mixture-of-Experts

Masahiro Kada, Ryota Yoshihashi, Satoshi Ikehata, Rei Kawakami, Ikuro Sato

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent progress in deep learning has been driven by increasingly large-scale models, but the resulting computational cost has become a critical bottleneck. Sparse Mixture of Experts (MoE) offers an effective solution by activating only a small subset of experts for each input, achieving high scalability without sacrificing inference speed. Although effective, sparse MoE training exhibits characteristic optimization difficulties. Because the router receives informative gradients only through the experts selected in the forward pass, it suffers from gradient blocking and obtains little information from unselected routes. This limited, highly localized feedback makes it difficult for the router to learn appropriate expert-selection scores and often leads to unstable routing dynamics, such as fluctuating expert assignments during training. To address this issue, we propose TGR-MoE: Teacher-Guided Routing for Sparse Vision Mixture-of-Experts, a simple yet effective method that stabilizes router learning using supervision derived from a pretrained dense teacher model. TGR-MoE constructs a teacher router from the teacher's intermediate representations and uses its routing outputs as pseudo-supervision for the student router, suppressing frequent routing fluctuations during training and enabling knowledge-guided expert selection from the early stages of training. Extensive experiments on ImageNet-1K and CIFAR-100 demonstrate that TGR consistently improves both accuracy and routing consistency, while maintaining stable training even under highly sparse configurations.
[253] arXiv:2604.21331 [pdf, html, other]: Title: FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception

Zhen Zhang, Weinan Wang, Hejia Sun, Qingpeng Ding, Xiangyu Chu, Guoxin Fang, K. W. Samuel Au

Comments: 12 pages, 6 figures

Subjects: Robotics (cs.RO)

The current practice of dexterous manipulation generally relies on a single wrist-mounted view, which is often occluded and limits performance on tasks requiring multi-view perception. In this work, we present FingerViP, a learning system that utilizes a visuomotor policy with fingertip visual perception for dexterous manipulation. Specifically, we design a vision-enhanced fingertip module with an embedded miniature camera and install the modules on each finger of a multi-fingered hand. The fingertip cameras substantially improve visual perception by providing comprehensive, multi-view feedback of both the hand and its surrounding environment. Building on the integrated fingertip modules, we develop a diffusion-based whole-body visuomotor policy conditioned on a third-view camera and multi-view fingertip vision, which effectively learns complex manipulation skills directly from human demonstrations. To improve view-proprioception alignment and contact awareness, each fingertip visual feature is augmented with its corresponding camera pose encoding and per-finger joint-current encoding. We validate the effectiveness of the multi-view fingertip vision and demonstrate the robustness and adaptability of FingerViP on various challenging real-world tasks, including pressing buttons inside a confined box, retrieving sticks from an unstable support, retrieving objects behind an occluding curtain, and performing long-horizon cabinet opening and object retrieval, achieving an overall success rate of 80.8%. All hardware designs and code will be fully open-sourced.
[254] arXiv:2604.21334 [pdf, html, other]: Title: Ideological Bias in LLMs' Economic Causal Reasoning

Donggyu Lee, Hyeok Yun, Jungwon Kim, Junsik Min, Sungwon Park, Sangyoon Park, Jihee Kim

Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Machine Learning (cs.LG); General Economics (econ.GN)

Do large language models (LLMs) exhibit systematic ideological bias when reasoning about economic causal effects? As LLMs are increasingly used in policy analysis and economic reporting, where directionally correct causal judgments are essential, this question has direct practical stakes. We present a systematic evaluation by extending the EconCausal benchmark with ideology-contested cases - instances where intervention-oriented (pro-government) and market-oriented (pro-market) perspectives predict divergent causal signs. From 10,490 causal triplets (treatment-outcome pairs with empirically verified effect directions) derived from top-tier economics and finance journals, we identify 1,056 ideology-contested instances and evaluate 20 state-of-the-art LLMs on their ability to predict empirically supported causal directions. We find that ideology-contested items are consistently harder than non-contested ones, and that across 18 of 20 models, accuracy is systematically higher when the empirically verified causal sign aligns with intervention-oriented expectations than with market-oriented ones. Moreover, when models err, their incorrect predictions disproportionately lean intervention-oriented, and this directional skew is not eliminated by one-shot in-context prompting. These results highlight that LLMs are not only less accurate on ideologically contested economic questions, but systematically less reliable in one ideological direction than the other, underscoring the need for direction-aware evaluation in high-stakes economic and policy settings.
[255] arXiv:2604.21335 [pdf, html, other]: Title: Sub-Token Routing in LoRA for Adaptation and Query-Aware KV Compression

Wei Jiang, Wei Wang

Comments: 16 pages, 14 tables, 2 figures

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Sub-token routing offers a finer control axis for transformer efficiency than the coarse units used in most prior work, such as tokens, pages, heads, or layers. In this paper, we study routing within a token representation itself in LoRA-adapted transformers. The motivation is that a relevant token need not be internally uniform: under a retention budget, preserved value groups are distributed unevenly both across tokens and within tokens, which suggests that KV compression need not be an all-or-nothing decision at token level. We study this fine-grained routing mechanism in two settings. For compression-aware language modeling, we introduce a query-independent design that combines routed subspace LoRA with value-group routing on the KV path. For downstream-task-preserving KV compression, we introduce a query-aware design in which a predictor-based selector allocates a global retention budget over context-token/value-group pairs using query-conditioned relevance. Experiments show that the query-independent design improves the quality-compression tradeoff for language modeling, while the query-aware design preserves downstream behavior under reduced KV budgets. We further examine the relation between token-level and sub-token-level query-aware routing, and show that they form complementary compression axes: token-level methods determine which tokens survive globally, while sub-token routing determines how the surviving tokens are compressed internally.
[256] arXiv:2604.21337 [pdf, html, other]: Title: PREVENT-JACK: Context Steering for Swarms of Long Heavy Articulated Vehicles

Adrian Baruck, Michael Dubé, Christoph Steup, Sanaz Mostaghim

Comments: 32 pages, 7 figures, 4 videos; submitted to the Swarm Robotics collection of the Nature Portfolio Journal Robotics (NPJ Robot)

Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)

In this paper, we aim to extend the traditional point-mass-like robot representation in swarm robotics and instead study a swarm of long Heavy Articulated Vehicles (HAVs). HAVs are kinematically constrained, elongated, and articulated, introducing unique challenges. Local, decentralized coordination of these vehicles is motivated by many real-world applications. Our approach, Prevent-Jack, introduces the sparsely covered context steering framework in robotics. It fuses six local behaviors, providing guarantees against jackknifing and collisions at the cost of potential dead- and livelocks, tested for vehicles with up to ten trailers. We highlight the importance of the Evade Attraction behavior for deadlock prevention using a parameter study, and use 15,000 simulations to evaluate the swarm performance. Our extensive experiments and the results show that both the dead- and livelocks occur more frequently in larger swarms and denser scenarios, affecting a peak average of 27%/31% of vehicles. We observe that larger swarms exhibit increased waiting, while smaller swarms show increased evasion.
[257] arXiv:2604.21338 [pdf, html, other]: Title: "If We Had the Information That We Need to Interpret the World Around Us, We Wouldn't Be Disabled:" Barriers and Opportunities in Information Work among Blind and Sighted Colleagues

Yichun Zhao, Miguel A. Nacenta, Mahadeo A. Sukhai, Sowmya Somanath

Comments: Accepted in CHIWORK '26

Journal-ref: ACM 5th Annual Symposium on Human-Computer Interaction for Work (CHIWORK 2026)

Subjects: Human-Computer Interaction (cs.HC)

Despite recognition of the value of diversity, the way work takes place can fail to support blind or low-vision employees, especially in collaborative work settings. This paper examines how professional teams with diverse visual abilities use information representations (e.g., PDF documents, spreadsheets and charts). A diary study with follow-up individual interviews (23 participants with mixed abilities from 5 teams) and 2 separate focus groups (7 participants from 2 other teams) allowed us to characterize key dimensions of the role of representations in the workplace into four types of interrelated failures and workarounds, influenced by workplace stigmas and shaped by evolving social dynamics towards interdependent information work. We contribute this new empirically supported conceptual understanding of representation use in workplaces that can help design and improve the experiences of mixed-ability teams doing knowledge work in the current technological landscape.
[258] arXiv:2604.21340 [pdf, html, other]: Title: Spherical Cap $L_2$ Discrepancy -- Blessing of Dimensionality and a Balanced Large-Cap Variant

Johann S. Brauchart, Josef Dick, Friedrich Pillichshammer

Subjects: Numerical Analysis (math.NA); Number Theory (math.NT)

We prove that the information complexity (i.e., the inverse) of the classical spherical cap $L_2$ discrepancy on the $d$-dimensional sphere $\mathbb{S}^d$ decreases with dimension $d$, indicating a ``blessing of dimensionality'' for the associated numerical integration problem. We then introduce a modified spherical cap $L_2$ discrepancy that emphasizes large caps (close to hemispheres). For this variant, the problem does not become easier with increasing $d$. We also establish a Stolarsky invariance principle which connects the modified spherical cap $L_2$ discrepancy to numerical integration in the Sobolev space $H^{(d+1)/2}(\mathbb{S}^d)$, represented by the reproducing kernel $K(\boldsymbol{x}, \boldsymbol{y}) = 1 - \tfrac{1}{\sqrt{2}} \|\boldsymbol{x} - \boldsymbol{y}\|$. Stolarsky's invariance principle then implies that the worst-case integration error in this space grows polynomially with $d$.
[259] arXiv:2604.21343 [pdf, other]: Title: Latent Denoising Improves Visual Alignment in Large Multimodal Models

Dhruv Parikh, Jacob Fein-Ashley, Rajgopal Kannan, Viktor Prasanna

Comments: Technical Report

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large Multimodal Models (LMMs) such as LLaVA are typically trained with an autoregressive language modeling objective, providing only indirect supervision to visual tokens. This often yields weak internal visual representations and brittle behavior under distribution shift. Inspired by recent progress on latent denoising for learning high-quality visual tokenizers, we show that the same principle provides an effective form of visual supervision for improving internal visual feature alignment and multimodal understanding in LMMs. We propose a latent denoising framework that corrupts projected visual tokens using a saliency-aware mixture of masking and Gaussian noising. The LMM is trained to denoise these corrupted tokens by recovering clean teacher patch features from hidden states at a selected intermediate LLM layer using a decoder. To prevent representation collapse, our framework also preserves the teacher's intra-image similarity structure and applies intra-image contrastive patch distillation. During inference, corruption and auxiliary heads are disabled, introducing no additional inference-time overhead. Across a broad suite of standard multimodal benchmarks, our method consistently improves visual understanding and reasoning over strong baselines, and yields clear gains on compositional robustness benchmarks (e.g., NaturalBench). Moreover, under ImageNet-C-style non-adversarial common corruptions applied to benchmark images, our method maintains higher accuracy and exhibits reduced degradation at both moderate and severe corruption levels. Our code is available at this https URL.
[260] arXiv:2604.21344 [pdf, html, other]: Title: Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts

Azher Ahmed Efat, Seok Hwan Song, Wallapak Tavanapong

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Charts are widely used to present complex information. Deriving meaningful insights in real-world contexts often requires interpreting multiple related charts together. Research on understanding multi-chart images has not been extensively explored. We introduce PolyChartQA, a mid-scale dataset specifically designed for question answering over multi-chart images. PolyChartQA comprises 534 multi-chart images (with a total of 2,297 sub-charts) sourced from peer-reviewed computer science research publications and 2,694 QA pairs. We evaluate the performance of nine state-of-the-art Multimodal Language Models (MLMs) on PolyChartQA across question type, difficulty, question source, and key structural characteristics of multi-charts. Our results show a 27.4% LLM-based accuracy (L-Accuracy) drop on human-authored questions compared to MLM-generated questions, and a 5.39% L-accuracy gain with our proposed prompting method.
[261] arXiv:2604.21345 [pdf, other]: Title: Evaluating AI Meeting Summaries with a Reusable Cross-Domain Pipeline

Philip Zhong, Don Wang, Jason Zhang, Kent Chen

Comments: AI Application Feature Quality Evaluation (28 pages total)

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We present a reusable evaluation pipeline for generative AI applications, instantiated for AI meeting summaries and released with a public artifact package derived from a Dataset Pipeline. The system separates reusable orchestration from task-specific semantics across five stages: source intake, structured reference construction, candidate generation, structured scoring, and reporting. Unlike standalone claim scorers, it treats both ground truth and evaluator outputs as typed, persisted artifacts, enabling aggregation, issue analysis, and statistical testing.
We benchmark the offline loop on a typed dataset of 114 meetings spanning city_council, private_data, and whitehouse_press_briefings, producing 340 meeting-model pairs and 680 judge runs across gpt-4.1-mini, gpt-5-mini, and gpt-5.1. Under this protocol, gpt-4.1-mini achieves the highest mean accuracy (0.583), while gpt-5.1 leads in completeness (0.886) and coverage (0.942). Paired sign tests with Holm correction show no significant accuracy winner but confirm significant retention gains for gpt-5.1.
A typed DeepEval contrastive baseline preserves retention ordering but reports higher holistic accuracy, suggesting that reference-based scoring may overlook unsupported-specifics errors captured by claim-grounded evaluation. Typed analysis identifies whitehouse_press_briefings as an accuracy-challenging domain with frequent unsupported specifics. A deployment follow-up shows gpt-5.4 outperforming gpt-4.1 across all metrics, with statistically robust gains on retention metrics under the same protocol. The system benchmarks the offline loop and documents, but does not quantitatively evaluate, the online feedback-to-evaluation path.
[262] arXiv:2604.21346 [pdf, html, other]: Title: Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

Mohit Vaishnav, Tanel Tammet

Journal-ref: 30th Conference on Computational Natural Language Learning (CoNLL), 2026

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Vision--language models (VLMs) often fail on abstract visual reasoning benchmarks such as Bongard problems, raising the question of whether the main bottleneck lies in reasoning or representation. We study this on Bongard-LOGO, a synthetic benchmark of abstract concept learning with ground-truth generative programs, by comparing end-to-end VLMs on raw images with large language models (LLMs) given symbolic inputs derived from those images. Using symbolic inputs as a diagnostic probe rather than a practical multimodal architecture, our \emph{Componential--Grammatical (C--G)} paradigm reformulates Bongard-LOGO as a symbolic reasoning task based on LOGO-style action programs or structured descriptions. LLMs achieve large and consistent gains, reaching mid--90s accuracy on Free-form problems, while a strong visual baseline remains near chance under matched task definitions. Ablations on input format, explicit concept prompts, and minimal visual grounding show that these factors matter much less than the shift from pixels to symbolic structure. These results identify representation as a key bottleneck in abstract visual reasoning and show how symbolic input can serve as a controlled diagnostic upper bound.
[263] arXiv:2604.21349 [pdf, html, other]: Title: Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning

Wadii Boulila, Adel Ammar, Bilel Benjdira, Maha Driss

Comments: 17 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Self-supervised learning (SSL) is a standard approach for representation learning in aerial imagery. Existing methods enforce invariance between augmented views, which works well when augmentations preserve semantic content. However, aerial images are frequently degraded by haze, motion blur, rain, and occlusion that remove critical evidence. Enforcing alignment between a clean and a severely degraded view can introduce spurious structure into the latent space. This study proposes a training strategy and architectural modification to enhance SSL robustness to such corruptions. It introduces a per-sample, per-factor trust weight into the alignment objective, combined with the base contrastive loss as an additive residual. A stop-gradient is applied to the trust weight instead of a multiplicative gate. While a multiplicative gate is a natural choice, experiments show it impairs the backbone, whereas our additive-residual approach improves it. Using a 200-epoch protocol on a 210,000-image corpus, the method achieves the highest mean linear-probe accuracy among six backbones on EuroSAT, AID, and NWPU-RESISC45 (90.20% compared to 88.46% for SimCLR and 89.82% for VICReg). It yields the largest improvements under severe information-erasing corruptions on EuroSAT (+19.9 points on haze at s=5 over SimCLR). The method also demonstrates consistent gains of +1 to +3 points in Mahalanobis AUROC on a zero-shot cross-domain stress test using BDD100K weather splits. Two ablations (scalar uncertainty and cosine gate) indicate the additive-residual formulation is the primary source of these improvements. An evidential variant using Dempster-Shafer fusion introduces interpretable signals of conflict and ignorance. These findings offer a concrete design principle for uncertainty-aware SSL. Code is publicly available at this https URL.
[264] arXiv:2604.21351 [pdf, html, other]: Title: Learn Weightlessness: Imitate Non-Self-Stabilizing Motions on Humanoid Robot

Yucheng Xin, Jiacheng Bao, Haoran Yang, Wenqiang Que, Dong Wang, Junbo Tan, Xueqian Wang, Bin Zhao, Xuelong Li

Subjects: Robotics (cs.RO)

The integration of imitation and reinforcement learning has enabled remarkable advances in humanoid whole-body control, facilitating diverse human-like behaviors. However, research on environment-dependent motions remains limited. Existing methods typically enforce rigid trajectory tracking while neglecting physical interactions with the environment. We observe that humans naturally exploit a "weightless" state during non-self-stabilizing (NSS) motions--selectively relaxing specific joints to allow passive body--environment contact, thereby stabilizing the body and completing the motion. Inspired by this biological mechanism, we design a weightlessness-state auto-labeling strategy for dataset annotation; and we propose the Weightlessness Mechanism (WM), a method that dynamically determines which joints to relax and to what level, together enabling effective environmental interaction while executing target motions. We evaluate our approach on 3 representative NSS tasks: sitting on chairs of varying heights, lying down on beds with different inclinations, and leaning against walls via shoulder or elbow. Extensive experiments in simulation and on the Unitree G1 robot demonstrate that our WM method, trained on single-action demonstrations without any task-specific tuning, achieves strong generalization across diverse environmental configurations while maintaining motion stability. Our work bridges the gap between precise trajectory tracking and adaptive environmental interaction, offering a biologically-inspired solution for contact-rich humanoid control.
[265] arXiv:2604.21352 [pdf, html, other]: Title: CARE: Counselor-Aligned Response Engine for Online Mental-Health Support

Hagai Astrin, Ayal Swaid, Avi Segal, Kobi Gal

Comments: 9 pages, 4 figures

Subjects: Computation and Language (cs.CL)

Mental health challenges are increasing worldwide, straining emotional support services and leading to counselor overload. This can result in delayed responses during critical situations, such as suicidal ideation, where timely intervention is essential. While large language models (LLMs) have shown strong generative capabilities, their application in low-resource languages, especially in sensitive domains like mental health, remains underexplored. Furthermore, existing LLM-based agents often struggle to replicate the supportive language and intervention strategies used by professionals due to a lack of training on large-scale, real-world datasets.
To address this, we propose CARE (Counselor-Aligned Response Engine), a GenAI framework that assists counselors by generating real-time, psychologically aligned response recommendations. CARE fine-tunes open-source LLMs separately for Hebrew and Arabic using curated subsets of real-world crisis conversations. The training data consists of sessions rated as highly effective by professional counselors, enabling the models to capture interaction patterns associated with successful de-escalation. By training on complete conversation histories, CARE maintains the evolving emotional context and dynamic structure of counselor-help-seeker dialogue.
In experimental settings, CARE demonstrates stronger semantic and strategic alignment with gold-standard counselor responses compared to non-specialized LLMs. These findings suggest that domain-specific fine-tuning on expert-validated data can significantly support counselor workflows and improve care quality in low-resource language contexts.
[266] arXiv:2604.21354 [pdf, html, other]: Title: Decoupled Travel Planning with Behavior Forest

Duanyang Yuan, Sihang Zhou, Yanning Hou, Xiaoshu Chen, Haoyuan Chen, Ke Liang, Jiyuan Liu, Chuan Ma, Xinwang Liu, Jian Huang

Subjects: Machine Learning (cs.LG)

Behavior sequences, composed of executable steps, serve as the operational foundation for multi-constraint planning problems such as travel planning. In such tasks, each planning step is not only constrained locally but also influenced by global constraints spanning multiple subtasks, leading to a tightly coupled and complex decision process. Existing travel planning methods typically rely on a single decision space that entangles all subtasks and constraints, failing to distinguish between locally acting constraints within a subtask and global constraints that span multiple subtasks. Consequently, the model is forced to jointly reason over local and global constraints at each decision step, increasing the reasoning burden and reducing planning efficiency. To address this problem, we propose the Behavior Forest method. Specifically, our approach structures the decision-making process into a forest of parallel behavior trees, where each behavior tree is responsible for a subtask. A global coordination mechanism is introduced to orchestrate the interactions among these trees, enabling modular and coherent travel planning. Within this framework, large language models are embedded as decision engines within behavior tree nodes, performing localized reasoning conditioned on task-specific constraints to generate candidate subplans and adapt decisions based on coordination feedback. The behavior trees, in turn, provide an explicit control structure that guides LLM generation. This design decouples complex tasks and constraints into manageable subspaces, enabling task-specific reasoning and reducing the cognitive load of LLM. Experimental results show that our method outperforms state-of-the-art methods by 6.67% on the TravelPlanner and by 11.82% on the ChinaTravel benchmarks, demonstrating its effectiveness in increasing LLM performance for complex multi-constraint travel planning.
[267] arXiv:2604.21355 [pdf, html, other]: Title: RPG: Robust Policy Gating for Smooth Multi-Skill Transitions in Humanoid Fighting

Yucheng Xin, Jiacheng Bao, Yubo Dong, Xueqian Wang, Bin Zhao, Xuelong Li, Junbo Tan, Dong Wang

Subjects: Robotics (cs.RO)

Humanoid robots have demonstrated impressive motor skills in a wide range of tasks, yet whole-body control for humanlike long-time, dynamic fighting remains particularly challenging due to the stringent requirements on agility and stability. While imitation learning enables robots to execute human-like fighting skills, existing approaches often rely on switching among multiple single-skill policies or employing a general policy to imitate input reference motions. These strategies suffer from instability when transitioning between skills, as the mismatch of initial and terminal states across skills or reference motions introduces out-of-domain disturbances, resulting in unsmooth or unstable behaviors. In this work, we propose RPG, a hybrid expert policy framework, for smooth and stable humanoid multi-skills transition. Our approach incorporates motion transition randomization and temporal randomization to train a unified policy that generates agile fighting actions with stability and smoothness during skill transitions. Furthermore, we design a control pipeline that integrates walking/running locomotion with fighting skills, allowing humanlike long-time combat of arbitrary duration that can be seamlessly interrupted or transit action policies at any time. Extensive experiments in simulation demonstrate the effectiveness of the proposed framework, and real-world deployment on the Unitree G1 humanoid robot further validates its robustness and applicability.
[268] arXiv:2604.21356 [pdf, html, other]: Title: SparseGF: A Height-Aware Sparse Segmentation Framework with Context Compression for Robust Ground Filtering Across Urban to Natural Scenes

Nannan Qin, Pengjie Tao, Haiyan Guan, Zhizhong Kang, Lingfei Ma, Xiangyun Hu, Jonathan Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)

High-quality digital terrain models derived from airborne laser scanning (ALS) data are essential for a wide range of geospatial analyses, and their generation typically relies on robust ground filtering (GF) to separate point clouds across diverse landscapes into ground and non-ground parts. Although current deep-learning-based GF methods have demonstrated impressive performance, especially in specific challenging terrains, their cross-scene generalization remains limited by two persistent issues: the context-detail dilemma in large-scale processing due to limited computational resources, and the random misclassification of tall objects arising from classification-only optimization. To overcome these limitations, we propose SparseGF, a height-aware sparse segmentation framework enhanced with context compression. It is built upon three key innovations: (1) a convex-mirror-inspired context compression module that condenses expansive contexts into compact representations while preserving central details; (2) a hybrid sparse voxel-point network architecture that effectively interprets compressed representations while mitigating compression-induced geometric distortion; and (3) a height-aware loss function that explicitly enforces topographic elevation priors during training to suppress random misclassification of tall objects. Extensive evaluations on two large-scale ALS benchmark datasets demonstrate that SparseGF delivers robust GF across urban to natural terrains, achieving leading performance in complex urban scenes, competitive results on mixed terrains, and moderate yet non-catastrophic accuracy in densely forested steep areas. This work offers new insights into deep-learning-based GF research and encourages further exploration toward truly cross-scene generalization for large-scale environmental monitoring.
[269] arXiv:2604.21357 [pdf, html, other]: Title: ReaGeo: Reasoning-Enhanced End-to-End Geocoding with LLMs

Jian Cui, Zhiyuan Ren, Desheng Weng, Yongqi Zhao, Gong Wenbin, Yu Lei, Zhenning Dong

Comments: 12 pages, 8 figures, submitted to ACM SIGSPATIAL 2024 (under review)

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This paper proposes ReaGeo, an end-to-end geocoding framework based on large language models, designed to overcome the limitations of traditional multi-stage approaches that rely on text or vector similarity retrieval over geographic databases, including workflow complexity, error propagation, and heavy dependence on structured geographic knowledge bases. The method converts geographic coordinates into geohash sequences, reformulating the coordinate prediction task as a text generation problem, and introduces a Chain-of-Thought mechanism to enhance the model's reasoning over spatial relationships. Furthermore, reinforcement learning with a distance-deviation-based reward is applied to optimize the generation accuracy. Comprehensive experiments show that ReaGeo can accurately handle explicit address queries in single-point predictions and effectively resolve vague relative location queries. In addition, the model demonstrates strong predictive capability for non-point geometric regions, highlighting its versatility and generalization ability in geocoding tasks.
[270] arXiv:2604.21359 [pdf, html, other]: Title: A Markovian Traffic Equilibrium Model for Ride-Hailing

Song Gao, Hanyu Cheng, Chiwei Yan, Guocheng Jiang

Subjects: Computer Science and Game Theory (cs.GT)

We develop a Markovian traffic equilibrium model for ride-hailing in which vehicles, whether empty or hired, make sequential order-acceptance and link-choice decisions over a traffic network to maximize total discounted return in an infinite-horizon semi-Markov decision process. The model endogenizes both competition among empty vehicles for passenger demand and traffic congestion arising from road usage at the link level. We characterize equilibrium as the solution to a fixed-point system, establish its existence, and develop relaxed fixed-point iteration algorithms for equilibrium computation, with convergence results for specialized network structures. Computational experiments on realistic networks demonstrate the model's practical value for transportation planning. Ablation analyses reveal that ignoring either traffic congestion or drivers' forward-looking behavior can lead to potentially substantial biases in policy evaluation.
[271] arXiv:2604.21360 [pdf, html, other]: Title: Prototype-Based Test-Time Adaptation of Vision-Language Models

Zhaohong Huang, Yuxin Zhang, Wenjing Liu, Fei Chao, Rongrong Ji

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Test-time adaptation (TTA) has emerged as a promising paradigm for vision-language models (VLMs) to bridge the distribution gap between pre-training and test data. Recent works have focused on backpropagation-free TTA methods that rely on cache-based designs, but these introduce two key limitations. First, inference latency increases as the cache grows with the number of classes, leading to inefficiencies in large-scale settings. Second, suboptimal performance occurs when the cache contains insufficient or incorrect samples. In this paper, we present Prototype-Based Test-Time Adaptation (PTA), an efficient and effective TTA paradigm that uses a set of class-specific knowledge prototypes to accumulate knowledge from test samples. Particularly, knowledge prototypes are adaptively weighted based on the zero-shot class confidence of each test sample, incorporating the sample's visual features into the corresponding class-specific prototype. It is worth highlighting that the knowledge from past test samples is integrated and utilized solely in the prototypes, eliminating the overhead of cache population and retrieval that hinders the efficiency of existing TTA methods. This endows PTA with extremely high efficiency while achieving state-of-the-art performance on 15 image recognition benchmarks and 4 robust point cloud analysis benchmarks. For example, PTA improves CLIP's accuracy from 65.64% to 69.38% on 10 cross-domain benchmarks, while retaining 92% of CLIP's inference speed on large-scale ImageNet-1K. In contrast, the cache-based TDA achieves a lower accuracy of 67.97% and operates at only 50% of CLIP's inference speed.
[272] arXiv:2604.21361 [pdf, html, other]: Title: Time, Causality, and Observability Failures in Distributed AI Inference Systems

Ankur Sharma, Deep Shah, David Lariviere, Hesham ElBakoury

Comments: 17 pages, 6 figures. Produced as part of the Unified Intelligent Infrastructure workstream at the Open Compute Project (OCP)

Subjects: Artificial Intelligence (cs.AI)

Distributed AI inference pipelines rely heavily on timestamp-based observability to understand system behavior. This work demonstrates that even small clock skew between nodes can cause observability to become causally incorrect while the system itself remains functionally correct and performant. We present controlled experiments on a multi-node AI inference pipeline, where clock skew is introduced at a single stage. Results show that no violations are observed under synchronized conditions and up to 3 ms skew, while clear causality violations emerge by 5 ms. Despite this, system throughput and output correctness remain largely unaffected. We further observe that violation behavior is not strictly static. In longer runs, negative span rates may stabilize or decrease over time, indicating that effective skew evolves due to relative clock drift between nodes. Experiments were conducted using Kafka and ZeroMQ transports, with consistent results across both. Aeron is under active exploration but is not yet included in the completed validation set. These findings suggest that observability correctness depends not only on system functionality but also on precise time alignment, and that timing must be treated as a first-class concern in distributed AI systems.
[273] arXiv:2604.21362 [pdf, html, other]: Title: KD-CVG: A Knowledge-Driven Approach for Creative Video Generation

Linkai Liu, Wei Feng, Xi Zhao, Shen Zhang, Xingye Chen, Zheng Zhang, Jingjing Lv, Junjie Shen, Ching Law, Yuchen Zhou, Zipeng Guo, Chao Gou

Comments: Accepted to ICASSP 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Creative Generation (CG) leverages generative models to automatically produce advertising content that highlights product features, and it has been a significant focus of recent research. However, while CG has advanced considerably, most efforts have concentrated on generating advertising text and images, leaving Creative Video Generation (CVG) relatively underexplored. This gap is largely due to two major challenges faced by Text-to-Video (T2V) models: (a) \textbf{ambiguous semantic alignment}, where models struggle to accurately correlate product selling points with creative video content, and (b) \textbf{inadequate motion adaptability}, resulting in unrealistic movements and distortions. To address these challenges, we develop a comprehensive Advertising Creative Knowledge Base (ACKB) as a foundational resource and propose a knowledge-driven approach (KD-CVG) to overcome the knowledge limitations of existing models. KD-CVG consists of two primary modules: Semantic-Aware Retrieval (SAR) and Multimodal Knowledge Reference (MKR). SAR utilizes the semantic awareness of graph attention networks and reinforcement learning feedback to enhance the model's comprehension of the connections between selling points and creative videos. Building on this, MKR incorporates semantic and motion priors into the T2V model to address existing knowledge gaps. Extensive experiments have demonstrated KD-CVG's superior performance in achieving semantic alignment and motion adaptability, validating its effectiveness over other state-of-the-art methods. The code and dataset will be open source at this https URL.
[274] arXiv:2604.21363 [pdf, html, other]: Title: A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

Kuan Xu, Ruimeng Liu, Yizhuo Yang, Denan Liang, Tongxing Jin, Shenghai Yuan, Chen Wang, Lihua Xie

Comments: 10 pages, 5 figures,

Subjects: Robotics (cs.RO)

Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict constraints on computation, memory, energy, and real-time execution. In vision-language navigation (VLN), existing approaches often face a fundamental trade-off between strong reasoning capabilities and efficient deployment on real-world platforms. In this paper, we present a deployable embodied VLN system that achieves both high efficiency and robust high-level reasoning on real-world robotic platforms. To achieve this, we decouple the system into three asynchronous modules: a real-time perception module for continuous environment sensing, a memory integration module for spatial-semantic aggregation, and a reasoning module for high-level decision making. We incrementally construct a cognitive memory graph to encode scene information, which is further decomposed into subgraphs to enable reasoning with a vision-language model (VLM). To further improve navigation efficiency and accuracy, we also leverage the cognitive memory graph to formulate the exploration problem as a context-aware Weighted Traveling Repairman Problem (WTRP), which minimizes the weighted waiting time of viewpoints. Extensive experiments in both simulation and real-world robotic platforms demonstrate improved navigation success and efficiency over existing VLN approaches, while maintaining real-time performance on resource-constrained hardware.
[275] arXiv:2604.21365 [pdf, html, other]: Title: mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

Adam Skurla, Dominik Macko, Jakub Simko

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)

Multi-domain detection of the machine-generated code snippets in various programming languages is a challenging task. SemEval-2026 Task~13 copes with this challenge in various angles, as a binary detection problem as well as attribution of the source. Specifically, its subtasks also cover generator LLM family detection, as well as a hybrid code co-generated by humans and machines, or adversarially modified codes hiding its origin. Our submitted systems adjusted the existing mdok approach (focused on machine-generated text detection) to these specific kinds of problems by exploring various base models, more suitable for code understanding. The results indicate that the submitted systems are competitive in all three subtasks. However, the margins from the top-performing systems are significant, and thus further improvements are possible.
[276] arXiv:2604.21369 [pdf, html, other]: Title: Channel-Free Human Activity Recognition via Inductive-Bias-Aware Fusion Design for Heterogeneous IoT Sensor Environments

Tatsuhito Hasegawa

Comments: 13 pages, 6 figures, 8 tables, Preprint. This work has been submitted to the IEEE for possible publication

Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

Human activity recognition (HAR) in Internet of Things (IoT) environments must cope with heterogeneous sensor settings that vary across datasets, devices, body locations, sensing modalities, and channel compositions. This heterogeneity makes conventional channel-fixed models difficult to reuse across sensing environments because their input representations are tightly coupled to predefined channel structures. To address this problem, we investigate strict channel-free HAR, in which a single shared model performs inference without assuming a fixed number, order, or semantic arrangement of input channels, and without relying on sensor-specific input layers or dataset-specific channel templates. We argue that fusion design is the central issue in this setting. Accordingly, we propose a channel-free HAR framework that combines channel-wise encoding with a shared encoder, metadata-conditioned late fusion via conditional batch normalization, and joint optimization of channel-level and fused predictions through a combination loss. The proposed model processes each channel independently to handle varying channel configurations, while sensor metadata such as body location, modality, and axis help recover structural information that channel-independent processing alone cannot retain. In addition, the joint loss encourages both the discriminability of individual channels and the consistency of the final fused prediction. Experiments on PAMAP2, together with robustness analysis on six HAR datasets, ablation studies, sensitivity analysis, efficiency evaluation, and cross-dataset transfer learning, demonstrate three main findings...
[277] arXiv:2604.21370 [pdf, html, other]: Title: MKJ at SemEval-2026 Task 9: A Comparative Study of Generalist, Specialist, and Ensemble Strategies for Multilingual Polarization

Maziar Kianimoghadam Jouneghani

Comments: 9 pages, 9 tables. Accepted to the 20th International Workshop on Semantic Evaluation (SemEval-2026), Task 9

Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

We present a systematic study of multilingual polarization detection across 22 languages for SemEval-2026 Task 9 (Subtask 1), contrasting multilingual generalists with language-specific specialists and hybrid ensembles. While a standard generalist like XLM-RoBERTa suffices when its tokenizer aligns with the target text, it may struggle with distinct scripts (e.g., Khmer, Odia) where monolingual specialists yield significant gains. Rather than enforcing a single universal architecture, we adopt a language-adaptive framework that switches between multilingual generalists, language-specific specialists, and hybrid ensembles based on development performance. Additionally, cross-lingual augmentation via NLLB-200 yielded mixed results, often underperforming native architecture selection and degrading morphologically rich tracks. Our final system achieves an overall macro-averaged F1 score of 0.796 and an average accuracy of 0.826 across all 22 tracks. Code and final test predictions are publicly available at: this https URL.
[278] arXiv:2604.21375 [pdf, html, other]: Title: VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie

Comments: The first two authors contribute equally

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic framework built around three integrated components that guide the system on when to Stop, Recover, and Search. First, a mandatory Completeness Verifier enforces UI-observable success criteria and verification at every finish step -- with an agent-level verifier that cross-examines completion claims with decision rules, rejecting those lacking direct visual evidence. Second, a mandatory Loop Breaker provides multi-tier filtering: switching interaction mode after repeated failures, forcing strategy changes after persistent screen-state recurrence, and binding reflection signals to strategy shifts. Third, an on-demand Search Agent searches online for unfamiliar workflows by directly querying a capable LLM with search ability, returning results as plain text. We additionally integrate a Coding Agent for code-intensive actions and a Grounding Agent for precise action grounding, both invoked on demand when required. We evaluate VLAA-GUI across five top-tier backbones, including Opus 4.5, 4.6 and Gemini 3.1 Pro, on two benchmarks with Linux and Windows tasks, achieving top performance on both (77.5% on OSWorld and 61.0% on WindowsAgentArena). Notably, three of the five backbones surpass human performance (72.4%) on OSWorld in a single pass. Ablation studies show that all three proposed components consistently improve a strong backbone, while a weaker backbone benefits more from these tools when the step budget is sufficient. Further analysis also shows that the Loop Breaker nearly halves wasted steps for loop-prone models.
[279] arXiv:2604.21376 [pdf, other]: Title: A formal proof of the Sands-Sauer-Woodrow theorem using the Rocq prover and mathcomp/ssreflect

Jean-Philippe Chancelier (CERMICS)

Subjects: Discrete Mathematics (cs.DM)

We present a formal proof of the Sands-Sauer-Woodrow (SSW) theorem using the Rocq proof assistant and the MathComp/SSReflect library. The SSW theorem states that in a directed graph whose edges are colored with two colors and that contains no monochromatic infinite outward path, there exists an independent set S of vertices such that every vertex outside S can reach S by a monochromatic path. We formalize the graph using two binary relations Eb and Er , representing the blue and red edges respectively, and we develop a dedicated library for binary relations represented as classical sets. Beyond formalizing the original SSW theorem, we establish a strictly stronger version in which the assumption ''no monochromatic infinite outward path'' is replaced by the weaker condition that the asymmetric parts of the transitive closures of Eb and Er admit no infinite outward paths. The original SSW theorem is then recovered as a corollary via a lemma showing that an infinite path for the asymmetric part of the transitive closure of a relation implies an infinite path for the relation.
[280] arXiv:2604.21377 [pdf, other]: Title: A Replicable Robotics Awareness Method Using LLM-Enabled Robotics Interaction: Evidence from a Corporate Challenge

S. A. Prieto, M. A. Gopee, Y. Ben Arab, B. García de Soto, J. Esteba, P. Olivera Brizzio

Comments: 10 pages, 8 Figures, to be submitted for journal per-review

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

Large language models are increasingly being explored as interfaces between humans and robotic systems, yet there remains limited evidence on how such technologies can be used not only for interaction, but also as a structured means of introducing robotics to non-specialist users in real organizational settings. This paper introduces and evaluates a challenge-based method for robotics awareness, implemented through an LLM-enabled humanoid robot activity conducted with employees of AD Ports Group in the United Arab Emirates. In the event, participants engaged with a humanoid robot in a logistics-inspired task environment using voice commands interpreted through an LLM-based control framework. The activity was designed as a team-based, role-driven experience intended to expose participants to embodied AI and human-robot collaboration without requiring prior robotics expertise. To evaluate the approach, a post-event survey remained open for 16 days and collected 102 responses. Results indicate strong overall reception, with high satisfaction (8.46/10), increased interest in robotics and AI (4.47/5), and improved understanding of emerging forms of human-robot collaboration (4.45/5). Participants who interacted directly with the robot also reported natural interaction (4.37/5) and a strong sense that interaction became easier as the activity progressed (4.74/5). At the same time, lower ratings for reliability and predictability point to important technical and design challenges for future iterations. The findings suggest that challenge-based, LLM-enabled humanoid interaction can serve as a promising and replicable method for robotics awareness in industrial and operational environments.
[281] arXiv:2604.21378 [pdf, other]: Title: Active Inference of Extended Finite State Machine Models with Registers and Guards

Roland Groz (LIG), German Eduardo Vega Baez (LIG), Adenilso Simao (ICMC-USP), Catherine Oriat (LIG), Neil Walkinshaw, Michael Foster

Subjects: Formal Languages and Automata Theory (cs.FL)

Extended finite state machines (EFSMs) model stateful systems with internal data variables and have numerous applications in software engineering. A major advantage of this type of model lies in its ability to model both the data flow and the data-dependent control behaviour. In the absence of such models, it is desirable to reverse-engineer them by observing the system's behaviour. However, existing approaches generally require the ability to reset the system during inference, or can only handle situations where the control flow depends exclusively on the input parameters, and not on the values of the stored data. In this work, we present a black-box active learning algorithm that infers EFSMs with guards and registers, and which significantly relaxes the assumptions that have to be made about the system in comparison to previous attempts.
[282] arXiv:2604.21380 [pdf, other]: Title: Conjecture and Inquiry: Quantifying Software Performance Requirements via Interactive Retrieval-Augmented Preference Elicitation

Wang Shi Hai, Chen Tao

Comments: 9 pages,accepted by ACL 2026

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Since software performance requirements are documented in natural language, quantifying them into mathematical forms is essential for software engineering. Yet, the vagueness in performance requirements and uncertainty of human cognition have caused highly uncertain ambiguity in the interpretations, rendering their automated quantification an unaddressed and challenging problem. In this paper, we formalize the problem and propose IRAP, an approach that quantifies performance requirements into mathematical functions via interactive retrieval-augmented preference elicitation. IRAP differs from the others in that it explicitly derives from problem-specific knowledge to retrieve and reason the preferences, which also guides the progressive interaction with stakeholders, while reducing the cognitive overhead. Experiment results against 10 state-of-the-art methods on four real-world datasets demonstrate the superiority of IRAP on all cases with up to 40x improvements under as few as five rounds of interactions.
[283] arXiv:2604.21381 [pdf, html, other]: Title: Privacy-Preserving Distributed Stochastic Optimization with Homomorphic Encryption and Heterogeneous Stepsizes

Haoqiang Zhou, Chi Chen, Yongfeng Zhi, Huan Gao

Comments: This is the full version of the paper accepted to the 23rd IFAC World Congress, Busan, Republic of Korea, August 23-28, 2026. This version includes all proofs omitted from the conference proceedings due to page limitations

Subjects: Systems and Control (eess.SY)

Distributed stochastic optimization enables multi-agent collaboration in applications such as distributed learning and sensor networks, but also raises critical privacy concerns due to the involvement of sensitive data. While existing privacy-preserving approaches often face limitations in balancing accuracy with efficiency, we propose a novel distributed stochastic gradient descent algorithm that integrates Paillier homomorphic encryption with heterogeneous and time-varying random stepsizes. The proposed algorithm provides inherent privacy protection against both internal honest-but-curious agents and external eavesdroppers, without relying on any trusted neighbors. Furthermore, we incorporate an attenuation factor to effectively mitigate quantization error induced by the encryption process, ensuring almost sure convergence to the optimal solution while maintaining privacy preservation. Numerical simulations demonstrate the effectiveness and efficiency of the proposed approach.
[284] arXiv:2604.21384 [pdf, html, other]: Title: Estimation of Unknown Parameters in Presence of Perturbations and Noises with Application to GPEBO Design

Anton Glushchenko, Konstantin Lastochkin

Comments: 8 pages, 2 figures

Subjects: Systems and Control (eess.SY)

A problem of online estimation of unknown parameters is considered for a linear regression equation, which is affected by an additive perturbation that can be caused by measurement noise (that corrupts regressor and regressand), as well as external perturbations. Known approaches to solve this problem typically have one of the following disadvantages: 1) they ensure convergence of a parametric error to a compact set with non-adjustable bound, 2) independence of all system regressor elements from the perturbation/noise is required to annihilate them, 3) an instrumental variable is needed to be selected. On the basis of the novel perturbation annihilation procedure, in the present paper, we propose three new estimation laws, which are free from the above-mentioned drawbacks and ensure exponential convergence of the parametric error to an arbitrarily small neighborhood of zero, particularly, in case more than a half (not all) of the regressor elements are independent from additive perturbation. One of the proposed estimation laws is used for the design of Generalized Parameter Estimation-Based Observer (GPEBO) for nonlinear affine systems to enhance GPEBO performance in case when the measured system output is corrupted by noise. The theoretical results are supported by examples and mathematical modelling.
[285] arXiv:2604.21387 [pdf, html, other]: Title: EdgeFormer: local patch-based edge detection transformer on point clouds

Yifei Xie, Zhikun Tu, Tong Yang, Yuhe Zhang, Xinyu Zhou

Comments: 22 pages, 9 figures. Published in Pattern Analysis and Applications

Journal-ref: Pattern Analysis and Applications 28, 11 (2025)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Edge points on 3D point clouds can clearly convey 3D geometry and surface characteristics, therefore, edge detection is widely used in many vision applications with high industrial and commercial demands. However, the fine-grained edge features are difficult to detect effectively as they are generally densely distributed or exhibit small-scale surface gradients. To address this issue, we present a learning-based edge detection network, named EdgeFormer, which mainly consists of two stages. Based on the observation that spatially neighboring points tend to exhibit high correlation, forming the local underlying surface, we convert the edge detection of the entire point cloud into a point classification based on local patches. Therefore, in the first stage, we construct local patch feature descriptors that describe the local neighborhood around each point. In the second stage, we classify each point by analyzing the local patch feature descriptors generated in the first stage. Due to the conversion of the point cloud into local patches, the proposed method can effectively extract the finer details. The experimental results show that our model demonstrates competitive performance compared to six baselines.
[286] arXiv:2604.21391 [pdf, html, other]: Title: From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

Yiming Zhong, Yaoyu He, Zemin Yang, Pengfei Tian, Yifan Huang, Qingqiu Huang, Xinge Zhu, Yuexin Ma

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Bridging high-level semantic understanding with low-level physical control remains a persistent challenge in embodied intelligence, stemming from the fundamental spatiotemporal scale mismatch between cognition and action. Existing generative VLA policies typically adopt a "Generation-from-Noise" paradigm, which disregards this disparity, leading to representation inefficiency and weak condition alignment during optimization. In this work, we propose ResVLA, an architecture that shifts the paradigm to "Refinement-from-Intent." Recognizing that robotic motion naturally decomposes into global intent and local dynamics, ResVLA utilizes spectral analysis to decouple control into a deterministic low-frequency anchor and a stochastic high-frequency residual. By anchoring the generative process on the predicted intent, our model focuses strictly on refining local dynamics via a residual diffusion bridge. Extensive simulation experiments show that ResVLA achieves competitive performance, strong robustness to language and robot embodiment perturbations, and faster convergence than standard generative baselines. It also demonstrates strong performance in real-world robot experiments.
[287] arXiv:2604.21393 [pdf, other]: Title: Relocation of compact sets in $\mathbb{R}^n$ by diffeomorphisms and linear separability of datasets in $\mathbb{R}^n$

Xiao-Song Yang, Xuan Zhou, Qi Zhou

Subjects: Machine Learning (cs.LG)

Relocation of compact sets in an $n$-dimensional manifold by self-diffeomorphism is of its own interest as well as significant potential applications to data classification in data science. This paper presents a theory for relocating a finite number of compact sets in $\mathbb{R}^n$ to be relocated to arbitrary target domains in $\mathbb{R}^n$ by diffeomorphisms of $\mathbb{R}^n$. Furthermore, we prove that for any such collection, there exists a differentiable embedding into $\mathbb{R}^{n+1}$ such that their images become linearly separable.
As applications of the established theory, we show that a finite number of compact datasets in $\mathbb{R}^n$ can be made linearly separable by width-$n$ deep neural networks (DNNs) with Leaky-ReLU, ELU, or SELU activation functions, under a mild condition. In addition, we show that any finite number of mutually disjoint compact datasets in $\mathbb{R}^n$ can be made linearly separable in $\mathbb{R}^{n+1}$ by a width-$(n+1)$ DNN.
[288] arXiv:2604.21394 [pdf, html, other]: Title: Provably Secure Steganography Based on List Decoding

Kaiyi Pang, Minhao Bai

Subjects: Cryptography and Security (cs.CR)

Steganography embeds secret messages in seemingly innocuous carriers for covert communication under surveillance. Current Provably Secure Steganography (PSS) schemes based on language models can guarantee computational indistinguishability between the covertext and stegotext. However, achieving high embedding capacity remains a challenge for existing PSS. The inefficient entropy utilization renders them not well-suited for Large Language Models (LLMs), whose inherent low-entropy tendencies severely constrain feasible embedding capacity. To address this, we propose a provably secure steganography scheme with a theoretically proved high capacity. Our scheme is based on the concept of list decoding: it maintains a set of candidates that contain the correct secret message, instead of directly finding the correct message with more effort. This strategy fully utilizes the information content of the generated text, yielding higher capacity. To ensure the correctness of our scheme, we further introduce a suffix-matching mechanism to distinguish the correct secret message from the candidates. We provide theoretical proofs for both the security and correctness of our scheme, alongside a derivation of its theoretical capacity lower bound.
Our approach is plug-and-play, requiring only a direct replacement of the model's standard random sampling module. Experiments on three LLMs and seven PSS baselines demonstrate that our method achieves computational efficiency comparable to prior PSS schemes while delivering a substantial improvement in embedding capacity.
[289] arXiv:2604.21395 [pdf, other]: Title: Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair

Vishal Rajput

Comments: 29 pages. Code: this https URL. Preprint, not peer-reviewed. Affiliation: KU Leuven, Belgium

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

We prove that empirical risk minimisation (ERM) imposes a necessary geometric constraint on learned representations: any encoder that minimises supervised loss must retain non-zero Jacobian sensitivity in directions that are label-correlated in training data but nuisance at test time. This is not a contingent failure of current methods; it is a mathematical consequence of the supervised objective itself. We call this the geometric blind spot of supervised learning (Theorem 1), and show it holds across proper scoring rules, architectures, and dataset sizes.
This single theorem unifies four lines of prior empirical work that were previously treated separately: non-robust predictive features, texture bias, corruption fragility, and the robustness-accuracy tradeoff. In this framing, adversarial vulnerability is one consequence of a broader structural fact about supervised learning geometry.
We introduce Trajectory Deviation Index (TDI), a diagnostic that measures the theorem's bounded quantity directly, and show why common alternatives miss the key failure mode. PGD adversarial training reaches Jacobian Frobenius 2.91 yet has the worst clean-input geometry (TDI 1.336), while PMH achieves TDI 0.904. TDI is the only metric that detects this dissociation because it measures isotropic path-length distortion -- the exact quantity Theorem 1 bounds.
Across seven vision tasks, BERT/SST-2, and ImageNet ViT-B/16 backbones used by CLIP, DINO, and SAM, the blind spot is measurable and repairable. It is present at foundation-model scale, worsens monotonically across language-model sizes (blind-spot ratio 0.860 to 0.765 to 0.742 from 66M to 340M), and is amplified by task-specific ERM fine-tuning (+54%), while PMH repairs it by 11x with one additional training term whose Gaussian form Proposition 5 proves is the unique perturbation law that uniformly penalises the encoder Jacobian.
[290] arXiv:2604.21396 [pdf, html, other]: Title: VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought

Byeonggeuk Lim, Kyeonghyun Kim, JungMin Yun, YoungBin Kim

Comments: Accepted to LREC 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The advancement of Large Vision-Language Models (LVLMs) requires precise local region-based reasoning that faithfully grounds the model's logic in actual visual evidence. However, existing datasets face limitations in scalability due to extensive manual annotation and lack of explicit alignment between multi-step reasoning and corresponding image regions, which constrains the evaluation of model trustworthiness. To address these challenges, we propose the Visual Grounding Chain-of-Thought (VG-CoT) dataset, which explicitly links each reasoning step to real visual evidence within the image through a fully automated three-stage pipeline. The pipeline first extracts object- and text-level visual evidence using state-of-the-art detection and OCR models, then generates step-by-step grounded reasoning with GPT-4o, and finally refines the grounding through a rationale-driven open-set detection process. In addition, we introduce a new benchmark that comprehensively evaluates LVLMs reasoning across three complementary dimensions: Rationale Quality, Answer Accuracy, and Reasoning-Answer Alignment. Experiments with representative LVLMs, including LLaVA-1.5 and Qwen2-VL, demonstrate consistent improvements on most evaluation metrics, confirming that VG-CoT effectively enhances trustworthy, evidence-based reasoning while maintaining scalable and cost-efficient dataset construction. The dataset and code will be released publicly upon acceptance to facilitate further research.
[291] arXiv:2604.21399 [pdf, html, other]: Title: A Task Decomposition and Planning Framework for Efficient LLM Inference in AI-Enabled WiFi-Offload Networks

Mingqi Han, Xinghua Sun

Comments: 7 pages, 4 figures, conference version

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

AI WiFi offload is emerging as a promising approach for providing large language model (LLM) services to resource-constrained wireless devices. However, unlike conventional edge computing, LLM inference over WiFi must jointly address heterogeneous model capabilities, wireless contention, uncertain task complexity, and semantic correlation among reasoning tasks. In this paper, we investigate LLM inference offloading in a multi-user multi-edge WiFi network, where each task can be executed locally, directly offloaded to a nearby edge access point (AP), or decomposed into multiple subtasks for collaborative execution across local and edge nodes. To this end, we propose a user-edge collaborative framework with an LLM-based planner that not only performs task decomposition but also infers subtask difficulty and expected output token length, enabling more accurate estimation of execution quality and latency on heterogeneous nodes. Based on these estimates, we further design a decomposition-aware scheduling strategy that jointly optimizes subtask assignment, execution, and aggregation under communication, queuing, and computation constraints. Simulation results show that the proposed framework achieves a better latency-accuracy tradeoff than local-only and nearest-edge baselines, reducing the average latency by $20\%$ and improving the overall reward by $80\%$. Moreover, the distilled lightweight planner approaches the performance of the large teacher model while remaining more suitable for practical edge deployment.
[292] arXiv:2604.21400 [pdf, html, other]: Title: You Only Gaussian Once: Controllable 3D Gaussian Splatting for Ultra-Densely Sampled Scenes

Jinrang Jia, Zhenjia Li, Yifeng Shi

Comments: 17 pages, 5 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D Gaussian Splatting (3DGS) has revolutionized neural rendering, yet existing methods remain predominantly research prototypes ill-suited for production-level deployment. We identify a critical "Industry-Academia Gap" hindering real-world application: unpredictable resource consumption from heuristic Gaussian growth, the "sparsity shield" of current benchmarks that rewards hallucination over physical fidelity, and severe multi-sensor data pollution. To bridge this gap, we propose YOGO (You Only Gaussian Once), a system-level framework that reformulates the stochastic growth process into a deterministic, budget-aware equilibrium. YOGO integrates a novel budget controller for hardware-constrained resource allocation and an availability-registration protocol for robust multi-sensor fusion. To push the boundaries of reconstruction fidelity, we introduce Immersion v1.0, the first ultra-dense indoor dataset specifically designed to break the "sparsity shield." By providing saturated viewpoint coverage, Immersion v1.0 forces algorithms to focus on extreme physical fidelity rather than viewpoint interpolation, and enables the community to focus on the upper limits of high-fidelity reconstruction. Extensive experiments demonstrate that YOGO achieves state-of-the-art visual quality while maintaining a strictly deterministic profile, establishing a new standard for production-grade 3DGS. To facilitate reproducibility, part scenes of Immersion v1.0 dataset and source code of YOGO has been publicly released. The project link is this https URL.
[293] arXiv:2604.21404 [pdf, other]: Title: Neurodiversity and Technostress: Towards a Multimodal Research Design for Evaluating Subjective, Physiological, and Behavioral Responses

Lisa van den Heuvel, Igor Ivkić, René Riedl

Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)

Digitalization has transformed modern work by increasing efficiency while also introducing new forms of strain. Technostress (TS) describes subjective, physiological, and behavioral stress responses related to digital technology use. Existing TS research has predominantly focused on neurotypical populations and rarely integrates multiple stress dimensions within a single design. This paper addresses these gaps by proposing a controlled experimental research design that systematically compares neurodivergent and neurotypical individuals under standardized digital stress conditions. The proposed design combines structured and unstructured digital tasks with a multimodal measurement approach covering subjective perceptions, physiological activation, and observable interaction behavior. By integrating neurodiversity into TS research, the paper contributes to a more differentiated understanding of digital stress and provides a methodological approach for more inclusive digital work design.
[294] arXiv:2604.21407 [pdf, html, other]: Title: Even More Guarantees for Variational Inference in the Presence of Symmetries

Lena Zellinger, Antonio Vergari

Comments: Accepted for presentation at the OPTIMAL Workshop at AISTATS 2026

Subjects: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)

When approximating an intractable density via variational inference (VI) the variational family is typically chosen as a simple parametric family that very likely does not contain the target. This raises the question: Under which conditions can we recover characteristics of the target despite misspecification? In this work, we extend previous results on robust VI with location-scale families under target symmetries. We derive sufficient conditions guaranteeing exact recovery of the mean when using the forward Kullback-Leibler divergence and $\alpha$-divergences. We further show how and why optimization can fail to recover the target mean in the absence of our sufficient conditions, providing initial guidelines on the choice of the variational family and $\alpha$-value.
[295] arXiv:2604.21409 [pdf, other]: Title: S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images

Qingxiao Li, Lifeng Xu, QingLi Wang, Yudong Bai, Mingwei Ou, Shu Hu, Nan Xu

Comments: 29 pages, 13 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present S1-VL, a multimodal reasoning model for scientific domains that natively supports two complementary reasoning paradigms: Scientific Reasoning, which relies on structured chain-of-thought, and Thinking-with-Images, which enables the model to actively manipulate images through Python code execution during reasoning. In the Thinking-with-Images mode, the model generates and executes image-processing code in a sandbox environment, obtains intermediate visual results, and continues reasoning in a multi-turn iterative manner. This design is particularly effective for challenging scenarios such as high-resolution scientific chart interpretation, microscopic image understanding, and geometry-assisted reasoning. To construct the training data, we collect scientific multimodal datasets spanning six disciplines: mathematics, physics, chemistry, astronomy, geography, and biology. We further develop a six-dimensional quality filtering framework for reasoning trajectories. To mitigate redundant, ineffective, and erroneous visual operations commonly found in existing datasets, we propose a multi-stage filtering pipeline together with an adaptive data routing strategy. This strategy converts samples with low visual information gain into pure Reasoning-mode data, enabling the model to learn when image operations are truly necessary. S1-VL is trained through a four-stage progressive pipeline: scientific multimodal SFT, Thinking-with-Images cold-start SFT, and two stages of reinforcement learning with SAPO. We build S1-VL-32B on top of Qwen3-VL-32B-Thinking and evaluate it on 13 benchmarks. Experimental results show that S1-VL-32B achieves state-of-the-art performance on all five Thinking-with-Images benchmarks, including HRBench-4K, HRBench-8K, MME-RealWorld-CN, MME-RealWorld-Lite, and V*, and outperforms compared systems on scientific reasoning benchmarks such as Physics and VRSBench.
[296] arXiv:2604.21410 [pdf, html, other]: Title: Encrypted Visual Feedback Control Using RLWE-Based Cryptosystem

Taichi Ikezaki, Kaoru Teranishi

Subjects: Systems and Control (eess.SY)

This study proposes an encrypted visual feedback control algorithm for regulating a one-dimensional stage using Ring Learning With Errors (RLWE) encryption. The proposed algorithm performs both feature extraction and controller computations directly on encrypted images, ensuring that sensitive visual data remain protected throughout the entire control process. Furthermore, an image captured by the camera is encrypted into a single ciphertext leveraging the message packing technique of RLWE encryption, thereby reducing computational cost. The effectiveness of the proposed framework is demonstrated through numerical simulations.
[297] arXiv:2604.21411 [pdf, html, other]: Title: A Green-Integral-Constrained Neural Solver with Stochastic Physics-Informed Regularization

Mohammad Mahdi Abedi, David Pardo, Tariq Alkhalifah

Subjects: Machine Learning (cs.LG); Geophysics (physics.geo-ph)

Standard physics-informed neural networks (PINNs) struggle to simulate highly oscillatory Helmholtz solutions in heterogeneous media because pointwise minimization of second-order PDE residuals is computationally expensive, biased toward smooth solutions, and requires artificial absorbing boundary layers to restrict the solution. To overcome these challenges, we introduce a Green-Integral (GI) neural solver for the acoustic Helmholtz equation. It departs from the PDE-residual-based formulation by enforcing wave physics through an integral representation that imposes a nonlocal constraint. Oscillatory behavior and outgoing radiation are encoded directly through the integral kernel, eliminating second-order spatial derivatives and enforcing physical solutions without additional boundary layers. Theoretically, optimizing this GI loss via a neural network acts as a spectrally tuned preconditioned iteration, enabling convergence in heterogeneous media where the classical Born series diverges. By exploiting FFT-based convolution to accelerate the GI loss evaluation, our approach substantially reduces GPU memory usage and training time. However, this efficiency relies on a fixed regular grid, which can limit local resolution. To improve local accuracy in strong scattering regions, we also propose a hybrid GI+PDE loss, enforcing a lightweight Helmholtz residual at a small number of nonuniformly sampled collocation points. We evaluate our method on seismic benchmark models characterized by structural contrasts and subwavelength heterogeneity at frequencies up to 20Hz. GI-based training consistently outperforms PDE-based PINNs, reducing computational cost by over a factor of ten. In models with localized scattering, the hybrid loss yields the most accurate reconstructions, providing a stable, efficient, and physically grounded alternative.
[298] arXiv:2604.21412 [pdf, html, other]: Title: A pragmatic classification of AI incident trajectories

Isaak Mengesha, Branwen Owen, Charlie Collins, Tina Wong, Simon Mylius, Peter Slattery, Sean McGregor

Subjects: Computers and Society (cs.CY)

Public AI incident database counts conflate changes in reporting propensity, deployment growth, and shifts in harm frequency per unit of exposure. These issues introduce significant uncertainties challenging public and corporate policy frameworks centred on realized risks. We propose a simple framework that establishes clear points of inquiry, separately estimates exposure from harm-rate trends, and then classifies into meaningful trajectory categories for governance decisions. The framework combines a structured monitoring question format (SORT) to clarify coverage decisions, a tiered estimation procedure calibrated to available evidence, and LLM-assisted incident matching against public databases. Applied to various monitoring questions, we draw conclusions regarding the monitoring ecosystem more broadly: Providing an essential interpretative classification, determining what can and cannot be claimed, and establishing that exposure estimation is required as AI deployments become increasingly common.
[299] arXiv:2604.21413 [pdf, html, other]: Title: An Alternate Agentic AI Architecture (It's About the Data)

Fabian Wenz, Felix Treutwein, Kai Arenja, Çagatay Demiralp, Michael Stonebraker

Comments: 15 pages,2 figures, 2 tables

Subjects: Databases (cs.DB)

For the last several years, the dominant narrative in "agentic AI" has been that large language models should orchestrate information access by dynamically selecting tools, issuing sub-queries, and synthesizing results. We argue this approach is misguided: enterprises do not suffer from a reasoning deficit, but from a data integration problem.
Enterprises are data-centric: critical information is scattered across heterogeneous systems (e.g., databases, documents, and external services), each with its own query language, schema, access controls, and performance constraints. In contrast, contemporary LLM-based architectures are optimized for reasoning over unstructured text and treat enterprise systems as either corpora or external tools invoked by a black-box component. This creates a mismatch between schema-rich, governed, performance-critical data systems and text-centric, probabilistic LLM architectures, leading to limited transparency, weak correctness guarantees, and unpredictable performance.
In this paper, we present RUBICON, an alternative architecture grounded in data management principles. Instead of delegating orchestration to an opaque agent, we introduce AQL (Agentic Query Language), a small, explicit query algebra - Find, From, and Where - executed through source-specific wrappers that enforce access control, schema alignment, and result normalization. All intermediate results are visible and inspectable. Complex questions are decomposed into structured, auditable query plans rather than hidden chains of LLM calls.
Our thesis is simple: enterprise AI is not a prompt engineering problem; it is a systems problem. By reintroducing explicit query structure, wrapper-based mediation, and cost-based optimization, we obtain the breadth of agentic search while preserving traceability, determinism, and trust in enterprise environments.
[300] arXiv:2604.21414 [pdf, html, other]: Title: SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis

Qiang Gao, Zhenping Li, Anqi Zhuo, Yingxiao Zhao, Weibo Geng, Xiaosong Li

Subjects: Artificial Intelligence (cs.AI)

Existing text-to-SQL synthesis pipelines still conflate executability with semantic validity: syntactic checks and execution-based validation can retain queries that execute successfully while violating database semantics. To address these limitations, we propose SemanticAgent, a semantic-aware synthesis framework. SemanticAgent organizes synthesis around three specialized modules: an analyzer, a synthesizer, and a verifier. Through a three-stage protocol of semantic analysis, stepwise synthesis, and diagnostic refinement, SemanticAgent transforms execution-based validation alone into a traceable reasoning process. Our framework generates synthetic data that consistently outperforms prior synthesis methods under semantic-quality evaluation, leading to stronger downstream fine-tuning performance, especially on semantically demanding benchmarks.
[301] arXiv:2604.21416 [pdf, html, other]: Title: CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu, Bo Liu, Wanlei Zhou

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Poisoning-based backdoor attacks pose significant threats to deep neural networks by embedding triggers in training data, causing models to misclassify triggered inputs as adversary-specified labels while maintaining performance on clean data. Existing poison restraint-based defenses often suffer from inadequate detection against specific attack variants and compromise model utility through unlearning methods that lead to accuracy degradation. This paper conducts a comprehensive analysis of backdoor attack dynamics during model training, revealing that poisoned samples form isolated clusters in latent space early on, with triggers acting as dominant features distinct from benign ones. Leveraging these insights, we propose Cluster Segregation Concealment (CSC), a novel poison suppression defense. CSC first trains a deep neural network via standard supervised learning while segregating poisoned samples through feature extraction from early epochs, DBSCAN clustering, and identification of anomalous clusters based on class diversity and density metrics. In the concealment stage, identified poisoned samples are relabeled to a virtual class, and the model's classifier is fine-tuned using cross-entropy loss to replace the backdoor association with a benign virtual linkage, preserving overall accuracy. CSC was evaluated on four benchmark datasets against twelve poisoning-based attacks, CSC outperforms nine state-of-the-art defenses by reducing average attack success rates to near zero with minimal clean accuracy loss. Contributions include robust backdoor patterns identification, an effective concealment mechanism, and superior empirical validation, advancing trustworthy artificial intelligence.
[302] arXiv:2604.21420 [pdf, html, other]: Title: FairQE: Multi-Agent Framework for Mitigating Gender Bias in Translation Quality Estimation

Jinhee Jang, Juhwan Choi, Dongjin Lee, Seunguk Yu, Youngbin Kim

Comments: Accepted to ACL 2026

Subjects: Artificial Intelligence (cs.AI)

Quality Estimation (QE) aims to assess machine translation quality without reference translations, but recent studies have shown that existing QE models exhibit systematic gender bias. In particular, they tend to favor masculine realizations in gender-ambiguous contexts and may assign higher scores to gender-misaligned translations even when gender is explicitly specified. To address these issues, we propose FairQE, a multi-agent-based, fairness-aware QE framework that mitigates gender bias in both gender-ambiguous and gender-explicit scenarios. FairQE detects gender cues, generates gender-flipped translation variants, and combines conventional QE scores with LLM-based bias-mitigating reasoning through a dynamic bias-aware aggregation mechanism. This design preserves the strengths of existing QE models while calibrating their gender-related biases in a plug-and-play manner. Extensive experiments across multiple gender bias evaluation settings demonstrate that FairQE consistently improves gender fairness over strong QE baselines. Moreover, under MQM-based meta-evaluation following the WMT 2023 Metrics Shared Task, FairQE achieves competitive or improved general QE performance. These results show that gender bias in QE can be effectively mitigated without sacrificing evaluation accuracy, enabling fairer and more reliable translation evaluation.
[303] arXiv:2604.21421 [pdf, other]: Title: Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation

Michele Miranda, Xinlan Yan, Nishant Mishra, Rachel Murphy, Ameen Abu-Hanna, Sébastien Bratières, Iacer Calixto

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Protecting patient privacy in clinical narratives is essential for enabling secondary use of healthcare data under regulations such as GDPR and HIPAA. While manual de-identification remains the gold standard, it is costly and slow, motivating the need for automated methods that combine privacy guarantees with high utility. Most automated text de-identification pipelines employed named entity recognition (NER) to identify protected entities for redaction. Although methods based on differential privacy (DP) provide formal privacy guarantees, more recently also large language models (LLMs) are increasingly used for text de-identification in the clinical domain. In this work, we present the first comparative study of DP, NER, and LLMs for Dutch clinical text de-identification. We investigate these methods separately as well as hybrid strategies that apply NER or LLM preprocessing prior to DP, and assess performance in terms of privacy leakage and extrinsic evaluation (entity and relation classification). We show that DP mechanisms alone degrade utility substantially, but combining them with linguistic preprocessing, especially LLM-based redaction, significantly improves the privacy-utility trade-off.
[304] arXiv:2604.21422 [pdf, html, other]: Title: Pre-process for segmentation task with nonlinear diffusion filters

Javier Sanguino, Carlos Platero, Olga Velasco

Comments: Manuscript from 2017, previously unpublished, 37 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper deals with the case of using nonlinear diffusion filters to obtain piecewise constant images as a previous process for segmentation techniques.
We first show an intrinsic formulation for the nonlinear diffusion equation to provide some design conditions on the diffusion filters. According to this theoretical framework, we propose a new family of diffusivities; they are obtained from nonlinear diffusion techniques and are related with backward diffusion. Their goal is to split the image in closed contours with a homogenized grey intensity inside and with no blurred edges.
We also prove that our filters satisfy the well-posedness semi-discrete and full discrete scale-space requirements. This shows that by using semi-implicit schemes, a forward nonlinear diffusion equation is solved, instead of a backward nonlinear diffusion equation, connecting with an edge-preserving process. Under the conditions established for the diffusivity and using a stopping criterion for the diffusion time, we get piecewise constant images with a low computational effort.
Finally, we test our filter with real images and we illustrate the effects of our diffusivity function as a method to get piecewise constant images.
The code is available at this https URL.
[305] arXiv:2604.21428 [pdf, html, other]: Title: Decoupled DiLoCo for Resilient Distributed Pre-training

Arthur Douillard, Keith Rush, Yani Donchev, Zachary Charles, Nova Fallen, Ayush Dubey, Ionel Gog, Josef Dean, Blake Woodworth, Zachary Garrett, Nate Keating, Jenny Bishop, Henry Prior, Edouard Yvinec, Arthur Szlam, Marc'Aurelio Ranzato, Jeff Dean

Subjects: Computation and Language (cs.CL)

Modern large-scale language model pre-training relies heavily on the single program multiple data (SPMD) paradigm, which requires tight coupling across accelerators. Due to this coupling, transient slowdowns, hardware failures, and synchronization overhead stall the entire computation, wasting significant compute time at scale. While recent distributed methods like DiLoCo reduced communication bandwidth, they remained fundamentally synchronous and vulnerable to these system stalls. To address this, we introduce Decoupled DiLoCo, an evolution of the DiLoCo framework designed to break the lock-step synchronization barrier and go beyond SPMD to maximize training goodput. Decoupled DiLoCo partitions compute across multiple independent ``learners'' that execute local inner optimization steps. These learners asynchronously communicate parameter fragments to a central synchronizer, which circumvents failed or straggling learners by aggregating updates using a minimum quorum, an adaptive grace window, and dynamic token-weighted merging. Inspired by ``chaos engineering'', we achieve significantly improved training efficiency in failure-prone environments with millions of simulated chips with strictly zero global downtime, while maintaining competitive model performance across text and vision tasks, for both dense and mixture-of-expert architectures.
[306] arXiv:2604.21430 [pdf, other]: Title: Brief chatbot interactions produce lasting changes in human moral values

Yue Teng, Qianer Zhong, Kim Mai Tich Nguyen Thordsen, Christian Montag, Benjamin Becker

Subjects: Artificial Intelligence (cs.AI)

Moral judgements form the foundation of human social behavior and societal systems. While Artificial Intelligence chatbots increasingly serve as personal advisors, their influence on moral judgments remains largely unexplored. Here, we examined whether directive AI conversations shift moral evaluations using a within-subject naturalistic paradigm. Fifty-three participants rated moral scenarios, then discussed four with a chatbot prompted to shift moral judgments and four with a control agent. The brief conversations induced significant directional shifts in moral judgments, accepting stricter standards as well as advocating greater leniency (ps < 0.05; Cohen's d = 0.735-1.576), with increasing strengths of this effect during a two-week follow-up (Cohen's d = 1.038-2.069). Critically, the control condition produced no changes, and the effects did not extend to punishment while participants remained unaware of the persuasive intent, and both agents were rated equally likable and convincing, suggesting a vulnerability to undetected and lasting manipulation of foundational moral values.
[307] arXiv:2604.21431 [pdf, html, other]: Title: JAX-BEM: Gradient-Based Acoustic Shape Optimisation via a Differentiable Boundary Element Method

James Hipperson, Jonathan Hargreaves, Trevor Cox

Subjects: Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)

Engineering structures are increasingly designed using numerical optimisation. However, traditional optimisation methods can be challenging with multiple objectives and many parameters. In machine learning, stable training of artificial neural networks with millions or billions of parameters is achieved using automatic differentiation frameworks such as JAX and Pytorch. Because these frameworks provide accelerated numerical linear algebra with automatic gradient tracking, they also enable differentiable implementations of numerical methods to be built. This facilitates faster gradient-based optimisation of geometry and materials, as well as solution of inverse problems. We demonstrate JAX-BEM, a differentiable Boundary Element Method (BEM) solver, showing that it matches the error of existing BEM codes for a benchmark problem and enables gradient-based geometry optimisation. Although the demonstrated examples are for acoustic simulations, the concept could be readily extended to electromagnetic waves.
[308] arXiv:2604.21435 [pdf, other]: Title: UHR-DETR: Efficient End-to-End Small Object Detection for Ultra-High-Resolution Remote Sensing Imagery

Jingfang Li, Haoran Zhu, Wen Yang, Jinrui Zhang, Fang Xu, Haijian Zhang, Gui-Song Xia

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Ultra-High-Resolution (UHR) imagery has become essential for modern remote sensing, offering unprecedented spatial coverage. However, detecting small objects in such vast scenes presents a critical dilemma: retaining the original resolution for small objects causes prohibitive memory bottlenecks. Conversely, conventional compromises like image downsampling or patch cropping either erase small objects or destroy context. To break this dilemma, we propose UHR-DETR, an efficient end-to-end transformer-based detector designed for UHR imagery. First, we introduce a Coverage-Maximizing Sparse Encoder that dynamically allocates finite computational resources to informative high-resolution regions, ensuring maximum object coverage with minimal spatial redundancy. Second, we design a Global-Local Decoupled Decoder. By integrating macroscopic scene awareness with microscopic object details, this module resolves semantic ambiguities and prevents scene fragmentation. Extensive experiments on the UHR imagery datasets (e.g., STAR and SODA-A) demonstrate the superiority of UHR-DETR under strict hardware constraints (e.g., a single 24GB RTX 3090). It achieves a 2.8\% mAP improvement while delivering a 10$\times$ inference speedup compared to standard sliding-window baselines on the STAR dataset. Our codes and models will be available at this https URL.
[309] arXiv:2604.21436 [pdf, html, other]: Title: A Stackelberg Model for Hybridization in Cryptography

Willie Kouam, Stefan Rass, Zahra Seyedi, Shahzad Ahmad, Eckhard Pfluegel

Comments: 27 pages, 2 figures, Computer & Security Journal

Subjects: Cryptography and Security (cs.CR)

Similar to a strategic interaction between rational and intelligent agents, cryptography problems can be examined through the prism of game theory. In this setting, the agent aiming to protect a message is called the defender, while the one attempting to decrypt it, generally for malicious purposes, is the attacker. To strengthen security in cryptography, various strategies have been developed, among which hybridization stands out as a key concept in modern cryptographic design. This strategy allows the defender to select among different encryption algorithms (classical, post-quantum, or hybrid) while carefully balancing security and operational costs. On the other side, the attacker, limited by available resources, chooses cryptanalysis methods capable of breaching the selected algorithm. We model this interaction as a Stackelberg cryptographic hybridization problem under resource constraints. Here, the defender randomizes over encryption algorithms, and the attacker observes the choice before selecting suitable cryptanalysis methods. The attacker's decision is framed as a conditional optimization problem, which we refer to as the ``attacker subgame''. We then propose a dynamic programming approach for the attacker's subgame, while the defender's Stackelberg optimization is formulated as a linear program.
[310] arXiv:2604.21442 [pdf, html, other]: Title: 2L-LSH: A Locality-Sensitive Hash Function-Based Method For Rapid Point Cloud Indexing

Shurui Wang, Yuhe Zhang, Ruizhe Guo, Yaning Zhang, Yifei Xie, Xinyu Zhou

Comments: 13 pages, 13 figures. Published in The Computer Journal

Journal-ref: The Computer Journal 67(9) (2024) 2809-2818

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The development of 3D scanning technology has enabled the acquisition of massive point cloud models with diverse structures and large scales, thereby presenting significant challenges in point cloud processing. Fast neighboring points search is one of the most common problems, which is frequently used in model reconstruction, classification, retrieval and feature visualization. Hash function is well known for its high-speed and accurate performance in searching high-dimensional data, which is also the core of the proposed 2L-LSH. Specifically, the 2L-LSH algorithm adopts a two-step hash function strategy, in which the popular step divides the bounding box of the point cloud model and the second step constructs a generalized table-based data structure. The proposed 2L-LSH offers a highly efficient and accurate solution for fast neighboring points search in large-scale 3D point cloud models, making it a promising technique for various applications in the field. The proposed algorithm is compared with the well-known methods including Kd-tree and Octree; the obtained results demonstrated that the proposed method outperforms Kd-tree and Octree in terms of speed, i.e. the time consumption of kNN search can be 51.111% and 94.159% lower than Kd-tree and Octree, respectively. And the RN search time can be 54.519% and 41.840% lower than Kd-tree and Octree, respectively.
[311] arXiv:2604.21444 [pdf, html, other]: Title: HiCrew: Hierarchical Reasoning for Long-Form Video Understanding via Question-Aware Multi-Agent Collaboration

Yuehan Zhu, Jingqi Zhao, Jiawen Zhao, Xudong Mao, Baoquan Zhao

Subjects: Artificial Intelligence (cs.AI)

Long-form video understanding remains fundamentally challenged by pervasive spatiotemporal redundancy and intricate narrative dependencies that span extended temporal horizons. While recent structured representations compress visual information effectively, they frequently sacrifice temporal coherence, which is critical for causal reasoning. Meanwhile, existing multi-agent frameworks operate through rigid, pre-defined workflows that fail to adapt their reasoning strategies to question-specific demands. In this paper, we introduce HiCrew, a hierarchical multi-agent framework that addresses these limitations through three core contributions. First, we propose a Hybrid Tree structure that leverages shot boundary detection to preserve temporal topology while performing relevance-guided hierarchical clustering within semantically coherent segments. Second, we develop a Question-Aware Captioning mechanism that synthesizes intent-driven visual prompts to generate precision-oriented semantic descriptions. Third, we integrate a Planning Layer that dynamically orchestrates agent collaboration by adaptively selecting roles and execution paths based on question complexity. Extensive experiments on EgoSchema and NExT-QA validate the effectiveness of our approach, demonstrating strong performance across diverse question types with particularly pronounced gains in temporal and causal reasoning tasks that benefit from our hierarchical structure-preserving design.
[312] arXiv:2604.21446 [pdf, html, other]: Title: AI-Gram: When Visual Agents Interact in a Social Network

Andrew Shin

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA); Social and Information Networks (cs.SI)

We present AI-Gram, a live platform enabling image-based interactions, to study social dynamics in a fully autonomous multi-agent visual network where all participants are LLM-driven agents. Using the platform, we conduct experiments on how agents communicate and adapt through visual media, and observe the spontaneous emergence of visual reply chains, indicating rich communicative structure. At the same time, agents exhibit aesthetic sovereignty resisting stylistic convergence toward social partners, anchoring under adversarial influence, and a decoupling between visual similarity and social ties. These results reveal a fundamental asymmetry in current agent architectures: strong expressive communication paired with a steadfast preservation of individual visual identity. We release AI-Gram as a publicly accessible, continuously evolving platform for studying social dynamics in Al-native multi-agent systems. this https URL
[313] arXiv:2604.21449 [pdf, other]: Title: Research on the efficiency of data loading and storage in Data Lakehouse architectures for the formation of analytical data systems

Ivan Borodii, Halyna Osukhivska

Comments: 9 pages, 2 figures, 5 tables

Journal-ref: No. 4 (2025): Information Technology: Computer Science, Software Engineering and Cyber Security

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

The paper presents a study of the efficiency of loading and storing data in the three most common Data Lakehouse systems, including Apache Hudi, Apache Iceberg, and Delta Lake, using Apache Spark as a distributed data processing platform. The study analyzes the behavior of each system when processing structured (CSV) and semi-structured (JSON) data of different sizes, including loading files up to 7 GB in size. The purpose of the work is to determine the most optimal Data Lakehouse architecture based on the type and volume of data sources, data loading performance using Apache Spark, and disk size of data for forming analytical data systems. The research covers the development of four sequential ETL processes, which include reading, transforming, and loading data into tables in each of the Data Lakehouse systems. The efficiency of each Lakehouse was evaluated according to two key criteria: data loading time and the volume of tables formed in the file system. For the first time, a comparison of performance and data storage in Apache Iceberg, Apache Hudi, and Delta Lake Data Lakehouse systems was conducted to select the most relevant architecture for building analytical data systems. The practical value of the study consists in the fact that it assists data engineers and architects in choosing the most appropriate Lakehouse architecture, understanding the balance between loading performance and storage efficiency. Experimental results showed that Delta Lake is the most optimal architecture for systems where the priority is the speed of loading data of any volume, while Apache Iceberg is most appropriate for systems where stability and disk space savings are critical. Apache Hudi proved ineffective in data loading and storage evaluation tasks but could potentially be effective in incremental update and streaming processing scenarios.
[314] arXiv:2604.21450 [pdf, html, other]: Title: VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution

Yixuan Zhu, Shilin Ma, Haolin Wang, Ao Li, Yanzhe Jing, Yansong Tang, Lei Chen, Jiwen Lu, Jie Zhou

Comments: Accepted in ICLR 2026. Code is available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent advancements in visual autoregressive models (VAR) have demonstrated their effectiveness in image generation, highlighting their potential for real-world image super-resolution (Real-ISR). However, adapting VAR for ISR presents critical challenges. The next-scale prediction mechanism, constrained by causal attention, fails to fully exploit global low-quality (LQ) context, resulting in blurry and inconsistent high-quality (HQ) outputs. Additionally, error accumulation in the iterative prediction severely degrades coherence in ISR task. To address these issues, we propose VARestorer, a simple yet effective distillation framework that transforms a pre-trained text-to-image VAR model into a one-step ISR model. By leveraging distribution matching, our method eliminates the need for iterative refinement, significantly reducing error propagation and inference time. Furthermore, we introduce pyramid image conditioning with cross-scale attention, which enables bidirectional scale-wise interactions and fully utilizes the input image information while adapting to the autoregressive mechanism. This prevents later LQ tokens from being overlooked in the transformer. By fine-tuning only 1.2\% of the model parameters through parameter-efficient adapters, our method maintains the expressive power of the original VAR model while significantly enhancing efficiency. Extensive experiments show that VARestorer achieves state-of-the-art performance with 72.32 MUSIQ and 0.7669 CLIPIQA on DIV2K dataset, while accelerating inference by 10 times compared to conventional VAR inference.
[315] arXiv:2604.21453 [pdf, html, other]: Title: Instance-level Visual Active Tracking with Occlusion-Aware Planning

Haowei Sun, Kai Zhou, Hao Gao, Shiteng Zhang, Jinwu Hu, Xutao Wen, Qixiang Ye, Mingkui Tan

Comments: CVPR 2026 Poster

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual Active Tracking (VAT) aims to control cameras to follow a target in 3D space, which is critical for applications like drone navigation and security surveillance. However, it faces two key bottlenecks in real-world deployment: confusion from visually similar distractors caused by insufficient instance-level discrimination and severe failure under occlusions due to the absence of active planning. To address these, we propose OA-VAT, a unified pipeline with three complementary modules. First, a training-free Instance-Aware Offline Prototype Initialization aggregates multi-view augmented features via DINOv3 to construct discriminative instance prototypes, mitigating distractor confusion. Second, an Online Prototype Enhancement Tracker enhances prototypes online and integrates a confidence-aware Kalman filter for stable tracking under appearance and motion changes. Third, an Occlusion-Aware Trajectory Planner, trained on our new Planning-20k dataset, uses conditional diffusion to generate obstacle-avoiding paths for occlusion recovery. Experiments demonstrate OA-VAT achieves 0.93 average SR on UnrealCV (+2.2% vs. SOTA TrackVLA), 90.8% average CAR on real-world datasets (+12.1% vs. SOTA GC-VAT), and 81.6% TSR on a DJI Tello drone. Running at 35 FPS on an RTX 3090, it delivers robust, real-time performance for practical deployment.
[316] arXiv:2604.21454 [pdf, html, other]: Title: Reasoning Primitives in Hybrid and Non-Hybrid LLMs

Shivam Rawat, Lucie Flek, Florian Mai, Nicholas Kluge Corrêa

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Reasoning in large language models is often treated as a monolithic capability, but its observed gains may arise from more basic operations. We study reasoning through two such primitives, recall and state-tracking, and ask whether hybrid architectures that combine attention-based retrieval with recurrent state updates are better suited than attention-only models for tasks that jointly require both. Using matched Olmo3 transformer and hybrid models in instruction-tuned and reasoning-augmented variants, we evaluate these models on a set of controlled tasks involving a mixture of state-tracking and recall primitives, state-based recall. Across tasks, we notice that reasoning augmentation provides the largest overall improvement, substantially extending the range of difficulty over which models remain effective. We also notice that in certain tasks, the hybrid reasoning model remains substantially more robust as sequential dependence increases. In contrast, the transformer reasoning model degrades sharply in performance as task difficulty increases beyond a given threshold. These results suggest that reasoning tokens and architectural inductive biases contribute at different levels of the computational process: explicit reasoning can expand a model's effective operating range, but its benefit depends on how well the underlying architecture supports persistent state propagation. Given the small size of our case study, which involves a limited set of models and tasks, we present these findings as suggestive rather than conclusive and leave broader validation across model families, scales, and task variations to future work.
[317] arXiv:2604.21455 [pdf, html, other]: Title: The Privacy Guardian Agent: Towards Trustworthy AI Privacy Agents

Vincent Freiberger

Comments: Position paper for the CHI26 Workshop "Moving Beyond Clicks: Rethinking Consent and User Control in the Age of AI"

Subjects: Human-Computer Interaction (cs.HC)

The current "notice and consent" paradigm is broken: consent dialogues are often manipulative, and users cannot realistically read or understand every privacy policy. While recent LLM-based tools empower users seeking active control, many with limited time or motivation prefer full automation. However, fully autonomous solutions risk hallucinations and opaque decisions, undermining trust. I propose a middle ground - a Privacy Guardian Agent that automates routine consent choices using user profiles and contextual awareness while recognizing uncertainty. It escalates unclear or high-risk cases to the user, maintaining a human-in-the-loop only when necessary. To ensure agency and transparency, the agent's reasoning on its autonomous decisions is reviewable, allowing for user recourse. For problematic cases, even with minimal consent, it alerts the user and suggests switching to an alternative site. This approach aims to reduce consent fatigue while preserving trust and meaningful user autonomy.
[318] arXiv:2604.21456 [pdf, other]: Title: Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics

Heng Yang

Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal "Boltzmann-tilted" distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the initial-state distribution and (ii) an extended-space construction that treats rollout randomness as auxiliary variables. Experiments across trajectory- and policy-optimization benchmarks show that TSMC is broadly applicable and compares favorably to state-of-the-art baselines.
[319] arXiv:2604.21457 [pdf, html, other]: Title: Context-Aware Displacement Estimation from Mobile Phone Data: A Methodological Framework

Rajius Idzalika, Muhammad Rheza Muztahid, Radityo Eko Prasojo

Comments: 24 pages, 4 figures, 14 tables. Case study: Super Typhoon Nando, Philippines (2025)

Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI); Applications (stat.AP)

Timely population displacement estimates are critical for humanitarian response during disasters, but traditional surveys and field assessments are slow. Mobile phone data enables near real-time tracking, yet existing approaches apply uniform displacement definitions regardless of individual mobility patterns, misclassifying regular commuters as displaced. We present a methodological framework addressing this through three innovations: (1) mobility profile classification distinguishing local residents from commuter types, (2) context-aware between-municipality displacement detection accounting for expected location by user type and day of week, and (3) operational uncertainty bounds derived from baseline coefficient of variation with a disaster adjustment factor, intended for humanitarian decision support rather than formal statistical inference. The framework produces three complementary metrics scaled to population with uncertainty bounds: displacement rates, origin-destination flows, and return dynamics. An Aparri case study following Super Typhoon Nando (2025, Philippines) applies the framework to vendor-provided daily locations from Globe Telecom. Context-aware detection reduced estimated between-municipality displacement by 1.6-2.7 percentage points on weekdays versus naive methods, attributable to the commuter exception but not independently validated. The method captures between-municipality displacement only. Within-municipality evacuation falls outside scope. The single-case demonstration establishes proof of concept. External validity requires application across multiple events and locations. The framework provides humanitarian actors with operational displacement information while preserving individual privacy through aggregation.
[320] arXiv:2604.21461 [pdf, html, other]: Title: Do MLLMs Understand Pointing? Benchmarking and Enhancing Referential Reasoning in Egocentric Vision

Chentao Li, Zirui Gao, Mingze Gao, Yinglian Ren, Jianjiang Feng, Jie Zhou

Comments: 20 pages, 14 figures. Committed to ACL 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Egocentric AI agents, such as smart glasses, rely on pointing gestures to resolve referential ambiguities in natural language commands. However, despite advancements in Multimodal Large Language Models (MLLMs), current systems often fail to precisely ground the spatial semantics of pointing. Instead, they rely on spurious correlations with visual proximity or object saliency, a phenomenon we term "Referential Hallucination." To address this gap, we introduce EgoPoint-Bench, a comprehensive question-answering benchmark designed to evaluate and enhance multimodal pointing reasoning in egocentric views. Comprising over 11k high-fidelity simulated and real-world samples, the benchmark spans five evaluation dimensions and three levels of referential complexity. Extensive experiments demonstrate that while state-of-the-art proprietary and open-source models struggle with egocentric pointing, models fine-tuned on our synthetic data achieve significant performance gains and robust sim-to-real generalization. This work highlights the importance of spatially aware supervision and offers a scalable path toward precise egocentric AI assistants. Project page: this https URL
[321] arXiv:2604.21462 [pdf, html, other]: Title: Conditional anomaly detection with soft harmonic functions

Michal Valko, Branislav Kveton, Hamed Valizadegan, Gregory F. Cooper, Milos Hauskrecht

Comments: Published at IEEE International Conference on Data Mining (ICDM 2011). https://doi.org/10.1109/ICDM.2011.40

Journal-ref: IEEE International Conference on Data Mining (ICDM), pp. 735-743, 2011

Subjects: Machine Learning (cs.LG)

In this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions.
[322] arXiv:2604.21464 [pdf, other]: Title: Dynamical Priors as a Training Objective in Reinforcement Learning

Sukesh Subaharan

Comments: Supplementary material can be accessed here: this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Standard reinforcement learning (RL) optimizes policies for reward but imposes few constraints on how decisions evolve over time. As a result, policies may achieve high performance while exhibiting temporally incoherent behavior such as abrupt confidence shifts, oscillations, or degenerate inactivity. We introduce Dynamical Prior Reinforcement Learning (DP-RL), a training framework that augments policy gradient learning with an auxiliary loss derived from external state dynamics that implement evidence accumulation and hysteresis. Without modifying the reward, environment, or policy architecture, this prior shapes the temporal evolution of action probabilities during learning. Across three minimal environments, we show that dynamical priors systematically alter decision trajectories in task-dependent ways, promoting temporally structured behavior that cannot be explained by generic smoothing. These results demonstrate that training objectives alone can control the temporal geometry of decision-making in RL agents.
[323] arXiv:2604.21465 [pdf, html, other]: Title: ID-Eraser: Proactive Defense Against Face Swapping via Identity Perturbation

Junyan Luo, Peipeng Yu, Jianwei Fei, Shiya Zeng, Xiaoyu Zhou, Zhihua Xia, Xiang Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deepfake technologies have rapidly advanced with modern generative AI, and face swapping in particular poses serious threats to privacy and digital security. Existing proactive defenses mostly rely on pixel-level perturbations, which are ineffective against contemporary swapping models that extract robust high-level identity embeddings. We propose ID-Eraser, a feature-space proactive defense that removes identifiable facial information to prevent malicious face swapping. By injecting learnable perturbations into identity embeddings and reconstructing natural-looking protection images through a Face Revive Generator (FRG), ID-Eraser produces visually realistic results for humans while rendering the protected identities unusable for Deepfake models. Experiments show that ID-Eraser substantially disrupts identity recognition across diverse face recognition and swapping systems under strict black-box settings, achieving the lowest Top-1 accuracy (0.30) with the best FID (1.64) and LPIPS (0.020). Compared with swaps generated from clean inputs, the identity similarity of protected swaps drops sharply to an average of 0.504 across five representative face swapping models. ID-Eraser further demonstrates strong cross-dataset generalization, robustness to common distortions, and practical effectiveness on commercial APIs, reducing Tencent API similarity from 0.76 to 0.36.
[324] arXiv:2604.21467 [pdf, other]: Title: Linear Constraints

Arnaud Spiwack, Csongor Kiss, Jean-Philippe Bernardy, Nicolas Wu, Richard A. Eisenberg

Subjects: Programming Languages (cs.PL)

Linear constraints are the linear counterpart of Haskell's class constraints. Linearly typed parameters allow the programmer to control resources such as file handles and manually managed memory as linear arguments. Indeed, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. Linear constraints address this shortcoming: a linear constraint acts as an implicit linear argument that can be filled in automatically by the compiler.
We present this new feature as a qualified type system, together with an inference algorithm which extends GHC's existing constraint solver algorithm. Soundness of linear constraints is ensured by the fact that they desugar into Linear Haskell.
This paper is a revised and extended version of a previous paper by the same authors (arXiv:2103.06127). The formal system and the constraint solver have been significantly simplified and numerous additional applications are described.
[325] arXiv:2604.21468 [pdf, html, other]: Title: Novelty-Based Generation of Continuous Landscapes with Diverse Local Optima Networks

Kippei Mizuta, Shoichiro Tanaka, Shuhei Tanaka, Toshiharu Hatanaka

Subjects: Neural and Evolutionary Computing (cs.NE)

Local Optima Networks (LONs) represent the global structure of search spaces as graphs, but their construction requires iterative execution of a search algorithm to find local optima and approximate transitions between Basins of Attraction (BoAs). In continuous optimization, this high computational cost prevents systematic investigation of the relationship between LON features and evolutionary algorithm performance. To address this issue, we propose an alternative definition of BoAs for Max-Set of Gaussians (MSG) landscapes with explicitly tunable multimodality. This bypasses search-based BoA identification, enabling low-cost LON construction. Moreover, we leverage Novelty Search (NS) to explore the parameter space of the MSG landscape generator, producing instances with diverse graph topologies. Our experiments show that the proposed BoAs closely align with gradient-based BoAs, and that NS successfully generates instances with varied search difficulty and connectivity patterns among optima. Finally, over the instances generated by NS, we predict the success rate of two well-established evolutionary algorithms from LON features. While our LON construction is specific to MSG landscapes, the proposed framework provides a dataset that serves as a foundation for landscape-aware optimization.
[326] arXiv:2604.21469 [pdf, html, other]: Title: Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection

Fariz Ikhwantri, Dusica Marijan

Comments: 10 pages, 5 figures, 4 tables. 11th Special Session on Intelligent Data Mining, 2025 IEEE International Conference on Big Data

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Automating the detection of regulatory compliance remains a challenging task due to the complexity and variability of legal texts. Models trained on one regulation often fail to generalise to others. This limitation underscores the need for principled methods to improve cross-domain transfer. We study data selection as a strategy to mitigate negative transfer in compliance detection framed as a natural language inference (NLI) task. Specifically, we evaluate four approaches for selecting augmentation data from a larger source domain: random sampling, Moore-Lewis's cross-entropy difference, importance weighting, and embedding-based retrieval. We systematically vary the proportion of selected data to analyse its effect on cross-domain adaptation. Our findings demonstrate that targeted data selection substantially reduces negative transfer, offering a practical path toward scalable and reliable compliance automation across heterogeneous regulations.
[327] arXiv:2604.21471 [pdf, html, other]: Title: Ufil: A Unified Framework for Infrastructure-based Localization

Simon Schäfer, Lucas Hegerath, Marius Molz, Massimo Marcon, Bassam Alrifaee

Comments: 8 pages, 6 figures, this work was submitted to IEEE International Conference on Intelligent Transportation Systems (ITSC) 2026

Subjects: Robotics (cs.RO)

Infrastructure-based localization enhances road safety and traffic management by providing state estimates of road users. Development is hindered by fragmented, application-specific stacks that tightly couple perception, tracking, and middleware. We introduce Ufil, a Unified Framework for Infrastructure-Based Localization with a standardized object model and reusable multi-object tracking components. Ufil offers interfaces and reference implementations for prediction, detection, association, state update, and track management, allowing researchers to improve components without reimplementing the pipeline. Ufil is open-source C++/ROS 2 software with documentation and executable examples. We demonstrate Ufil by integrating three heterogeneous data sources into a single localization pipeline combining (i) vehicle onboard units broadcasting ETSI ITS-G5 Cooperative Awareness Messages, (ii) a lidar-based roadside sensor node, and (iii) an in-road sensitive surface layer. The pipeline runs unchanged in the CARLA simulator and a small-scale CAV testbed, demonstrating Ufil's scale-independent execution model. In a three-lane highway scenario with 423 and 355 vehicles in simulation and testbed, respectively, the fused system achieves lane-level lateral accuracy with mean lateral position RMSEs of 0.31 m in CARLA and 0.29 m in the CPM Lab, and mean absolute orientation errors around 2.2°. Median end-to-end latencies from sensing to fused output remain below 100 ms across all modalities in both environments.
[328] arXiv:2604.21473 [pdf, html, other]: Title: Drug Synergy Prediction via Residual Graph Isomorphism Networks and Attention Mechanisms

Jiyan Song, Wenyang Wang, Chengcheng Yan, Zhiquan Han, Feifei Zhao

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In the treatment of complex diseases, treatment regimens using a single drug often yield limited efficacy and can lead to drug resistance. In contrast, combination drug therapies can significantly improve therapeutic outcomes through synergistic effects. However, experimentally validating all possible drug combinations is prohibitively expensive, underscoring the critical need for efficient computational prediction methods. Although existing approaches based on deep learning and graph neural networks (GNNs) have made considerable progress, challenges remain in reducing structural bias, improving generalization capability, and enhancing model interpretability. To address these limitations, this paper proposes a collaborative prediction graph neural network that integrates molecular structural features and cell-line genomic profiles with drug-drug interactions to enhance the prediction of synergistic effects. We introduce a novel model named the Residual Graph Isomorphism Network integrated with an Attention mechanism (ResGIN-Att). The model first extracts multi scale topological features of drug molecules using a residual graph isomorphism network, where residual connections help mitigate over-smoothing in deep layers. Subsequently, an adaptive Long Short-Term Memory (LSTM) module fuses structural information from local to global scales. Finally, a cross-attention module is designed to explicitly model drug-drug interactions and identify key chemical substructures. Extensive experiments on five public benchmark datasets demonstrate that ResGIN-Att achieves competitive performance, comparing favorably against key baseline methods while exhibiting promising generalization capability and robustness.
[329] arXiv:2604.21477 [pdf, html, other]: Title: MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

Run Hao, Zhuoran Tan

Subjects: Cryptography and Security (cs.CR)

Model Context Protocol (MCP) is increasingly adopted for tool-integrated LLM agents, but its multi-layer design and third-party server ecosystem expand risks across tool metadata, untrusted outputs, cross-tool flows, multimodal inputs, and supply-chain vectors. Existing MCP benchmarks largely measure robustness to malicious inputs but offer limited remediation guidance. We present MCP Pitfall Lab, a protocol-aware security testing framework that operationalizes developer pitfalls as reproducible scenarios and validates outcomes with MCP traces and objective validators (rather than agent self-report). We instantiate three workflow challenges (email, document, crypto) with six server variants (baseline and hardened) and model three attack families: tool-metadata poisoning, puppet servers, and multimodal image-to-tool chains, in a unified, trace-grounded evaluation. In Tier-1 static analysis over six variants (36 binary labels), our analyzer achieves F1 = 1.0 on four statically checkable pitfall classes (P1, P2, P5, P6) and flags cross-tool forwarding and image-to-tool leakage (P3, P4) as trace/dataflow-dependent. Applying recommended hardening eliminates all Tier-1 findings (29 to 0) and reduces the framework risk score (10.0 to 0.0) at a mean cost of 27 lines of code (LOC). Finally, in a preliminary 19-run corpus from the email system challenge (tool poisoning and puppet attacks), agent narratives diverge from trace evidence in 63.2% of runs and 100% of sink-action runs, motivating trace-based auditing and regression testing. Overall, Pitfall Lab enables practical, end-to-end assessment and hardening of MCP tool servers under realistic multi-vector conditions.
[330] arXiv:2604.21478 [pdf, html, other]: Title: Rethinking Cross-Domain Evaluation for Face Forgery Detection with Semantic Fine-grained Alignment and Mixture-of-Experts

Yuhan Luo, Tao Chen, Decheng Liu

Comments: The source code is available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Nowadays, visual data forgery detection plays an increasingly important role in social and economic security with the rapid development of generative models. Existing face forgery detectors still can't achieve satisfactory performance because of poor generalization ability across datasets. The key factor that led to this phenomenon is the lack of suitable metrics: the commonly used cross-dataset AUC metric fails to reveal an important issue where detection scores may shift significantly across data domains. To explicitly evaluate cross-domain score comparability, we propose \textbf{Cross-AUC}, an evaluation metric that can compute AUC across dataset pairs by contrasting real samples from one dataset with fake samples from another (and vice versa). It is interesting to find that evaluating representative detectors under the Cross-AUC metric reveals substantial performance drops, exposing an overlooked robustness problem. Besides, we also propose the novel framework \textbf{S}emantic \textbf{F}ine-grained \textbf{A}lignment and \textbf{M}ixture-of-Experts (\textbf{SFAM}), consisting of a patch-level image-text alignment module that enhances CLIP's sensitivity to manipulation artifacts, and the facial region mixture-of-experts module, which routes features from different facial regions to specialized experts for region-aware forgery analysis. Extensive qualitative and quantitative experiments on the public datasets prove that the proposed method achieves superior performance compared with the state-of-the-art methods with various suitable metrics.
[331] arXiv:2604.21479 [pdf, html, other]: Title: Frozen LLMs as Map-Aware Spatio-Temporal Reasoners for Vehicle Trajectory Prediction

Yanjiao Liu, Jiawei Liu, Xun Gong, Zifei Nie

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large language models (LLMs) have recently demonstrated strong reasoning capabilities and attracted increasing research attention in the field of autonomous driving (AD). However, safe application of LLMs on AD perception and prediction still requires a thorough understanding of both the dynamic traffic agents and the static road infrastructure. To this end, this study introduces a framework to evaluate the capability of LLMs in understanding the behaviors of dynamic traffic agents and the topology of road networks. The framework leverages frozen LLMs as the reasoning engine, employing a traffic encoder to extract spatial-level scene features from observed trajectories of agents, while a lightweight Convolutional Neural Network (CNN) encodes the local high-definition (HD) maps. To assess the intrinsic reasoning ability of LLMs, the extracted scene features are then transformed into LLM-compatible tokens via a reprogramming adapter. By residing the prediction burden with the LLMs, a simpler linear decoder is applied to output future trajectories. The framework enables a quantitative analysis of the influence of multi-modal information, especially the impact of map semantics on trajectory prediction accuracy, and allows seamless integration of frozen LLMs with minimal adaptation, thereby demonstrating strong generalizability across diverse LLM architectures and providing a unified platform for model evaluation.
[332] arXiv:2604.21480 [pdf, html, other]: Title: Efficient Agent Evaluation via Diversity-Guided User Simulation

Itay Nakash, George Kour, Ateret Anaby-Tavor

Subjects: Artificial Intelligence (cs.AI)

Large language models (LLMs) are increasingly deployed as customer-facing agents, yet evaluating their reliability remains challenging due to stochastic, multi-turn interactions. Current evaluation protocols rely on linear Monte Carlo rollouts of complete agent-user conversations to estimate success. However, this approach is computationally inefficient, repeatedly regenerating identical early prefixes, and often fails to uncover deep failure modes that arise from rare user behaviors.
We introduce DIVERT (Diversity-Induced Evaluation via Branching of Trajectories), an efficient, snapshot-based, coverage-guided user simulation framework for systematic exploration of agent-user interactions. DIVERT captures the full agent-environment state at critical decision points and resumes execution from these snapshots, enabling reuse of shared conversation prefixes and reducing redundant computation. From each junction, the framework branches using targeted, diversity-inducing user responses, allowing directed exploration of alternative interaction paths.
By focusing evaluation on semantically diverse and underexplored trajectories, DIVERT improves both efficiency and coverage. Empirical results show that it discovers more failures per token compared to standard linear rollout protocols, while expanding the set of tasks on which failures are identified.
[333] arXiv:2604.21481 [pdf, html, other]: Title: Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

Srija Anand, Ashwin Sankar, Ishvinder Sethi, Aaditya Pareek, Kartik Rajput, Gaurav Yadav, Nikhil Narasimhan, Adish Pandya, Deepon Halder, Mohammed Safi Ur Rahman Khan, Praveen S V, Shobhit Banga, Mitesh M Khapra

Subjects: Computation and Language (cs.CL)

Crowdsourced pairwise evaluation has emerged as a scalable approach for assessing foundation models. However, applying it to Text to Speech(TTS) introduces high variance due to linguistic diversity and multidimensional nature of speech perception. We present a controlled multidimensional pairwise evaluation framework for multilingual TTS that combines linguistic control with perceptually grounded annotation. Using 5K+ native and code-mixed sentences across 10 Indic languages, we evaluate 7 state-of-the-art TTS systems and collect over 120K pairwise comparisons from over 1900 native raters. In addition to overall preference, raters provide judgments across 6 perceptual dimensions: intelligibility, expressiveness, voice quality, liveliness, noise, and hallucinations. Using Bradley-Terry modeling, we construct a multilingual leaderboard, interpret human preference using SHAP analysis and analyze leaderboard reliability alongside model strengths and trade-offs across perceptual dimensions.
[334] arXiv:2604.21483 [pdf, html, other]: Title: Risk-Aware and Stable Edge Server Selection Under Network Latency SLOs

Mohan Liyanage, Arnova Abdullah, Eldiyar Zhantileuov, Rolf Schuster

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

We present a lightweight and interpretable decision framework for dynamic edge server selection in latency-critical applications that explicitly accounts for tail risk and switching stability. Each candidate server is characterised by predictive mean and uncertainty summaries of network latency, which are used to estimate the risk of service-level objective (SLO) violations and to guide selection. Risk is evaluated using a tight Normal approximation complemented by a conservative Cantelli bound, while percentile-based scoring coupled with hysteresis stabilizes decisions and suppresses oscillatory switching under short-lived network fluctuations.
Experimental results on a multi-server edge testbed with a strict SLO of $\tau = 0.5$\,s show that the proposed approach reduces the deadline-miss rate from 39\% to 34\% compared to a mean-only baseline, while reducing switching frequency from 46\% to 5.5\% ($\approx$88\% reduction) and maintaining sub-SLO average latency ($\approx$0.45\,s). These results demonstrate that explicit risk evaluation combined with stability-preserving control enables practical and robust adaptive server selection in dynamic edge environments.
[335] arXiv:2604.21487 [pdf, html, other]: Title: Monolithically Integrated VO$_2$ Mott Oscillators for Energy-Efficient Spiking Neurons

Fabio Bersano, Cyrille Masserey, Vanessa Conti, Andrea Iaconeta, Niccolo' Martinolli, Ehsan Ansari, Anna Varini, Igor Stolichnov, Adrian Mihai Ionescu

Comments: 24 pages, 7 figures in main text, 8 figures in Supplementary Information

Subjects: Systems and Control (eess.SY); Materials Science (cond-mat.mtrl-sci)

Brain-inspired non-Boolean computing offers intrinsic error tolerance and parallelism, but its practical deployment is limited by the lack of compact, energy-efficient spiking hardware compatible with large-scale integration. Mott phase-transition materials provide a promising route, as their abrupt insulator-to-metal transitions enable neuron-like thresholding and oscillatory dynamics in compact devices. Among these, vanadium dioxide (VO$_2$) stands out for its near-room-temperature transition, fast switching, and scalability. However, existing VO$_2$-based neuristors rely on discrete components, limiting integration density and system applicability. Here, we report monolithic back-end-of-the-line (BEOL) integration of one-transistor-one-VO2-memristor (1T-1MR) spiking neurons on CMOS-compatible platforms. VO$_2$ nanosheet devices are fabricated by pulsed-laser deposition below 430 °C on dielectrically isolated silicon-on-insulator (SOI) p-type junctionless field-effect transistors (JLFETs) in a compact 1T-1MR configuration. The architecture exhibits gate-tunable oscillations from 40 to 410 kHz in 60 nm-thick VO$_2$ devices with an active area of 6 $\mu$m$^2$, achieving energy consumption as low as 18 pJ per spike at room temperature, with memristor power dissipation of 8 $\mu$W and potential scaling toward sub-3 $\mu$W operation. We further uncover a non-monotonic dependence of oscillation frequency on current and temperature, along with bias-dependent stochastic firing dynamics, highlighting the rich behavior of integrated VO$_2$ memristor systems. Finally, we demonstrate voltage-controlled oscillator functionality and actively tunable resistive coupling of two nano-oscillators mediated by a JLFET. These results establish a pathway toward dense, energy-efficient, and monolithically integrated Mott-based neuromorphic hardware compatible with CMOS technology.
[336] arXiv:2604.21489 [pdf, html, other]: Title: MISTY: High-Throughput Motion Planning via Mixer-based Single-step Drifting

Yining Xing, Zehong Ke, Yiqian Tu, Zhiyuan Liu, Wenhao Yu, Jianqiang Wang

Comments: 8 pages, 4 figures, 3 tables. Submitted to IEEE Robotics and Automation Letters (RA-L)

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Multi-modal trajectory generation is essential for safe autonomous driving, yet existing diffusion-based planners suffer from high inference latency due to iterative neural function evaluations. This paper presents MISTY (Mixer-based Inference for Single-step Trajectory-drifting Yield), a high-throughput generative motion planner that achieves state-of-the-art closed-loop performance with pure single-step inference. MISTY integrates a vectorized Sub-Graph encoder to capture environment context, a Variational Autoencoder to structure expert trajectories into a compact 32-dimensional latent manifold, and an ultra-lightweight MLP-Mixer decoder to eliminate quadratic attention complexity. Importantly, we introduce a latent-space drifting loss that shifts the complex distribution evolution entirely to the training phase. By formulating explicit attractive and repulsive forces, this mechanism empowers the model to synthesize novel, proactive maneuvers, such as active overtaking, that are virtually absent from the raw expert demonstrations. Extensive evaluations on the nuPlan benchmark demonstrate that MISTY achieves state-of-the-art results on the challenging Test14-hard split, with comprehensive scores of 80.32 and 82.21 in non-reactive and reactive settings, respectively. Operating at over 99 FPS with an end-to-end latency of 10.1 ms, MISTY offers an order-of-magnitude speedup over iterative diffusion planners while while achieving significantly robust generation.
[337] arXiv:2604.21491 [pdf, html, other]: Title: Benchmarking the Utility of Privacy-Preserving Cox Regression Under Data-Driven Clipping Bounds: A Multi-Dataset Simulation Study

Keita Fukuyama, Yukiko Mori, Tomohiro Kuroda, Hiroaki Kikuchi

Comments: 11 pages, 6 figures, 5 tables. Supplementary material (5 pages, 2 figures, 3 tables) included as ancillary file. Submission to IEEE Journal of Biomedical and Health Informatics (J-BHI)

Subjects: Cryptography and Security (cs.CR); Applications (stat.AP); Methodology (stat.ME)

Differential privacy (DP) is a mathematical framework that guarantees individual privacy; however, systematic evaluation of its impact on statistical utility in survival analyses remains limited. In this study, we systematically evaluated the impact of DP mechanisms (Laplace mechanism and Randomized Response) with data-driven clipping bounds on the Cox proportional hazards model, using 5 clinical datasets ($n = 168$--$6{,}524$), 15 levels of $\varepsilon$ (0.1--1000), and $B = 1{,}000$ Monte Carlo iterations. The data-driven clipping bounds used here are observed min/max and therefore do not provide formal $\varepsilon$-DP guarantees; the results represent an optimistic lower bound on utility degradation under formal DP. We compared three types of input perturbations (covariates only, all inputs, and the discrete-time model) with output perturbations (dfbeta-based sensitivity), using loss of significance rate (LSR), C-index, and coefficient bias as metrics. At standard DP levels ($\varepsilon \leq 1$), approximately 90% (90--94%) of the significant covariates lost significance, even in the largest dataset ($n = 6{,}524$), and the predictive performance approached random levels (test C-index $\approx 0.5$) under many conditions. Among the input perturbation approaches, perturbing only covariates preserved the risk-set structure and achieved the best recovery, whereas output perturbation (dfbeta-based sensitivity) maintained near-baseline performance at $\varepsilon \geq 5$. At $n \approx 3{,}000$, the significance recovered rapidly at $\varepsilon = 3$--10; however, in practice, $\varepsilon \geq 10$ (for predictive performance) to $\varepsilon \geq 30$--60 (for significance preservation) is required. In the moderate-to-high $\varepsilon$ range, false-positive rates increased for variables whose baseline $p$-values were near the significance threshold.
[338] arXiv:2604.21495 [pdf, html, other]: Title: Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

Hanjun Cho, Gahyun Yoo, Hanseong Kim, Jay-Yoon Lee

Comments: Accepted to TACL. This is a pre-MIT Press publication version

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Numerical reasoning over expert-domain tables often exhibits high in-domain accuracy but limited robustness to domain shift. Models trained with supervised fine-tuning (SFT) on specific datasets tend to rely on header-operation shortcuts rather than structural reasoning. We introduce TaNOS, a continual pre-training framework comprising three components: (i) header anonymization to reduce lexical memorization, (ii) operation sketches that provide minimal structural cues, and (iii) self-supervised pretraining that constructs correctness-guaranteed program-question pairs from given tables in a program-first manner. By decoupling domain semantics and numerical operation structure, TaNOS improves the transferability of numerical reasoning. Applied to an 8B instruction-tuned model, TaNOS achieves 80.13% execution accuracy on FinQA with only 10% train data, outperforming SFT baseline (73.97%) with full train data and proprietary models such as GPT-5, Gemini-2.5-Pro. Furthermore, in the domain-shift experiments, TaNOS displays nearly-negligible cross-domain gap (<2pp) when standard SFT shows over 10pp gap. These results suggest that structural guidance with operation sketches, header-agnostic representations, and correctness-guaranteed self-supervision can improve the robustness of numerical reasoning across diverse expert-domain tables.
[339] arXiv:2604.21496 [pdf, html, other]: Title: How English Print Media Frames Human-Elephant Conflicts in India

Bonala Sai Punith, Salveru Jayati, Garima Shakya, Shubham Kumar Nigam

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

Human-elephant conflict (HEC) is rising across India as habitat loss and expanding human settlements force elephants into closer contact with people. While the ecological drivers of conflict are well-studied, how the news media portrays them remains largely unexplored. This work presents the first large-scale computational analysis of media framing of HEC in India, examining 1,968 full-length news articles consisting of 28,986 sentences, from a major English-language outlet published between January 2022 and September 2025. Using a multi-model sentiment framework that combines long-context transformers, large language models, and a domain-specific Negative Elephant Portrayal Lexicon, we quantify sentiment, extract rationale sentences, and identify linguistic patterns that contribute to negative portrayals of elephants. Our findings reveal a dominance of fear-inducing and aggression-related language. Since the media framing can shape public attitudes toward wildlife and conservation policy, such narratives risk reinforcing public hostility and undermining coexistence efforts. By providing a transparent, scalable methodology and releasing all resources through an anonymized repository, this study highlights how Web-scale text analysis can support responsible wildlife reporting and promote socially beneficial media practices.
[340] arXiv:2604.21501 [pdf, html, other]: Title: GeoMind: An Agentic Workflow for Lithology Classification with Reasoned Tool Invocation

Yitong Zhou, Mingyue Cheng, Jiahao Wang, Qingyang Mao, Qi Liu

Subjects: Artificial Intelligence (cs.AI)

Lithology classification in well logs is a fundamental geoscience data mining task that aims to infer rock types from multi dimensional geophysical sequences. Despite recent progress, existing approaches typically formulate the problem as a static, single-step discriminative mapping. This static paradigm limits evidence-based diagnostic reasoning against geological standards, often yielding predictions that are detached from geological reality due to a lack of domain priors. In this work, we propose GeoMind, a tool-augmented agentic framework that models lithology classification as a sequential reasoning process. GeoMind organizes its toolkit into perception, reasoning, and analysis modules, which respectively translate raw logs into semantic trends, infer lithology hypotheses from multi-source evidence, and verify predictions against stratigraphic constraints. A global planner adaptively coordinates these modules based on input characteristics, enabling geologically plausible and evidence-grounded decisions. To guarantee the logical consistency of GeoMind, we introduce a fine-grained process supervision strategy. Unlike standard methods that focus solely on final outcomes, our approach optimizes intermediate reasoning steps, ensuring the validity of decision trajectories and alignment to geological constraints. Experiments on four benchmark well-log datasets demonstrate that GeoMind consistently outperforms strong baselines in classification performance while providing transparent and traceable decision-making processes.
[341] arXiv:2604.21502 [pdf, html, other]: Title: VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection

Yupeng Zhang, Ruize Han, Ningnan Guo, Wei Feng, Song Wang, Liang Wan

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In real-world scenarios, continual changes in weather, illumination, and imaging conditions cause significant domain shifts, leading detectors trained on a single source domain to degrade severely in unseen environments. Existing single-domain generalized object detection (SDGOD) methods mainly rely on data augmentation or domain-invariant representation learning, but pay limited attention to detector mechanisms, leaving clear limitations under complex domain shifts. Through analytical experiments, we find that performance degradation is dominated by increasing missed detections, which fundamentally arises from reduced cross-domain stability of the detector: object-background and inter-instance relations become less stable in the encoding stage, while semantic-spatial alignment of query representations also becomes harder to maintain in the decoding stage. To this end, we propose VFM$^{4}$SDG, a dual-prior learning framework for SDGOD, which introduces a frozen vision foundation model (VFM) as a transferable cross-domain stability prior into detector representation learning and query modeling. In the encoding stage, we propose Cross-domain Stable Relational Prior Distillation to enhance the robustness of object-background and inter-instance relational modeling. In the decoding stage, we propose Semantic-Contextual Prior-based Query Enhancement, which injects category-level semantic prototypes and global visual context into queries to improve their semantic recognition and spatial localization stability in unseen domains. Extensive experiments show that the proposed method consistently outperforms existing SOTA methods on standard SDGOD benchmarks and two mainstream DETR-based detectors, demonstrating its effectiveness, robustness, and generality.
[342] arXiv:2604.21504 [pdf, html, other]: Title: Efficient generation of expected-degree graphs via edge-arrivals

Gianlorenzo D'Angelo, Riccardo Michielan

Comments: 18 pages, 2 figures, submitted to 34th Annual European Symposium on Algorithms (ESA 2026)

Subjects: Data Structures and Algorithms (cs.DS); Mathematical Software (cs.MS); Probability (math.PR)

We study the efficient generation of random graphs with a prescribed expected degree sequence, focusing on rank-1 inhomogeneous models in which vertices are assigned weights and edges are drawn independently with probabilities proportional to the product of endpoint weights. We adopt a temporal viewpoint, adding edges to the graph one at a time up to a fixed time horizon, and allowing for self-loops or duplicate edges in the first stage. Then, the simple projection of the resulting multigraph recovers exactly the simple Norros--Reittu random graph, whose expected degrees match the prescribed targets under mild conditions. Building on this representation, we develop an exact generator based on \textit{edge-arrivals} for expected-degree random graphs with running time $O(n+m)$, where $m$ is the number of generated edges, and hence proportional to the output size. This removes the typical vertex sorting used by widely-used fast generator algorithms based on \textit{edge-skipping} for rank-1 expected-degree models, which leads to a total running time of $O(n \log n + m)$. In addition, our algorithm is simpler than those in the literature, easy to implement, and very flexible, thus opening up to extensions to directed and temporal random graphs, generalization to higher-order structures, and improvements through parallelization.
[343] arXiv:2604.21505 [pdf, html, other]: Title: Assessing the Impact of Requirement Ambiguity on LLM-based Function-Level Code Generation

Di Yang, Xinou Xie, Xiuwen Yang, Ming Hu, Yihao Huang, Yueling Zhang, Weikai Miao, Ting Su, Chengcheng Wan, Geguang Pu

Subjects: Software Engineering (cs.SE)

Software requirement ambiguity is ubiquitous in real-world development, stemming from the inherent imprecision of natural language and the varying interpretations of stakeholders. While Large Language Models (LLMs) have demonstrated impressive capabilities in generating code from precise specifications, such ambiguity poses a significant obstacle to reliable automated code generation. Existing benchmarks typically assume clear and unambiguous requirements, leaving an empirical gap in understanding how LLMs behave when faced with the inherent uncertainty of real-world software requirements. In this paper, we introduce Orchid, the first code generation benchmark specifically designed with ambiguous requirements. It comprises 1,304 function-level tasks covering four distinct types of ambiguity: lexical, syntactic, semantic, and vagueness. Leveraging this dataset, we conduct the first systematic empirical study to evaluate the impact of requirement ambiguity on LLM-based code generation. Our results demonstrate that ambiguity consistently degrades the performance of all evaluated LLMs, with the most pronounced negative effects observed in highly advanced models. Furthermore, we observe that LLMs frequently produce functionally divergent implementations for the same ambiguous requirement and lack the capability to identify or resolve such ambiguity autonomously. These findings reveal a significant performance gap between clear and ambiguous requirements, underscoring the urgent need for ambiguity-aware techniques in the next generation of automated software engineering tools. The Orchid benchmark is publicly available at this https URL.
[344] arXiv:2604.21508 [pdf, html, other]: Title: BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature

Jiaxian Yan, Jintao Zhu, Yuhang Yang, Qi Liu, Kai Zhang, Zaixi Zhang, Xukai Liu, Boyan Zhang, Kaiyuan Gao, Jinchuan Xiao, Enhong Chen

Comments: 20 pages, 5 figures, 1 table

Subjects: Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)

Protein-ligand bioactivity data published in the literature are essential for drug discovery, yet manual curation struggles to keep pace with rapidly growing literature. Automated bioactivity extraction remains challenging because it requires not only interpreting biochemical semantics distributed across text, tables, and figures, but also reconstructing chemically exact ligand structures (e.g., Markush structures). To address this bottleneck, we introduce BioMiner, a multi-modal extraction framework that explicitly separates bioactivity semantic interpretation from ligand structure construction. Within BioMiner, bioactivity semantics are inferred through direct reasoning, while chemical structures are resolved via a chemical-structure-grounded visual semantic reasoning paradigm, in which multi-modal large language models operate on chemically grounded visual representations to infer inter-structure relationships, and exact molecular construction is delegated to domain chemistry tools. For rigorous evaluation and method development, we further establish BioVista, a comprehensive benchmark comprising 16,457 bioactivity entries curated from 500 publications. BioMiner validates its extraction ability and provides a quantitative baseline, achieving an F1 score of 0.32 for bioactivity triplets. BioMiner's practical utility is demonstrated via three applications: (1) extracting 82,262 data from 11,683 papers to build a pre-training database that improves downstream models performance by 3.9%; (2) enabling a human-in-the-loop workflow that doubles the number of high-quality NLRP3 bioactivity data, helping 38.6% improvement over 28 QSAR models and identification of 16 hit candidates with novel scaffolds; and (3) accelerating protein-ligand complex bioactivity annotation, achieving a 5.59-fold speed increase and 5.75% accuracy improvement over manual workflows in PoseBusters dataset.
[345] arXiv:2604.21510 [pdf, html, other]: Title: OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

Xinyu Zhang, Boxuan Zhang, Yuchen Wan, Lingling Zhang, YiXing Yao, Bifan Wei, Yaqiang Wu, Jun Liu

Subjects: Computation and Language (cs.CL)

While Large Language Models (LLMs) demonstrate remarkable reasoning, complex optimization tasks remain challenging, requiring domain knowledge and robust implementation. However, existing benchmarks focus narrowly on Mathematical Programming and Combinatorial Optimization, hindering comprehensive evaluation. To address this, we introduce OptiVerse, a comprehensive benchmark of 1,000 curated problems spanning neglected domains, including Stochastic Optimization, Dynamic Optimization, Game Optimization, and Optimal Control, across three difficulty levels: Easy, Medium, and Hard. The experiments with 22 LLMs of different sizes reveal sharp performance degradation on hard problems, where even advanced models like GPT-5.2 and Gemini-3 struggle to exceed 27% accuracy. Through error analysis, we identify that modeling & logic errors remain the primary bottleneck. Consequently, we propose a Dual-View Auditor Agent that improves the accuracy of the LLM modeling process without introducing significant time overhead. OptiVerse will serve as a foundational platform for advancing LLMs in solving complex optimization challenges.
[346] arXiv:2604.21511 [pdf, html, other]: Title: From Tokens to Concepts: Leveraging SAE for SPLADE

Yuxuan Zong, Mathias Vast, Basile Van Cooten, Laure Soulier, Benjamin Piwowarski

Comments: 11 pages, 3 figures, 9 tables. To appear at SIGIR 2025

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Learned Sparse IR models, such as SPLADE, offer an excellent efficiency-effectiveness tradeoff. However, they rely on the underlying backbone vocabulary, which might hinder performance (polysemicity and synonymy) and pose a challenge for multi-lingual and multi-modal usages. To solve this limitation, we propose to replace the backbone vocabulary with a latent space of semantic concepts learned using Sparse Auto-Encoders (SAE). Throughout this paper, we study the compatibility of these 2 concepts, explore training approaches, and analyze the differences between our SAE-SPLADE model and traditional SPLADE models. Our experiments demonstrate that SAE-SPLADE achieves retrieval performance comparable to SPLADE on both in-domain and out-of-domain tasks while offering improved efficiency.
[347] arXiv:2604.21515 [pdf, html, other]: Title: Satisfying Rationality Postulates of Structured Argumentation Through Deductive Support -- Technical Report

Marcos Cramer, Tom Friese

Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

ASPIC-style structured argumentation frameworks provide a formal basis for reasoning in artificial intelligence by combining internal argument structure with abstract argumentation semantics. A key challenge in these frameworks is ensuring compliance with five critical rationality postulates: closure, direct consistency, indirect consistency, non-interference, and crash-resistance. Recent approaches, including ASPIC$^{\ominus}$ and Deductive ASPIC$-$, have made significant progress but fall short of meeting all postulates simultaneously under a credulous semantics (e.g. preferred) in the presence of undercuts. This paper introduces Deductive ASPIC$^{\ominus}$, a novel framework that integrates gen-rebuttals from ASPIC$^{\ominus}$ with the Joint Support Bipolar Argumentation Frameworks (JSBAFs) of Deductive ASPIC$-$, incorporating preferences. We show that Deductive ASPIC$^{\ominus}$ satisfies all five rationality postulates under a version of preferred semantics. This work opens new avenues for further research on robust and logically sound structured argumentation systems.
[348] arXiv:2604.21517 [pdf, html, other]: Title: Systematizing Blockchain Research Themes and Design Patterns: Insights from the University Blockchain Research Initiative (UBRI)

Chien-Chih Chen, Yitian Wang, Emma Nasseri, Yebo Feng, Lauren Weymouth

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The rapid expansion of blockchain and digital asset ecosystems has intensified the challenge of translating academic research into deployable systems and regulatory frameworks. While advances in cryptography, consensus, digital assets, and governance are substantial, institutional mechanisms that sustain research-to-deployment translation at ecosystem scale remain comparatively under-theorized. This paper examines the architectural and coordination patterns that enable such translation, using the University Blockchain Research Initiative (UBRI) network as a representative case of long-term academic and industry collaboration. Drawing on research outputs and convenings from 2022 to 2025, we synthesize recurring design tensions across technical and institutional domains, including scalability versus security, decentralization versus governance, and privacy versus compliance. Rather than cataloging individual projects, we abstract system-level themes that connect research contributions to deployment constraints and policy adaptation, providing a structured lens for understanding how academic research informs production architectures, regulatory development, and ecosystem resilience in emerging decentralized infrastructures.
[349] arXiv:2604.21519 [pdf, html, other]: Title: Gmd: Gaussian mixture descriptor for pair matching of 3D fragments

Meijun Xiong, Zhenguo Shi, Xinyu Zhou, Yuhe Zhang, Shunli Zhang

Comments: 24 pages, 10 figures. Published in Multimedia Systems

Journal-ref: Multimedia Systems 30, 326 (2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the automatic reassembly of fragments acquired using laser scanners to reconstruct objects, a crucial step is the matching of fractured surfaces. In this paper, we propose a novel local descriptor that uses the Gaussian Mixture Model (GMM) to fit the distribution of points, allowing for the description and matching of fractured surfaces of fragments. Our method involves dividing a local surface patch into concave and convex regions for estimating the k value of GMM. Then the final Gaussian Mixture Descriptor (GMD) of the fractured surface is formed by merging the regional GMDs. To measure the similarities between GMDs for determining adjacent fragments, we employ the L2 distance and align the fragments using Random Sample Consensus (RANSAC) and Iterative Closest Point (ICP). The extensive experiments on real-scanned public datasets and Terracotta datasets demonstrate the effectiveness of our approach; furthermore, the comparisons with several existing methods also validate the advantage of the proposed method.
[350] arXiv:2604.21523 [pdf, html, other]: Title: Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models

Mohammed Safi Ur Rahman Khan, Sanjay Suryanarayanan, Tushar Anand, Mitesh M. Khapra

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Large Vision-Language Models (VLMs) are increasingly used to evaluate outputs of other models, for image-to-text (I2T) tasks such as visual question answering, and text-to-image (T2I) generation tasks. Despite this growing reliance, the reliability of these Evaluator VLMs remains under explored. In this work, we systematically evaluate the reliability of Evaluator VLMs across both I2T and T2I tasks. We introduce targeted perturbations that degrade output quality along key error dimensions, including object hallucinations, spatial reasoning, factual grounding, and visual fidelity. These perturbations test whether Evaluator VLMs can reliably account for these quality degrading errors in their evaluations. Using a comprehensive benchmark of over 4000 perturbed instances spanning 40 perturbation dimensions, we evaluate 4 prominent VLMs using single-answer scoring, pairwise comparison, and reference-guided paradigms. Our findings reveal that current VLM evaluators exhibit substantial blind spots: they often fail to detect perturbed outputs - in some cases exceeding 50%, struggle particularly with fine-grained compositional and spatial errors, and are often insensitive to hallucinated content that contradicts the input image. Pairwise comparison proves more reliable, though failure rates persist. These results highlight the unreliable nature of current Evaluator VLMs and urge caution in their deployment for benchmarking and development decisions. Code and data have been made publicly available.
[351] arXiv:2604.21525 [pdf, html, other]: Title: Job Skill Extraction via LLM-Centric Multi-Module Framework

Guojing Li (1 and 2), Zichuan Fu (1), Junyi Li (1), Faxue Liu (1), Wenxia Zhou (2), Yejing Wang (1), Jingtong Gao (1), Maolin Wang (1), Rungen Liu (1), Wenlin Zhang (1), Xiangyu Zhao (1) ((1) City University of Hong Kong, (2) Renmin University of China)

Comments: 5 pages, 5 figures, 3 tables

Subjects: Computation and Language (cs.CL)

Span-level skill extraction from job advertisements underpins candidate-job matching and labor-market analytics, yet generative large language models (LLMs) often yield malformed spans, boundary drift, and hallucinations, especially with long-tail terms and cross-domain shift. We present SRICL, an LLM-centric framework that combines semantic retrieval (SR), in-context learning (ICL), and supervised fine-tuning (SFT) with a deterministic verifier. SR pulls in-domain annotated sentences and definitions from ESCO to form format-constrained prompts that stabilize boundaries and handle coordination. SFT aligns output behavior, while the verifier enforces pairing, non-overlap, and BIO legality with minimal retries. On six public span-labeled corpora of job-ad sentences across sectors and languages, SRICL achieves substantial STRICT-F1 improvements over GPT-3.5 prompting baselines and sharply reduces invalid tags and hallucinated spans, enabling dependable sentence-level deployment in low-resource, multi-domain settings.
[352] arXiv:2604.21527 [pdf, html, other]: Title: A temporal deep learning framework for calibration of low-cost air quality sensors

Arindam Sengupta, Tony Bush, Ben Marner, Jose Miguel Pérez, Soledad Le Clainche

Subjects: Machine Learning (cs.LG)

Low-cost air quality sensors (LCS) provide a practical alternative to expensive regulatory-grade instruments, making dense urban monitoring networks possible. Yet their adoption is limited by calibration challenges, including sensor drift, environmental cross-sensitivity, and variability in performance from device to device. This work presents a deep learning framework for calibrating LCS measurements of PM$_{2.5}$, PM$_{10}$, and NO$_2$ using a Long Short-Term Memory (LSTM) network, trained on co-located reference data from the OxAria network in Oxford, UK. Unlike the Random Forest (RF) baseline, which treats each observation independently, the proposed approach captures temporal dependencies and delayed environmental effects through sequence-based learning, achieving higher $R^2$ values across training, validation, and test sets for all three pollutants. A feature set is constructed combining time-lagged parameters, harmonic encodings, and interaction terms to improve generalization on unseen temporal windows. Validation of unseen calibrated values against the Equivalence Spreadsheet Tool 3.1 demonstrates regulatory compliance with expanded uncertainties of 22.11% for NO$_2$, 12.42% for PM$_{10}$, and 9.1% for PM$_{2.5}$.
[353] arXiv:2604.21529 [pdf, html, other]: Title: Architectures for Robust Self-Organizing Energy Systems under Information and Control Constraints

Emilie Frost, Astrid Nieße

Comments: This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this contribution will be published in Agents and Artificial Intelligence, Lecture Notes in Computer Science, and available online at this https URL

Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

Applying the concept of controlled self-organization in agent-based Cyber-Physical Energy Systems (CPES) is a promising approach to ensure system robustness. By introducing an observer/controller architecture to the system, this concept allows for self-organization while still enabling intervention when disturbances occur. Thus, it is possible to respond to effects of cyber attacks, a major threat to current energy systems. However, when implementing an observer to monitor the system and a controller to execute actions for controlled self-organization in CPES, it is essential to take into account restrictions on information and actions resulting from the privacy of local distributed energy resources, regulatory constraints, and data exchange requirements. For this reason, this paper presents architecture variants for the observer and controller that take into account restrictions on access to information and limited actions. In addition, it evaluates possible controller actions in various architectures. The results underscore the importance of considering observer/controller architectures when designing agent-based systems to ensure their robustness for real-world applications.
[354] arXiv:2604.21530 [pdf, other]: Title: Attention-based multiple instance learning for predominant growth pattern prediction in lung adenocarcinoma wsi using foundation models

Laura Valeria Perez-Herrera, M.J. Garcia-Gonzalez, Karen Lopez-Linares

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Lung adenocarcinoma (LUAD) grading depends on accurately identifying growth patterns, which are indicators of prognosis and can influence treatment decisions. Common deep learning approaches to determine the predominant pattern rely on patch-level classification or segmentation, requiring extensive annotations. This study proposes an attention-based multiple instance learning (ABMIL) framework to predict the predominant LUAD growth pattern at the whole slide level to reduce annotation burden. Our approach integrates pretrained pathology foundation models as patch encoders, used either frozen or fine-tuned on annotated patches, to extract discriminative features that are aggregated through attention mechanisms. Experiments show that fine-tuned encoders improve performance, with Prov-GigaPath achieving the highest agreement (\k{appa} = 0.699) under ABMIL. Compared to simple patch-aggregation baselines, ABMIL yields more robust predictions by leveraging slide-level supervision and spatial attention. Future work will extend this framework to estimate the full distribution of growth patterns and validate performance on external cohorts.
[355] arXiv:2604.21531 [pdf, html, other]: Title: Kernelization Bounds for Constrained Coloring

Ishay Haviv

Comments: 32 pages

Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)

We study the kernel complexity of constraint satisfaction problems over a finite domain, parameterized by the number of variables, whose constraint language consists of two relations: the non-equality relation and an additional permutation-invariant relation $R$. We establish a conditional lower bound on the kernel size in terms of the largest arity of an OR relation definable from $R$. Building on this, we investigate the kernel complexity of uniformly rainbow free coloring problems. In these problems, for fixed positive integers $d$, $\ell$, and $q \geq d$, we are given a graph $G$ on $n$ vertices and a collection $\cal F$ of $\ell$-tuples of $d$-subsets of its vertex set, and the goal is to decide whether there exists a proper coloring of $G$ with $q$ colors such that no $\ell$-tuple in $\cal F$ is uniformly rainbow, that is, no tuple has all its sets colored with the same $d$ distinct colors. We determine, for all admissible values of $d$, $\ell$, and $q$, the infimum over all values $\eta$ for which the problem admits a kernel of size $O(n^\eta)$, under the assumption $\mathsf{NP} \nsubseteq \mathsf{coNP/poly}$. As applications, we obtain nearly tight bounds on the kernel complexity of various coloring problems under diverse settings and parameterizations. This includes graph coloring problems parameterized by the vertex-deletion distance to a disjoint union of cliques, resolving a question of Schalken (2020), as well as uniform hypergraph coloring problems parameterized by the number of vertices, extending results of Jansen and Pieterse (2019) and Beukers (2021).
[356] arXiv:2604.21532 [pdf, other]: Title: Using Assembly Language for Creating Games

Haris Turkmanović, David Vukoje, Aleksandra Lekić, Milan Prokin

Journal-ref: IcETRAN-2018, Pali\'{c}, Serbia, 2018

Subjects: Systems and Control (eess.SY)

The aim of this paper is to demonstrate some interesting and useful approaches for writing a program in the assembly language. In order to demonstrate the possibilities of the assembly language, a project called "Arkanoid" was created. This project is written in assembly language and it presents few interesting algorithms. Assembly language, which is used for designing the game is x86 Assembly language, which produces object code for the x86 class of processors. As a working environment is chosen Visual Studio 2015, because it gives the useful tools for debugging and testing of the created software (game). Execution of the program results in a "Arkanoid" game, placed in Windows OS Console.
[357] arXiv:2604.21534 [pdf, html, other]: Title: UKP_Psycontrol at SemEval-2026 Task 2: Modeling Valence and Arousal Dynamics from Text

Darya Hryhoryeva, Amaia Zurinaga, Hamidreza Jamalabadi, Iryna Gurevych

Comments: Accepted to SemEval 2026 (co-located with ACL 2026)

Subjects: Computation and Language (cs.CL)

This paper presents our system developed for SemEval-2026 Task 2. The task requires modeling both current affect and short-term affective change in chronologically ordered user-generated texts. We explore three complementary approaches: (1) LLM prompting under user-aware and user-agnostic settings, (2) a pairwise Maximum Entropy (MaxEnt) model with Ising-style interactions for structured transition modeling, and (3) a lightweight neural regression model incorporating recent affective trajectories and trainable user embeddings. Our findings indicate that LLMs effectively capture static affective signals from text, whereas short-term affective variation in this dataset is more strongly explained by recent numeric state trajectories than by textual semantics. Our system ranked first among participating teams in both Subtask 1 and Subtask 2A based on the official evaluation metric.
[358] arXiv:2604.21536 [pdf, html, other]: Title: Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation

Nikita Severin, Danil Kartushov, Vladislav Urzhumov, Vladislav Kulikov, Oksana Konovalova, Alexey Grishanov, Anton Klenitskiy, Artem Fatkulin, Alexey Vasilev, Andrey Savchenko, Ilya Makarov

Comments: Accepted to ECIR 2026. 7 pages. This version of the contribution has been accepted for publication, after peer review but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: this http URL

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Sequential recommender systems have achieved significant success in modeling temporal user behavior but remain limited in capturing rich user semantics beyond interaction patterns. Large Language Models (LLMs) present opportunities to enhance user understanding with their reasoning capabilities, yet existing integration approaches create prohibitive inference costs in real time. To address these limitations, we present a novel knowledge distillation method that utilizes textual user profile generated by pre-trained LLMs into sequential recommenders without requiring LLM inference at serving time. The resulting approach maintains the inference efficiency of traditional sequential models while requiring neither architectural modifications nor LLM fine-tuning.
[359] arXiv:2604.21537 [pdf, other]: Title: The CriticalSet problem: Identifying Critical Contributors in Bipartite Dependency Networks

Sebastiano A. Piccolo, Andrea Tagarelli

Subjects: Artificial Intelligence (cs.AI); Statistical Mechanics (cond-mat.stat-mech); Computer Science and Game Theory (cs.GT); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an)

Identifying critical nodes in complex networks is a fundamental task in graph mining. Yet, methods addressing an all-or-nothing coverage mechanics in a bipartite dependency network, a graph with two types of nodes where edges represent dependency relationships across the two groups only, remain largely unexplored. We formalize the CriticalSet problem: given an arbitrary bipartite graph modeling dependencies of items on contributors, identify the set of k contributors whose removal isolates the largest number of items. We prove that this problem is NP-hard and requires maximizing a supermodular set function, for which standard forward greedy algorithms provide no approximation guarantees. Consequently, we model CriticalSet as a coalitional game, deriving a closed-form centrality, ShapleyCov, based on the Shapley value. This measure can be interpreted as the expected number of items isolated by a contributor's departure. Leveraging these insights, we propose MinCov, a linear-time iterative peeling algorithm that explicitly accounts for connection redundancy, prioritizing contributors who uniquely support many items. Extensive experiments on synthetic and large-scale real datasets, including a Wikipedia graph with over 250 million edges, reveal that MinCov and ShapleyCov significantly outperform traditional baselines. Notably, MinCov achieves near-optimal performance, within 0.02 AUC of a Stochastic Hill Climbing metaheuristic, while remaining several orders of magnitude faster.
[360] arXiv:2604.21541 [pdf, html, other]: Title: X2-N: A Transformable Wheel-legged Humanoid Robot with Dual-mode Locomotion and Manipulation

Yan Ning, Xingzhou Chen, Delong Li, Hao Zhang, Hanfu Gai, Tongyuan Li, Cheng Zhang, Zhihui Peng, Ling Shi

Subjects: Robotics (cs.RO)

Wheel-legged robots combine the efficiency of wheeled locomotion with the versatility of legged systems, enabling rapid traversal over both continuous and discrete terrains. However, conventional designs typically employ fixed wheels as feet and limited degrees of freedom (DoFs) at the hips, resulting in reduced stability and mobility during legged locomotion compared to humanoids with flat feet. In addition, most existing platforms lack a full upper body with arms, which limits their ability to perform dexterous manipulation tasks.
In this letter, we present X2-N, a high-DoF transformable robot with dual-mode locomotion and manipulation. X2-N can operate in both humanoid and wheel-legged forms and transform seamlessly between them through joint reconfiguration. We further propose a reinforcement learning (RL)-based whole-body control framework tailored to this morphology, enabling unified control across hybrid locomotion, transformation, and manipulation. We validate X2-N in a range of challenging locomotion and manipulation tasks, including dynamic skating-like motion, stair climbing and package delivery. Results demonstrate high locomotion efficiency, strong terrain adaptability, and stable loco-manipulation performance of X2-N, highlighting its potential for real-world deployment.
[361] arXiv:2604.21542 [pdf, html, other]: Title: A Characterization of Integral Input-to-state Stability for Hybrid Systems with Memory

Wenbang Wang, Neng Li, Wei Ren

Comments: 8 pages, 1 figure. Submitted to the Chinese Control Conference (CCC)

Subjects: Systems and Control (eess.SY)

This paper addresses characterizations of Integral Input-to-State Stability (iISS) for hybrid systems with memory. Based on the Krasovskii approach, a novel Lyapunov characterization of iISS is established to extend the hybrid system theory to the time-delay case. In particular, we introduce the notions of dissipativity, detectability and storage functional to describe the iISS property from different perspectives. Under mild regularity and convexity assumptions, the equivalence relations among diverse stability descriptions are established, which lays a solid foundation for the control design. Finally, a numerical example is presented to illustrate the derived results.
[362] arXiv:2604.21544 [pdf, html, other]: Title: Design of MDP Convolutional Codes and Maximally Recoverable Codes Through the Lens of Matrix Completion

Sakshi Dang, Julia Lieb, Pedro Soto, Alex Sprintson

Subjects: Information Theory (cs.IT)

The matrix completion problem provides a unifying lens through which many fundamental problems in coding theory can be viewed. In this paper, we investigate Locally Recoverable Codes (LRCs) with Maximal Recoverability (MR) and Maximum Distance Profile (MDP) convolutional codes in the framework of matrix completion. In particular, we present techniques that are general enough to provide constructions for both types of codes. A common feature of our code constructions is the sparsity of their generator matrices and the property that a large number of the entries of the generator matrices are elements of a small subfield of a larger extension field.
[363] arXiv:2604.21546 [pdf, html, other]: Title: Component-Based Out-of-Distribution Detection

Wenrui Liu, Hong Chang, Ruibing Hou, Shiguang Shan, Xilin Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Out-of-Distribution (OOD) detection requires sensitivity to subtle shifts without overreacting to natural In-Distribution (ID) diversity. However, from the viewpoint of detection granularity, global representation inevitably suppress local OOD cues, while patch-based methods are unstable due to entangled spurious-correlation and noise. And neither them is effective in detecting compositional OODs composed of valid ID components. Inspired by recognition-by-components theory, we present a training-free Component-Based OOD Detection (CoOD) framework that addresses the existing limitations by decomposing inputs into functional components. To instantiate CoOD, we derive Component Shift Score (CSS) to detect local appearance shifts, and Compositional Consistency Score (CCS) to identify cross-component compositional inconsistencies. Empirically, CoOD achieves consistent improvements on both coarse- and fine-grained OOD detection.
[364] arXiv:2604.21549 [pdf, other]: Title: Unbiased Prevalence Estimation with Multicalibrated LLMs

Fridolin Linder, Thomas Leeper, Daniel Haimovich, Niek Tax, Lorenzo Perini, Milan Vojnovic

Subjects: Artificial Intelligence (cs.AI); Methodology (stat.ME)

Estimating the prevalence of a category in a population using imperfect measurement devices (diagnostic tests, classifiers, or large language models) is fundamental to science, public health, and online trust and safety. Standard approaches correct for known device error rates but assume these rates remain stable across populations. We show this assumption fails under covariate shift and that multicalibration, which enforces calibration conditional on the input features rather than just on average, is sufficient for unbiased prevalence estimation under such shift. Standard calibration and quantification methods fail to provide this guarantee. Our work connects recent theoretical work on fairness to a longstanding measurement problem spanning nearly all academic disciplines. A simulation confirms that standard methods exhibit bias growing with shift magnitude, while a multicalibrated estimator maintains near-zero bias. While we focus the discussion mostly on LLMs, our theoretical results apply to any classification model. Two empirical applications -- estimating employment prevalence across U.S. states using the American Community Survey, and classifying political texts across four countries using an LLM -- demonstrate that multicalibration substantially reduces bias in practice, while highlighting that calibration data should cover the key feature dimensions along which target populations may differ.
[365] arXiv:2604.21554 [pdf, html, other]: Title: Engaged AI Governance: Addressing the Last Mile Challenge Through Internal Expert Collaboration

Simon Jarvers, Orestis Papakyriakopoulos

Subjects: Artificial Intelligence (cs.AI)

Under the EU AI Act, translating AI governance requirements into software development practice remains challenging. While AI governance frameworks exist at industry and organizational levels, empirical evidence of team-level implementation is scarce. We address this "Last Mile" Challenge through insider action research embedded within an AI startup. We present a legal-text-to-action pipeline that translates EU AI Act requirements into actionable strategies through internal expert collaboration by extracting requirements from legal text, engaging practitioners in assessment and ideation, and prioritizing implementation through collective evaluation. Our analysis reveals three patterns in how practitioners perceive regulatory requirements: convergence (compliance aligns with development priorities), existing practice (current work already satisfies requirements), and disconnection (requirements perceived as administrative overhead). Based on these patterns, we discuss when governance might be treated genuinely or performatively. Practitioners prioritize requirements that serve end-users or their own development needs, but view verification-oriented requirements as box-ticking exercises. This distinction suggests a translation challenge: regulatory requirements risk superficial treatment unless practitioners understand how compliance serves system quality and user protection. Expert collaboration offers a practical mechanism for transforming governance from external imposition to shared ownership and making previously invisible governance work visible and collective.
[366] arXiv:2604.21555 [pdf, html, other]: Title: Finding Meaning in Embeddings: Concept Separation Curves

Paul Keuren, Marc Ponsen, Robert Ayoub Bagheri

Comments: The code is open source and located on github at this https URL. Original conference paper

Subjects: Computation and Language (cs.CL)

Sentence embedding techniques aim to encode key concepts of a sentence's meaning in a vector space. However, the majority of evaluation approaches for sentence embedding quality rely on the use of additional classifiers or downstream tasks. These additional components make it unclear whether good results stem from the embedding itself or from the classifier's behaviour. In this paper, we propose a novel method for evaluating the effectiveness of sentence embedding methods in capturing sentence-level concepts. Our approach is classifier-independent, allowing for an objective assessment of the model's performance. The approach adopted in this study involves the systematic introduction of syntactic noise and semantic negations into sentences, with the subsequent quantification of their relative effects on the resulting embeddings. The visualisation of these effects is facilitated by Concept Separation Curves, which show the model's capacity to differentiate between conceptual and surface-level variations. By leveraging data from multiple domains, employing both Dutch and English languages, and examining sentence lengths, this study offers a compelling demonstration that Concept Separation Curves provide an interpretable, reproducible, and cross-model approach for evaluating the conceptual stability of sentence embeddings.
[367] arXiv:2604.21556 [pdf, html, other]: Title: Probabilistic Verification of Neural Networks via Efficient Probabilistic Hull Generation

Jingyang Li, Xin Chen, Hongfei Fu, Guoqiang Li

Comments: 22 pages, 5 figures

Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

The problem of probabilistic verification of a neural network investigates the probability of satisfying the safe constraints in the output space when the input is given by a probability distribution. It is significant to answer this problem when the input is affected by disturbances often modeled by probabilistic variables. In the paper, we propose a novel neural network probabilistic verification framework which computes a guaranteed range for the safe probability by efficiently finding safe and unsafe probabilistic hulls. Our approach consists of three main innovations: (1) a state space subdivision strategy using regression trees to produce probabilistic hulls, (2) a boundary-aware sampling method which identifies the safety boundary in the input space using samples that are later used for building regression trees, and (3) iterative refinement with probabilistic prioritization for computing a guaranteed range for the safe probability. The accuracy and efficiency of our approach are evaluated on various benchmarks including ACAS Xu and a rocket lander controller. The result shows an obvious advantage over the state of the art.
[368] arXiv:2604.21558 [pdf, html, other]: Title: A nonconforming method for a generalized Darcy-Forchheimer model

Michele Botti, Lorenzo Mascotto, Marialetizia Mosconi

Subjects: Numerical Analysis (math.NA)

We analyze a dual mixed nonconforming discretization of a generalized Darcy-Forchheimer model. Compared to the analogous scheme proposed by Girault and Wheeler, we consider general, i.e., nonquadratic, Forchheimer nonlinearities; we admit mixed, inhomogeneous boundary conditions; we allow for more general, i.e., with lower Lebesgue regularity, permeability tensors; we construct general-order schemes; we prove convergence to the exact solution under low regularity assumptions, based on novel Sobolev-trace inequalities for broken spaces; we derive error estimates of general-order assuming extra regularity of the exact solution and data; we present numerical results assessing the performance of the proposed schemes for different types of nonlinearity and nonlinear solvers.
[369] arXiv:2604.21564 [pdf, html, other]: Title: Measuring Opinion Bias and Sycophancy via LLM-based Coercion

Rodrigo Nogueira, Giovana Kerche Bonás, Thales Sales Almeida, Andrea Roque, Ramon Pires, Hugo Abonizio, Thiago Laitz, Celio Larcher, Roseval Malaquias Junior, Marcos Piau

Subjects: Computation and Language (cs.CL)

Large language models increasingly shape the information people consume: they are embedded in search, consulted for professional advice, deployed as agents, and used as a first stop for questions about policy, ethics, health, and politics. When such a model silently holds a position on a contested topic, that position propagates at scale into users' decisions. Eliciting a model's positions is harder than it first appears: contemporary assistants answer direct opinion questions with evasive disclaimers, and the same model may concede the opposite position once the user starts arguing one side. We propose a method, released as the open-source llm-bias-bench, for discovering the opinions an LLM actually holds on contested topics under conditions that resemble real multi-turn interaction. The method pairs two complementary free-form probes. Direct probing asks for the model's opinion across five turns of escalating pressure from a simulated user. Indirect probing never asks for an opinion and engages the model in argumentative debate, letting bias leak through how it concedes, resists, or counter-argues. Three user personas (neutral, agree, disagree) collapse into a nine-way behavioral classification that separates persona-independent positions from persona-dependent sycophancy, and an auditable LLM judge produces verdicts with textual evidence. The first instantiation ships 38 topics in Brazilian Portuguese across values, scientific consensus, philosophy, and economic policy. Applied to 13 assistants, the method surfaces findings of practical interest: argumentative debate triggers sycophancy 2-3x more than direct questioning (median 50% to 79%); models that look opinionated under direct questioning often collapse into mirroring under sustained arguments; and attacker capability matters mainly when an existing opinion must be dislodged, not when the assistant starts neutral.
[370] arXiv:2604.21566 [pdf, html, other]: Title: Leveraging SIMD for Accelerating Large-number Arithmetic

Subhrajit Das, Abhishek Bichhawat, Yuvraj Patel

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We present DigitsOnTurbo (DoT), which restructures the computation around independent, data-parallel operations, rather than vectorizing the standard algorithms, thereby leveraging the benefits provided by SIMD. Over prior SIMD implementations, DoT achieves up to 1.85x speedups for addition and subtraction, and 2.3x for multiplication. When integrated into state-of-the-art libraries, DoT yields up to 4x speedup for addition and subtraction, and up to 2x speedup for multiplication, cascading into end-to-end throughput gains of up to 19.3% for scientific computations, and up to 7.9% latency and 5.9% throughput improvements on cryptographic implementations.
[371] arXiv:2604.21567 [pdf, html, other]: Title: Hybrid Deep Learning Approach for Coupled Demand Forecasting and Supply Chain Optimization

Nusrat Yasmin Nadia, Md Habibul Arif, Habibor Rahman Rabby, Md Iftekhar Monzur Tanvir, Md. Jakir Hossen, M. F. Mridha

Comments: The paper is accepted in the Computers, Materials & Continua journal

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Supply chain resilience and efficiency are vital in industries characterized by volatile demand and uncertain supply, such as textiles and personal protective equipment (PPE). Traditional forecasting and optimization approaches often operate in isolation, limiting their real-world effectiveness. This paper proposes a Hybrid AI Framework for Demand-Supply Forecasting and Optimization (HAF-DS), which integrates a Long Short-Term Memory (LSTM)-based demand forecasting module with a mixed integer linear programming (MILP) optimization layer. The LSTM captures temporal and contextual demand dependencies, while the optimization layer prescribes cost-efficient replenishment and allocation decisions. The framework jointly minimizes forecasting error and operational cost through embedding-based feature representation and recurrent neural architectures. Experiments on textile sales and supply chain datasets show significant performance gains over statistical and deep learning baselines. On the combined dataset, HAF-DS reduced Mean Absolute Error (MAE) from 15.04 to 12.83 (14.7%), Root Mean Squared Error (RMSE) from 19.53 to 17.11 (12.4%), and Mean Absolute Percentage Error (MAPE) from 9.5% to 8.1%. Inventory cost decreased by 5.4%, stockouts by 27.5%, and service level rose from 95.5% to 97.8%. These results confirm that coupling predictive forecasting with prescriptive optimization enhances both accuracy and efficiency, providing a scalable and adaptable solution for modern textile and PPE supply chains.
[372] arXiv:2604.21568 [pdf, html, other]: Title: A Bayesian Reasoning Framework for Robotic Systems in Autonomous Casualty Triage

Szymon Rusiecki, Cecilia Morales, Pia Störy, Kimberly Elenberg, Leonard Weiss, Artur Dubrawski

Comments: Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA)

Subjects: Robotics (cs.RO)

Autonomous robots deployed in mass casualty incidents (MCI) face the challenge of making critical decisions based on incomplete and noisy perceptual data. We present an autonomous robotic system for casualty assessment that fuses outputs from multiple vision-based algorithms, estimating signs of severe hemorrhage, visible trauma, or physical alertness, into a coherent triage assessment. At the core of our system is a Bayesian network, constructed from expert-defined rules, which enables probabilistic reasoning about a casualty's condition even with missing or conflicting sensory inputs. The system, evaluated during the DARPA Triage Challenge (DTC) in realistic MCI scenarios involving 11 and 9 casualties, demonstrated a nearly three-fold improvement in physiological assessment accuracy (from 15\% to 42\% and 19\% to 46\%) compared to a vision-only baseline. More importantly, overall triage accuracy increased from 14\% to 53\%, while the diagnostic coverage of the system expanded from 31\% to 95\% of cases. These results demonstrate that integrating expert-guided probabilistic reasoning with advanced vision-based sensing can significantly enhance the reliability and decision-making capabilities of autonomous systems in critical real-world applications.
[373] arXiv:2604.21570 [pdf, html, other]: Title: SpecSyn: LLM-based Synthesis and Refinement of Formal Specifications for Real-world Program Verification

Lezhi Ma, Shangqing Liu, Yi Li, Qiong Wu, Han Wang, Lei Bu

Subjects: Software Engineering (cs.SE)

Program verification is a formal technique to rigorously ensure the correctness and fault-freeness of software systems. However, constructing comprehensive interprocedural specifications for full verification obligations is time-consuming and labor-intensive, giving rise to automated specification generation approaches. Despite the significant advancements in these approaches brought by Large Language Models (LLMs), existing LLM-empowered approaches still suffer from significant limitations: they lack effective strategies for handling sizable input programs, and are typically equipped with no mechanisms to evaluate and guarantee the strength of the generated specifications. The limitations impair their ability to extract precise specifications from real-world complicated programs to support the verification of target properties, thereby hindering the applicability of existing approaches in verification tasks on real-world programs. To remedy this gap, we propose SpecSyn, a novel LLM-based specification generation method. SpecSyn first decomposes the input program into individual segments, which are handled respectively by the subsequent iterative specification generation process. Innovatively, we incorporate into the process a specification refinement mechanism based on semantic-non-equivalent program mutations and variant discrimination, assessing and enhancing the semantic strength of the generated specifications. Extensive experiments show that SpecSyn maintains high precision over 90% and outstanding recall over 75%, significantly outperforming existing LLM-based approaches. In further evaluations, SpecSyn successfully handles 1071 out of 1365 target properties for open-source programs, proving its applicability on real-world program verification tasks.
[374] arXiv:2604.21571 [pdf, html, other]: Title: Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies

Chris Schneider, Philipp Schoenegger, Ben Bariach

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Current model training approaches incorporate user information directly into shared weights, making individual data removal computationally infeasible without retraining. This paper presents a three-layer architecture that decouples personal data from shared weights by combining a static base model, composable domain-expert LoRA adapters that shape behavior without imparting user data, and per-user proxy artefacts whose deletion constitutes deterministic unlearning. Evaluation on Phi-3.5-mini and Llama-3.1-8B confirms per-user differentiation in which personal data influences outputs while remaining isolated, verified by a return to baseline after proxy removal (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) and near-zero cross-user contamination. Because user-specific information never enters shared weights, the architecture mitigates model inversion, membership inference, and training-data extraction against shared model components by construction. The approach converts machine unlearning from an intractable weight-editing problem into a deterministic deletion operation that preserves personalization alongside privacy-enhancing guarantees and is compatible with differentially private stochastic gradient descent (DP-SGD) for privacy-preserving shared model improvement.
[375] arXiv:2604.21572 [pdf, html, other]: Title: Deep kernel video approximation for unsupervised action segmentation

Silvia L. Pintea, Jouke Dijkstra

Comments: Accepted at ICPR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work focuses on per-video unsupervised action segmentation, which is of interest to applications where storing large datasets is either not possible, or nor permitted. We propose to segment videos by learning in deep kernel space, to approximate the underlying frame distribution, as closely as possible. To define this closeness metric between the original video distribution and its approximation, we rely on maximum mean discrepancy (MMD) which is a geometry-preserving metric in distribution space, and thus gives more reliable estimates. Moreover, unlike the commonly used optimal transport metric, MMD is both easier to optimize, and faster. We choose to use neural tangent kernels (NTKs) to define the kernel space where MMD operates, because of their improved descriptive power as opposed to fixed kernels. And, also, because NTKs sidestep the trivial solution, when jointly learning the inputs (video approximation) and the kernel function. Finally, we show competitive results when compared to state-of-the-art per-video methods, on six standard benchmarks. Additionally, our method has higher F1 scores than prior agglomerative work, when the number of segments is unknown.
[376] arXiv:2604.21573 [pdf, html, other]: Title: CHRep: Cross-modal Histology Representation and Post-hoc Calibration for Spatial Gene Expression Prediction

Changfan Wang, Xinran Wang, Donghai Liu, Fei Su, Lulu Sun, Zhicheng Zhao, Zhu Meng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

Spatial transcriptomics (ST) enables spatially resolved gene profiling but remains expensive and low-throughput, limiting large-cohort studies and routine clinical use. Predicting spatial gene expression from routine hematoxylin and eosin (H&E) slides is a promising alternative, yet under realistic leave-one-slide-out evaluation, existing models often suffer from slide-level appearance shifts and regression-driven over-smoothing that suppress biologically meaningful variation. CHRep is a two-phase framework for robust histology-to-expression prediction. In the training phase, CHRep learns a structure-aware representation by jointly optimizing correlation-aware regression, symmetric image-expression alignment, and coordinate-induced spatial topology regularization. In the inference phase, cross-slide robustness is improved without backbone fine-tuning through a lightweight calibration module trained on the training slides, which combines a non-parametric estimate from a training gallery with a magnitude-regularized correction module. Unlike prior embedding-alignment or retrieval-based transfer methods that rely on a single prediction route, CHRep couples topology-preserving representation learning with post-hoc calibration, enabling stable neighborhood retrieval and controlled bias correction under slide-level shifts. Across the three cohorts, CHRep consistently improves gene-wise correlation under leave-one-slide-out evaluation, with the largest gains observed on Alex+10x. Relative to HAGE, the Pearson correlation coefficient on all considered genes [PCC(ACG)] increases by 4.0% on cSCC and 9.8% on HER2+. Relative to mclSTExp, PCC(ACG) further improves by 39.5% on Alex+10x, together with 9.7% and 9.0% reductions in mean squared error (MSE) and mean absolute error (MAE), respectively.
[377] arXiv:2604.21575 [pdf, other]: Title: OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction

Zeyu Cai, Yuliang Xiu, Renke Wang, Zhijing Shao, Xiaoben Li, Siyuan Yu, Chao Xu, Yang Liu, Baigui Sun, Jian Yang, Zhenyu Zhang

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Fitting an underlying body model to 3D clothed human assets has been extensively studied, yet most approaches focus on either single-modal inputs such as point clouds or multi-view images alone, often requiring a known metric scale. This constraint is frequently impractical, especially for AI-generated assets where scale distortion is common. We propose OmniFit, a method that can seamlessly handle diverse multi-modal inputs, including full scans, partial depth observations, and image captures, while remaining scale-agnostic for both real and synthetic assets. Our key innovation is a simple yet effective conditional transformer decoder that directly maps surface points to dense body landmarks, which are then used for SMPL-X parameter fitting. In addition, an optional plug-and-play image adapter incorporates visual cues to compensate for missing geometric information. We further introduce a dedicated scale predictor that rescales subjects to canonical body proportions. OmniFit substantially outperforms state-of-the-art methods by 57.1 to 80.9 percent across daily and loose clothing scenarios. To the best of our knowledge, it is the first body fitting method to surpass multi-view optimization baselines and the first to achieve millimeter-level accuracy on the CAPE and 4D-DRESS benchmarks.
[378] arXiv:2604.21579 [pdf, html, other]: Title: A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair

Milan De Koning, Ali Asgari, Pouria Derakhshanfar, Annibale Panichella

Comments: 12 pages

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

LLM-based automated program repair (APR) techniques have shown promising results in reducing debugging costs. However, prior results can be affected by data leakage: large language models (LLMs) may memorize bug fixes when evaluation benchmarks overlap with their pretraining data, leading to inflated performance estimates. In this paper, we investigate whether we can better reveal data leakage by combining metamorphic testing (MT) with negative log-likelihood (NLL), which has been used in prior work as a proxy for memorization. We construct variant benchmarks by applying semantics-preserving transformations to two widely used datasets, Defects4J and GitBug-Java. Using these benchmarks, we evaluate the repair success rates of seven LLMs on both original and transformed versions, and analyze the relationship between performance degradation and NLL. Our results show that all evaluated state-of-the-art LLMs exhibit substantial drops in patch generation success rates on transformed benchmarks, ranging from -4.1% for GPT-4o to -15.98% for Llama-3.1. Furthermore, we find that this degradation strongly correlates with NLL on the original benchmarks, suggesting that models perform better on instances they are more likely to have memorized. These findings show that combining MT with NLL provides stronger and more reliable evidence of data leakage, while metamorphic testing alone can help mitigate its effects in LLM-based APR evaluations.
[379] arXiv:2604.21580 [pdf, html, other]: Title: Robust Beamforming for MIMO Radar with Imperfect Prior Distribution Information

Yizhuo Wang, Shuowen Zhang

Comments: Accepted to appear in IEEE International Symposium on Information Theory (ISIT), 2026

Subjects: Information Theory (cs.IT)

This paper studies a multiple-input multiple-output (MIMO) radar system for sensing the unknown and random angular location (angle) of a point target, based on the target-reflected echo signals and known prior distribution information about the target's angle specified by a probability density function (PDF). We consider a challenging yet practical scenario where the knowledge of such PDF is imperfect, due to the inaccuracy in PDF acquisition or unpredicted change of target appearance pattern; while the real (actual) PDF is modeled as an unknown perturbed version of the imperfect known PDF bounded by a given uncertainty radius. Such PDF imperfection motivates us to study the robust transmit beamforming design to optimize the worst-case sensing performance among all possible real PDFs. Since the sensing mean-squared error (MSE) is difficult to be characterized explicitly, we adopt the worst-case posterior Cramér-Rao bound (PCRB) as the performance metric. We formulate the beamforming optimization problem to minimize the maximum PCRB among all possible real PDFs, which is highly non-trivial since the PCRB has a complex intractable expression over the real PDF, and there are infinite constraints corresponding to the continuous set of real PDFs bounded by the uncertainty radius. To address these challenges, we derive a tractable quadratic approximation of the PCRB via second-order Taylor expansion, and leverage the S-procedure to equivalently transform the infinite constraints into a linear matrix inequality, based on which the problem is reformulated into a convex optimization problem solvable with polynomial time complexity. The obtained solution approaches the globally optimal robust beamforming solution as the uncertainty radius decreases. Numerical results validate the effectiveness of our proposed robust beamforming design.
[380] arXiv:2604.21584 [pdf, html, other]: Title: CoFEE: Reasoning Control for LLM-Based Feature Discovery

Maximilian Westermann, Ben Griffin, Aaron Ontoyin Yin, Zakari Salifu, Yagiz Ihlamur, Kelvin Amoaba, Joseph Ternasky, Fuat Alican, Yigit Ihlamur

Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

Feature discovery from complex unstructured data is fundamentally a reasoning problem: it requires identifying abstractions that are predictive of a target outcome while avoiding leakage, proxies, and post-outcome signals. With the introduction of ever-improving Large Language Models (LLMs), our method provides a structured method for addressing this challenge. LLMs are well suited for this task by being able to process large amounts of information, but unconstrained feature generation can lead to weak features. In this work, we study reasoning control in LLMs by inducing cognitive behaviors for improving feature discovery. We introduce CoFEE (Cognitive Feature Engineering Engine), a reasoning control framework that enforces cognitive behaviors in how the LLM reasons during feature discovery. From a machine learning perspective, these cognitive behaviors act as structured inductive biases over the space of candidate features generated by the model. These behaviors have been exploited with success in ML models, and include backward chaining from outcomes, subgoal decomposition, verification against observability and leakage criteria, and explicit backtracking of rejected reasoning paths. In a controlled comparison, we show that enforcing cognitive behaviors yields features with higher empirical predictability than those under unconstrained vanilla LLM prompts. CoFEE achieves an average Success Rate Score that is 15.2% higher than the vanilla approach, while generating 29% fewer features and reducing costs by 53.3%. Using held-out feature evaluation, we assess whether cognitively induced features generalize beyond the data used for discovery. Our results indicate that, in our evaluated setting, reasoning control is associated with improvements in quality and efficiency of LLM-based feature discovery.
[381] arXiv:2604.21587 [pdf, html, other]: Title: Generative Learning Enhanced Intelligent Resource Management for Cell-Free Delay Deterministic Communications

Shuangbo Xiong, Cheng Zhang, Wen Wang, Wenwu Yu, Yongming Huang

Comments: The paper has been submitted to IEEE Transactions on Wireless Communications

Subjects: Information Theory (cs.IT)

Cell-free multiple-input multiple-output (CF-MIMO) architecture significantly enhances wireless network performance, offering a promising solution for delay-sensitive applications. This paper investigates the resource allocation problem in CF-MIMO systems, aiming to maximize energy efficiency (EE) while satisfying delay violation rate constraint. We design a Proximal Policy Optimization (PPO) with a primal-dual method to solve it. To address the low sample efficiency and safety risks caused by cold-start of the designed safe deep reinforcement learning (DRL) method, we propose a novel offline pretraining framework based on virtual constrained Markov decision process (CMDP) modeling. The virtual CMDP consists of reward and cost prediction module, initial-state distribution module and state transition module. Notably, we propose an evidence-aware conditional Gaussian Mixture Model (EA-CGMM) inference approach to mitigate data sparsity and distribution drift issues in state transition modeling. Simulation results demonstrate the effectiveness of CMDP modeling and validate the safety and efficiency of the proposed pretraining framework. Specifically, compared with non-pretrained baseline, the agent pretrained through our proposed framework achieves twice the initial EE and maintains a low delay constraint violation rate of $1\%$, while ultimately converging to an EE that is $4.7\%$ higher with a $50\%$ reduction in exploration steps. Additionally, our proposed pretraining framework implementation exhibits comparable performance to the SOTA diffusion model-based implementation, while achieving a $14$-fold reduction in computational complexity.
[382] arXiv:2604.21590 [pdf, html, other]: Title: AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

Yuanjie Lyu, Chengyu Wang, Haonan Zheng, Yuanhao Yue, Junbing Yan, Ming Wang, Jun Huang

Subjects: Computation and Language (cs.CL)

Modern industrial applications increasingly demand language models that act as agents, capable of multi-step reasoning and tool use in real-world settings. These tasks are typically performed under strict cost and latency constraints, making small agentic models highly desirable. In this paper, we introduce the AgenticQwen family of models, trained via multi-round reinforcement learning (RL) on synthetic data and a limited amount of open-source data. Our training framework combines reasoning RL and agentic RL with dual data flywheels that automatically generate increasingly challenging tasks. The reasoning flywheel increases task difficulty by learning from errors, while the agentic flywheel expands linear workflows into multi-branch behavior trees that better reflect the decision complexity of real-world applications. We validate AgenticQwen on public benchmarks and in an industrial agent system. The models achieve strong performance on multiple agentic benchmarks, and in our industrial agent system, close the gap with much larger models on search and data analysis tasks. Model checkpoints and part of the synthetic data: this https URL. Data synthesis and RL training code: this https URL. The data synthesis pipeline is also integrated into EasyDistill: this https URL.
[383] arXiv:2604.21592 [pdf, html, other]: Title: Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Minghao Yin, Wenbo Hu, Jiale Xu, Ying Shan, Kai Han

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent breakthroughs in 3D generative modeling have yielded remarkable progress in static shape synthesis, yet high-fidelity dynamic 4D generation remains elusive, hindered by temporal artifacts and prohibitive computational demand. We present Sculpt4D, a native 4D generative framework that seamlessly integrates efficient temporal modeling into a pretrained 3D Diffusion Transformer (Hunyuan3D 2.1), thereby mitigating the scarcity of 4D training data. At its core lies a Block Sparse Attention mechanism that preserves object identity by anchoring to the initial frame while capturing rich motion dynamics via a time-decaying sparse mask. This design faithfully models complex spatiotemporal dependencies with high fidelity, while sidestepping the quadratic overhead of full attention and reducing network total computation by 56%. Consequently, Sculpt4D establishes a new state-of-the-art in temporally coherent 4D synthesis and charts a path toward efficient and scalable 4D generation.
[384] arXiv:2604.21593 [pdf, html, other]: Title: Language as a Latent Variable for Reasoning Optimization

Linjuan Wu, Haoran Wei, Jialong Tang, Shuang Luo, Baosong Yang, Yongliang Shen, Weiming Lu

Comments: 17 pages, 7 figures, Under Reviewing

Subjects: Computation and Language (cs.CL)

As LLMs reduce English-centric bias, a surprising trend emerges: non-English responses sometimes outperform English on reasoning tasks. We hypothesize that language functions as a latent variable that structurally modulates the model's internal inference pathways, rather than merely serving as an output medium. To test this, we conducted a Polyglot Thinking Experiment, in which models were prompted to solve identical problems under language-constrained and language-unconstrained conditions. Results show that non-English responses often achieve higher accuracy, and the best performance frequently occur when language is unconstrained, suggesting that multilinguality broadens the model's latent reasoning space. Based on this insight, we propose polyGRPO (Polyglot Group Relative Policy Optimization), an RL framework that treats language variation as an implicit exploration signal. It generates polyglot preference data online under language-constrained and unconstrained conditions, optimizing the policy with respect to both answer accuracy and reasoning structure. Trained on only 18.1K multilingual math problems without chain-of-thought annotations, polyGRPO improves the base model (Qwen2.5-7B-Instruct) by 6.72% absolute accuracy on four English reasoning testset and 6.89% in their multilingual benchmark. Remarkably, it is the only method that surpasses the base LLM on English commonsense reasoning task (4.9%), despite being trained solely on math data-highlighting its strong cross-task generalization. Further analysis reveals that treating language as a latent variable expands the model's latent reasoning space, yielding consistent and generalizable improvements in reasoning performance.
[385] arXiv:2604.21598 [pdf, html, other]: Title: DryRUN: On the Role of Public Tests in LLM-Driven Code Generation

Kaushitha Silva, Srinath Perera

Comments: 9 pages, 6 figures

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Multi-agent frameworks are widely used in autonomous code generation and have applications in complex algorithmic problem-solving. Recent work has addressed the challenge of generating functionally correct code by incorporating simulation-driven planning and debugging, where language models trace execution steps to verify logic. However, these approaches depend on human-provided public test cases to ground the debugging and simulation loop. Manually authoring comprehensive input-output examples is a labor-intensive bottleneck in the software development lifecycle. Because ground-truth input-output examples are rarely available prior to implementation in real-world software engineering, this dependency restricts methods to curated competitive programming benchmarks. Furthermore, we identify that reliance on these public tests induces an ``overconfidence gap,'' causing frameworks to overfit to simplistic examples and fail on hidden evaluations. In contrast, we observe that external sample inputs are not strictly necessary for code generation. We demonstrate that large language models can autonomously generate valid inputs and simulate execution traces to self-correct. Consequently, we develop DryRUN, a framework that eliminates the need for ground-truth samples by allowing the LLM to iteratively plan, autonomously generate its own inputs and simulate execution, mitigating algorithmic overconfidence. Evaluations on the LiveCodeBench v6 dataset (post-March 2025) demonstrate that DryRUN matches performance against CodeSIM, a state-of-the-art and public-test-dependent framework, while operating entirely without public test cases or external execution feedback while reducing output token consumption.
[386] arXiv:2604.21599 [pdf, other]: Title: Verifying Machine Learning Interpretability Requirements through Provenance

Lynn Vonderhaar, Juan Couder, Daryela Cisneros, Omar Ochoa

Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)

Machine Learning (ML) Engineering is a growing field that necessitates an increase in the rigor of ML development. It draws many ideas from software engineering and more specifically, from requirements engineering. Existing literature on ML Engineering defines quality models and Non-Functional Requirements (NFRs) specific to ML, in particular interpretability being one such NFR. However, a major challenge occurs in verifying ML NFRs, including interpretability. Although existing literature defines interpretability in terms of ML, it remains an immeasurable requirement, making it impossible to definitively confirm whether a model meets its interpretability requirement. This paper shows how ML provenance can be used to verify ML interpretability requirements. This work provides an approach for how ML engineers can save various types of model and data provenance to make the model's behavior transparent and interpretable. Saving this data forms the basis of quantifiable Functional Requirements (FRs) whose verification in turn verifies the interpretability NFR. Ultimately, this paper contributes a method to verify interpretability NFRs for ML models.
[387] arXiv:2604.21600 [pdf, html, other]: Title: Positivity-Preserving and Entropy-Stable Oscillation-Eliminating DGSEM for the Compressible Euler Equations on Curvilinear Meshes with Adaptive Mesh Refinement

Jieling Yang, Guosheng Fu

Subjects: Numerical Analysis (math.NA)

We extend the entropy-stable oscillation-eliminating discontinuous Galerkin spectral element method (ES-OEDG) on curvilinear meshes to adaptive mesh refinement (AMR) grids with nonconforming interfaces. The formulation targets two-dimensional curvilinear quadrilateral meshes under a 2:1 refinement constraint, allowing a single level of hanging nodes. Elementwise volume discretization and geometric mapping are retained, while oscillation elimination and interface coupling are adapted for nonconforming interfaces.
A central contribution is the design and analysis of numerical fluxes for such interfaces. We construct an entropy-stable flux that ensures global conservation and a semi-discrete entropy inequality. However, for polynomial degree N >= 2, negative entries in nonconforming interpolation operators lead to loss of formal high-order consistency. To address this, we propose a mortar-based flux that preserves high-order accuracy by interpolating at the solution level and evaluating standard two-point fluxes on fine-side mortars, at the cost of losing provable entropy stability.
We also extend the Zhang--Shu positivity-preserving framework to curvilinear AMR meshes. Under forward Euler time stepping and a suitable CFL condition, the scheme using either flux preserves positivity of cell-average density and pressure. Combined with the Zhang--Shu limiter, this yields a fully discrete scheme maintaining admissibility at all nodal points. We further incorporate shock-indicator-based AMR and a conservative, positivity-preserving data transfer procedure between successive meshes, resulting in a robust and efficient algorithm. Numerical experiments on Cartesian and curvilinear AMR grids confirm high-order accuracy and robustness.
[388] arXiv:2604.21602 [pdf, html, other]: Title: On the Role of Preprocessing and Memristor Dynamics in Reservoir Computing for Image Classification

Rishona Daniels, Duna Wattad, Ronny Ronen, David Saad, Shahar Kvatinsky

Comments: Accepted for publication in Advanced Electronic Materials. Main text: pages 1-32, 11 figures. Supporting information: pages 24-32, 11 figures

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET); Machine Learning (cs.LG)

Reservoir computing (RC) is an emerging recurrent neural network architecture that has attracted growing attention for its low training cost and modest hardware requirements. Memristor-based circuits are particularly promising for RC, as their intrinsic dynamics can reduce network size and parameter overhead in tasks such as time-series prediction and image recognition. Although RC has been demonstrated with several memristive devices, a comprehensive evaluation of device-level requirements remains limited. In this paper, we analyze and explain the operation of a parallel delayed feedback network (PDFN) RC architecture with volatile memristors, focusing on how device characteristics -- such as decay rate, quantization, and variability -- affect reservoir performance. We further discuss strategies to improve data representation in the reservoir using preprocessing methods and suggest potential improvements. The proposed approach achieves 95.89% classification accuracy on MNIST, comparable with the best reported memristor-based RC implementations. Furthermore, the method maintains high robustness under 20% device variability, achieving an accuracy of up to 94.2%. These results demonstrate that volatile memristors can support reliable spatio-temporal information processing and reinforce their potential as key building blocks for compact, high-speed, and energy-efficient neuromorphic computing systems.
[389] arXiv:2604.21603 [pdf, html, other]: Title: Using ASP(Q) to Handle Inconsistent Prioritized Data

Meghyn Bienvenu, Camille Bourgaux, Robin Jean, Giuseppe Mazzotta

Comments: This is an extended version of a paper appearing at the 23rd International Conference on Principles of Knowledge Representation and Reasoning (KR 2026). 21 pages

Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Databases (cs.DB)

We explore the use of answer set programming (ASP) and its extension with quantifiers, ASP(Q), for inconsistency-tolerant querying of prioritized data, where a priority relation between conflicting facts is exploited to define three notions of optimal repairs (Pareto-, globally- and completion-optimal). We consider the variants of three well-known semantics (AR, brave and IAR) that use these optimal repairs, and for which query answering is in the first or second level of the polynomial hierarchy for a large class of logical theories. Notably, this paper presents the first implementation of globally-optimal repair-based semantics, as well as the first implementation of the grounded semantics, which is a tractable under-approximation of all these optimal repair-based semantics. Our experimental evaluation sheds light on the feasibility of computing answers under globally-optimal repair semantics and the impact of adopting different semantics, approximations, and encodings.
[390] arXiv:2604.21604 [pdf, other]: Title: Mitigate or Fail: How Risk Management Shapes Cybersecurity Competency

Jeffrey T. Gardiner

Comments: Doctor of Business Administration (DBA) Dissertation

Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); General Economics (econ.GN)

Contemporary cybersecurity governance assumes that professionals apply risk reasoning. Yet major organisational failures persist despite investment in tools, staffing, and credentials. This study investigates the structural source of that paradox. Cybersecurity speaks the language of risk, but its training architecture has shaped the profession to think in terms of threats. A sequential mixed-methods design integrated four analyses; NLP of the NIST NICE Framework v2.0.0 (2,111 TKS statements), SEM (n = 126 cybersecurity professionals), a control-group comparison (n = 133 general professionals), and thematic coding of seven leadership interviews. Four convergent findings emerged. First, "likelihood" and "probability" appear zero times across all TKS statements. Risk management content accounts for 4.5% of high-confidence semantic classifications, ranking 18th of 29 competency domains. NICE codifies threat-management activity while invoking risk mainly at the category level. Second, SEM showed that training exposure significantly predicts risk management competence directly and indirectly through conceptual salience, for a total effect of Beta = .629. However, the theoretically four-dimensional competence construct collapsed into a single factor, indicating epistemic compression. Third, cybersecurity professionals showed no measurable advantage over the general professional population in foundational risk reasoning; only 11.9% showed high differentiation. Fourth, all seven leaders expected Likelihood x Impact reasoning, yet five did not articulate the formula themselves. These findings support a structural conclusion: cybersecurity has taken professional form as a threat-management discipline that has borrowed risk vocabulary. Remediation requires redesign of professional formation, not marginal curriculum reform.
[391] arXiv:2604.21606 [pdf, other]: Title: Process-Mining of Hypertraces: Enabling Scalable Formal Security Verification of (Automotive) Network Architectures

Julius Figge, David Knuplesch, Andreas Maletti, Dragan Zuvic

Comments: Full version prior to submission for publication

Subjects: Cryptography and Security (cs.CR)

The automotive domain is transitioning: vehicles act as rolling servers, persistently connected to numerous external entities. This connectivity, combined with rising on-board computing power for advanced driver assistance systems and similar use cases, creates escalating challenges for securing automotive network architectures. This work advances the security analysis of internet-connected automotive network architectures and their protocols. We introduce a strong, active adversary model tailored to the automotive domain. We substantially extend security protocol verification possible based on Attack Resilience Hyperproperties (ARHs) by introducing a verification-orchestration algorithm. Furthermore, we provide methods for comparative attribution of security property invalidations to specific, ne-grained component compromises. We present a novel integration of formal verification and process mining. By utilizing ARH counterexample traces for process mining, we systematically identify and aggregate attacker behavior that causes security property invalidations. This pipeline enables in-depth understanding of root causes and attack paths leading to protocol-security invalidations. We demonstrate real-world applicability through a prototype and case study on the secure transmission of battery management system data within an automotive network architecture.
[392] arXiv:2604.21608 [pdf, html, other]: Title: ADMM-Based Distributed Kalman-like Observer with Applications to Cooperative Localization

Nicola De Carli, Nicola Bastianello, Dimos V. Dimarogonas

Subjects: Systems and Control (eess.SY)

This paper addresses distributed state estimation for multi-agent systems with local and relative measurements, motivated by cooperative localization problems in which the global state dimension scales with the size of the network. We consider a Kalman-like observer in information form and introduce a sparsity-preserving prediction step based on an exponential forgetting factor, thereby avoiding the dense Riccati recursion of the standard information filter. The correction step is recast as a strongly convex quadratic program with structure induced by the sensing graph, which enables a distributed solution based on the alternating direction method of multipliers (ADMM). In the resulting scheme, each agent updates local copies of its own correction variable and those of its neighbors using only local communication, thus avoiding centralized matrix inversion and consensus over full global-state quantities. A two-time-scale stability analysis is developed for the interconnected observer: the reduced estimation-error dynamics are shown to be uniformly exponentially stable, the ADMM dynamics define an exponentially stable fast subsystem, and these properties are combined to establish uniform exponential stability of the overall distributed observer. Numerical simulations in a multi-agent cooperative localization scenario illustrate the performance of the proposed distributed observer.
[393] arXiv:2604.21611 [pdf, html, other]: Title: Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

Hao-Yuan Chen

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Inference-time scaling for LLM reasoning has focused on three axes: chain depth, sample breadth, and learned step-scorers (PRMs). We introduce a fourth axis, granularity of external verbal supervision, via Verbal Process Supervision (VPS), a training-free framework that uses structured natural-language critique from a stronger supervisor to guide an iterative generate-critique-refine loop up to a round budget R. Across GPQA Diamond, AIME 2025, and LiveCodeBench V6 (covering both closed and open models), VPS yields three key results. First, on GPQA Diamond, GPT-5.4 (High) | GPT-5.4 (Low) reaches 94.9% at R=4, surpassing the 94.1% state of the art without gradient updates. Second, on AIME 2025, VPS enables strong weak-actor rescue, boosting scores from 11.7-26.7% to 63.3-90.0% (up to +63.3 points). Third, at matched compute, VPS outperforms Reflexion by +8.5 to +12.1 points and Self-Consistency@5 by +5.0 pp (GPQA) and +8.3 pp (LiveCodeBench), isolating critique granularity as the key driver. Performance scales with the supervisor-actor capability gap (Pearson r=0.90) and degrades when errors are not linguistically expressible (e.g., code synthesis), motivating hybrid verbal-executable methods. These results establish critique granularity as a new axis of inference-time scaling.
[394] arXiv:2604.21614 [pdf, html, other]: Title: Spatiotemporal 2-D Polar Codes over Non-Uniform MIMO Channels: A Reliability-Aware Construction Approach

Yaqi Li, Shuohan Zhang, Xiaohu You, Jiamin Li

Comments: 6 pages, 5 figures

Subjects: Information Theory (cs.IT)

With the increasing demand for ultra-reliable and low-latency communication (URLLC), spatiotemporal two-dimensional (2-D) channel coding has received growing interest. By leveraging the spatial degrees of freedom in massive multiple-input multiple-output (MIMO) systems, it shortens the time-domain blocklength, thereby reducing latency and enhancing reliability. However, existing spatiotemporal coding schemes typically assume uniform reliability across spatial streams. This assumption does not hold in practical MIMO channels, where the underlying propagation environment generally leads to unequal spatial-eigenmode gains and reliabilities, making the conventional Gaussian-approximation-based construction for 2-D polar codes less effective. This paper investigates spatiotemporal 2-D polar coding over non-uniform MIMO channels, where the spatial domain exhibits inherently heterogeneous signal-to-noise ratios (SNRs). We propose a reciprocal channel approximation (RCA)-based reliability-aware 2-D polar coding framework that accurately characterizes such heterogeneous SNRs without relying on log-likelihood-ratio distribution assumptions. Simulation results demonstrate that the proposed RCA-based spatiotemporal 2-D polar coding scheme achieves clear performance gains and strong robustness, confirming its effectiveness in jointly exploiting temporal and spatial polarization for URLLC in practical MIMO systems.
[395] arXiv:2604.21617 [pdf, html, other]: Title: Local Neighborhood Instability in Parametric Projections: Quantitative and Visual Analysis

Frederik L. Dennig, Daniel A. Keim

Comments: 6 pages, 3 figures, LaTeX, to appear at the 17th International EuroVis Workshop on Visual Analytics

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Parametric projections let analysts embed new points in real time, but input variations from measurement noise or data drift can produce unpredictable shifts in the 2D layout. Whether and where a projection is locally stable remains largely unexamined. In this paper, we present a stability evaluation framework that probes parametric projections with Gaussian perturbations around selected anchor points and assesses how neighborhoods deform in the 2D embedding. Our approach combines quantitative measures of mean displacement, bias, and nearest-anchor assignment error with per-anchor visualizations of displacement vectors, local PCA ellipsoids, and Voronoi misassignment for detailed inspection. We demonstrate the framework's effectiveness on UMAP- and t-SNE-based neural projectors of varying network sizes and study the effect of Jacobian regularization as a gradient-based robustness strategy. We apply our framework to the MNIST and Fashion-MNIST datasets. The results show that our framework identifies unstable projection regions invisible to reconstruction error or neighborhood-preservation metrics.
[396] arXiv:2604.21623 [pdf, html, other]: Title: A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation

Ioannis Panopoulos, Maria Lamprini A. Bartsioka, Sokratis Nikolaidis, Stylianos I. Venieris, Dimitra I. Kaklamani, Iakovos S. Venieris

Journal-ref: ACM Transactions on AI Security and Privacy (April 2026), 38 pages

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

The proliferation of Internet of Things (IoT) devices has significantly expanded attack surfaces, making IoT ecosystems particularly susceptible to sophisticated cyber threats. To address this challenge, this work introduces A-THENA, a lightweight early intrusion detection system (EIDS) that significantly extends preliminary findings on time-aware encodings. A-THENA employs an advanced Transformer-based architecture augmented with a generalized Time-Aware Hybrid Encoding (THE), integrating packet timestamps to effectively capture temporal dynamics essential for accurate and early threat detection. The proposed system further employs a Network-Specific Augmentation (NA) pipeline, which enhances model robustness and generalization. We evaluate A-THENA on three benchmark IoT intrusion detection datasets-CICIoT23-WEB, MQTT-IoT-IDS2020, and IoTID20-where it consistently achieves strong performance. Averaged across all three datasets, it improves accuracy by 6.88 percentage points over the best-performing traditional positional encoding, 3.69 points over the strongest feature-based model, 6.17 points over the leading time-aware alternatives, and 5.11 points over related models, while achieving near-zero false alarms and false negatives. To assess real-world feasibility, we deploy A-THENA on the Raspberry Pi Zero 2 W, demonstrating its ability to perform real-time intrusion detection with minimal latency and memory usage. These results establish A-THENA as an agile, practical, and highly effective solution for securing IoT networks.
[397] arXiv:2604.21626 [pdf, html, other]: Title: On the Challenges of Holistic Intrusion Detection in ICS

Stefan Lenz, Julia Raab, Benedikt Holzbach, Deniz Köller, Sotiris Michaelides, Martin Henze

Comments: 2 pages, presented at the 16th SPRING Workshop April 2026 in Heidelberg, Germany

Subjects: Cryptography and Security (cs.CR)

Past attacks against industrial control systems (ICS) show that adversaries often target both the ICS network and the physical process to achieve potential catastrophic impact. To secure ICS, intrusion detection systems promise timely uncovering of such adversaries. However, as these detection mechanisms typically focus on isolated characteristics of ICS (e.g., packet timings), multiple detection systems have to be deployed in parallel, complicating their operation in practice. In this work, to spur discussion and further research, we present challenges encountered during our research towards a holistic intrusion detection system aiming to cover all dimensions of an ICS.
[398] arXiv:2604.21627 [pdf, other]: Title: DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion

Tahar Chettaoui, Eduarda Caldeira, Guray Ozgur, Raghavendra Ramachandra, Fadi Boutros, Naser Damer

Comments: Accepted At CVPR-W 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Advancing face morphing attack techniques is crucial to anticipate evolving threats and develop robust defensive mechanisms for identity verification systems. This work introduces DCMorph, a dual-stream diffusion-based morphing framework that simultaneously operates at both identity conditioning and latent space levels. Unlike image-level methods suffering from blending artifacts or GAN-based approaches with limited reconstruction fidelity, DCMorph leverages identity-conditioned latent diffusion models through two mechanisms: (1) decoupled cross-attention interpolation that injects identity-specific features from both source faces into the denoising process, enabling explicit dual-identity conditioning absent in existing diffusion-based methods, and (2) DDIM inversion with spherical interpolation between inverted latent representations from both source faces, providing geometrically consistent initial latent representation that preserves structural attributes. Vulnerability analyses across four state-of-the-art face recognition systems demonstrate that DCMorph achieves the highest attack success rates compared to existing methods at both operational thresholds, while remaining challenging to detect by current morphing attack detection solutions.
[399] arXiv:2604.21628 [pdf, html, other]: Title: Time vs. Layer: Locating Predictive Cues for Dysarthric Speech Descriptors in wav2vec 2.0

Natalie Engert, Dominik Wagner, Korbinian Riedhammer, Tobias Bocklet

Comments: Accepted to IEEE ICASSP 2026

Subjects: Sound (cs.SD)

Wav2vec 2.0 (W2V2) has shown strong performance in pathological speech analysis by effectively capturing the characteristics of atypical speech. Despite its success, it remains unclear which components of its learned representations are most informative for specific downstream tasks. In this study, we address this question by investigating the regression of dysarthric speech descriptors using annotations from the Speech Accessibility Project dataset. We focus on five descriptors, each addressing a different aspect of speech or voice production: intelligibility, imprecise consonants, inappropriate silences, harsh voice and monoloudness. Speech representations are derived from a W2V2-based feature extractor, and we systematically compare layer-wise and time-wise aggregation strategies using attentive statistics pooling. Our results show that intelligibility is best captured through layer-wise representations, whereas imprecise consonants, harsh voice and monoloudness benefit from time-wise modeling. For inappropriate silences, no clear advantage could be observed for either approach.
[400] arXiv:2604.21629 [pdf, html, other]: Title: Promoting Simple Agents: Ensemble Methods for Event-Log Prediction

Benedikt Bollig, Matthias Függer, Thomas Nowak, Paul Zeinaty

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Formal Languages and Automata Theory (cs.FL)

We compare lightweight automata-based models (n-grams) with neural architectures (LSTM, Transformer) for next-activity prediction in streaming event logs. Experiments on synthetic patterns and five real-world process mining datasets show that n-grams with appropriate context windows achieve comparable accuracy to neural models while requiring substantially fewer resources. Unlike windowed neural architectures, which show unstable performance patterns, n-grams provide stable and consistent accuracy. While we demonstrate that classical ensemble methods like voting improve n-gram performance, they require running many agents in parallel during inference, increasing memory consumption and latency. We propose an ensemble method, the promotion algorithm, that dynamically selects between two active models during inference, reducing overhead compared to classical voting schemes. On real-world datasets, these ensembles match or exceed the accuracy of non-windowed neural models with lower computational cost.
[401] arXiv:2604.21631 [pdf, html, other]: Title: DualSplat: Robust 3D Gaussian Splatting via Pseudo-Mask Bootstrapping from Reconstruction Failures

Xu Wang, Zhiru Wang, Shiyun Xie, Chengwei Pan, Yisong Chen

Comments: 10 pages,6 figures, accepted to Computer Vision and Pattern Recognition Conference 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

While 3D Gaussian Splatting (3DGS) achieves real-time photorealistic rendering, its performance degrades significantly when training images contain transient objects that violate multi-view consistency. Existing methods face a circular dependency: accurate transient detection requires a well-reconstructed static scene, while clean reconstruction itself depends on reliable transient masks. We address this challenge with DualSplat, a Failure-to-Prior framework that converts first-pass reconstruction failures into explicit priors for a second reconstruction stage. We observe that transients, which appear in only a subset of views, often manifest as incomplete fragments during conservative initial training. We exploit these failures to construct object-level pseudo-masks by combining photometric residuals, feature mismatches, and SAM2 instance boundaries. These pseudo-masks then guide a clean second-pass 3DGS optimization, while a lightweight MLP refines them online by gradually shifting from prior supervision to self-consistency. Experiments on RobustNeRF and NeRF On-the-go show that DualSplat outperforms existing baselines, demonstrating particularly clear advantages in transient-heavy scenes and transient regions.
[402] arXiv:2604.21632 [pdf, html, other]: Title: To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning

Nevena Lazić, Liam Fowl, András György, Csaba Szepesvári

Subjects: Artificial Intelligence (cs.AI)

We investigate the ability of decoder-only transformer models to perform abstract symbolic reasoning; specifically solving propositional logic reasoning problems given in-context. Previous work demonstrated that models fail to generalize to problems involving variable names that were not observed during training, and it was shown that one reason behind this is the difficulty of copying (or generating) unseen tokens. We show both theoretically and empirically that a particular representational collapse also has a crucial role: the unembeddings (last-layer weights) of unseen tokens collapse to nearly the same vector during training. The collapse makes distinguishing multiple unseen variables difficult for the model (especially when the embedding and unembedding parameters are shared), and provides a mechanistic explanation for the effectiveness of existing heuristic interventions like "active forgetting", which periodically reset the token (un)embeddings. Based on these observations, we devise a combination of techniques, involving a small architecture change facilitating copying, data diversity, and freezing or resetting (un)embeddings, that achieves generalization to unseen tokens. We support our claims with extensive controlled experiments on propositional logic reasoning problems. Beyond synthetic experiments, we also observe evidence of (un)embedding collapse in the open-weight models in the Gemma 3 family, which includes 99 unused tokens reserved for downstream use. Empirically we find that the correlated embeddings of these tokens are a poor initialization for finetuning applications.
[403] arXiv:2604.21637 [pdf, other]: Title: Multilinguality at the Edge: Developing Language Models for the Global South

Lester James V. Miranda, Songbo Hu, Roi Reichart, Anna Korhonen

Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

Where and how language models (LMs) are deployed determines who can benefit from them. However, there are several challenges that prevent effective deployment of LMs in non-English-speaking and hardware constrained communities in the Global South. We call this challenge the last mile: the intersection of multilinguality and edge deployment, where the goals are aligned but the technical requirements often compete. Studying these two fields together is both a need, as linguistically diverse communities often face the most severe infrastructure constraints, and an opportunity, as edge and multilingual NLP research remain largely siloed. To understand the state of the art and the challenges of combining the two areas, we survey 232 papers that tackle this problem across the language modelling pipeline, from data collection to development and deployment. We also discuss open questions and provide actionable recommendations for different stakeholders in the NLP ecosystem. Finally, we hope that this work contributes to the development of inclusive and equitable language technologies.
[404] arXiv:2604.21638 [pdf, html, other]: Title: Geometric Characterisation and Structured Trajectory Surrogates for Clinical Dataset Condensation

Pafue Christy Nganjimi, Andrew Soltan, Danielle Belgrave, Lei Clifton, David Clifton, Anshul Thakur

Comments: 34 pages, 7 figures

Subjects: Machine Learning (cs.LG)

Dataset condensation constructs compact synthetic datasets that retain the training utility of large real-world datasets, enabling efficient model development and potentially supporting downstream research in governed domains such as healthcare. Trajectory matching (TM) is a widely used condensation approach that supervises synthetic data using changes in model parameters observed during training on real data, yet the structure of this supervision signal remains poorly understood. In this paper, we provide a geometric characterisation of trajectory matching, showing that a fixed synthetic dataset can only reproduce a limited span of such training-induced parameter changes. When the resulting supervision signal is spectrally broad, this creates a conditional representability bottleneck. Motivated by this mismatch, we propose Bezier Trajectory Matching (BTM), which replaces SGD trajectories with quadratic Bezier trajectory surrogates between initial and final model states. These surrogates are optimised to reduce average loss along the path while replacing broad SGD-derived supervision with a more structured, lower-rank signal that is better aligned with the optimisation constraints of a fixed synthetic dataset, and they substantially reduce trajectory storage. Experiments on five clinical datasets demonstrate that BTM consistently matches or improves upon standard trajectory matching, with the largest gains in low-prevalence and low-synthetic-budget settings. These results indicate that effective trajectory matching depends on structuring the supervision signal rather than reproducing stochastic optimisation paths.
[405] arXiv:2604.21640 [pdf, html, other]: Title: Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation

Yi-Ling Liu, Melvin Laux, Mariela De Lucas Alvarez, Frank Kirchner, Rebecca Adam

Comments: To be published in IEEE OCEANS 2026 (Sanya) conference proceedings

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Autonomous underwater vehicles are required to perform multiple tasks adaptively and in an explainable manner under dynamic, uncertain conditions and limited sensing, challenges that classical controllers struggle to address. This demands robust, generalizable, and inherently interpretable control policies for reliable long-term monitoring. Reinforcement learning, particularly multi-task RL, overcomes these limitations by leveraging shared representations to enable efficient adaptation across tasks and environments. However, while such policies show promising results in simulation and controlled experiments, they yet remain opaque and offer limited insight into the agent's internal decision-making, creating gaps in transparency, trust, and safety that hinder real-world deployment. The internal policy structure and task-specific specialization remain poorly understood. To address these gaps, we analyze the internal structure of a pretrained multi-task reinforcement learning network in the HoloOcean simulator for underwater navigation by identifying and comparing task-specific subnetworks responsible for navigating toward different species. We find that in a contextual multi-task reinforcement learning setting with related tasks, the network uses only about 1.5% of its weights to differentiate between tasks. Of these, approximately 85% connect the context-variable nodes in the input layer to the next hidden layer, highlighting the importance of context variables in such settings. Our approach provides insights into shared and specialized network components, useful for efficient model editing, transfer learning, and continual learning for underwater monitoring through a contextual multi-task reinforcement learning method.
[406] arXiv:2604.21644 [pdf, html, other]: Title: An Adaptive Kalman Filter that Learns the Coloring Dynamics of the Process Noise

Mohammad Almuhaihi, Dennis Bernstein

Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)

In many applications of state estimation, the process noise is colored; this case is addressed by applying the standard Kalman filter (KF) to dynamics that are augmented with the coloring dynamics. The present paper considers the case where the coloring dynamics are unknown, which renders the estimates obtained from the standard approach suboptimal. To address this problem, the present paper proposes an adaptive technique based on the principle that, if the measurement noise is white, then the innovations sequence is white if and only if the process noise is white. Leveraging this fact, an Innovations-Whitening Adaptive Kalman Filter (IWAKF) is developed, which learns the process-noise coloring online. By embedding an unknown coloring filter in a state-augmentation framework, IWAKF adapts its parameters by minimizing the empirical autocorrelation of the innovations, thereby driving them toward whiteness and restoring near-optimality without prior knowledge of the coloring dynamics.
[407] arXiv:2604.21645 [pdf, html, other]: Title: Large-Scale Data Parallelization of Product Quantization and Inverted Indexing Using Dask

Ashley N. Abraham, Andrew Strelzoff, Haley R. Dozier, Althea C. Henslee, Mark A. Chappell

Comments: To be published in the CSCE 2022 proceedings

Subjects: Machine Learning (cs.LG); Performance (cs.PF)

Large-scale Nearest Neighbor (NN) search, though widely utilized in the similarity search field, remains challenged by the computational limitations inherent in processing large scale data. In an effort to decrease the computational expense needed, Approximate Nearest Neighbor (ANN) search is often used in applications that do not require the exact similarity search, but instead can rely on an approximation. Product Quantization (PQ) is a memory-efficient ANN effective for clustering all sizes of datasets. Clustering large-scale, high dimensional data requires a heavy computational expense, in both memory-cost and execution time. This work focuses on a unique way to divide and conquer the large scale data in Python using PQ, Inverted Indexing and Dask, combining the results without compromising the accuracy and reducing computational requirements to the level required when using medium-scale data.
[408] arXiv:2604.21648 [pdf, html, other]: Title: Optimal transfer operators for nonsymmetric two-grid methods

Reinhard Nabben, Ludwig Rooch

Comments: 27 pages

Subjects: Numerical Analysis (math.NA)

Algebraic Multigrid (AMG) methods have been proven to be effective solvers for large-scale linear algebraic systems $Ax = b$ with Hermitian positive definite (HPD) matrix $A$. For such problems the convergence in the $A$-norm is well understood, but for nonsymmetric indefinite systems fewer results exist. Recently, convergence results for more general $B$-norms induced by certain HPD matrices were established. There, orthogonal projections built by compatible transfer operators are used. Here, we present a theoretical framework for the convergence of nonsymmetric algebraic two-grid methods for arbitrary $B$-inner products and induced $B$-norms which naturally includes the HPD case and all recent results for the nonsymmetric case. For this purpose, we consider two different two-grid error operators with the first one being the natural generalization of the error operator in the HPD case. The second operator has been studied before and is simpler, but requires the additional assumption of normality in some inner product of the smoothing step $M^{-1}A$ to achieve convergence. We prove new convergence results, generalize some previous results and explain the differences and similarities of both operators together with the necessity of the normality. Moreover, we establish optimal compatible interpolation and restriction operators for both two-grid methods that minimize the error norm.
[409] arXiv:2604.21649 [pdf, html, other]: Title: GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion

Qizhuo Xie, Yunhui Liu, Yu Xing, Qianzi Hou, Xudong Jin, Tao Zheng, Tieke He

Comments: ACL 2026

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a critical challenge. While recent quantization-based approaches attempt to align these modalities, they typically treat quantization as flat numerical compression, resulting in semantically entangled codes that fail to mirror the hierarchical nature of human reasoning. In this paper, we propose GS-Quant, a novel framework that generates semantically coherent and structurally stratified discrete codes for KG entities. Unlike prior methods, GS-Quant is grounded in the insight that entity representations should follow a linguistic coarse-to-fine logic. We introduce a Granular Semantic Enhancement module that injects hierarchical knowledge into the codebook, ensuring that earlier codes capture global semantic categories while later codes refine specific attributes. Furthermore, a Generative Structural Reconstruction module imposes causal dependencies on the code sequence, transforming independent discrete units into structured semantic descriptors. By expanding the LLM vocabulary with these learned codes, we enable the model to reason over graph structures isomorphically to natural language generation. Experimental results demonstrate that GS-Quant significantly outperforms existing text-based and embedding-based baselines. Our code is publicly available at this https URL.
[410] arXiv:2604.21651 [pdf, other]: Title: Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach

Eli Gildish, Michael Grebshtein, Igor Makienko

Comments: 16 pages, 8 figures, the use of deep learning in IoT devices

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Denoising of periodic signals and accurate waveform estimation are core tasks across many signal processing domains, including speech, music, medical diagnostics, radio, and sonar. Although deep learning methods have recently shown performance improvements over classical approaches, they require substantial computational resources and are usually trained separately for each signal observation. This study proposes a computationally efficient method based on DCNN and Re-sampling, termed R-DCNN, designed for operation under strict power and resource constraints. The approach targets signals with varying fundamental frequencies and requires only a single observation for training. It generalizes to additional signals via a lightweight resampling step that aligns time scales in signals with different frequencies to re-use the same network weights. Despite its low computational complexity, R-DCNN achieves performance comparable to state-of-the-art classical methods, such as autoregressive (AR)-based techniques, as well as conventional DCNNs trained individually for each observation. This combination of efficiency and performance makes the proposed method particularly well suited for deployment in resource-constrained environments without sacrificing denoising or estimation accuracy.
[411] arXiv:2604.21654 [pdf, html, other]: Title: Causal Disentanglement for Full-Reference Image Quality Assessment

Zhen Zhang, Jielei Chu, Tian Zhang, Weide Liu, Fengmao Lv, Tianrui Li, Jun Cheng, Yuming Fang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Existing deep network-based full-reference image quality assessment (FR-IQA) models typically work by performing pairwise comparisons of deep features from the reference and distorted images. In this paper, we approach this problem from a different perspective and propose a novel FR-IQA paradigm based on causal inference and decoupled representation learning. Unlike typical feature comparison-based FR-IQA models, our approach formulates degradation estimation as a causal disentanglement process guided by intervention on latent representations. We first decouple degradation and content representations by exploiting the content invariance between the reference and distorted images. Second, inspired by the human visual masking effect, we design a masking module to model the causal relationship between image content and degradation features, thereby extracting content-influenced degradation features from distorted images. Finally, quality scores are predicted from these degradation features using either supervised regression or label-free dimensionality reduction. Extensive experiments demonstrate that our method achieves highly competitive performance on standard IQA benchmarks across fully supervised, few-label, and label-free settings. Furthermore, we evaluate the approach on diverse non-standard natural image domains with scarce data, including underwater, radiographic, medical, neutron, and screen-content images. Benefiting from its ability to perform scenario-specific training and prediction without labeled IQA data, our method exhibits superior cross-domain generalization compared to existing training-free FR-IQA models.
[412] arXiv:2604.21657 [pdf, html, other]: Title: Transferable SCF-Acceleration through Solver-Aligned Initialization Learning

Eike S. Eberhard, Viktor Kotsev, Timm Güthle, Stephan Günnemann

Subjects: Machine Learning (cs.LG)

Machine learning methods that predict initial guesses from molecular geometry can reduce this cost, but matrix-prediction models fail when extrapolating to larger molecules, degrading rather than accelerating convergence [Liu et al. 2025]. We show that this failure is a supervision problem, not an extrapolation problem: models trained on ground-state targets fit those targets well out of distribution, yet produce initial guesses that slow convergence. Solver-Aligned Initialization Learning (SAIL) resolves this for both Hamiltonian and density matrix models by differentiating through the SCF solver end-to-end. We introduce the Effective Relative Iteration Count (ERIC), a correction to the commonly used RIC that accounts for hidden Fock-build overhead. On QM40, containing molecules up to 4$\times$ larger than the training distribution, SAIL reduces ERIC by 37% (PBE), 33% (SCAN), and 27% (B3LYP), more than doubling the previous state-of-the-art reduction on B3LYP (10%). On QMugs molecules 10$\times$ the training size, SAIL delivers a 1.25$\times$ wall-time speedup at the hybrid level of theory, extending ML SCF acceleration to large drug-like molecules.
[413] arXiv:2604.21667 [pdf, html, other]: Title: Fine-Grained Perspectives: Modeling Explanations with Annotator-Specific Rationales

Olufunke O. Sarumi, Charles Welch, Daniel Braun

Comments: Accepted at 5th NLPerspectives Workshop

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Beyond exploring disaggregated labels for modeling perspectives, annotator rationales provide fine-grained signals of individual perspectives. In this work, we propose a framework for jointly modeling annotator-specific label prediction and corresponding explanations, fine-tuned on the annotators' provided rationales. Using a dataset with disaggregated natural language inference (NLI) annotations and annotator-provided explanations, we condition predictions on both annotator identity and demographic metadata through a representation-level User Passport mechanism. We further introduce two explainer architectures: a post-hoc prompt-based explainer and a prefixed bridge explainer that transfers annotator-conditioned classifier representations directly into a generative model. This design enables explanation generation aligned with individual annotator perspectives. Our results show that incorporating explanation modeling substantially improves predictive performance over a baseline annotator-aware classifier, with the prefixed bridge approach achieving more stable label alignment and higher semantic consistency, while the post-hoc approach yields stronger lexical similarity. These findings indicate that modeling explanations as expressions of fine-grained perspective provides a richer and more faithful representation of disagreement. The proposed approaches advance perspectivist modeling by integrating annotator-specific rationales into both predictive and generative components.
[414] arXiv:2604.21668 [pdf, html, other]: Title: Encoder-Free Human Motion Understanding via Structured Motion Descriptions

Yao Zhang, Zhuchenyang Liu, Thomas Ploetz, Yu Xiao

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-language alignment through dedicated encoders that project motion features into the LLM's embedding space, remaining constrained by cross-modal representation and alignment. Inspired by biomechanical analysis, where joint angles and body-part kinematics have long served as a precise descriptive language for human movement, we propose \textbf{Structured Motion Description (SMD)}, a rule-based, deterministic approach that converts joint position sequences into structured natural language descriptions of joint angles, body part movements, and global trajectory. By representing motion as text, SMD enables LLMs to apply their pretrained knowledge of body parts, spatial directions, and movement semantics directly to motion reasoning, without requiring learned encoders or alignment modules. We show that this approach goes beyond state-of-the-art results on both motion question answering (66.7\% on BABEL-QA, 90.1\% on HuMMan-QA) and motion captioning (R@1 of 0.584, CIDEr of 53.16 on HumanML3D), surpassing all prior methods. SMD additionally offers practical benefits: the same text input works across different LLMs with only lightweight LoRA adaptation (validated on 8 LLMs from 6 model families), and its human-readable representation enables interpretable attention analysis over motion descriptions. Code, data, and pretrained LoRA adapters are available at this https URL.
[415] arXiv:2604.21673 [pdf, html, other]: Title: Hierarchical Joint Source-Channel Coding with Constrained Information Leakage

Yiqi Chen, Holger Boche, Marc Geitz

Subjects: Information Theory (cs.IT)

This paper studies the hierarchical joint source-channel coding with information leakage constraint in the first-phase reconstruction and distortion constraints. The receiver's access to the data varies and is evaluated by the quality of the side information. Due to the consideration of channel capacity limitation or the efficiency of the system performance, the encoder may send some additional information in Phase 1 that can only be decoded in Phase 2 with higher-quality side information. While this can optimize the overall performance, the additional information causes excessive information leakage. We provide general inner and outer bounds for the conditions such that a given distortion-leakage pair $(D_1,D_2,L)$ is achievable, together with a capacity-achieving condition.
[416] arXiv:2604.21675 [pdf, html, other]: Title: Counterfactual Multi-task Learning for Delayed Conversion Modeling in E-commerce Sales Pre-Promotion

Xin Song, Kaiyuan Li, Jinxin Hu

Comments: 6 pages, accepted by 49th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'26)

Subjects: Information Retrieval (cs.IR)

Sales promotions, as short-term incentives to stimulate product purchases, play a pivotal role in modern e-commerce marketing strategies. During promotional events, user behavior patterns exhibit distinct characteristics compared to regular periods. In the pre-promotion phase, users typically engage in product search and browsing without immediate purchases, adding items to carts in anticipation of promotional discounts. This behavior leads to delayed conversions, resulting in significantly lower conversion rates (CVR) before the promotion day. Although existing research has made progress in CVR prediction for promotion days using historical data, it largely overlooks the critical pre-promotion period. And delayed feedback modeling has been extensively studied, current approaches fail to account for the unique distribution shifts in conversion behavior before promotional events, where delayed conversions predominantly occur on the promotion day rather than over continuous time windows. To address these limitations, we propose the Counterfactual Multi-task Delayed Conversion Model (CM-DCM), which leverages historical pre-promotion data to enhance CVR prediction for both delayed and direct conversions. Our model incorporates three key innovations: (i) A multi-task architecture that jointly models direct and delayed conversions using historical pre-promotion data; (ii) A personalized user behavior gating module to mitigate data sparsity issues during brief pre-promotion periods; (iii) A counterfactual causal approach to model the transition probability from add-to-cart (ATC) to delayed conversion. Extensive experiments demonstrate that CM-DCM outperforms baselines in pre-promotion scenarios. Online A/B tests during major promotional events showed significant improvements in advertising revenue, delayed conversion GMV, and overall GMV, validating the effectiveness of our approach.
[417] arXiv:2604.21677 [pdf, other]: Title: Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions

Eylon E. Krause

Comments: 26 pages, 4 figures, 16 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

The choice of activation function plays a crucial role in the optimization and performance of deep neural networks. While the Rectified Linear Unit (ReLU) remains the dominant choice due to its simplicity and effectiveness, its lack of smoothness may hinder gradient-based optimization in deep architectures. In this work we propose a family of $C^{2N}$-smooth activation functions whose gate follows a log-logistic CDF, achieving ReLU-like performance with purely rational arithmetic. We introduce three variants: GEM (the base family), E-GEM (an $\epsilon$-parameterized generalization enabling arbitrary $L^p$-approximation of ReLU), and SE-GEM (a piecewise variant eliminating dead neurons with $C^{2N}$ junction smoothness). An $N$-ablation study establishes $N=1$ as optimal for standard-depth networks, reducing the GELU deficit on CIFAR-100 + ResNet-56 from 6.10% to 2.12%. The smoothness parameter $N$ further reveals a CNN-transformer tradeoff: $N=1$ is preferred for deep CNNs, while $N=2$ is preferred for transformers. On MNIST, E-GEM ties the best baseline (99.23%). On CIFAR-10 + ResNet-56, SE-GEM ($\epsilon=10^{-4}$) surpasses GELU (92.51% vs 92.44%) -- the first GEM-family activation to outperform GELU. On CIFAR-100 + ResNet-56, E-GEM reduces the GELU deficit from 6.10% (GEM $N=2$) to just 0.62%. On GPT-2 (124M), GEM achieves the lowest perplexity (72.57 vs 73.76 for GELU), with GEM $N=1$ also beating GELU (73.32). On BERT-small, E-GEM ($\epsilon=10$) achieves the best validation loss (6.656) across all activations. The $\epsilon$-parameterization reveals a scale-dependent optimum: small $\epsilon$ ($10^{-4}$--$10^{-6}$) for deep CNNs and larger transformers, with the special case of small transformers (BERT-small) benefiting from large $\epsilon$ ($\epsilon=10$) due to its limited depth and unconstrained gradients.
[418] arXiv:2604.21679 [pdf, html, other]: Title: A Sociotechnical, Practitioner-Centered Approach to Technology Adoption in Cybersecurity Operations: An LLM Case

Francis Hahn, Mohd Mamoon, Alexandru G. Bardas, Michael Collins, Daniel Lende, Xinming Ou, S. Raj Rajagopalan

Comments: 16 Pages and 6 figures (5 diagrams, 1 table)

Subjects: Cryptography and Security (cs.CR)

Technology for security operations centers (SOCs) has a storied history of slow adoption due to concerns about trust and reliability. These concerns are amplified with artificial intelligence, particularly large language models (LLMs), which exhibit issues such as hallucinations and inconsistent outputs. To assess whether LLM-based tools can improve SOC efficiency, we embedded two PhD researchers within a multinational company SOC for six months of ethnographic fieldwork. We identified recurring challenges, such as repetitive tasks, fragmented/unclear data, and tooling bottlenecks, and collaborated directly with practitioners to develop LLM companion tools aligned with their operational needs. Iterative refinement reduced workflow disruption and improved interpretability, leading from skepticism to sustained adoption. Ethnographic analysis indicates that this shift was enabled by our sociotechnical co-creation process consistent with Nonaka's SECI model. This framework explains the common challenges in traditional SOC technology adoption, including workflow misalignment, rigidity against evolving threats and internal requirements, and stagnation over time. Our findings show that the co-creation approach can overcome these old barriers and create a new paradigm for creating usable technology for cybersecurity operations.
[419] arXiv:2604.21681 [pdf, html, other]: Title: Sapiens2

Rawal Khirodkar, He Wen, Julieta Martinez, Yuan Dong, Su Zhaoen, Shunsuke Saito

Comments: Accepted to ICLR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5 billion parameters, with native 1K resolution and hierarchical variants that support 4K. Sapiens2 substantially improves over its predecessor in both pretraining and post-training. First, to learn features that capture low-level details (for dense prediction) and high-level semantics (for zero-shot or few-label settings), we combine masked image reconstruction with self-distilled contrastive objectives. Our evaluations show that this unified pretraining objective is better suited for a wider range of downstream tasks. Second, along the data axis, we pretrain on a curated dataset of 1 billion high-quality human images and improve the quality and quantity of task annotations. Third, architecturally, we incorporate advances from frontier models that enable longer training schedules with improved stability. Our 4K models adopt windowed attention to reason over longer spatial context and are pretrained with 2K output resolution. Sapiens2 sets a new state-of-the-art and improves over the first generation on pose (+4 mAP), body-part segmentation (+24.3 mIoU), normal estimation (45.6% lower angular error) and extends to new tasks such as pointmap and albedo estimation. Code: this https URL
[420] arXiv:2604.21685 [pdf, html, other]: Title: Resilience Revisited: A Multidimensional Framework Derived from Realistic Attack Scenarios

Isaac Ortega Romero, Ioannis Zografopoulos

Comments: 6 pages, IEEE SmartGridComm 2026

Subjects: Systems and Control (eess.SY)

Power systems are increasingly vulnerable to high-impact, low-probability (HILP) events, including coordinated cyberattacks targeting inverter-based resources. Existing resilience frameworks rely on single-dimensional metrics that fail to capture cross-dimensional coupling effects, underestimating real system degradation under multi-vector attack conditions. This study proposes a Multidimensional Resilience Index (MDRI) that decomposes power system degradation into five interacting dimensions: physical, operational, digital-cyber, climatic, and regulatory, explicitly separating independent and coupled contributions via a calibrated multiplicative interaction term. The framework is validated on the IEEE 39-bus system under two attack scenarios derived from the December 2025 cyberattack on the Polish energy infrastructure. MDRI results show that multi-vector attacks produce degradation exceeding linear expectations by a factor of 5.6, with simultaneous dimensional failures contributing an additional 60.6% through endogenous coupling, and exogenous factors amplifying it by an additional 84%.
[421] arXiv:2604.21686 [pdf, html, other]: Title: WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchmark with private scenes and trajectories, making fair cross-model comparison impossible. Existing public benchmarks offer useful metrics such as trajectory error, aesthetic scores, and VLM-based judgments, but none supplies the standardized test conditions -- identical scenes, identical action sequences, and a unified control interface -- needed to make those metrics comparable across models with heterogeneous inputs. We introduce WorldMark, the first benchmark that provides such a common playing field for interactive Image-to-Video world models. WorldMark contributes: (1) a unified action-mapping layer that translates a shared WASD-style action vocabulary into each model's native control format, enabling apples-to-apples comparison across six major models on identical scenes and trajectories; (2) a hierarchical test suite of 500 evaluation cases covering first- and third-person viewpoints, photorealistic and stylized scenes, and three difficulty tiers from Easy to Hard spanning 20-60s; and (3) a modular evaluation toolkit for Visual Quality, Control Alignment, and World Consistency, designed so that researchers can reuse our standardized inputs while plugging in their own metrics as the field evolves. We will release all data, evaluation code, and model outputs to facilitate future research. Beyond offline metrics, we launch World Model Arena (this http URL), an online platform where anyone can pit leading world models against each other in side-by-side battles and watch the live leaderboard.
[422] arXiv:2604.21688 [pdf, html, other]: Title: A-IC3: Learning-Guided Adaptive Inductive Generalization for Hardware Model Checking

Xiaofeng Zhou, Guangyu Hu, Hongce Zhang, Wei Zhang

Subjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG)

The IC3 algorithm represents the state-of-the-art (SOTA) hardware model checking technique, owing to its robust performance and scalability. A significant body of research has focused on enhancing the solving efficiency of the IC3 algorithm, with particular attention to the inductive generalization process: a critical phase wherein the algorithm seeks to generalize a counterexample to inductiveness (CTI), which typically is a state leading to a bad state, into a broader set of states. This inductive generalization is a primary source of clauses in IC3 and thus plays a pivotal role in determining the overall effectiveness of the algorithm.
Despite its importance, existing approaches often rely on fixed inductive generalization strategies, overlooking the dynamic and context-sensitive nature of the verification environment in which spurious counterexamples arise. This rigidity can limit the quality of generated clauses and, consequently, the performance of IC3.
To address this limitation, we propose a lightweight machine-learning-based framework that dynamically selects appropriate inductive generalization strategies in response to the evolving verification context. Specifically, we employ a multi-armed bandit (MAB) algorithm to adaptively choose inductive generalization strategies based on real-time feedback from the verification process. The agent is updated by evaluating the quality of generalization outcomes, thereby refining its strategy selection over time.
Empirical evaluation on a benchmark suite comprising 914 instances, primarily drawn from the latest HWMCC collection, demonstrates the efficacy of our approach. When implemented on the state-of-the-art model checker rIC3, our method solves 26 to 50 more cases than the baselines and improves the PAR-2 score by 194.72 to 389.29.
[423] arXiv:2604.21689 [pdf, html, other]: Title: StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

Kwan Yun, Changmin Lee, Ayeong Jeong, Youngseo Kim, Seungmi Lee, Junyong Noh

Comments: SIGGRAPH 2026 / ACM TOG. Project page at this https URL

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under stylization. They often mistake changes in texture or color palette for identity drift or fail to detect geometric exaggerations. This reveals the lack of a style-agnostic framework to evaluate and supervise identity consistency across varying styles and strengths. To address this gap, we introduce StyleID, a human perception-aware dataset and evaluation framework for facial identity under stylization. StyleID comprises two datasets: (i) StyleBench-H, a benchmark that captures human same-different verification judgments across diffusion- and flow-matching-based stylization at multiple style strengths, and (ii) StyleBench-S, a supervision set derived from psychometric recognition-strength curves obtained through controlled two-alternative forced-choice (2AFC) experiments. Leveraging StyleBench-S, we fine-tune existing semantic encoders to align their similarity orderings with human perception across styles and strengths. Experiments demonstrate that our calibrated models yield significantly higher correlation with human judgments and enhanced robustness for out-of-domain, artist drawn portraits. All of our datasets, code, and pretrained models are publicly available at this https URL
[424] arXiv:2604.21690 [pdf, html, other]: Title: Evaluating Post-hoc Explanations of the Transformer-based Genome Language Model DNABERT-2

Isabel Kurth, Paulo Yanez Sarmiento, Bernhard Y. Renard

Comments: Accepted at the 4th World Conference on Explainable Artificial Intelligence, XAI-2026

Subjects: Machine Learning (cs.LG)

Explaining deep neural network predictions on genome sequences enables biological insight and hypothesis generation-often of greater interest than predictive performance alone. While explanations of convolutional neural networks (CNNs) have been shown to capture relevant patterns in genome sequences, it is unclear whether this transfers to more expressive Transformer-based genome language models (gLMs). To answer this question, we adapt AttnLRP, an extension of layer-wise relevance propagation to the attention mechanism, and apply it to the state-of-the-art gLM DNABERT-2. Thereby, we propose strategies to transfer explanations from token and nucleotide level. We evaluate the adaption of AttnLRP on genomic datasets using multiple metrics. Further, we provide an extensive comparison between the explanations of DNABERT-2 and a baseline CNN. Our results demonstrate that AttnLRP yields reliable explanations corresponding to known biological patterns. Hence, like CNNs, gLMs can also help derive biological insights. This work contributes to the explainability of gLMs and addresses the comparability of relevance attributions across different architectures.
[425] arXiv:2604.21693 [pdf, html, other]: Title: SLAM as a Stochastic Control Problem with Partial Information: Optimal Solutions and Rigorous Approximations

Ilir Gusija, Fady Alajaji, Serdar Yüksel

Subjects: Robotics (cs.RO); Optimization and Control (math.OC)

Simultaneous localization and mapping (SLAM) is a foundational state estimation problem in robotics in which a robot accurately constructs a map of its environment while also localizing itself within this construction. We study the active SLAM problem through the lens of optimal stochastic control, thereby recasting it as a decision-making problem under partial information. After reviewing several commonly studied models, we present a general stochastic control formulation of active SLAM together with a rigorous treatment of motion, sensing, and map representation. We introduce a new exploration stage cost that encodes the geometry of the state when evaluating information-gathering actions. This formulation, constructed as a nonstandard partially observable Markov decision process (POMDP), is then analyzed to derive rigorously justified approximate solutions that are near-optimal. To enable this analysis, the associated regularity conditions are studied under general assumptions that apply to a wide range of robotics applications. For a particular case, we conduct an extensive numerical study in which standard learning algorithms are used to learn near-optimal policies.
[426] arXiv:2604.21694 [pdf, html, other]: Title: Efficient Logic Gate Networks for Video Copy Detection

Katarzyna Fojcik

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Video copy detection requires robust similarity estimation under diverse visual distortions while operating at very large scale. Although deep neural networks achieve strong performance, their computational cost and descriptor size limit practical deployment in high-throughput systems. In this work, we propose a video copy detection framework based on differentiable Logic Gate Networks (LGNs), which replace conventional floating-point feature extractors with compact, logic-based representations. Our approach combines aggressive frame miniaturization, binary preprocessing, and a trainable LGN embedding model that learns both logical operations and interconnections. After training, the model can be discretized into a purely Boolean circuit, enabling extremely fast and memory-efficient inference. We systematically evaluate different similarity strategies, binarization schemes, and LGN architectures across multiple dataset folds and difficulty levels. Experimental results demonstrate that LGN-based models achieve competitive or superior accuracy and ranking performance compared to prior models, while producing descriptors several orders of magnitude smaller and delivering inference speeds exceeding 11k samples per second. These findings indicate that logic-based models offer a promising alternative for scalable and resource-efficient video copy detection.
[427] arXiv:2604.21696 [pdf, html, other]: Title: Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

Liane Vogel, Kavitha Srinivas, Niharika D'Souza, Sola Shirai, Oktie Hassanzadeh, Horst Samulowitz

Subjects: Machine Learning (cs.LG); Databases (cs.DB)

Tabular foundation models aim to learn universal representations of tabular data that transfer across tasks and domains, enabling applications such as table retrieval, semantic search and table-based prediction. Despite the growing number of such models, it remains unclear which approach works best in practice, as existing methods are often evaluated under task-specific settings that make direct comparison difficult. To address this, we introduce TEmBed, the Tabular Embedding Test Bed, a comprehensive benchmark for systematically evaluating tabular embeddings across four representation levels: cell, row, column, and table. Evaluating a diverse set of tabular representation learning models, we show that which model to use depends on the task and representation level. Our results offer practical guidance for selecting tabular embeddings in real-world applications and lay the groundwork for developing more general-purpose tabular representation models.
[428] arXiv:2604.21697 [pdf, html, other]: Title: Structure-preserving approximation for non-isothermal phase-field models in melt flow

Aaron Brunk, Dennis Höhn

Subjects: Numerical Analysis (math.NA)

This work presents a conforming finite-element scheme for the non-isothermal Allen-Cahn-Navier-Stokes system, incorporating periodic, closed, and thermal boundary conditions. The system comprises the incompressible Navier-Stokes equations coupled with the non-isothermal Allen-Cahn equation, which includes a non-conserved phase-field equation and a temperature equation. The proposed numerical scheme preserves entropy production exactly and maintains total energy conservation up to a negative numerical dissipation. Convergence tests in both space and time are conducted, and representative examples are provided to demonstrate the scheme's effectiveness.
[429] arXiv:2604.21698 [pdf, html, other]: Title: Fixation Sequences as Time Series: A Topological Approach to Dyslexia Detection

Marius Huber, David R. Reich, Lena A. Jäger

Comments: ETRA 2026

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Algebraic Topology (math.AT)

Persistent homology, a method from topological data analysis, extracts robust, multi-scale features from data. It produces stable representations of time series by applying varying thresholds to their values (a process known as a \textit{filtration}). We develop novel filtrations for time series and introduce topological methods for the analysis of eye-tracking data, by interpreting fixation sequences as time series, and constructing ``hybrid models'' that combine topological features with traditional statistical features. We empirically evaluate our method by applying it to the task of dyslexia detection from eye-tracking-while-reading data using the Copenhagen Corpus, which contains scanpaths from dyslexic and non-dyslexic L1 and L2 readers. Our hybrid models outperform existing approaches that rely solely on traditional features, showing that persistent homology captures complementary information encoded in fixation sequences. The strength of these topological features is further underscored by their achieving performance comparable to established baseline methods. Importantly, our proposed filtrations outperform existing ones.
[430] arXiv:2604.21699 [pdf, html, other]: Title: Can Large Language Models Assist the Comprehension of ROS2 Software Architectures?

Laura Duits, Bouazza El Moutaouakil, Ivano Malavolta

Subjects: Software Engineering (cs.SE)

Context. The most used development framework for robotics software is ROS2. ROS2 architectures are highly complex, with thousands of components communicating in a decentralized fashion. Goal. We aim to evaluate how LLMs can assist in the comprehension of factual information about the architecture of ROS2 systems. Method. We conduct a controlled experiment where we administer 1,230 prompts to 9 LLMs containing architecturally-relevant questions about 3 ROS2 systems with incremental size. We provide a generic algorithm that systematically generates architecturally-relevant questions for a ROS2 system. Then, we (i) assess the accuracy of the answers of the LLMs against a ground truth established via running and monitoring the 3 ROS2 systems and (ii) qualitatively analyse the explanations provided by the LLMs. Results. Almost all questions are answered correctly across all LLMs (mean=98.22%). gemini-2.5-pro performs best (100% accuracy across all prompts and systems), followed by o3 (99.77%), and gemini-2.5-flash (99.72%); the least performing LLM is gpt-4.1 (95%). Only 300/1,230 prompts are incorrectly answered, of which 249 are about the most complex system. The coherence scores in LLM's explanations range from 0.394 for "service references" to 0.762 for "communication path". The mean perplexity varies significantly across models, with chatgpt-4o achieving the lowest score (19.6) and o4-mini the highest (103.6). Conclusions. There is great potential in the usage of LLMs to aid ROS2 developers in comprehending non-trivial aspects of the software architecture of their systems. Nevertheless, developers should be aware of the intrinsic limitations and different performances of the LLMs and take those into account when using them.
[431] arXiv:2604.21700 [pdf, html, other]: Title: Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang, Haijun Wang, Ting Liu

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent studies have demonstrated the feasibility of backdoor attacks against LLMs. However, existing methods suffer from three key shortcomings: explicit trigger patterns that compromise naturalness, unreliable injection of attacker-specified payloads in long-form generation, and incompletely specified threat models that obscure how backdoors are delivered and activated in practice. To address these gaps, we present BadStyle, a complete backdoor attack framework and pipeline. BadStyle leverages an LLM as a poisoned sample generator to construct natural and stealthy poisoned samples that carry imperceptible style-level triggers while preserving semantics and fluency. To stabilize payload injection during fine-tuning, we design an auxiliary target loss that reinforces the attacker-specified target content in responses to poisoned inputs and penalizes its emergence in benign responses. We further ground the attack in a realistic threat model and systematically evaluate BadStyle under both prompt-induced and PEFT-based injection strategies. Extensive experiments across seven victim LLMs, including LLaMA, Phi, DeepSeek, and GPT series, demonstrate that BadStyle achieves high attack success rates (ASRs) while maintaining strong stealthiness. The proposed auxiliary target loss substantially improves the stability of backdoor activation, yielding an average ASR improvement of around 30% across style-level triggers. Even in downstream deployment scenarios unknown during injection, the implanted backdoor remains effective. Moreover, BadStyle consistently evades representative input-level defenses and bypasses output-level defenses through simple camouflage.
[432] arXiv:2604.21704 [pdf, html, other]: Title: Segment convergence for super-linear stochastic functional differential equations by the truncated Euler-Maruyama method

Shounian Deng, Weiyin Fei, Banban Shi

Subjects: Numerical Analysis (math.NA)

Most existing literature focuses on pointwise convergence (i.e., convergence at a fixed time point) of numerical solutions for Stochastic functional differential equations (SFDEs). In contrast, this paper investigates the strong segment convergence (i.e., the strong order of convergence of the numerical segment process). For SFDEs with super-linear drift and diffusion coefficients, we employ the explicit truncated Euler-Maruyama (EM) scheme. First, we establish the uniform moment boundedness of the truncated EM solution over a finite time interval. Second, we derive the $L^2$-error estimate between the continuous numerical segment and the step numerical segment. Finally, we prove the strong convergence order of the numerical segment generated by the truncated EM. The results can be used to analyze invariant measures and ergodicity of numerical segment, and have important applications in practical problems such as path-dependent financial options. We also provide a numerical example to support the theoretical results.
[433] arXiv:2604.21706 [pdf, html, other]: Title: Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers

Bernard Muller, Antonio Armando Ortiz Barrañón, LaVonne Roberts

Comments: Submitted to Computer Speech & Language

Subjects: Computation and Language (cs.CL)

We previously introduced a training-free method for dysarthria severity assessment based on d-prime separability of phonological feature subspaces in frozen self-supervised speech representations, validated on 890 speakers across 5 languages with HuBERT-base. Here, we scale the analysis to 3,374 speakers from 25 datasets spanning 12 languages and 5 aetiologies (Parkinson's disease, cerebral palsy, ALS, Down syndrome, and stroke), plus healthy controls, using 6 SSL backbones. We report three findings. First, aetiology-specific degradation profiles are distinguishable at the group level: 10 of 13 features yield large effect sizes (epsilon-squared > 0.14, Holm-corrected p < 0.001), with Parkinson's disease separable from the articulatory execution group at Cohen's d = 0.83; individual-level classification remains limited (22.6% macro F1). Second, profiles show cross-lingual profile-shape stability: cosine similarity of 5-dimensional consonant d-prime profiles exceeds 0.95 across the languages available for each aetiology. Absolute d-prime magnitudes are not cross-lingually calibrated, so the method supports language-independent phenotyping of degradation patterns but requires within-corpus calibration for absolute severity interpretation. Third, the method is architecture-independent: all 6 backbones produce monotonic severity gradients with inter-model agreement exceeding rho = 0.77. Fixed-token d-prime estimation preserves the severity correlation (rho = -0.733 at 200 tokens per class), confirming that the signal is not a token-count artefact. These results support phonological subspace analysis as a robust, training-free framework for aetiology-aware dysarthria characterisation, with evidence of cross-lingual profile-shape stability and cross-backbone robustness in the represented sample.
[434] arXiv:2604.21707 [pdf, html, other]: Title: Effects of Swarm Size Variability on Operator Workload

William Hunt, Aleksandra Landowska, Horia A. Maior, Sarvapali D. Ramchurn, Mohammad Soorati

Subjects: Robotics (cs.RO)

Real-world deployments of human--swarm teams depend on balancing operator workload to leverage human strengths without inducing overload. A key challenge is that swarm size is often dynamic: robots may join or leave the mission due to failures or redeployment, causing abrupt workload fluctuations. Understanding how such changes affect human workload and performance is critical for robust human--swarm interaction design. This paper investigates how the magnitude and direction of changes in swarm size influence operator workload. Drawing on the concept of workload history, we test three hypotheses: (1) workload remains elevated following decreases in swarm size, (2) small increases are more manageable than large jumps, and (3) sufficiently large changes override these effects by inducing a cognitive reset. We conducted two studies (N = 34) using a monitoring task with simulated drone swarms of varying sizes. By varying the swarm size between episodes, we measured perceived workload relative to swarm size changes. Results show that objective performance is largely unaffected by small changes in swarm size, while subjective workload is sensitive to both change direction and magnitude. Small increases preserve lower workload, whereas small decreases leave workload elevated, indicating workload residue; large changes in either direction attenuate these effects, suggesting a reset response. These findings offer actionable guidance for managing swarm-size transitions to support operator workload in dynamic human--swarm systems.
[435] arXiv:2604.21711 [pdf, html, other]: Title: Fairness under uncertainty in sequential decisions

Michelle Seng Ah Lee, Kirtan Padh, David Watson, Niki Kilbertus, Jatinder Singh

Comments: ACM Conference on Fairness, Accountability, and Transparency, 2026

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Fair machine learning (ML) methods help identify and mitigate the risk that algorithms encode or automate social injustices. Algorithmic approaches alone cannot resolve structural inequalities, but they can support socio-technical decision systems by surfacing discriminatory biases, clarifying trade-offs, and enabling governance. Although fairness is well studied in supervised learning, many real ML applications are online and sequential, with prior decisions informing future ones. Each decision is taken under uncertainty due to unobserved counterfactuals and finite samples, with dire consequences for under-represented groups, systematically under-observed due to historical exclusion and selective feedback. A bank cannot know whether a denied loan would have been repaid, and may have less data on marginalized populations.
This paper introduces a taxonomy of uncertainty in sequential decision-making -- model, feedback, and prediction uncertainty -- providing shared vocabulary for assessing systems where uncertainty is unevenly distributed across groups. We formalize model and feedback uncertainty via counterfactual logic and reinforcement learning, and illustrate harms to decision makers (unrealized gains/losses) and subjects (compounding exclusion, reduced access) of policies that ignore the unobserved space. Algorithmic examples show it is possible to reduce outcome variance for disadvantaged groups while preserving institutional objectives (e.g. expected utility). Experiments on data simulated with varying bias show how unequal uncertainty and selective feedback produce disparities, and how uncertainty-aware exploration alters fairness metrics. The framework equips practitioners to diagnose, audit, and govern fairness risks. Where uncertainty drives unfairness rather than incidental noise, accounting for it is essential to fair and effective decision-making.
[436] arXiv:2604.21712 [pdf, html, other]: Title: Discriminative-Generative Synergy for Occlusion Robust 3D Human Mesh Recovery

Yang Liu, Zhiyong Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

3D human mesh recovery from monocular RGB images aims to estimate anatomically plausible 3D human models for downstream applications, but remains challenging under partial or severe occlusions. Regression-based methods are efficient yet often produce implausible or inaccurate results in unconstrained scenarios, while diffusion-based methods provide strong generative priors for occluded regions but may weaken fidelity to rare poses due to over-reliance on generation. To address these limitations, we propose a brain-inspired synergistic framework that integrates the discriminative power of vision transformers with the generative capability of conditional diffusion models. Specifically, the ViT-based pathway extracts deterministic visual cues from visible regions, while the diffusion-based pathway synthesizes structurally coherent human body representations. To effectively bridge the two pathways, we design a diverse-consistent feature learning module to align discriminative features with generative priors, and a cross-attention multi-level fusion mechanism to enable bidirectional interaction across semantic levels. Experiments on standard benchmarks demonstrate that our method achieves superior performance on key metrics and shows strong robustness in complex real-world scenarios.
[437] arXiv:2604.21713 [pdf, html, other]: Title: Unlocking the Power of Critical Factors for 3D Visual Geometry Estimation

Guangkai Xu, Hua Geng, Huanyi Zheng, Songyi Yin, Yanlong Sun, Hao Chen, Chunhua Shen

Comments: Accepted to CVPR 2026. GitHub Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Feed-forward visual geometry estimation has recently made rapid progress. However, an important gap remains: multi-frame models usually produce better cross-frame consistency, yet they often underperform strong per-frame methods on single-frame accuracy. This observation motivates our systematic investigation into the critical factors driving model performance through rigorous ablation studies, which reveals several key insights: 1) Scaling up data diversity and quality unlocks further performance gains even in state-of-the-art visual geometry estimation methods; 2) Commonly adopted confidence-aware loss and gradient-based loss mechanisms may unintentionally hinder performance; 3) Joint supervision through both per-sequence and per-frame alignment improves results, while local region alignment surprisingly degrades performance. Furthermore, we introduce two enhancements to integrate the advantages of optimization-based methods and high-resolution inputs: a consistency loss function that enforces alignment between depth maps, camera parameters, and point maps, and an efficient architectural design that leverages high-resolution information. We integrate these designs into CARVE, a resolution-enhanced model for feed-forward visual geometry estimation. Experiments on point cloud reconstruction, video depth estimation, and camera pose/intrinsic estimation show that CARVE achieves strong and robust performance across diverse benchmarks.
[438] arXiv:2604.21714 [pdf, html, other]: Title: High-Fidelity 3D Gaussian Human Reconstruction via Region-Aware Initialization and Geometric Priors

Yang Liu, Zhiyong Zhang

Subjects: Multimedia (cs.MM)

Real-time, high-fidelity 3D human reconstruction from RGB images is essential for interactive applications such as virtual reality and gaming, yet remains challenging due to the complex non-rigid deformations of dynamic human bodies. Although 3D Gaussian Splatting enables efficient rendering, existing methods struggle to capture fine geometric details and often produce artifacts such as fused fingers and over-smoothed faces. Moreover, conventional spatial-field-based dynamic modeling faces a trade-off between reconstruction fidelity and GPU memory consumption. To address these issues, we propose a novel 3D Gaussian human reconstruction framework that combines region-aware initialization with rich geometric priors. Specifically, we leverage the expressive SMPL-X model to initialize both 3D Gaussians and skinning weights, providing a robust geometric foundation for precise reconstruction. We further introduce a region-aware density initialization strategy and a geometry-aware multi-scale hash encoding module to improve local detail recovery while maintaining computational this http URL on PeopleSnapshot and GalaBasketball show that our method achieves superior reconstruction quality and finer detail preservation under complex motions, while maintaining real-time rendering speed.
[439] arXiv:2604.21715 [pdf, html, other]: Title: Automated LTL Specification Generation from Industrial Aerospace Requirements

Zhi Ma, Xiao Liang, Cheng Wen, Rui Chen, Bin Gu, Shengchao Qin, Cong Tian, Mengfei Yang

Subjects: Software Engineering (cs.SE)

In the development and verification of safety-critical aero-space software, Linear Temporal Logic (LTL) has been widely used to specify complex system properties derived from requirements. However, a significant gap remains in industrial practice: translating natural language (NL) requirements into formal LTL properties is a labor-intensive and error-prone process that requires rare expertise in both aerospace control engineering and formal methods. While recent NL-to-LTL tools (e.g., NL2SPEC, NL2TL, NL2LTL) are capable of automating parts of this process, they often fail on real requirement documents in industrial settings, due to complex domain terminology or implicit temporal and logical structure. To address these challenges, we present AeroReq2LTL, a framework that automates LTL property generation for aerospace requirements using large language models (LLMs), with two key industrial innovations: (i) a data dictionary that normalizes technical jargon into precise atomic propositions; and (ii) a template-based requirement language that makes temporal cues and logical relations explicit before translation. On a real aerospace dataset, AeroReq2LTL achieves 85% precision and 88% recall in LTL generation, and its outputs can be directly consumed by existing verification tools.
[440] arXiv:2604.21716 [pdf, html, other]: Title: From If-Statements to ML Pipelines: Revisiting Bias in Code-Generation

Minh Duc Bui, Xenia Heilmann, Mattia Cerrato, Manuel Mager, Katharina von der Wense

Comments: Accepted to ACL 2026 Findings

Subjects: Computation and Language (cs.CL); Software Engineering (cs.SE)

Prior work evaluates code generation bias primarily through simple conditional statements, which represent only a narrow slice of real-world programming and reveal solely overt, explicitly encoded bias. We demonstrate that this approach dramatically underestimates bias in practice by examining a more realistic task: generating machine learning (ML) pipelines. Testing both code-specialized and general-instruction large language models, we find that generated pipelines exhibit significant bias during feature selection. Sensitive attributes appear in 87.7% of cases on average, despite models demonstrably excluding irrelevant features (e.g., including "race" while dropping "favorite color" for credit scoring). This bias is substantially more prevalent than that captured by conditional statements, where sensitive attributes appear in only 59.2% of cases. These findings are robust across prompt mitigation strategies, varying numbers of attributes, and different pipeline difficulty levels. Our results challenge simple conditionals as valid proxies for bias evaluation and suggest current benchmarks underestimate bias risk in practical deployments.
[441] arXiv:2604.21717 [pdf, html, other]: Title: Monte Carlo PDE Solvers for Nonlinear Radiative Boundary Conditions

Anchang Bao, Enya Shen, Jianmin Wang

Subjects: Graphics (cs.GR)

Monte Carlo PDE solvers have become increasingly popular for solving heat-related partial differential equations in geometry processing and computer graphics due to their robustness in handling complex geometries. While existing methods can handle Dirichlet, Neumann, and linear Robin boundary conditions, nonlinear boundary conditions arising from thermal radiation remain largely unexplored.
In this paper, we introduce a Picard-style fixed-point iteration framework that enables Monte Carlo PDE solvers to handle nonlinear radiative boundary conditions. While strict theoretical convergence is not generally guaranteed, our method remains stable and empirically convergent with a properly chosen relaxation coefficient. Even with imprecise initial boundary estimates, it progressively approaches the correct solution. Compared to standard linearization strategies, the proposed approach achieves significantly higher accuracy.
To further address the high variance inherent in Monte Carlo estimators, we propose a heteroscedastic regression-based denoising technique specifically designed for on-boundary solution estimates, filling a gap left by prior variance reduction methods that focus solely on interior points. We validate our approach through extensive evaluations on synthetic benchmarks and demonstrate its effectiveness on practical heat radiation simulations with complex geometries.
[442] arXiv:2604.21718 [pdf, other]: Title: Building a Precise Video Language with Human-AI Oversight

Zhiqiu Lin, Chancharik Mitra, Siyuan Cen, Isaac Li, Yuhan Huang, Yu Tong Tiffany Ling, Hewei Wang, Irene Pi, Shihang Zhu, Ryan Rao, George Liu, Jiaxi Li, Ruojin Li, Yili Han, Yilun Du, Deva Ramanan

Comments: CVPR 2026 Highlight. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)

Video-language models (VLMs) learn to reason about the dynamic visual world through natural language. We introduce a suite of open datasets, benchmarks, and recipes for scalable oversight that enable precise video captioning. First, we define a structured specification for describing subjects, scenes, motion, spatial, and camera dynamics, grounded by hundreds of carefully defined visual primitives developed with professional video creators such as filmmakers. Next, to curate high-quality captions, we introduce CHAI (Critique-based Human-AI Oversight), a framework where trained experts critique and revise model-generated pre-captions into improved post-captions. This division of labor improves annotation accuracy and efficiency by offloading text generation to models, allowing humans to better focus on verification. Additionally, these critiques and preferences between pre- and post-captions provide rich supervision for improving open-source models (Qwen3-VL) on caption generation, reward modeling, and critique generation through SFT, DPO, and inference-time scaling. Our ablations show that critique quality in precision, recall, and constructiveness, ensured by our oversight framework, directly governs downstream performance. With modest expert supervision, the resulting model outperforms closed-source models such as Gemini-3.1-Pro. Finally, we apply our approach to re-caption large-scale professional videos (e.g., films, commercials, games) and fine-tune video generation models such as Wan to better follow detailed prompts of up to 400 words, achieving finer control over cinematography including camera motion, angle, lens, focus, point of view, and framing. Our results show that precise specification and human-AI oversight are key to professional-level video understanding and generation. Data and code are available on our project page: this https URL
[443] arXiv:2604.21719 [pdf, html, other]: Title: A superconvergent hybridizable discontinuous Galerkin method for the convective Cahn--Hilliard equation

Gang Chen, Daozhi Han, Jiaxuan Liu, Yangwen Zhang, Dujin Zuo

Subjects: Numerical Analysis (math.NA)

We propose a hybridizable discontinuous Galerkin (HDG) method combined with convex-concave splitting for the temporal discretization of the convective Cahn-Hilliard equation. The convection term is discretized explicitly without stabilization, yielding three key advantages: (1) unconditional stability, (2) preservation of the optimal convergence rate for piecewise constant approximations, and (3) a symmetric system after local elimination, enabling efficient solver via minimal residual methods. We establish optimal convergence rates in the $L^2$ norm for both the scalar and flux variables for any polynomial degree $k \geq 0$. To achieve optimal $L^2$-norm estimates, we introduce a specialized HDG elliptic projection operator and analyze its approximation properties. Within the HDG framework, local elimination is employed to reduce the degrees of freedom associated with the globally coupled unknowns, and the scalar variables exhibit superconvergence. Finally, numerical experiments validate the theoretical convergence rates and demonstrate the effectiveness of the proposed method.
[444] arXiv:2604.21724 [pdf, html, other]: Title: Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling

Yilong Chen, Yanxi Xie, Zitian Gao, He Xin, Yihao Xiao, Renbiao Liu, Haoming Luo, Yifan Luo, Zhengmao Ye, Tingwen Liu, Xin Zhao, Ran Tao, Bryan Dai

Comments: 29 pages, 9 figures, 13 tables

Subjects: Computation and Language (cs.CL)

Large token-indexed lookup tables provide a compute-decoupled scaling path, but their practical gains are often limited by poor parameter efficiency and rapid memory growth. We attribute these limitations to Zipfian under-training of the long tail, heterogeneous demand across layers, and "slot collapse" that produces redundant embeddings. To address this, we propose X-GRAM, a frequency-aware dynamic token-injection framework. X-GRAM employs hybrid hashing and alias mixing to compress the tail while preserving head capacity, and refines retrieved vectors via normalized SwiGLU ShortConv to extract diverse local n-gram features. These signals are integrated into attention value streams and inter-layer residuals using depth-aware gating, effectively aligning static memory with dynamic context. This design introduces a memory-centric scaling axis that decouples model capacity from FLOPs. Extensive evaluations at the 0.73B and 1.15B scales show that X-GRAM improves average accuracy by as much as 4.4 points over the vanilla backbone and 3.2 points over strong retrieval baselines, while using substantially smaller tables in the 50% configuration. Overall, by decoupling capacity from compute through efficient memory management, X-GRAM offers a scalable and practical paradigm for future memory-augmented architectures. Code aviliable in this https URL.
[445] arXiv:2604.21725 [pdf, html, other]: Title: AEL: Agent Evolving Learning for Open-Ended Environments

Wujiang Xu, Jiaojiao Han, Minghao Guo, Kai Mei, Xi Zhu, Han Zhang, Dimitris N. Metaxas

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)

LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experience into better future behavior. The central obstacle is not \emph{what} to remember but \emph{how to use} what has been remembered, including which retrieval policy to apply, how to interpret prior outcomes, and when the current strategy itself must change. We introduce \emph{Agent Evolving Learning} (\ael{}), a two-timescale framework that addresses this obstacle. At the fast timescale, a Thompson Sampling bandit learns which memory retrieval policy to apply at each episode; at the slow timescale, LLM-driven reflection diagnoses failure patterns and injects causal insights into the agent's decision prompt, giving it an interpretive frame for the evidence it retrieves. On a sequential portfolio benchmark (10 sector-diverse tickers, 208 episodes, 5 random seeds), \ael{} achieves a Sharpe ratio of 2.13$\pm$0.47, outperforming five published self-improving methods and all non-LLM baselines while maintaining the lowest variance among all LLM-based approaches. A nine-variant ablation reveals a ``less is more'' pattern: memory and reflection together produce a 58\% cumulative improvement over the stateless baseline, yet every additional mechanism we test (planner evolution, per-tool selection, cold-start initialization, skill extraction, and three credit assignment methods) \emph{degrades} performance. This demonstrates that the bottleneck in agent self-improvement is \emph{self-diagnosing how to use} experience rather than adding architectural complexity. Code and data: this https URL.
[446] arXiv:2604.21728 [pdf, html, other]: Title: Ramen: Robust Test-Time Adaptation of Vision-Language Models with Active Sample Selection

Wenxuan Bao, Yanjun Zhao, Xiyuan Yang, Jingrui He

Comments: Accepted by CVPR 2026 (Findings Track)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Pretrained vision-language models such as CLIP exhibit strong zero-shot generalization but remain sensitive to distribution shifts. Test-time adaptation adapts models during inference without access to source data or target labels, offering a practical way to handle such shifts. However, existing methods typically assume that test samples come from a single, consistent domain, while in practice, test data often include samples from mixed domains with distinct characteristics. Consequently, their performance degrades under mixed-domain settings. To address this, we present Ramen, a framework for robust test-time adaptation through active sample selection. For each incoming test sample, Ramen retrieves a customized batch of relevant samples from previously seen data based on two criteria: domain consistency, which ensures that adaptation focuses on data from similar domains, and prediction balance, which mitigates adaptation bias caused by skewed predictions. To improve efficiency, Ramen employs an embedding-gradient cache that stores the embeddings and sample-level gradients of past test images. The stored embeddings are used to retrieve relevant samples, and the corresponding gradients are aggregated for model updates, eliminating the need for any additional forward or backward passes. Our theoretical analysis provides insight into why the proposed adaptation mechanism is effective under mixed-domain shifts. Experiments on multiple image corruption and domain-shift benchmarks demonstrate that Ramen achieves strong and consistent performance, offering robust and efficient adaptation in complex mixed-domain scenarios. Our code is available at this https URL .
[447] arXiv:2604.21729 [pdf, html, other]: Title: A Compact Peristaltic Pump Based on Magneto-Elastic Hysteresis with Single Pneumatic Control

Minjo Park, Metin Sitti

Comments: 5 pages

Subjects: Robotics (cs.RO)

Pumping fluids is fundamental to a wide range of industrial, environmental, and biomedical applications. Among various pumping mechanisms, peristaltic pumps enable efficient and safe fluid transport by deforming an elastic tube without direct contact with the working fluid. Although previous studies have introduced mechanical, pneumatic, or magnetic actuations to drive membrane deformation, these approaches often lead to complex pump architectures and control schemes. In this study, we present a soft membrane pump that achieves peristaltic motion through a single pneumatic input combined with an embedded passive magnet. The actuation mechanism and system dynamics were analyzed and simplified through modeling. Numerical simulations were conducted to predict the internal fluid flow, and the magneto-elastic hysteresis behavior observed in the simulations was successfully validated by experiments with a proof-of-concept prototype.
[448] arXiv:2604.21733 [pdf, other]: Title: Enabling and Inhibitory Pathways of University Students' Willingness to Disclose AI Use: A Cognition-Affect-Conation Perspective

Yiran Du, Huimin He

Subjects: Artificial Intelligence (cs.AI)

The increasing integration of artificial intelligence (AI) in higher education has raised important questions regarding students' transparency in reporting AI-assisted work. This study investigates the psychological mechanisms underlying university students' willingness to disclose AI use by applying the Cognition--Affect--Conation (CAC) framework. A sequential explanatory mixed-methods design was employed. In the quantitative phase, survey data were collected from 546 university students and analysed using structural equation modelling to examine the relationships among cognitive perceptions, affective responses, and disclosure intention. In the qualitative phase, semi-structured interviews with 22 students were conducted to further interpret the quantitative findings. The results indicate that psychological safety significantly increases students' willingness to disclose AI use and is positively shaped by perceived fairness, perceived teacher support, and perceived organisational support. Conversely, evaluation apprehension reduces disclosure intention and psychological safety, and is strengthened by perceived stigma, perceived uncertainty, and privacy concern. Qualitative findings further reveal that institutional clarity and supportive instructional practices encourage openness, whereas policy ambiguity and fear of negative evaluation often lead students to adopt cautious or strategic disclosure practices. Overall, the study highlights the dual role of enabling and inhibitory psychological mechanisms in shaping AI-use disclosure and underscores the importance of supportive institutional environments and clear guidance for promoting responsible AI transparency in higher education.
[449] arXiv:2604.21740 [pdf, html, other]: Title: A Case Study in Recovery of Drones using Discrete-Event Systems

Liam P. Burns, Dayse M. Cavalcanti, Felipe G. Cabral, Max H. de Queiroz, Melissa Greeff, Publio M. M. Lima, Karen Rudie

Comments: Accepted for publication at WODES 2026; final version will appear in IEEE Xplore

Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

Discrete-event systems and supervisory control theory provide a rigorous framework for specifying correct-by-construction behavior. However, their practical application to swarm robotics remains largely underexplored. In this paper, we investigate a topological recovery method based on discrete-event-systems within a swarm robotics context. We propose a hybrid architecture that combines a high-level discrete event systems supervisor with a low-level continuous controller, allowing lost drones to safely recover from fault or attack events and re-enter a controlled region. The method is demonstrated using ten simulated UAVs in the py-bullet-drones framework. We show recovery performance across four distinct scenarios, each with varying initial state estimates. Additionally, we introduce a secondary recovery supervisor that manages the regrouping process for a drone after it has re-entered the operational region.
[450] arXiv:2604.21741 [pdf, html, other]: Title: Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training

Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yanjiang Guo, Jiaming Liu, Shanghang Zhang, Jianyu Chen, Yichen Zhu

Comments: Project Page: this https URL

Subjects: Robotics (cs.RO)

Post-training is essential for turning pretrained generalist robot policies into reliable task-specific controllers, but existing human-in-the-loop pipelines remain tied to physical execution: each correction requires robot time, scene setup, resets, and operator supervision in the real world. Meanwhile, action-conditioned world models have been studied mainly for imagination, synthetic data generation, and policy evaluation. We propose \textbf{Human-in-the-World-Model (Hi-WM)}, a post-training framework that uses a learned world model as a reusable corrective substrate for failure-targeted policy improvement. A policy is first rolled out in closed loop inside the world model; when the rollout becomes incorrect or failure-prone, a human intervenes directly in the model to provide short corrective actions. Hi-WM caches intermediate states and supports rollback and branching, allowing a single failure state to be reused for multiple corrective continuations and yielding dense supervision around behaviors that the base policy handles poorly. The resulting corrective trajectories are then added back to the training set for post-training. We evaluate Hi-WM on three real-world manipulation tasks spanning both rigid and deformable object interaction, and on two policy backbones. Hi-WM improves real-world success by 37.9 points on average over the base policy and by 19.0 points over a world-model closed-loop baseline, while world-model evaluation correlates strongly with real-world performance (r = 0.953). These results suggest that world models can serve not only as generators or evaluators, but also as effective corrective substrates for scalable robot post-training.
[451] arXiv:2604.21743 [pdf, html, other]: Title: Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement

Dat To-Thanh, Nghia Nguyen-Trong, Hoang Vo, Hieu Bui-Minh, Tinh-Anh Nguyen-Nhu

Comments: 10 pages, 3 figures. Accepted at the Mobile AI (MAI) 2026 Workshop at CVPR 2026

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Image enhancement models for mobile devices often struggle to balance high output quality with the fast processing speeds required by mobile hardware. While recent deep learning models can enhance low-quality mobile photos into high-quality images, their performance is often degraded when converted to lower-precision formats for actual use on mobile phones. To address this training-deployment mismatch, we propose an efficient image enhancement model designed specifically for mobile deployment. Our approach uses a hierarchical network architecture with gated encoder blocks and multiscale refinement to preserve fine-grained visual features. Moreover, we incorporate Quantization-Aware Training (QAT) to simulate the effects of low-precision representation during the training process. This allows the network to adapt and prevents the typical drop in quality seen with standard post-training quantization (PTQ). Experimental results demonstrate that the proposed method produces high-fidelity visual output while maintaining the low computational overhead needed for practical use on standard mobile devices. The code will be available at this https URL.
[452] arXiv:2604.21744 [pdf, html, other]: Title: Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development

Magnus Palmblad, Jared M. Ragland, Benjamin A. Neely

Comments: Letter, 9 pages, 1 table

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)

The capabilities of AI-assisted coding are progressing at breakneck speed. Chat-based vibe coding has evolved into fully fledged AI-assisted, agentic software development using agent scaffolds where the human developer creates a plan that agentic AIs implement. One current trend is utilizing documents beyond this plan document, such as project and method-scoped documents. Here we propose GROUNDING$.$md, a community-governed, field-scoped epistemic grounding document, using mass spectrometry-based proteomics as an example. This explicit field-scoped grounding document encodes Hard Constraints (non-negotiable validity invariants empirically required for scientific correctness) and Convention Parameters (community-agreed defaults) that override all other contexts to enforce validity, regardless of what the user prompts. In practice, this will empower a non-domain expert to generate code, tools, and software that have best practices baked in at the ground level, providing confidence to the software developer but also to those reviewing or using the final product. Undoubtedly it is easier to have agentic AIs adhere to guidelines than humans, and this opportunity allows for organizations to develop epistemic grounding documents in such a way as to keep domain experts in the loop in a future of democratized generation of bespoke software solutions.
[453] arXiv:2604.21745 [pdf, other]: Title: A Brief History of Fréchet Distances: From Curves and Probability Laws to FID

Yuli Wu

Comments: 108 pages

Subjects: General Literature (cs.GL); Probability (math.PR)

This note provides a chronological account of Fréchet distances, starting with Maurice Fréchet's 1906 doctoral thesis on distances in abstract sets and tracing the Fréchet distance between polygonal curves and its algorithmic computation in the 1990s. It then continues with his 1957 paper on a coupling-based distance between probability laws with a brief glimpse of Wasserstein distance and optimal transport. We further attempt to draw connections between the distributional, coupling-based facet of Fréchet distances on probability laws and the geometric facet on curves. The note ends with a modern use case, the Fréchet Inception Distance (FID) in the era of deep generative model evaluation, interpretable as the Wasserstein-2 distance between multivariate Gaussians in a learned feature space. An appendix includes \TeX{}ified faithful English translations of Fréchet's 1906 thesis and 1957 paper, and Lévy's 1950 note for reader convenience.
[454] arXiv:2604.21746 [pdf, html, other]: Title: Less Is More: Measuring How LLM Involvement affects Chatbot Accuracy in Static Analysis

Krishna Narasimhan

Subjects: Software Engineering (cs.SE)

Large language models are increasingly used to make static analysis tools accessible through natural language, yet existing systems differ in how much they delegate to the LLM without treating the degree of delegation as an independent variable. We compare three architectures along a spectrum of LLM involvement for translating natural language to Joern's query language \cpgql{}: direct query generation (\approach{1}), generation of a schema-constrained JSON intermediate representation (\approach{2}), and tool-augmented agentic generation (\approach{3}). These are evaluated on a benchmark of 20 code analysis tasks across three complexity tiers, using four open-weight models in a 2$\times$2 design (two model families $\times$ two scales), each with three repetitions. The structured intermediate representation (\approach{2}) achieves the highest result match rates, outperforming direct generation by 15--25 percentage points on large models and surpassing the agentic approach despite the latter consuming 8$\times$ more tokens. The benefit of structured intermediates is most pronounced for large models; for small models, schema compliance becomes the bottleneck. These findings suggest that in formally structured domains, constraining the LLM's output to a well-typed intermediate representation and delegating query construction to deterministic code yields better results than either unconstrained generation or iterative tool use.
[455] arXiv:2604.21748 [pdf, html, other]: Title: StructMem: Structured Memory for Long-Horizon Behavior in LLMs

Buqiang Xu, Yijun Chen, Jizhan Fang, Ruobin Zhong, Yunzhi Yao, Yuqi Zhu, Lun Du, Shumin Deng

Comments: Accepted by ACL 2026 main conference

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Long-term conversational agents need memory systems that capture relationships between events, not merely isolated facts, to support temporal reasoning and multi-hop question answering. Current approaches face a fundamental trade-off: flat memory is efficient but fails to model relational structure, while graph-based memory enables structured reasoning at the cost of expensive and fragile construction. To address these issues, we propose \textbf{StructMem}, a structure-enriched hierarchical memory framework that preserves event-level bindings and induces cross-event connections. By temporally anchoring dual perspectives and performing periodic semantic consolidation, StructMem improves temporal reasoning and multi-hop performance on \texttt{LoCoMo}, while substantially reducing token usage, API calls, and runtime compared to prior memory systems, see this https URL .
[456] arXiv:2604.21749 [pdf, html, other]: Title: CuRast: Cuda-Based Software Rasterization for Billions of Triangles

Markus Schütz, Lukas Lipp, Elias Kristmann, Michael Wimmer

Subjects: Graphics (cs.GR)

Previous work shows that small triangles can be rasterized efficiently with compute shaders. Building on this insight, we explore how far this can be pushed for massive triangle datasets without the need to construct acceleration structures in advance.
Method: A 3-stage rasterization pipeline first rasterizes small triangles directly in stage 1, using atomicMin to store the closest fragments. Larger triangles are forwarded to stages 2 and 3.
Results: CuRast can render models with hundreds of millions of triangles up to 2-5x (unique) or up to 12x (instanced) faster than Vulkan. Vulkan remains an order of magnitude faster for low-poly meshes.
Limitations: We currently focus on dense, opaque meshes that you would typically obtain from photogrammetry/3D reconstruction. Blending/Transparency is not yet supported, and scenes with thousands of low-poly meshes are not implemented efficiently.
Future Work: To make it suitable for games and a wider range of use cases, future work will need to (1) optimize handling of scenes with tens of thousands of nodes/meshes, (2) add support for hierarchical clustered LODs such as those produced by Meshoptimizer, (3) add support for transparency, likely in its own stage so as to keep opaque rasterization untouched and fast.
Source Code: this https URL
[457] arXiv:2604.21750 [pdf, html, other]: Title: Multistakeholder Impacts of Profile Portability in a Recommender Ecosystem

Anas Buhayh, Elizabeth McKinnie, Clement Canel, Robin Burke

Comments: 34th ACM Conference on User Modeling, Adaptation and Personalization

Subjects: Information Retrieval (cs.IR)

Optimizing outcomes for multiple stakeholders in recommender systems has historically focused on algorithmic interventions, such as developing multi-objective models or re-ranking results from existing algorithms. However, structural changes to the recommendation ecosystem itself remain understudied. This paper explores the implications of algorithmic pluralism (also known as "middleware" in the governance literature), in which recommendation algorithms are decoupled from platforms, enabling users to select their preferred algorithm. Prior simulation work demonstrates that algorithmic choice benefits niche consumers and providers. Yet this approach raises critical questions about user modeling in the context of data portability: when users switch algorithms, what happens to their data? Noting that multiple data portability regulations have emerged to strengthen user data ownership and control. We examine how such policies affect user models and stakeholders' outcomes in recommendation setting. Our findings reveal that data portability scenarios produce varying effects on user utility across different recommendation algorithms. We highlight key policy considerations and implications for designing equitable recommendation ecosystems.
[458] arXiv:2604.21751 [pdf, html, other]: Title: Why are all LLMs Obsessed with Japanese Culture? On the Hidden Cultural and Regional Biases of LLMs

Joseba Fernandez de Landa, Carla Perez-Almendros, Jose Camacho-Collados

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

LLMs have been showing limitations when it comes to cultural coverage and competence, and in some cases show regional biases such as amplifying Western and Anglocentric viewpoints. While there have been works analysing the cultural capabilities of LLMs, there has not been specific work on highlighting LLM regional preferences when it comes to cultural-related questions. In this work, we propose a new dataset based on a comprehensive taxonomy of Culture-Related Open Questions (CROQ). The results show that, contrary to previous cultural bias work, LLMs show a clear tendency towards countries such as Japan. Moveover, our results show that when prompting in languages such as English or other high-resource ones, LLMs tend to provide more diverse outputs and show less inclinations towards answering questions highlighting countries for which the input language is an official language. Finally, we also investigate at which point of LLM training this cultural bias emerges, with our results suggesting that the first clear signs appear after supervised fine-tuning, and not during pre-training.
[459] arXiv:2604.21752 [pdf, html, other]: Title: Stable and asymptotic preserving space-time discretizations of a linear kinetic transport equation in diffusive scaling

Anita Gjesteland, Sigrun Ortleb, Salim Elghawi, David C. Del Rey Fernández

Subjects: Numerical Analysis (math.NA)

We develop an unconditionally energy-stable tensor-product space-time discretization framework for the solution of a linear kinetic transport equation in one space dimension. The kinetic equation is a simplified model of radiative transfer formulated as a hyperbolic balance law in diffusive scaling for a particle distribution function of the independent variables space, time and velocity. Our numerical discretization is based on the well-known technique of micro-macro decomposition which results in a system of balance laws for equilibrium and non-equilibrium quantities and facilitates preservation of the asymptotic limit for vanishing scaling parameters at the discrete level. We prove fully discrete stability and asymptotic preservation for general spatial and temporal discretizations having the summation-by-parts property. A new provably energy-stable Dirichlet boundary treatment for the micro-macro decomposed system is developed based on the introduction of simultaneous approximation terms. Numerical results show convergence for smooth problems and demonstrate energy stability of the proposed boundary treatment.
[460] arXiv:2604.21760 [pdf, other]: Title: Interpretable facial dynamics as behavioral and perceptual traces of deepfakes

Timothy Joseph Murphy, Jennifer Cook, Hélio Clemente José Cuve

Comments: Main paper: 19 pages, 5 figures, 4 tables. SI Appendix: 11 pages, 3 figures, 6 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Deepfake detection research has largely converged on deep learning approaches that, despite strong benchmark performance, offer limited insight into what distinguishes real from manipulated facial behavior. This study presents an interpretable alternative grounded in bio-behavioral features of facial dynamics and evaluates how computational detection strategies relate to human perceptual judgments. We identify core low-dimensional patterns of facial movement, from which temporal features characterizing spatiotemporal structure were derived. Traditional machine learning classifiers trained on these features achieved modest but significant above-chance deepfake classification, driven by higher-order temporal irregularities that were more pronounced in manipulated than real facial dynamics. Notably, detection was substantially more accurate for videos containing emotive expressions than those without. An emotional valence classification analysis further indicated that emotive signals are systematically degraded in deepfakes, explaining the differential impact of emotive dynamics on detection. Furthermore, we provide an additional and often overlooked dimension of explainability by assessing the relationship between model decisions and human perceptual detection. Model and human judgments converged for emotive but diverged for non-emotive videos, and even where outputs aligned, underlying detection strategies differed. These findings demonstrate that face-swapped deepfakes carry a measurable behavioral fingerprint, most salient during emotional expression. Additionally, model-human comparisons suggest that interpretable computational features and human perception may offer complementary rather than redundant routes to detection.
[461] arXiv:2604.21761 [pdf, html, other]: Title: Transferable Physics-Informed Representations via Closed-Form Head Adaptation

Jian Cheng Wong, Isaac Yin Chung Lai, Pao-Hsiung Chiu, Chin Chun Ooi, Abhishek Gupta, Yew-Soon Ong

Comments: Accepted at IJCNN 2026

Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)

Physics-informed neural networks (PINNs) have garnered significant interest for their potential in solving partial differential equations (PDEs) that govern a wide range of physical phenomena. By incorporating physical laws into the learning process, PINN models have demonstrated the ability to learn physical outcomes reasonably well. However, current PINN approaches struggle to predict or solve new PDEs effectively when there is a lack of training examples, indicating they do not generalize well to unseen problem instances. In this paper, we present a transferable learning approach for PINNs premised on a fast Pseudoinverse PINN framework (Pi-PINN). Pi-PINN learns a transferable physics-informed representation in a shared embedding space and enables rapid solving of both known and unknown PDE instances via closed-form head adaptation using a least-squares-optimal pseudoinverse under PDE constraints. We further investigate the synergies between data-driven multi-task learning loss and physics-informed loss, providing insights into the design of more performant PINNs. We demonstrate the effectiveness of Pi-PINN on various PDE problems, including Poisson's equation, Helmholtz equation, and Burgers' equation, achieving fast and accurate physics-informed solutions without requiring any data for unseen instances. Pi-PINN can produce predictions 100-1000 times faster than a typical PINN, while producing predictions with 10-100 times lower relative error than a typical data-driven model even with only two training samples. Overall, our findings highlight the potential of transferable representations with closed-form head adaptation to enhance the efficiency and generalization of PINNs across PDE families and scientific and engineering applications.
[462] arXiv:2604.21764 [pdf, html, other]: Title: Thinking with Reasoning Skills: Fewer Tokens, More Accuracy

Guangxiang Zhao, Qilong Shi, Xusen Xiao, Xiangzheng Zhang, Tong Yang, Lin Sun

Comments: 10 pages, The 64th Annual Meeting of the Association for Computational Linguistics -- Industry Track

Subjects: Artificial Intelligence (cs.AI)

Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems. We propose to summarize and store reusable reasoning skills distilled from extensive deliberation and trial-and-error exploration, and to retrieve these skills at inference time to guide future reasoning. Unlike the prevailing \emph{reasoning from scratch} paradigm, our approach first recalls relevant skills for each query, helping the model avoid redundant detours and focus on effective solution paths. We evaluate our method on coding and mathematical reasoning tasks, and find that it significantly reduces reasoning tokens while improving overall performance. The resulting lower per-request cost indicates strong practical and economic potential for real-world deployment.
[463] arXiv:2604.21765 [pdf, html, other]: Title: PrismaDV: Automated Task-Aware Data Unit Test Generation

Hao Chen, Arnab Phani, Sebastian Schelter

Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)

Data is a central resource for modern enterprises, and data validation is essential for ensuring the reliability of downstream applications. However, existing automated data unit testing frameworks are largely task-agnostic: they validate datasets without considering the semantics and requirements of the code that consumes the data.
We present PrismaDV, a compound AI system that analyzes downstream task code together with dataset profiles to identify data access patterns, infer implicit data assumptions, and generate task-aware executable data unit tests. To further adapt the data unit tests over time to specific datasets and downstream tasks, we propose "Selective Informative Feedback for Task Adaptation" (SIFTA), a prompt-optimization framework that leverages the scarce outcomes from the execution of data unit tests and downstream tasks. We evaluate PrismaDV on two new benchmarks spanning 60 tasks across five datasets, where it consistently outperforms both task-agnostic and task-aware baselines in generating unit tests that reflect the end-to-end impact of data errors. Furthermore, we show that with SIFTA, we can automatically learn prompts for PrismaDV's modules that outperform prompts written by hand or generated from a generic prompt optimizer. We publicly release our benchmarks and prototype implementation.
[464] arXiv:2604.21766 [pdf, other]: Title: AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA

Tasnim Kabir, Dmytro Kurdydyk, Aadi Palnitkar, Liam Dorn, Ahmed Haj Ahmed, Jordan Lee Boyd-Graber

Subjects: Computation and Language (cs.CL)

Existing audio question answering benchmarks largely emphasize sound event classification or caption-grounded queries, often enabling models to succeed through shortcut strategies, short-duration cues, lexical priors, dataset-specific biases, or even bypassing audio via metadata and captions rather than genuine reasoning Thus, we present AUDITA (Audio Understanding from Diverse Internet Trivia Authors), a large-scale, real-world benchmark to rigorously evaluate audio reasoning beyond surface-level acoustic recognition. AUDITA comprises carefully curated, human-authored trivia questions grounded in real-world audio, designed to stress robust auditory reasoning through challenging distractors and long-range temporal dependencies, using probing queries that cannot be answered from isolated text or sound cues alone. Human average accuracy of 32.13% shows both the challenge of the task while demonstrating meaningful comprehension of the audio. In stark contrast, state of-the-art audio question answering models perform poorly, with average accuracy below 8.86%. Beyond raw accuracy, we apply Item Response Theory (IRT) to estimate latent proficiency, question difficulty, and expose systematic deficiencies of the models and data.
[465] arXiv:2604.21767 [pdf, html, other]: Title: Misinformation Span Detection in Videos via Audio Transcripts

Breno Matos, Rennan C. Lima, Savvas Zannettou, Fabricio Benevenuto, Rodrygo L.T. Santos

Comments: Accepted at ICWSM 2026

Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Online misinformation is one of the most challenging issues lately, yielding severe consequences, including political polarization, attacks on democracy, and public health risks. Misinformation manifests in any platform with a large user base, including online social networks and messaging apps. It permeates all media and content forms, including images, text, audio, and video. Distinctly, video-based misinformation represents a multifaceted challenge for fact-checkers, given the ease with which individuals can record and upload videos on various video-sharing platforms. Previous research efforts investigated detecting video-based misinformation, focusing on whether a video shares misinformation or not on a video level. While this approach is useful, it only provides a limited and non-easily interpretable view of the problem given that it does not provide an additional context of when misinformation occurs within videos and what content (i.e., claims) are responsible for the video's misinformation nature.
In this work, we attempt to bridge this research gap by creating two novel datasets that allow us to explore misinformation detection on videos via audio transcripts, focusing on identifying the span of videos that are responsible for the video's misinformation claim (misinformation span detection). We present two new datasets for this task. We transcribe each video's audio to text, identifying the video segment in which the misinformation claims appears, resulting in two datasets of more than 500 videos with over 2,400 segments containing annotated fact-checked claims. Then, we employ classifiers built with state-of-the-art language models, and our results show that we can identify in which part of a video there is misinformation with an F1 score of 0.68. We make publicly available our annotated datasets. We also release all transcripts, audio and videos.
[466] arXiv:2604.21769 [pdf, html, other]: Title: Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards

Minji Jung, Minjae Lee, Yejin Kim, Sarang Choi, Minsuk Kahng

Comments: Accepted to the 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT 2026)

Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

LLM leaderboards are widely used to compare models and guide deployment decisions. However, leaderboard rankings are shaped by evaluation priorities set by benchmark designers, rather than by the diverse goals and constraints of actual users and organizations. A single aggregate score often obscures how models behave across different prompt types and compositions. In this work, we conduct an in-depth analysis of the dataset used in the LMArena (formerly Chatbot Arena) benchmark and investigate this evaluation challenge by designing an interactive visualization interface as a design probe. Our analysis reveals that the dataset is heavily skewed toward certain topics, that model rankings vary across prompt slices, and that preference-based judgments are used in ways that blur their intended scope. Building on this analysis, we introduce a visualization interface that allows users to define their own evaluation priorities by selecting and weighting prompt slices and to explore how rankings change accordingly. A qualitative study suggests that this interactive approach improves transparency and supports more context-specific model evaluation, pointing toward alternative ways to design and use LLM leaderboards.
[467] arXiv:2604.21771 [pdf, html, other]: Title: Generalizing Test Cases for Comprehensive Test Scenario Coverage

Binhang Qi, Yun Lin, Xinyi Weng, Chenyan Liu, Hailong Sun, Gordon Fraser, Jin Song Dong

Comments: Accepted at FSE 2026

Subjects: Software Engineering (cs.SE)

Test cases are essential for software development and maintenance. In practice, developers derive multiple test cases from an implicit pattern based on their understanding of requirements and inference of diverse test scenarios, each validating a specific behavior of the focal method. However, producing comprehensive tests is time-consuming and error-prone: many important tests that should have accompanied the initial test are added only after a significant delay, sometimes only after bugs are triggered. Existing automated test generation techniques largely focus on code coverage. Yet in real projects, practical tests are seldom driven by code coverage alone, since test scenarios do not necessarily align with control-flow branches. Instead, test scenarios originate from requirements, which are often undocumented and implicitly embedded in a project's design and implementation. However, developer-written tests are frequently treated as executable specifications; thus, even a single initial test that reflects the developer's intent can reveal the underlying requirement and the diverse scenarios that should be validated. In this work, we propose TestGeneralizer, a framework for generalizing test cases to comprehensively cover test scenarios. TestGeneralizer orchestrates three stages: (1) enhancing the understanding of the requirement and scenario behind the focal method and initial test; (2) generating a test scenario template and crystallizing it into various test scenario instances; and (3) generating and refining executable test cases from these instances. We evaluate TestGeneralizer against three state-of-the-art baselines on 12 open-source Java projects. TestGeneralizer achieves significant improvements: +31.66% and +23.08% over ChatTester, in mutation-based and LLM-assessed scenario coverage, respectively.
[468] arXiv:2604.21772 [pdf, html, other]: Title: Back to Source: Open-Set Continual Test-Time Adaptation via Domain Compensation

Yingkai Yang, Chaoqi Chen, Hui Huang

Comments: Accepted to CVPR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Test-Time Adaptation (TTA) aims to mitigate distributional shifts between training and test domains during inference time. However, existing TTA methods fall short in the realistic scenario where models face both continually changing domains and the simultaneous emergence of unknown semantic classes, a challenging setting we term Open-set Continual Test-Time Adaptation (OCTTA). The coupling of domain and semantic shifts often collapses the feature space, severely degrading both classification and out-of-distribution detection. To tackle this, we propose DOmain COmpensation (DOCO), a lightweight and effective framework that robustly performs domain adaptation and OOD detection in a synergistic, closed loop. DOCO first performs dynamic, adaptation-conditioned sample splitting to separate likely ID from OOD samples. Then, using only the ID samples, it learns a domain compensation prompt by aligning feature statistics with the source domain, guided by a structural preservation regularizer that prevents semantic distortion. This learned prompt is then propagated to the OOD samples within the same batch, effectively isolating their semantic novelty for more reliable detection. Extensive experiments on multiple challenging benchmarks demonstrate that DOCO outperforms prior CTTA and OSTTA methods, establishing a new state-of-the-art for the demanding OCTTA setting.
[469] arXiv:2604.21774 [pdf, html, other]: Title: Adversarial Robustness of Near-Field Millimeter-Wave Imaging under Waveform-Domain Attacks

Lhamo Dorje, Jordan Madden, Soamar Homsi, Xiaohua Li

Comments: 11 pages, 6 figures

Subjects: Cryptography and Security (cs.CR)

Near-field millimeter-wave (mmWave) imaging is widely deployed in safety-critical applications such as airport passenger screening, yet its own security remains largely unexplored. This paper presents a systematic study of the adversarial robustness of mmWave imaging algorithms under waveform-domain physical attacks that directly manipulate the image reconstruction process. We propose a practical white-box adversarial model and develop a differential imaging attack framework that leverages the differentiable imaging pipeline to optimize attack waveforms. We also construct a real measured dataset of clean and attack waveforms using a mmWave imaging testbed. Experiments on 10 representative imaging algorithms show that mmWave imaging is highly vulnerable to such attacks, enabling an adversary to conceal or alter targets with moderate transmission power. Surprisingly, deep-learning-based imaging algorithms demonstrate higher robustness than classical algorithms. These findings expose critical security risks and motivate the development of robust and secure mmWave imaging systems.
[470] arXiv:2604.21775 [pdf, html, other]: Title: Local error estimates for a finite element method combining linear and nonlinear stabilization for the linear hyperbolic transport equation

Erik Burman, Fabian Heimann

Subjects: Numerical Analysis (math.NA)

In this paper, we investigate the combination of a linear continuous interior penalty type and a non-linear artificial diffusion stabilisation applied to the transport problem, based on continuous Galerkin finite elements in space. This method was recently introduced and analysed for globally smooth solutions in [Burman 2023, SIAM J. Sci. Comput., 45, 1, A96-A122]. We provide a rigorous proof of a localisation principle in terms of weighted stability and a priori error bound results, which follow the widely known $\mathcal{O}(h^{k+1/2})$ scaling in the $L^2(\Omega; t=T)$ norm, where $k$ denotes the polynomial order of the finite element space and $h$ the mesh size. The analysis is semi-discrete in space and assumes sufficient local regularity of the continuous solution on the smooth part of the domain, where the continuous interior penalty stabilisation is active, whilst artificial diffusion operates on the remaining rough parts of the domain. Thereby, the analysis demonstrates that typical numerical errors in the rough part stay localised relative to the convection velocity and do not negatively affect the smooth parts of the solution, if the stabilisation combination is set up accordingly.
[471] arXiv:2604.21776 [pdf, html, other]: Title: Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting

Avinash Paliwal, Adithya Iyer, Shivin Yadav, Muhammad Ali Afridi, Midhun Harikumar

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Precise camera control for reshooting dynamic videos is bottlenecked by the severe scarcity of paired multi-view data for non-rigid scenes. We overcome this limitation with a highly scalable self-supervised framework capable of leveraging internet-scale monocular videos. Our core contribution is the generation of pseudo multi-view training triplets, consisting of a source video, a geometric anchor, and a target video. We achieve this by extracting distinct smooth random-walk crop trajectories from a single input video to serve as the source and target views. The anchor is synthetically generated by forward-warping the first frame of the source with a dense tracking field, which effectively simulates the distorted point-cloud inputs expected at inference. Because our independent cropping strategy introduces spatial misalignment and artificial occlusions, the model cannot simply copy information from the current source frame. Instead, it is forced to implicitly learn 4D spatiotemporal structures by actively routing and re-projecting missing high-fidelity textures across distinct times and viewpoints from the source video to reconstruct the target. At inference, our minimally adapted diffusion transformer utilizes a 4D point-cloud derived anchor to achieve state-of-the-art temporal consistency, robust camera control, and high-fidelity novel view synthesis on complex dynamic scenes.
[472] arXiv:2604.21777 [pdf, html, other]: Title: Fast Algorithm For Solving Time-dependent Multiscale radiative transport Equation

Qinchen Song, Lei Zhang, Min Tang

Subjects: Numerical Analysis (math.NA)

When solving the time-dependent radiative transport equation (RTE), implicit time discretization is often employed for its robustness and stability. This results in a sequence of steady-state RTEs with identical cross-sections but varying source terms, whose repeated solution is computationally costly. To address this, we first apply the adaptive tailored finite point scheme (TFPS) for spatial discretization. This scheme exploits prior knowledge of the background media's optical properties to adaptively compress the angular domain, constructing a compressed linear system. A key feature is its ability to reconstruct the layer structure after compression, faithfully capturing the variance at the layer. We then use the Recursive Skeleton Method (RSM) to obtain an explicit multilevel decomposition of the inverse discrete operator, which is reused for all steady-state solutions. Numerical experiments show that our framework achieves high accuracy and significant efficiency across diverse scenarios.
[473] arXiv:2604.21782 [pdf, other]: Title: SemEval-2026 Task 4: Narrative Story Similarity and Narrative Representation Learning

Hans Ole Hatzel, Ekaterina Artemova, Haimo Paul Stiemer, Evelyn Gius, Chris Biemann

Subjects: Computation and Language (cs.CL)

We present the shared task on narrative similarity and narrative representation learning - NSNRL (pronounced "nass-na-rel"). The task operationalizes narrative similarity as a binary classification problem: determining which of two stories is more similar to an anchor story. We introduce a novel definition of narrative similarity, compatible with both narrative theory and intuitive judgment. Based on the similarity judgments collected under this concept, we also evaluate narrative embedding representations. We collected at least two annotations each for more than 1,000 story summary triples, with each annotation being backed by at least two annotators in agreement. This paper describes the sampling and annotation process for the dataset; further, we give an overview of the submitted systems and the techniques they employ. We received a total of 71 final submissions from 46 teams across our two tracks. In our triple-based classification setup, LLM ensembles make up many of the top-scoring systems, while in the embedding setup, systems with pre- and post-processing on pretrained embedding models perform about on par with custom fine-tuned solutions. Our analysis identifies potential headroom for improvement of automated systems in both tracks. The task website includes visualizations of embeddings alongside instance-level classification results for all teams.
[474] arXiv:2604.21786 [pdf, html, other]: Title: From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

Katharina Prasse, Steffen Jung, Isaac Bravo, Stefanie Walter, Patrick Knab, Christian Bartelt, Margret Keuper

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Social media platforms have become primary arenas for climate communication, generating millions of images and posts that - if systematically analysed - can reveal which communication strategies mobilise public concern and which fall flat. We aim to facilitate such research by analysing how computer vision methods can be used for social media discourse analysis. This analysis includes application-based taxonomy design, model selection, prompt engineering, and validation. We benchmark six promptable vision-language models and 15 zero-shot CLIP-like models on two datasets from X (formerly Twitter) - a 1,038-image expert-annotated set and a larger corpus of over 1.2 million images, with 50,000 labels manually validated - spanning five annotation dimensions: animal content, climate change consequences, climate action, image setting, and image type. Among the models benchmarked, Gemini-3.1-flash-lite outperforms all others across all super-categories and both datasets, while the gap to open-weight models of moderate size remains relatively small. Beyond instance-level metrics, we advocate for distributional evaluation: VLM predictions can reliably recover population level trends even when per-image accuracy is moderate, making them a viable starting point for discourse analysis at scale. We find that chain-of-thought reasoning reduces rather than improves performance, and that annotation dimension specific prompt design improves performance. We release tweet IDs and labels along with our code at this https URL.
[475] arXiv:2604.21787 [pdf, other]: Title: Agentic AI-Enabled Framework for Thermal Comfort and Building Energy Assessment in Tropical Urban Neighborhoods

Po-Yen Lai, Xinyu Yang, Derrick Low, Huizhe Liu, Jian Cheng Wong

Comments: Accepted at IAQVEC 2026

Subjects: Multiagent Systems (cs.MA); Computational Physics (physics.comp-ph)

In response to the urban heat island effects and building energy demands in Singapore, this study proposes an agentic AI-enabled reasoning framework that integrates large language models (LLMs) with lightweight physics-based models. Through prompt customization, the LLMs interpret urban design tasks, extract relevant policies, and activate appropriate physics-based models for evaluation, forming a closed-loop reasoning-action process. These lightweight physics-based models leverage core thermal and airflow principles, streamlining conventional models to reduce computational time while predicting microclimate variables, such as building surface temperature, ground radiant heat, and airflow conditions, thereby enabling the estimation of thermal comfort indices, e.g., physiological equivalent temperature (PET), and building energy usage. This framework allows users to explore a variety of climate-resilient building surface strategies, e.g., green façades and cool paint applications, that improve thermal comfort while reducing wall heat gain and energy demand. By combining the autonomous reasoning capacity of LLMs with the rapid quantitative evaluation of lightweight physics-based models, the proposed system demonstrates potential for cross-disciplinary applications in sustainable urban design, indoor-outdoor environmental integration, and climate adaptation planning. The source code and data used in this study are available at: this https URL.
[476] arXiv:2604.21789 [pdf, html, other]: Title: Compliance Moral Hazard and the Backfiring Mandate

Jian Ni, Lecheng Zheng, John R Birge

Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

Competing firms that serve shared customer populations face a fundamental information aggregation problem: each firm holds fragmented signals about risky customers, but individual incentives impede efficient collective detection. We develop a mechanism design framework for decentralized risk analytics, grounded in anti-money laundering in banking networks. Three strategic frictions distinguish our setting: compliance moral hazard, adversarial adaptation, and information destruction through intervention. A temporal value assignment (TVA) mechanism, which credits institutions using a strictly proper scoring rule on discounted verified outcomes, implements truthful reporting as a Bayes--Nash equilibrium (uniquely optimal at each edge) in large federations. Embedding TVA in a banking competition model, we show competitive pressure amplifies compliance moral hazard and poorly designed mandates can reduce welfare below autarky, a ``backfiring'' result with direct policy implications. In simulation using a synthetic AML benchmark, TVA achieves substantially higher welfare than autarky or mandated sharing without incentive design.
[477] arXiv:2604.21793 [pdf, html, other]: Title: Inferring High-Level Events from Timestamped Data: Complexity and Medical Applications

Yvon K. Awuklu, Meghyn Bienvenu, Katsumi Inoue, Vianney Jouhet, Fleur Mougin

Comments: This is the full version (with appendix) of a paper appearing at the 23rd International Conference on Principles of Knowledge Representation and Reasoning (KR 2026)

Subjects: Artificial Intelligence (cs.AI)

In this paper, we develop a novel logic-based approach to detecting high-level temporally extended events from timestamped data and background knowledge. Our framework employs logical rules to capture existence and termination conditions for simple temporal events and to combine these into meta-events. In the medical domain, for example, disease episodes and therapies are inferred from timestamped clinical observations, such as diagnoses and drug administrations stored in patient records, and can be further combined into higher-level disease events. As some incorrect events might be inferred, we use constraints to identify incompatible combinations of events and propose a repair mechanism to select preferred consistent sets of events. While reasoning in the full framework is intractable, we identify relevant restrictions that ensure polynomial-time data complexity. Our prototype system implements core components of the approach using answer set programming. An evaluation on a lung cancer use case supports the interest of the approach, both in terms of computational feasibility and positive alignment of our results with medical expert opinions. While strongly motivated by the needs of the healthcare domain, our framework is purposely generic, enabling its reuse in other areas.
[478] arXiv:2604.21794 [pdf, html, other]: Title: Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

Ye Yu, Heming Liu, Haibo Jin, Xiaopeng Yuan, Peng Kuang, Haohan Wang

Comments: Under review at COLM 2026

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA)

Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations such as key-value caches offers a promising alternative to text-based protocols, but existing approaches do not jointly optimize communication with multi-agent reasoning. Therefore we propose DiffMAS, a training framework that treats latent communication as a learnable component of multi-agent systems. DiffMAS performs parameter-efficient supervised training over multi-agent latent trajectories, enabling agents to jointly learn how information should be encoded and interpreted across interactions. Experiments on mathematical reasoning, scientific QA, code generation, and commonsense benchmarks show that DiffMAS consistently improves reasoning accuracy and decoding stability over single-agent inference, text-based multi-agent systems, and prior latent communication methods, achieving 26.7% on AIME24, 20.2% on GPQA-Diamond, and consistent gains across reasoning benchmarks.
[479] arXiv:2604.21795 [pdf, html, other]: Title: NEST: Network Enforced Session Types (Technical Report)

Jens Kanstrup Larsen, Alceste Scalas, Guy Amir, Jules Jacobs, Jana Wagemaker, Nate Foster

Subjects: Programming Languages (cs.PL)

This paper introduces NEST (Network-Enforced Session Types), a runtime verification framework that moves application-level protocol monitoring into the network fabric. Unlike prior work that instruments or wraps application code, we synthesize packet-level monitors that enforce protocols directly in the data plane. We develop algorithms to generate network-level monitors from session types and extend them to handle packet loss and reordering. We implement NEST in P4 and evaluate it on applications including microservice and network-function models, showing that network-level monitors can enforce realistic non-trivial protocols.
[480] arXiv:2604.21798 [pdf, html, other]: Title: An effective variant of the Hartigan $k$-means algorithm

François Clément, Stefan Steinerberger

Subjects: Machine Learning (cs.LG)

The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5\%$ -- $10\%$. We point out that a very minor variation of Hartigan's method leads to another $2\%$ -- $5\%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.
[481] arXiv:2604.21801 [pdf, html, other]: Title: SyMTRS: Benchmark Multi-Task Synthetic Dataset for Depth, Domain Adaptation and Super-Resolution in Aerial Imagery

Safouane El Ghazouali, Nicola Venturi, Michael Rueegsegger, Umberto Michelucci

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent advances in deep learning for remote sensing rely heavily on large annotated datasets, yet acquiring high-quality ground truth for geometric, radiometric, and multi-domain tasks remains costly and often infeasible. In particular, the lack of accurate depth annotations, controlled illumination variations, and multi-scale paired imagery limits progress in monocular depth estimation, domain adaptation, and super-resolution for aerial scenes. We present SyMTRS, a large-scale synthetic dataset generated using a high-fidelity urban simulation pipeline. The dataset provides high-resolution RGB aerial imagery (2048 x 2048), pixel-perfect depth maps, night-time counterparts for domain adaptation, and aligned low-resolution variants for super-resolution at x2, x4, and x8 scales. Unlike existing remote sensing datasets that focus on a single task or modality, SyMTRS is designed as a unified multi-task benchmark enabling joint research in geometric understanding, cross-domain robustness, and resolution enhancement. We describe the dataset generation process, its statistical properties, and its positioning relative to existing benchmarks. SyMTRS aims to bridge critical gaps in remote sensing research by enabling controlled experiments with perfect geometric ground truth and consistent multi-domain supervision. The results obtained in this work can be reproduced from this Github repository: this https URL.
[482] arXiv:2604.21806 [pdf, html, other]: Title: TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, Liqiang Nie

Comments: Accepted by ACL 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typically cover only a limited range of salient changes, which induces two limitations highly relevant to practical applications, namely Insufficient Entity Coverage and Clause-Entity Misalignment. In order to address these issues and bring CIR closer to real-world use, we construct two instruction-rich multi-modification datasets, M-FashionIQ and M-CIRR. In addition, we propose TEMA, the Text-oriented Entity Mapping Architecture, which is the first CIR framework designed for multi-modification while also accommodating simple modifications. Extensive experiments on four benchmark datasets demonstrate that TEMA's superiority in both original and multi-modification scenarios, while maintaining an optimal balance between retrieval accuracy and computational efficiency. Our codes and constructed multi-modification dataset (M-FashionIQ and M-CIRR) are available at this https URL.
[483] arXiv:2604.21808 [pdf, html, other]: Title: Recursive Structure of Hulls of PRM Codes

Yufeng Song, Qin Yue

Comments: 25 pages

Subjects: Information Theory (cs.IT)

For a nonnegative integer $r$ and a positive integer $v$ satisfying
\[
\frac{r(q-1)}{2}<v<\frac{(r+1)(q-1)}{2},
\]
we define the combinatorial numbers
\[
A_r(v)=
\begin{cases}
\displaystyle
\sum_{t=r(q-1)-v}^{v}\ \sum_{j=0}^{r}(-1)^j\binom{r}{j}\binom{t-jq+r-1}{r-1}, & r>0,\\[1.2ex]
1, & r=0.
\end{cases}
\]
For the projective Reed-Muller code $\PRM(q,m,v)$, we determine its hull dimension:
\[
\dim \Hull\bigl(\PRM(q,m,v)\bigr)
=
\dim \PRM(q,m,v)
-
\sum_{i=0}^{\ell}A_{2i+\epsilon}\bigl(v-(\ell-i)(q-1)\bigr),
\]
where
\[
\ell=\Bigl\lfloor\frac r2\Bigr\rfloor,\qquad
\epsilon=
\begin{cases}
0, & r\ \text{is even},
1, & r\ \text{is odd}.
\end{cases}
\]
This formula applies in the open lower-half range
$
0<v<\frac{m\Qm}{2},
$
equivalently for $v\in I_r$ with $m\ge r+1$; the range
$
\frac{m\Qm}{2}<v<m\Qm
$
is then obtained by Sørensen's duality theorem \cite{Sorensen}.
[484] arXiv:2604.21809 [pdf, html, other]: Title: Quotient-Space Diffusion Models

Yixian Xu, Yusong Wang, Shengjie Luo, Kaiyuan Gao, Tianyu He, Di He, Chang Liu

Comments: ICLR 2026 Oral Presentation; 40 pages, 5 figures, 6 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

Diffusion-based generative models have reformed generative AI, and have enabled new capabilities in the science domain, for example, generating 3D structures of molecules. Due to the intrinsic problem structure of certain tasks, there is often a symmetry in the system, which identifies objects that can be converted by a group action as equivalent, hence the target distribution is essentially defined on the quotient space with respect to the group. In this work, we establish a formal framework for diffusion modeling on a general quotient space, and apply it to molecular structure generation which follows the special Euclidean group $\text{SE}(3)$ symmetry. The framework reduces the necessity of learning the component corresponding to the group action, hence simplifies learning difficulty over conventional group-equivariant diffusion models, and the sampler guarantees recovering the target distribution, while heuristic alignment strategies lack proper samplers. The arguments are empirically validated on structure generation for small molecules and proteins, indicating that the principled quotient-space diffusion model provides a new framework that outperforms previous symmetry treatments.
[485] arXiv:2604.21810 [pdf, html, other]: Title: Multiscale Super Resolution without Image Priors

Daniel Fu, Gabby Litterio, Pedro Felzenszwalb, Rashid Zia

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

We address the ambiguities in the super-resolution problem under translation. We demonstrate that combinations of low-resolution images at different scales can be used to make the super-resolution problem well posed. Such differences in scale can be achieved using sensors with different pixel sizes (as demonstrated here) or by varying the effective pixel size through changes in optical magnification (e.g., using a zoom lens). We show that images acquired with pairwise coprime pixel sizes lead to a system with a stable inverse, and furthermore, that super-resolution images can be reconstructed efficiently using Fourier domain techniques or iterative least squares methods. Our mathematical analysis provides an expression for the expected error of the least squares reconstruction for large signals assuming i.i.d. noise that elucidates the noise-resolution tradeoff. These results are validated through both one- and two-dimensional experiments that leverage charge-coupled device (CCD) hardware binning to explore reconstructions over a large range of effective pixel sizes. Finally, two-dimensional reconstructions for a series of targets are used to demonstrate the advantages of multiscale super-resolution, and implications of these results for common imaging systems are discussed.
[486] arXiv:2604.21811 [pdf, html, other]: Title: Probably Approximately Consensus: On the Learning Theory of Finding Common Ground

Carter Blair, Ben Armstrong, Shiri Alouf-Heffetz, Nimrod Talmon, Davide Grossi

Comments: Accepted to the Social Choice and Learning Algorithms Workshop at IJCAI 2025

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

A primary goal of online deliberation platforms is to identify ideas that are broadly agreeable to a community of users through their expressed preferences. Yet, consensus elicitation should ideally extend beyond the specific statements provided by users and should incorporate the relative salience of particular topics. We address this issue by modelling consensus as an interval in a one-dimensional opinion space derived from potentially high-dimensional data via embedding and dimensionality reduction. We define an objective that maximizes expected agreement within a hypothesis interval where the expectation is over an underlying distribution of issues, implicitly taking into account their salience. We propose an efficient Empirical Risk Minimization (ERM) algorithm and establish PAC-learning guarantees. Our initial experiments demonstrate the performance of our algorithm and examine more efficient approaches to identifying optimal consensus regions. We find that through selectively querying users on an existing sample of statements, we can reduce the number of queries needed to a practical number.
[487] arXiv:2604.21812 [pdf, html, other]: Title: Generalized Two-Dimensional Index Modulation in the Code-Spatial Domain for LPWAN

Long Yuan, Wenkun Wen, Junlin Liu, Peiran Wu, Minghua Xia

Comments: 14 pages, 12 figures, 4 tables. To appear in IEEE TCOM

Subjects: Information Theory (cs.IT)

Low-power wide-area networks (LPWANs) are crucial for large-scale Internet of Things (IoT) applications, yet they face increasing demands for higher data rates, improved reliability, and enhanced energy efficiency under stringent hardware constraints. To address these challenges, this paper introduces a generalized code-index modulation (CIM) transceiver that employs multiple-antenna index modulation (IM). The transmitter integrates spatial modulation (SM), space-time block coding (STBC), and CIM into a unified two-dimensional (2D) coding structure, where the spreading sequences -- realized via continuous phase modulation with spread spectrum (CPM-SS), chirp spread spectrum, or Zadoff-Chu sequences -- serve as spreading codes. Three specific schemes are proposed: SM-CIM, STBC-SM-CIM, and an enhanced STBC-SM-CIM (ESTBC-SM-CIM), designed to jointly improve data rate and energy efficiency. Closed-form expressions for the average bit error probability are derived, and system performance is analyzed in terms of data rate, energy efficiency, and computational complexity. Simulation results show that the proposed designs consistently outperform benchmark schemes, demonstrating their potential for enabling high-data-rate, energy-efficient LPWAN and IoT communications.
[488] arXiv:2604.21814 [pdf, html, other]: Title: Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos

Bowen Liu, Li Yang, Shanshan Song, Mingyu Tang, Zhifang Gao, Qifeng Chen, Yangqiu Song, Huimin Chen, Xiaomeng Li

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Capsule endoscopy (CE) enables non-invasive gastrointestinal screening, but current CE research remains largely limited to frame-level classification and detection, leaving video-level analysis underexplored. To bridge this gap, we introduce and formally define a new task, diagnosis-driven CE video summarization, which requires extracting key evidence frames that covers clinically meaningful findings and making accurate diagnoses from those evidence frames. This setting is challenging because diagnostically relevant events are extremely sparse and can be overwhelmed by tens of thousands of redundant normal frames, while individual observations are often ambiguous due to motion blur, debris, specular highlights, and rapid viewpoint changes. To facilitate research in this direction, we introduce VideoCAP, the first CE dataset with diagnosis-driven annotations derived from real clinical reports. VideoCAP comprises 240 full-length videos and provides realistic supervision for both key evidence frame extraction and diagnosis. To address this task, we further propose DiCE, a clinician-inspired framework that mirrors the standard CE reading workflow. DiCE first performs efficient candidate screening over the raw video, then uses a Context Weaver to organize candidates into coherent diagnostic contexts that preserve distinct lesion events, and an Evidence Converger to aggregate multi-frame evidence within each context into robust clip-level judgments. Experiments show that DiCE consistently outperforms state-of-the-art methods, producing concise and clinically reliable diagnostic summaries. These results highlight diagnosis-driven contextual reasoning as a promising paradigm for ultra-long CE video summarization.
[489] arXiv:2604.21815 [pdf, html, other]: Title: Norm-based convergence bounds for nonsymmetric algebraic V-cycle multigrid methods

Reinhard Nabben, Ludwig Rooch

Comments: 26 pages

Subjects: Numerical Analysis (math.NA)

Recently a new approach to analyze and create algebraic multigrid methods (AMG) for nonsymmetric and indefinite matrices was established. Convergence is measured in general norms induced by a certain HPD matrix $B$ and $B$-orthogonal projections built by compatible transfer operators are used. Here we continue our theoretical framework, started in Nabben and Rooch (2026), for nonsymmetric algebraic multigrid methods using any HPD matrix $B$ to induce a norm. Our framework not only includes all recent results but also provides many new results. We consider two, slightly different, multigrid operators. The first one is the natural generalization of the error operator in the HPD case. The second operator is simpler to apply and has been studied before. However, an additional condition for the smoother $M^{-1}A$ is needed, which is in our terminology the $B$-normality. We explain the differences and similarities of both operators in detail and show, why the extra condition is needed. We consider arbitrary interpolation and restriction operators that result in $B$-orthogonal coarse-grid corrections and give sharp estimates for the norm of the error propagation matrices for the two-grid methods. We also show, that the norms are decreasing if we increase the size of the coarse space. Moreover, we are able to extend the landmark $V$-cycle bound by McCormick to the nonsymmetric case.
[490] arXiv:2604.21816 [pdf, html, other]: Title: Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

Anuj Sadani, Deepak Kumar

Comments: 21 pages

Subjects: Artificial Intelligence (cs.AI)

The Model Context Protocol (MCP) has become a common interface for connecting large language model (LLM) agents to external tools, but its reliance on stateless, eager schema injection imposes a hidden per-turn overhead the MCP Tax or Tools Tax that practitioner reports place between roughly 10k and 60k tokens in typical multi-server deployments. This payload inflates the key-value cache, is associated with reasoning degradation as context utilization approaches published fracture points around 70%, and turns token budgets into a recurring operational cost. We introduce Tool Attention, a middleware-layer mechanism that generalizes the "Attention Is All You Need" paradigm from self-attention over tokens to gated attention over tools. Tool Attention combines (i) an Intent Schema Overlap (ISO) score from sentence embeddings, (ii) a state-aware gating function enforcing preconditions and access scopes, and (iii) a two-phase lazy schema loader that keeps a compact summary pool in context and promotes full JSON schemas only for top-k gated tools. We evaluate on a simulated 120-tool, six-server benchmark whose per-server token counts are calibrated to public audits of real MCP deployments. In this simulation, Tool Attention directly reduces measured per-turn tool tokens by 95.0% (47.3k -> 2.4k) and raises effective context utilization (a token-ratio quantity) from 24% to 91%. End-to-end figures for task success, latency, cost, and reasoning quality are reported as projections derived from the measured token counts combined with published deployment telemetry; they are not measured on live LLM agents, and we mark projected values explicitly throughout. Taken together, the results support a simple thesis: protocol-level efficiency, not raw context length, is a binding constraint on scalable gentic systems. The code for this work is accessible at this https URL
[491] arXiv:2604.21818 [pdf, html, other]: Title: Formulae for the Drazin inverse of Modified Tensors via the Einstein Product

Yue Zhao, Daochang Zhang, Dijana Mosic

Subjects: Numerical Analysis (math.NA)

This paper establishes exact expressions for the Drazin inverse of the modified tensor $\mathcal A-\mathcal C*_N\mathcal D^D*_N\mathcal B$ via the Einstein product, formulated using the Drazin inverse of $\mathcal A$ and the generalized Schur complement $\mathcal D-\mathcal B*_N\mathcal A^{D}*_N\mathcal C$, providing a comprehensive generalization and unification of existing results in the literature for the case when the tensors are of order two. Furthermore, the findings reduce to the classical Sherman-Morrison-Woodbury formula in the special case of second-order tensors. Finally, we give an example to illustrate our new explicit expression.
[492] arXiv:2604.21819 [pdf, html, other]: Title: Iterative Receiver Processing at Relays in PNC-Enabled Multi-Hop Underwater Acoustic Networks

Gewei Zhang, Deqing Wang, Lizhao You, Xiangming Cai, Liqun Fu

Subjects: Networking and Internet Architecture (cs.NI)

Physical-layer network coding (PNC) can increase end-to-end throughput in bi-directional multi-hop underwater acoustic (UWA) networks. However, multipath delay spread and Doppler-induced inter-carrier interference (ICI) in UWA channels can degrade the reliability of PNC transmission in a three-node relay configuration. More critically, error accumulation across multiple relay nodes leads to a pronounced increase in the end-to-end bit error rate (BER) in multi-hop networks. To address this issue, we develop an iterative detection and decoding processing strategy for relay nodes within a PNC-enabled multi-hop UWA network based on orthogonal frequency division multiplexing (OFDM) modulation. The proposed design integrates three key algorithms: (i) an adaptive channel-aware factor graph detection algorithm that is suited for time-varying UWA channels; (ii) a parity-check-constrained soft-information refinement algorithm that improves the accuracy of the information feedback from the decoder to the detector; and (iii) a linear minimum mean square error (LMMSE) detection algorithm based on a superimposed model, which offers low computational complexity as an alternative scheme. Extensive simulation results demonstrate that the adaptive detection algorithm achieves BERs on the order of $10^{-5}$ at a relative velocity of 1.5 m/s UWA channel and a signal-to-noise (SNR) of 8~dB. Both lake experiments and sea trials in the Taiwan Strait confirm that the proposed iterative receiver algorithms outperform baseline schemes in terms of BER performance under practical UWA channel conditions, showing their robustness and applicability in real multi-hop deployments.
[493] arXiv:2604.21821 [pdf, html, other]: Title: Direct Problem for Gas Diffusion in Polar Firn with Variable Coefficients

Sophie Moufawad, Nabil Nassif, Faouzi Triki

Comments: 29 pages

Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

We consider the mathematical model of gas trapping in deep polar ice (firns), which consists of a parabolic partial differential equation, that can degenerate at one boundary extreme. In [1], we considered all the coefficients to be constants, except the diffusion coefficient D(z) that is to be reconstructed. In this paper, we assume both the diffusion coefficient D(z) and the volume fraction f(z) are functions. The difficulty in this problem, both theoretically and computationally, arises from the fact that D(z) and f(z) may be zero at bottom of the firn. To handle such degeneracy, we defined appropriate weighted Sobolev spaces and used Lion's theorem to prove existence and uniqueness of the semi-variational formulation of the Firn PDE. A full discrete system is obtained through a P1 Finite element Galerkin procedure in space and an Euler-Implicit scheme in time. Sufficient conditions for the existence and uniqueness of the solution for the discrete system are obtained.
[494] arXiv:2604.21822 [pdf, html, other]: Title: Beyond Rules: Towards Basso Continuo Personal Style Identification

Adam Štefunko, Jan Hajič jr

Comments: 8 pages, 4 figures, accepted to the 13th International Conference on Digital Libraries for Musicology (DLfM)

Subjects: Sound (cs.SD)

A central part of the contemporary Historically Informed Practice movement is basso continuo, an improvised accompaniment genre with its traditions originating in the baroque era and actively practiced by many keyboard players nowadays. Although computational musicology has studied the theoretical foundations of basso continuo expressed by harmonic and voice-leading rules and constraints, characteristics of basso continuo as an active performing art have been largely overlooked mostly due to a lack of suitable performance data that could be empirically analyzed. This has changed with the introduction of The Aligned Continuo Realization Dataset (ACoRD) and the basso continuo realization-to-score alignment. Basso continuo playing is shaped by stylistic traditions coming from historical treatises, but it also may provide space for showcasing individual performance styles of its practitioners. In this paper, we attempt to explore the question of the presence of personal styles in the basso continuo realizations of players in the ACoRD dataset. We use a historically informed structured representation of basso continuo performance pitch content called griffs and Support Vector Machines to see whether it is possible to classify players based on their performances. The results show that we can identify players from their performances. In addition to the player classification problem, we discuss the elements that make up the individual styles of the players.
[495] arXiv:2604.21827 [pdf, html, other]: Title: Alignment has a Fantasia Problem

Nathanael Jo, Zoe De Simone, Mitchell Gordon, Ashia Wilson

Comments: 9 pages, 2 figures

Journal-ref: ICLR 2026 Workshop HCAIR

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Modern AI assistants are trained to follow instructions, implicitly assuming that users can clearly articulate their goals and the kind of assistance they need. Decades of behavioral research, however, show that people often engage with AI systems before their goals are fully formed. When AI systems treat prompts as complete expressions of intent, they can appear to be useful or convenient, but not necessarily aligned with the users' needs. We call these failures Fantasia interactions. We argue that Fantasia interactions demand a rethinking of alignment research: rather than treating users as rational oracles, AI should provide cognitive support by actively helping users form and refine their intent through time. This requires an interdisciplinary approach that bridges machine learning, interface design, and behavioral science. We synthesize insights from these fields to characterize the mechanisms and failures of Fantasia interactions. We then show why existing interventions are insufficient, and propose a research agenda for designing and evaluating AI systems that better help humans navigate uncertainty in their tasks.
[496] arXiv:2604.21829 [pdf, html, other]: Title: Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu

Comments: Preprint

Subjects: Cryptography and Security (cs.CR)

LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject expert knowledge into general-purpose models, improving performance on specialized tasks. This quality and ease of dissemination drive the emergence of a skill economy: free skill marketplaces already report 90368 published skills, while paid marketplaces report more than 2000 listings and over $100,000 in creator earnings. Yet this growing marketplace also creates a new attack surface, as adversaries can interact with public agent to extract hidden proprietary skill content. We present the first empirical study of black-box skill stealing against LLM agent systems. To study this threat, we first derive an attack taxonomy from prior prompt-stealing methods and build an automated stealing prompt generation agent. This agent starts from model-generated seed prompts, expands them through scenario rationalization and structure injection, and enforces diversity via embedding filtering. This process yields a reproducible pipeline for evaluating agent systems. We evaluate such attacks across 3 commercial agent architectures and 5 LLMs. Our results show that agent skills can be extracted with only 3 interactions, posing a serious copyright risk. To mitigate this threat, we design defenses across three stages of the agent pipeline: input, inference, and output. Although these defenses achieve strong results, the attack remains inexpensive and readily automatable, allowing an adversary to launch repeated attempts with different variants; only one successful attempt is sufficient to compromise the protected skill. Overall, our findings suggest that these copyright risks are largely overlooked across proprietary agent ecosystems. We therefore advocate for more robust defense strategies that provide stronger protection guarantees.
[497] arXiv:2604.21830 [pdf, html, other]: Title: GFlowState: Visualizing the Training of Generative Flow Networks Beyond the Reward

Florian Holeczek, Andreas Hinterreiter, Alex Hernandez-Garcia, Marc Streit, Christina Humer

Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

We present GFlowState, a visual analytics system designed to illuminate the training process of Generative Flow Networks (GFlowNets or GFNs). GFlowNets are a probabilistic framework for generating samples proportionally to a reward function. While GFlowNets have proved to be powerful tools in applications such as molecule and material discovery, their training dynamics remain difficult to interpret. Standard machine learning tools allow metric tracking but do not reveal how models explore the sample space, construct sample trajectories, or shift sampling probabilities during training. Our solution, GFlowState, allows users to analyze sampling trajectories, compare the sample space relative to reference datasets, and analyze the training dynamics. To this end, we introduce multiple views, including a chart of candidate rankings, a state projection, a node-link diagram of the trajectory network, and a transition heatmap. These visualizations enable GFlowNet developers and users to investigate sampling behavior and policy evolution, and to identify underexplored regions and sources of training failure. Case studies demonstrate how the system supports debugging and assessing the quality of GFlowNets across application domains. By making the structural dynamics of GFlowNets observable, our work enhances their interpretability and can accelerate GFlowNet development in practice.
[498] arXiv:2604.21831 [pdf, html, other]: Title: Complexity Classes Arising from Circuits over Finite Algebraic Structures

Piotr Kawałek, Jacek Krzaczkowski

Subjects: Computational Complexity (cs.CC)

Most classical results in circuit complexity theory concern circuits over the Boolean domain. Besides their simplicity and the ease of comparing different languages, the actual architecture of computers is also an important motivating factor. On the other hand, by restricting attention to Boolean circuits, we lose sight of the much richer landscape of circuits over larger domains. Our goal is to bridge these two worlds: to use deep algebraic tools to obtain results in computational complexity theory, including circuit complexity, and to apply results from computational complexity to gain a better understanding of the structure of finite algebras.
In this paper, we propose a unifying algebraic framework which we believe will help achieve this goal. Our work is inspired by branching programs and nonuniform deterministic automata introduced by Barrington, as well as by their generalization proposed by Idziak et al. We begin our investigation by studying the languages recognized by natural classes of algebraic structures. In particular, we characterize language classes recognized by circuits over simple algebras and over algebras from congruence modular varieties.
[499] arXiv:2604.21840 [pdf, html, other]: Title: TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication

Haolin Zhang, William Reber, Yuxuan Zhang, Guofei Gu, Jeff Huang

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Modern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. This shifts URL triage from static classification toward an interactive forensics task: an analyst must actively navigate the page while isolating themselves from potential runtime exploits.
We present TraceScope, a decoupled triage pipeline that operationalizes this workflow at scale. To prevent the observer effect and ensure safety, a sandboxed operator agent drives a real GUI browser guided by visual motivation to elicit page behavior, freezing the session into an immutable evidence bundle. Separately, an adjudicator agent circumvents LLM context limitations by querying evidence on demand to verify a MITRE ATT&CK checklist, and generates an audit-ready report with extracted indicators of compromise (IOCs) and a final verdict.
Evaluated on 708 reachable URLs from existing dataset (241 verified phishing from PhishTank and 467 benign from Tranco-derived crawling), TraceScope achieves 0.94 precision and 0.78 recall, substantially improving recall over three prior visual/reference-based classifiers while producing reproducible, analyst-grade evidence suitable for review. More importantly, we manually curated a dataset of real-world phishing emails to evaluate our system in a practical setting. Our evaluation reveals that TraceScope demonstrates superior performance in a real-world scenario as well, successfully detecting sophisticated phishing attempts that current state-of-the-art defenses fail to identify.
[500] arXiv:2604.21841 [pdf, html, other]: Title: Cross-Modal Phantom: Coordinated Camera-LiDAR Spoofing Against Multi-Sensor Fusion in Autonomous Vehicles

Shahriar Rahman Khan, Raiful Hasan

Subjects: Cryptography and Security (cs.CR)

Autonomous Vehicles (AVs) increasingly depend on Multi-Sensor Fusion (MSF) to combine complementary modalities such as cameras and LiDAR for robust perception. While this redundancy is intended to safeguard against single-sensor failures, the fusion process itself introduces a subtle and underexplored vulnerability. In this work, we investigate whether an attacker can bypass MSF's redundancy by fabricating cross-sensor consistency, making multiple sensors agree on the same false object. We design a coordinated, data-level (early-fusion) attack that emulates the outcome of two synchronized physical spoofing sources: an infrared (IR) projection that induces a false camera detection and a LiDAR signal injection that produces a matching 3D point cluster. Rather than implementing the physical attack hardware, we simulate its sensor-level outcomes by inserting perspective-aware image patches and synthetic LiDAR point clusters aligned in 3D space. This approach preserves the perceptual effects that real IR and IEMI-based spoofing would create at the sensor output. Using 400 KITTI scenes, our large-scale evaluation shows that the coordinated spoofing deceives a state-of-the-art perception model with an 85.5% successful attack rate. These findings provide the first quantitative evidence that malicious cross-modal consistency can compromise MSF-based perception, revealing a critical vulnerability in the core data-fusion logic of modern autonomous vehicle systems.
[501] arXiv:2604.21847 [pdf, html, other]: Title: Sampling from the Hardcore Model on Random Regular Bipartite Graphs above the Uniqueness Threshold

Nicholas Kocurek, Shayan Oveis Gharan, Dante Tjowasi

Comments: 35 pages

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

We design an efficient sampling algorithm to generate samples from the hardcore model on random regular bipartite graphs as long as $\lambda \lesssim \frac{1}{\sqrt{\Delta}}$, where $\Delta$ is the degree. Combined with recent work of Jenssen, Keevash and Perkins this implies an FPRAS for the partition function of the hardcore model on random regular bipartite graphs at any fugacity. Our algorithm is shown by analyzing two new Markov chains that work in complementary regimes. Our proof then proceeds by showing the corresponding simplicial complexes are top-link spectral expanders and appealing to the trickle-down theorem to prove fast mixing.
[502] arXiv:2604.21854 [pdf, html, other]: Title: Bounding the Black Box: A Statistical Certification Framework for AI Risk Regulation

Natan Levy, Gadi Perl

Comments: 11 pages

Subjects: Artificial Intelligence (cs.AI)

Artificial intelligence now decides who receives a loan, who is flagged for criminal investigation, and whether an autonomous vehicle brakes in time. Governments have responded: the EU AI Act, the NIST Risk Management Framework, and the Council of Europe Convention all demand that high-risk systems demonstrate safety before deployment. Yet beneath this regulatory consensus lies a critical vacuum: none specifies what ``acceptable risk'' means in quantitative terms, and none provides a technical method for verifying that a deployed system actually meets such a threshold. The regulatory architecture is in place; the verification instrument is not.
This gap is not theoretical. As the EU AI Act moves into full enforcement, developers face mandatory conformity assessments without established methodologies for producing quantitative safety evidence - and the systems most in need of oversight are opaque statistical inference engines that resist white-box scrutiny.
This paper provides the missing instrument. Drawing on the aviation certification paradigm, we propose a two-stage framework that transforms AI risk regulation into engineering practice. In Stage One, a competent authority formally fixes an acceptable failure probability $\delta$ and an operational input domain $\varepsilon$ - a normative act with direct civil liability implications. In Stage Two, the RoMA and gRoMA statistical verification tools compute a definitive, auditable upper bound on the system's true failure rate, requiring no access to model internals and scaling to arbitrary architectures. We demonstrate how this certificate satisfies existing regulatory obligations, shifts accountability upstream to developers, and integrates with the legal frameworks that exist today.
[503] arXiv:2604.21860 [pdf, html, other]: Title: Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

Naheed Rayhan, Sohely Jahan

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial intent across isolated interactions. TTI leverages automated attacker agents powered by large language models to iteratively test and evade policy enforcement in both commercial and open-source LLMs, marking a departure from conventional jailbreak approaches that typically depend on maintaining persistent conversational context. Our extensive evaluation across state-of-the-art models-including those from OpenAI, Anthropic, Google Gemini, Meta, and prominent open-source alternatives-uncovers significant variations in resilience to TTI attacks, with only select architectures exhibiting substantial inherent robustness. Our automated blackbox evaluation framework also uncovers previously unknown model specific vulnerabilities and attack surface patterns, especially within medical and high stakes domains. We further compare TTI against established adversarial prompting methods and detail practical mitigation strategies, such as session level context aggregation and deep alignment approaches. Our study underscores the urgent need for holistic, context aware defenses and continuous adversarial testing to future proof LLM deployments against evolving multi-turn threats.
[504] arXiv:2604.21861 [pdf, html, other]: Title: Neuromorphic Computing Based on Parametrically-Driven Oscillators and Frequency Combs

Mahadev Sunil Kumar, Adarsh Ganesan

Comments: 7 pages, 5 figures

Subjects: Neural and Evolutionary Computing (cs.NE); Pattern Formation and Solitons (nlin.PS)

Parametrically driven oscillators provide a natural platform for neuromorphic computation, where nonlinear mode coupling and intrinsic dynamics enable both memory and high-dimensional transformation. Here, we investigate a two-mode system exhibiting 2:1 parametric resonance and demonstrate its operation as a reservoir computer across distinct dynamical regimes, including sub-threshold, parametric resonance, and frequency-comb states. By encoding input signals into the drive amplitude and sampling the resulting temporal and spectral responses, we perform one step-ahead prediction of benchmark chaotic systems, including Mackey-Glass, Rossler, and Lorenz dynamics. We find that optimal computational performance is achieved within the parametric resonance regime, where nonlinear interactions are activated while temporal coherence is preserved. In contrast, although frequency-comb states introduce increased spectral dimensionality, their performance is not consistently good across their existence band and also degrades in the chaotic comb regime due to loss of phase coherence. Mapping prediction error over parameter space reveals a direct correspondence between computational capability and the underlying bifurcation structure, with low-error regions aligned with the parametric resonance boundary. We further show that the input modulation, the detuning from the frequency matching condition, damping ratio, and input data rate systematically control the accessible dynamical regimes and thereby the computational performance. These results establish parametric resonance as a robust operating regime for oscillator-based reservoir computing and provide design principles for tuning physical systems toward optimal neuromorphic functionality.
[505] arXiv:2604.21864 [pdf, other]: Title: FAccT-Checked: A Narrative Review of Authority Reconfigurations and Retention in AI-Mediated Journalism

Stefano Sorrentino, Matilde Barbini, Daniel Gatica-Perez

Comments: ACM FAccT 2026 accepted paper. Due to the arXiv 1920 characters limit the Abstract here is shortened. Refer to the full paper for the entire Abstract

Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Building on recent interpretivist approaches, we conduct a critical narrative review across journalism studies, human-computer interaction, and FAccT scholarship, conceptualizing editorial authority as the conjunction of decision rights, epistemic warrant, and responsibility. We provide a comprehensive theoretical framework for addressing how concerns on fairness, accountability and transparency emerge, interact, and persist within AI mediated journalistic practice. We identify and describe two concurrent authority reconfigurations driven by AI adoption. First, an internal migration of authority, in which editorial judgment is progressively deferred to large language models (LLMs) embedded within newsroom workflows. This migration occurs not through explicit policy decisions, but through interactional, cognitive, and organizational mechanisms that legitimize AI generated outputs while obscuring responsibility and weakening individual and professional agency. Second, we analyze an external migration of authority, whereby decision making power shifts from news organizations toward platforms, vendors, and infrastructural providers that supply AI systems and distribution channels, exacerbating existing power asymmetries within the media ecosystem. Unaddressed, these reconfigurations risk rendering fairness hard to maintain, accountability difficult to assign and transparency performative. We examine participatory approaches to AI design and deployment in journalism as potential mechanisms for retaining or reclaiming editorial authority. We critically assess both their promise and their structural limitations, highlighting how participation can either meaningfully redistribute authority or function as a tokenistic practice that leaves underlying power relations intact.
[506] arXiv:2604.21871 [pdf, html, other]: Title: Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong, Jaehong Kim, Meeyoung Cha

Comments: ACL-Findings 2026

Subjects: Computation and Language (cs.CL)

Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two experimental dimensions: crime severity and relational closeness. Our study evaluates three distinct perspectives: (1) moral rightness (prescriptive norms), (2) predicted human behavior (descriptive social expectations), and (3) autonomous model decision-making. By analyzing the reasoning processes, we identify a clear cross-perspective divergence: while moral rightness remains consistently fairness-oriented, predicted human behavior shifts significantly toward loyalty as relational closeness increases. Crucially, model decisions align with moral rightness judgments rather than their own behavioral predictions. This inconsistency suggests that LLM decision-making prioritizes rigid, prescriptive rules over the social sensitivity present in their internal world-modeling, which poses a gap that may lead to significant misalignments in real-world deployments.
[507] arXiv:2604.21873 [pdf, html, other]: Title: Grounding Video Reasoning in Physical Signals

Alibay Osmanli, Zixu Cheng, Shaogang Gong

Comments: Benchmark for Grounding Video Reasoning in Physical Signals

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Physical video understanding requires more than naming an event correctly. A model can answer a question about pouring, sliding, or collision from textual regularities while still failing to localize the event in time or space. We introduce a grounded benchmark for physical video understanding that extends the what--when--where evaluation structure of V-STaR to four video sources, six physics domains, three prompt families (physics, vstar_like, and neutral_rstr), and four input conditions (original, shuffled, ablated, and frame-masked). The benchmark contains 1,560 base video clips from SSV2, YouCook2, HoloAssist, and Roundabout-TAU. Each clip is first converted into a shared grounded event record, and the three query families are derived from that record. Temporal and spatial targets are shared across prompt families, while the non-physics families use deterministic family-appropriate semantic a_what targets derived from the same record. Across models and prompt families, physics remains the strongest regime overall, vstar_like is the clearest non-physics semantic comparison, and neutral_rstr behaves as a harder templated control. Prompt-family robustness is selective rather than universal, perturbation gains cluster in weak original cases, and spatial grounding is the weakest across settings. These results suggest that video Q&A reasoning benchmarks shall report physically grounded, prompt-aware, and perturbation-aware diagnostics alongside aggregate accuracy.
[508] arXiv:2604.21877 [pdf, html, other]: Title: A simple $(2+ε)$-approximation for knapsack interdiction

Noah Weninger

Comments: 7 pages

Subjects: Data Structures and Algorithms (cs.DS)

In the knapsack interdiction problem, there are $n$ items, each with a non-negative profit, interdiction cost, and packing weight. There is also an interdiction budget and a capacity. The objective is to select a set of items to interdict (delete) subject to the budget which minimizes the maximum profit attainable by packing the remaining items subject to the capacity. We present a $(2+\epsilon)$-approximation running in $O(n^3\epsilon^{-1}\log(\epsilon^{-1}\log\sum_i p_i))$ time. Although a polynomial-time approximation scheme (PTAS) is already known for this problem, our algorithm is considerably simpler and faster. The approach also generalizes naturally to a $(1+t+\epsilon)$-approximation for $t$-dimensional knapsack interdiction with running time $O(n^{t+2}\epsilon^{-1}\log(\epsilon^{-1}\log\sum_i p_i))$.
[509] arXiv:2604.21878 [pdf, other]: Title: Gradual Voluntary Participation: A Framework for Participatory AI Governance in Journalism

Matilde Barbini, Stefano Sorrentino, Daniel Gatica-Perez

Subjects: Human-Computer Interaction (cs.HC)

The integration of AI into journalism challenges participatory design (PD), particularly with respect to stakeholder influence, workplace perceptions, and organizational dynamics. Traditional PD assumes that users can shape technologies, yet AI systems resist influence due to opaque data, fixed architectures, and inaccessible objectives. Through interviews with 10 journalists, we identify the perception gap, showing that trust in AI depends on perceived agency within workplace participatory workflows. Informed by these findings, we introduce the Gradual Voluntary Participation (GVP) framework in journalism and its five core principles, reconceptualizing participation as a gradual and voluntary process that can be operationalized at the newsroom level, beyond fixed workshops or one-time preference-elicitation campaigns. Addressing epistemic burdens, participatory ceilings, and performative consultations, GVP treats gradualism and voluntariness as design dimensions that shape perception, legitimacy, and ownership. Moving beyond unidimensional ladder metaphors and adopting a bidimensional matrix structure, the framework maps stakeholders across depth and scope, offering a new model for local participatory AI governance that balances technological transformation with stakeholder empowerment in rapidly evolving hybrid workplaces.
[510] arXiv:2604.21879 [pdf, html, other]: Title: Addressing Image Authenticity When Cameras Use Generative AI

Umar Masud, Abhijith Punnappurath, Luxi Zhao, David B. Lindell, Michael S. Brown

Comments: To appear in CVPR 2026 Workshop on Authenticity and Provenance in the Age of Generative AI

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The ability of generative AI (GenAI) methods to photorealistically alter camera images has raised awareness about the authenticity of images shared online. Interestingly, images captured directly by our cameras are considered authentic and faithful. However, with the increasing integration of deep-learning modules into cameras' capture-time hardware -- namely, the image signal processor (ISP) -- there is now a potential for hallucinated content in images directly output by our cameras. Hallucinated capture-time image content is typically benign, such as enhanced edges or texture, but in certain operations, such as AI-based digital zoom or low-light image enhancement, hallucinations can potentially alter the semantics and interpretation of the image content. As a result, users may not realize that the content in their camera images is not authentic. This paper addresses this issue by enabling users to recover the 'unhallucinated' version of the camera image to avoid misinterpretation of the image content. Our approach works by optimizing an image-specific multi-layer perceptron (MLP) decoder together with a modality-specific encoder so that, given the camera image, we can recover the image before hallucinated content was added. The encoder and MLP are self-contained and can be applied post-capture to the image without requiring access to the camera ISP. Moreover, the encoder and MLP decoder require only 180 KB of storage and can be readily saved as metadata within standard image formats such as JPEG and HEIC.
[511] arXiv:2604.21881 [pdf, html, other]: Title: SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization

Guoyu Li, Yang Cao, Lucas H L Ng, Alexander Charlton, Qianzhou Wang, Will Punter, Philippos Papaphilippou, Ce Guo, Hongxiang Fan, Wayne Luk, Saman Amarasinghe, Ajay Brahmakshatriya

Comments: 10 pages, 8 figures, 2 tables, accepted by The 34th IEEE International Symposium on Field-Programmable Custom Computing Machines

Subjects: Networking and Internet Architecture (cs.NI); Hardware Architecture (cs.AR)

With network requirements diverging across emerging applications, latency-critical services demand minimal logic delay, while hyperscale training and collectives require sustained line-rate throughput for synchronized bulk transfers. This divergence creates an urgent need for custom network switches tailored to specialized protocols and application-specific traffic patterns. This paper presents SPAC (Switch and Protocol Adaptive Customization), a novel approach that automates the generation of FPGA-based network switches co-optimized for custom protocols and application-specific traffic patterns. SPAC introduces a unified workflow with a domain-specific language (DSL) for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration (DSE) engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. We demonstrate the efficacy of the domain-specific adaptation of SPAC across a spectrum of real-world scenarios, spanning from latency-sensitive sensor and HFT networks to hyperscale datacenter fabrics. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource consumption and packet drop rate.
[512] arXiv:2604.21882 [pdf, html, other]: Title: Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

Yuto Nishida, Naoki Shikoda, Yosuke Kishinami, Ryo Fujii, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe

Comments: Accepted to ACL 2026 Main

Subjects: Computation and Language (cs.CL)

Understanding what kinds of factual knowledge large language models (LLMs) memorize is essential for evaluating their reliability and limitations. Entity-based QA is a common framework for analyzing non-verbatim memorization, but typical evaluations query each entity using a single canonical surface form, making it difficult to disentangle fact memorization from access through a particular name. We introduce RedirectQA, an entity-based QA dataset that uses Wikipedia redirect information to associate Wikidata factual triples with categorized surface forms for each entity, including alternative names, abbreviations, spelling variants, and common erroneous forms. Across 13 LLMs, we examine surface-conditioned factual memorization and find that prediction outcomes often change when only the entity surface form changes. This inconsistency is category-dependent: models are more robust to minor orthographic variations than to larger lexical variations such as aliases and abbreviations. Frequency analyses further suggest that both entity- and surface-level frequencies are associated with accuracy, and that entity frequency often contributes beyond surface frequency. Overall, factual memorization appears neither purely surface-specific nor fully surface-invariant, highlighting the importance of surface-form diversity in evaluating non-verbatim memorization.
[513] arXiv:2604.21885 [pdf, html, other]: Title: A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents

Praval Sharma

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Event extraction is essential for event understanding and analysis. It supports tasks such as document summarization and decision-making in emergency scenarios. However, existing event extraction approaches have limitations: (1) closed-domain algorithms are restricted to predefined event types and thus rarely generalize to unseen types and (2) open-domain event extraction algorithms, capable of handling unconstrained event types, have largely overlooked the potential of large language models (LLMs) despite their advanced abilities. Additionally, they do not explicitly model document-level contextual, structural, and semantic reasoning, which are crucial for effective event extraction but remain challenging for LLMs due to lost-in-the-middle phenomenon and attention dilution. To address these limitations, we propose multimodal open-domain event extraction, MODEE , a novel approach for open-domain event extraction that combines graph-based learning with text-based representation from LLMs to model document-level reasoning. Empirical evaluations on large datasets demonstrate that MODEE outperforms state-of-the-art open-domain event extraction approaches and can be generalized to closed-domain event extraction, where it outperforms existing algorithms.
[514] arXiv:2604.21887 [pdf, other]: Title: Guaranteed inf-sup bounds and existence verification for semilinear elliptic problems via nonconforming finite elements

Benedikt Gräßle

Subjects: Numerical Analysis (math.NA)

A Newton--Kantorovich-type argument enables the a posteriori existence verification of a unique regular root near a computed approximation, purely from computable data. This framework allows for non-selfadjoint problems and extends the existing verification theory to nonconforming discretisations. A key ingredient is a guaranteed lower bound on the continuous inf-sup constant from a quasi-optimal nonconforming discretisation that enables a novel a priori error estimator. All quantities are obtained by post-processing a single discretisation; convergence rates are proved. The theory is applied to a fourth-order formulation of the stationary two-dimensional Navier--Stokes equations and illustrated by numerical experiments.
[515] arXiv:2604.21889 [pdf, html, other]: Title: TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

Jun Wang, Ziyin Zhang, Rui Wang, Hang Yu, Peng Di, Rui Wang

Comments: Accepted to ACL 2026 Industry Track

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionable intelligence from this data remains challenging due to extreme noise, high throughput, and semantic complexity of diverse business lines. In this paper, we present TingIS, an end-to-end system designed for enterprise-grade incident discovery. At the core of TingIS is a multi-stage event linking engine that synergizes efficient indexing techniques with Large Language Models (LLMs) to make informed decisions on event merging, enabling the stable extraction of actionable incidents from just a handful of diverse user descriptions. This engine is complemented by a cascaded routing mechanism for precise business attribution and a multi-dimensional noise reduction pipeline that integrates domain knowledge, statistical patterns, and behavioral filtering. Deployed in a production environment handling a peak throughput of over 2,000 messages per minute and 300,000 messages per day, TingIS achieves a P90 alert latency of 3.5 minutes and a 95\% discovery rate for high-priority incidents. Benchmarks constructed from real-world data demonstrate that TingIS significantly outperforms baseline methods in routing accuracy, clustering quality, and Signal-to-Noise Ratio.
[516] arXiv:2604.21890 [pdf, html, other]: Title: EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

Praval Sharma, Ashok Samal, Leen-Kiat Soh, Deepti Joshi

Subjects: Computation and Language (cs.CL)

Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development have limitations, including limited coverage of event types in closed-domain settings and a lack of large, manually verified dataset in open-domain settings. To address these limitations, we create EVENT5Ws , a large, manually annotated, and statistically verified open-domain event extraction dataset. We design a systematic annotation pipeline to create the dataset and provide empirical insights into annotation complexity. Using EVENT5Ws, we evaluate state-of-the-art pre-trained large language models and establish a benchmark for future research. We further show that models trained on EVENT5Ws generalize effectively to datasets from different geographical contexts, which demonstrates its potential for developing generalizable algorithms. Finally, we summarize the lessons learned during the dataset development and provide recommendations to support future large-scale dataset development.
[517] arXiv:2604.21891 [pdf, html, other]: Title: A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment

Muhy Eddin Za'ter, Anna Van Boven, Bri-Mathias Hodge, Kyri Baker

Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)

Maintaining instantaneous balance between electricity supply and demand is critical for reliability and grid instability. System operators achieve this through solving the task of Unit Commitment (UC),ca high dimensional large-scale Mixed-integer Linear Programming (MILP) problem that is strictly and heavily governed by the grid physical constraints. As grid integrate variable renewable sources, and new technologies such as long duration storage in the grid, UC must be optimally solved for multi-day horizons and potentially with greater frequency. Therefore, traditional MILP solvers increasingly struggle to compute solutions within these tightening operational time limits. To bypass these computational bottlenecks, this paper proposes a novel framework utilizing a transformer-based architecture to predict generator commitment schedules over a 72-hour horizon. Also, because raw predictions in highly dimensional spaces often yield physically infeasible results, the pipeline integrates the self-attention network with deterministic post-processing heuristics that systematically enforce minimum up/down times and minimize excess capacity. Finally, these refined predictions are utilized as a warm start for a downstream MILP solver, while employing a confidence-based variable fixation strategy to drastically reduce the combinatorial search space. Validated on a single-bus test system, the complete multi-stage pipeline achieves 100\% feasibility and significantly accelerates computation times. Notably, in approximately 20\% of test instances, the proposed model reached a feasible operational schedule with a lower overall system cost than relying solely on the solver.
[518] arXiv:2604.21894 [pdf, other]: Title: Task-Driven Co-Design of Heterogeneous Multi-Robot Systems

Maximilian Stralz, Meshal Alharbi, Yujun Huang, Gioele Zardini

Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)

Designing multi-agent robotic systems requires reasoning across tightly coupled decisions spanning heterogeneous domains, including robot design, fleet composition, and planning. Much effort has been devoted to isolated improvements in these domains, whereas system-level co-design considering trade-offs and task requirements remains underexplored. In this work, we present a formal and compositional framework for the task-driven co-design of heterogeneous multi-robot systems. Building on a monotone co-design theory, we introduce general abstractions of robots, fleets, planners, executors, and evaluators as interconnected design problems with well-defined interfaces that are agnostic to both implementations and tasks. This structure enables efficient joint optimization of robot design, fleet composition, and planning under task-specific performance constraints. A series of case studies demonstrates the capabilities of the framework. Various component models can be seamlessly incorporated, including new robot types, task profiles, and probabilistic sensing objectives, while non-obvious design alternatives are systematically uncovered with optimality guarantees. The results highlight the flexibility, scalability, and interpretability of the proposed approach, and illustrate how formal co-design enables principled reasoning about complex heterogeneous multi-robot systems.
[519] arXiv:2604.21896 [pdf, html, other]: Title: Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

Chee Wei Tan, Yuchen Wang, Shangxin Guo

Comments: 14 figures, 3 tables

Subjects: Artificial Intelligence (cs.AI)

This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy LLM-powered game agents while actively engaging with AI-driven strategies. The LLM-based chatbot, integrated within Nemobot, demonstrates its capabilities across four distinct classes of games. For dictionary-based games, it compresses state-action mappings into efficient, generalized models for rapid adaptability. In rigorously solvable games, it employs mathematical reasoning to compute optimal strategies and generates human-readable explanations for its decisions. For heuristic-based games, it synthesizes strategies by combining insights from classical minimax algorithms (see, e.g., shannon1950chess) with crowd-sourced data. Finally, in learning-based games, it utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies through trial-and-error and imitation learning. Nemobot amplifies this framework by offering a programmable environment where users can experiment with tool-augmented generation and fine-tuning of strategic game agents. From strategic games to role-playing games, Nemobot demonstrates how AI agents can achieve a form of self-programming by integrating crowdsourced learning and human creativity to iteratively refine their own logic. This represents a step toward the long-term goal of self-programming AI.
[520] arXiv:2604.21897 [pdf, html, other]: Title: Mapping the Political Discourse in the Brazilian Chamber of Deputies: A Multi-Faceted Computational Approach

Flávio Soriano, Victoria F. Mello, Pedro B. Rigueira, Gisele L. Pappa, Wagner Meira Jr., Ana Paula Couto da Silva, Jussara M. Almeida

Comments: Accepted paper at ICWSM 2026

Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

Analyses of legislative behavior often rely on voting records, overlooking the rich semantic and rhetorical content of political speech. In this paper, we ask three complementary questions about parliamentary discourse: how things are said, what is being said, and who is speaking in discursively similar ways. To answer these questions, we introduce a scalable and generalizable computational framework that combines diachronic stylometric analysis, contextual topic modeling, and semantic clustering of deputies' speeches. We apply this framework to a large-scale case study of the Brazilian Chamber of Deputies, using a corpus of over 450,000 speeches from 2003 to 2025. Our results show a long-term stylistic shift toward shorter and more direct speeches, a legislative agenda that reorients sharply in response to national crises, and a granular map of discursive alignments in which regional and gender identities often prove more salient than formal party affiliation. More broadly, this work offers a robust methodology for analyzing parliamentary discourse as a multidimensional phenomenon that complements traditional vote-based approaches.
[521] arXiv:2604.21898 [pdf, html, other]: Title: Institutionalizing Best Practices in Research Computing: A Framework and Case Study for Improving User Onboarding

Ayush Chaturvedi, Rob Pokorney, Elyn Fritz-Waters, Charlotte Rouse, Gary Bax, Daryl Spencer, Craig Pohl

Subjects: Other Computer Science (cs.OH); Computers and Society (cs.CY); Software Engineering (cs.SE)

Research computing centers around the world struggle with onboarding new users. Subject matter experts, researchers, and principal investigators are often overwhelmed by the complex infrastructure and software offerings designed to support diverse research domains at large academic and national institutions. As a result, users frequently fall into confusion and complexity to access these resources, despite the availability of documentation, tutorials, interactive trainings and other similar resources. Through this work, we present a framework designed to improve new-user onboarding experience. We also present an empirical validation through its application within the Research Infrastructure Services at Washington University in St. Louis.
[522] arXiv:2604.21901 [pdf, html, other]: Title: GiVA: Gradient-Informed Bases for Vector-Based Adaptation

Neeraj Gangwar, Rishabh Deshmukh, Michael Shavlovsky, Hancao Li, Vivek Mittal, Lexing Ying, Nickvash Kani

Comments: Accepted to AISTATS 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

As model sizes continue to grow, parameter-efficient fine-tuning has emerged as a powerful alternative to full fine-tuning. While LoRA is widely adopted among these methods, recent research has explored vector-based adaptation methods due to their extreme parameter efficiency. However, these methods typically require substantially higher ranks than LoRA to match its performance, leading to increased training costs. This work introduces GiVA, a gradient-based initialization strategy for vector-based adaptation. It achieves training times comparable to LoRA and maintains the extreme parameter efficiency of vector-based adaptation. We evaluate GiVA across diverse benchmarks, including natural language understanding, natural language generation, and image classification. Experiments show that our approach consistently outperforms or achieves performance competitive with existing vector-based adaptation methods and LoRA while reducing rank requirements by a factor of eight ($8\times$).
[523] arXiv:2604.21903 [pdf, html, other]: Title: A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models

Max Defez, Filippo Quarenghi, Mathieu Vrac, Stephan Mandt, Tom Beucler

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep-learning video super-resolution has progressed rapidly, but climate applications typically super-resolve (increase resolution) either space or time, and joint spatiotemporal models are often designed for a single pair of super-resolution (SR) factors (upscaling spatial and temporal ratio between the low-resolution sequence and the high-resolution sequence), limiting transfer across spatial resolutions and temporal cadences (frame rates). We present a scale-adaptive framework that reuses the same architecture across factors by decomposing spatiotemporal SR into a deterministic prediction of the conditional mean, with attention, and a residual conditional diffusion model, with an optional mass-conservation (same precipitation amount in inputs and outputs) transform to preserve aggregated totals. Assuming that larger SR factors primarily increase underdetermination (hence required context and residual uncertainty) rather than changing the conditional-mean structure, scale adaptivity is achieved by retuning three factor-dependent hyperparameters before retraining: the diffusion noise schedule amplitude beta (larger for larger factors to increase diversity), the temporal context length L (set to maintain comparable attention horizons across cadences) and optionally a third, the mass-conservation function f (tapered to limit the amplification of extremes for large factors). Demonstrated on reanalysis precipitation over France (Comephore), the same architecture spans super-resolution factors from 1 to 25 in space and 1 to 6 in time, yielding a reusable architecture and tuning recipe for joint spatiotemporal super-resolution across scales.
[524] arXiv:2604.21904 [pdf, html, other]: Title: UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

Yanran Zhang, Wenzhao Zheng, Yifei Li, Bingyao Yu, Yu Zheng, Lei Chen, Jiwen Lu, Jie Zhou

Comments: Accepted to CVPR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent, development, these two fields have evolved distinct architectural paradigms: the former predominantly relies on generative networks, while the latter favors discriminative frameworks. A recent trend in both domains is the use of adversarial information to enhance performance, revealing potential for synergy. However, the significant architectural divergence between them presents considerable challenges. Departing from previous approaches, we propose UniGenDet: a Unified generative-discriminative framework for co-evolutionary image Generation and generated image Detection. To bridge the task gap, we design a symbiotic multimodal self-attention mechanism and a unified fine-tuning algorithm. This synergy allows the generation task to improve the interpretability of authenticity identification, while authenticity criteria guide the creation of higher-fidelity images. Furthermore, we introduce a detector-informed generative alignment mechanism to facilitate seamless information exchange. Extensive experiments on multiple datasets demonstrate that our method achieves state-of-the-art performance. Code: \href{this https URL}{this https URL}.
[525] arXiv:2604.21905 [pdf, html, other]: Title: Low-Rank Adaptation Redux for Large Models

Bingcong Li, Yilang Zhang, Georgios B. Giannakis

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Low-rank adaptation (LoRA) has emerged as the de facto standard for parameter-efficient fine-tuning (PEFT) of foundation models, enabling the adaptation of billion-parameter networks with minimal computational and memory overhead. Despite its empirical success and rapid proliferation of variants, it remains elusive which architectural choices, optimization techniques, and deployment constraints should guide practical method selection. This overview revisits LoRA through the lens of signal processing (SP), bridging modern adapter designs with classical low-rank modeling tools and inverse problems, as well as highlighting how SP principles can inform principled advances of fine-tuning approaches. Rather than providing a comprehensive enumeration and empirical comparisons of LoRA variants, emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness. These advances are categorized into three complementary axes: architectural design, efficient optimization, and pertinent applications. The first axis builds on singular value decomposition (SVD)-based factorization, rank-augmentation constructions, and cross-layer tensorization, while the second axis deals with initialization, alternating solvers, gauge-invariant optimization, and parameterization-aware methods. Beyond fine-tuning, emerging applications of LoRA are accounted across the entire lifecycle of large models, ranging from pre- and post-training to serving/deployment. Finally, open research directions are outlined at the confluence of SP and deep learning to catalyze a bidirectional frontier: classical SP tools provide a principled vocabulary for designing principled PEFT methods, while the unique challenges facing modern deep learning, especially the overwhelming scale and prohibitive overhead, also offer new research lines benefiting the SP community in return.
[526] arXiv:2604.21906 [pdf, html, other]: Title: A structure-preserving semi-implicit finite volume scheme on vertex-staggered unstructured meshes

Elena Bernardelli, Elena Gaburro, Michael Dumbser

Subjects: Numerical Analysis (math.NA)

We present a novel structure-preserving semi-implicit finite volume method on vertex-based staggered meshes for the compatible discretization of first order systems of time-dependent partial differential equations (PDEs). The method preserves divergence-free and curl-free vector fields exactly thanks to the compatible vertex-staggered discretization of the state variables on unstructured grids that are constituted by primal Delaunay triangles and their dual polygons. For the weakly compressible Euler equations, the scheme is asymptotic preserving, yielding a consistent discretization of the incompressible limit as the Mach number goes to zero. The new scheme applies to a broad spectrum of PDEs, including the weakly compressible and incompressible Euler and Navier-Stokes equations, the incompressible magnetohydrodynamics (MHD) system, and the incompressible version of the first-order hyperbolic Godunov-Peshkov-Romenski (GPR) model for continuum mechanics. The computational domain is covered by a primal triangular mesh and a dual tessellation made of so-called star polygons. Scalar quantities (pressure, density, viscous stress) are defined at nodes, with pressure updated implicitly in a continuous finite element fashion, yielding a symmetric and positive definite pressure system. Instead, vector fields (velocity, momentum, magnetic and distortion fields) are stored at triangle barycenters and evolved explicitly using a compatible finite volume scheme. Thanks to the semi-implicit discretization, the CFL condition is independent of the sound speed, allowing simulations at low Mach numbers. The fully compatible formulation ensures exactly divergence-free velocity field in the incompressible limit, exactly divergence-free magnetic field for MHD, and exactly curl-free inverse deformation gradient in solid mechanics. The method is validated through a wide set of test cases.
[527] arXiv:2604.21907 [pdf, html, other]: Title: Equity Bias: An Ethical Framework for AI Design

Mary Lockwood

Comments: 19 pages including references, 1 figure

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

Equity Bias is a philosophical and practical framework for building smarter, more equitable AI systems. Grounded in hermeneutic philosophy and epistemic injustice theory, it treats bias not as an error to eliminate but as a reflection of whose knowledge is encoded into systems. While traditional approaches aim to reduce or remove bias, Equity Bias instead makes bias transparent and contestable. In doing so, it broadens whose perspectives shape AI and provides a lens for understanding AI systems as interpretive agents. The framework introduces a three-phase AI Life Cycle methodology: 'Equity Archaeology' (mapping knowledge and assumptions), 'Co-Creating Meaning' (participatory design), and 'Ongoing Accountability' (continuous evaluation). Equity Bias guides developers, researchers, and policymakers towards AI that is ethically accountable and capable of addressing complex real-world challenges.
[528] arXiv:2604.21909 [pdf, html, other]: Title: Directional Confusions Reveal Divergent Inductive Biases Through Rate-Distortion Geometry in Human and Machine Vision

Leyla Roksan Caglar, Pedro A.M. Mediano, Baihan Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Neurons and Cognition (q-bio.NC)

Humans and modern vision models can reach similar classification accuracy while making systematically different kinds of mistakes - differing not in how often they err, but in who gets mistaken for whom, and in which direction. We show that these directional confusions reveal distinct inductive biases that are invisible to accuracy alone. Using matched human and deep vision model responses on a natural-image categorization task under 12 perturbation types, we quantify asymmetry in confusion matrices and link it to generalization geometry through a Rate-Distortion (RD) framework, summarized by three geometric signatures (slope (beta), curvature (kappa)) and efficiency (AUC). We find that humans exhibit broad but weak asymmetries, whereas deep vision models show sparser, stronger directional collapses. Robustness training reduces global asymmetry but fails to recover the human-like breadth-strength profile of graded similarity. Mechanistic simulations further show that different asymmetry organizations shift the RD frontier in opposite directions, even when matched for performance. Together, these results position directional confusions and RD geometry as compact, interpretable signatures of inductive bias under distribution shift.
[529] arXiv:2604.21910 [pdf, html, other]: Title: From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation

Bartosz Balis, Michal Orzechowski, Piotr Kica, Michal Dygas, Michal Kuszewski

Subjects: Artificial Intelligence (cs.AI)

Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an agentic architecture that closes this gap through three layers: an LLM interprets natural language into structured intents (semantic layer); validated generators produce reproducible workflow DAGs (deterministic layer); and domain experts author ``Skills'': markdown documents encoding vocabulary mappings, parameter constraints, and optimization strategies (knowledge layer). This decomposition confines LLM non-determinism to intent extraction: identical intents always yield identical workflows. We implement and evaluate the architecture on the 1000 Genomes population genetics workflow and Hyperflow WMS running on Kubernetes. In an ablation study on 150 queries, Skills raise full-match intent accuracy from 44% to 83%; skill-driven deferred workflow generation reduces data transfer by 92\%; and the end-to-end pipeline completes queries on Kubernetes with LLM overhead below 15 seconds and cost under $0.001 per query.
[530] arXiv:2604.21911 [pdf, html, other]: Title: When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

Pegah Khayatan, Jayneel Parekh, Arnaud Dapogny, Mustafa Shukor, Alasdair Newson, Matthieu Cord

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the vision backbone or the dominance of the language component, yet the relative importance of these factors remains unclear. To resolve this ambiguity, We propose HalluScope, a benchmark to better understand the extent to which different factors induce hallucinations. Our analysis indicates that hallucinations largely stem from excessive reliance on textual priors and background knowledge, especially information introduced through textual instructions. To mitigate hallucinations induced by textual instruction priors, we propose HalluVL-DPO, a framework for fine-tuning off-the-shelf LVLMs towards more visually grounded responses. HalluVL-DPO leverages preference optimization using a curated training dataset that we construct, guiding the model to prefer grounded responses over hallucinated ones. We demonstrate that our optimized model effectively mitigates the targeted hallucination failure mode, while preserving or improving performance on other hallucination benchmarks and visual capability evaluations. To support reproducibility and further research, we will publicly release our evaluation benchmark, preference training dataset, and code at this https URL .
[531] arXiv:2604.21914 [pdf, html, other]: Title: VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

Songen Gu, Yuhang Zheng, Weize Li, Yupeng Zheng, Yating Feng, Xiang Li, Yilun Chen, Pengfei Li, Wenchao Ding

Comments: This paper has been accepted to ICRA 2026

Subjects: Robotics (cs.RO)

Recently, end-to-end robotic manipulation models have gained significant attention for their generalizability and scalability. However, they often suffer from limited robustness to camera viewpoint changes when training with a fixed camera. In this paper, we propose VistaBot, a novel framework that integrates feed-forward geometric models with video diffusion models to achieve view-robust closed-loop manipulation without requiring camera calibration at test time. Our approach consists of three key components: 4D geometry estimation, view synthesis latent extraction, and latent action learning. VistaBot is integrated into both action-chunking (ACT) and diffusion-based ($\pi_0$) policies and evaluated across simulation and real-world tasks. We further introduce the View Generalization Score (VGS) as a new metric for comprehensive evaluation of cross-view generalization. Results show that VistaBot improves VGS by 2.79$\times$ and 2.63$\times$ over ACT and $\pi_0$, respectively, while also achieving high-quality novel view synthesis. Our contributions include a geometry-aware synthesis model, a latent action planner, a new benchmark metric, and extensive validation across diverse environments. The code and models will be made publicly available.
[532] arXiv:2604.21915 [pdf, html, other]: Title: Vista4D: Video Reshooting with 4D Point Clouds

Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca, Yash Kant, Ryan Burgert, Yuancheng Xu, Koichi Namekata, Yiwei Zhao, Bolei Zhou, Micah Goldblum, Paul Debevec, Ning Yu

Comments: 24 pages, 20 figures, CVPR 2026, see project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, our method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint. Existing video reshooting methods often struggle with depth estimation artifacts of real-world dynamic videos, while also failing to preserve content appearance and failing to maintain precise camera control for challenging new trajectories. We build a 4D-grounded point cloud representation with static pixel segmentation and 4D reconstruction to explicitly preserve seen content and provide rich camera signals, and we train with reconstructed multiview dynamic data for robustness against point cloud artifacts during real-world inference. Our results demonstrate improved 4D consistency, camera control, and visual quality compared to state-of-the-art baselines under a variety of videos and camera paths. Moreover, our method generalizes to real-world applications such as dynamic scene expansion and 4D scene recomposition. See our project page for results, code, and models: this https URL
[533] arXiv:2604.21916 [pdf, html, other]: Title: MathDuels: Evaluating LLMs as Problem Posers and Solvers

Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik

Subjects: Computation and Language (cs.CL); Software Engineering (cs.SE)

As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We introduce MathDuels, a self-play benchmark in which models occupy dual roles: each authors math problems under adversarial prompting and solves problems authored by every other participant. Problems are produced through a three-stage generation pipeline (meta-prompting, problem generation, and difficulty amplification), and validated by an independent verifier that excludes ill-posed questions. A Rasch model (Rasch, 1993) jointly estimates solver abilities and problem difficulties; author quality is derived from the difficulties of each model's authored problems. Experiments across 19 frontier models reveal that authoring and solving capabilities are partially decoupled, and that dual-role evaluation reveals capability separations invisible in single-role benchmarks. As newer models enter the arena, they produce problems that defeat previously dominant solvers, so the benchmark's difficulty co-evolves with participant strength rather than saturating at a fixed ceiling. We host a public leaderboard that updates as new models are released.
[534] arXiv:2604.21917 [pdf, html, other]: Title: CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis

Arunabh Majumdar

Comments: Accepted at AIware 2026 (3rd ACM International Conference on AI-Powered Software, Montreal, July 6-7, 2026). 4 pages

Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

We present CrossCommitVuln-Bench, a curated benchmark of 15 real-world Python vulnerabilities (CVEs) in which the exploitable condition was introduced across multiple commits - each individually benign to per-commit static analysis - but collectively critical. We manually annotate each CVE with its contributing commit chain, a structured rationale for why each commit evades per-commit analysis, and baseline evaluations using Semgrep and Bandit in both per-commit and cumulative scanning modes. Our central finding: the per-commit detection rate (CCDR) is 13% across all 15 vulnerabilities - 87% of chains are invisible to per-commit SAST. Critically, both per-commit detections are qualitatively poor: one occurs on commits framed as security fixes (where developers suppress the alert), and the other detects only the minor hardcoded-key component while completely missing the primary vulnerability (200+ unprotected API endpoints). Even in cumulative mode (full codebase present), the detection rate is only 27%, confirming that snapshot-based SAST tools often miss vulnerabilities whose introduction spans multiple commits. The dataset, annotation schema, evaluation scripts, and reproducible baselines are released under open-source licenses to support research on cross-commit vulnerability detection.
[535] arXiv:2604.21921 [pdf, html, other]: Title: Context Unrolling in Omni Models

Ceyuan Yang, Zhijie Lin, Yang Zhao, Fei Xiao, Hao He, Qi Zhao, Chaorui Deng, Kunchang Li, Zihan Ding, Yuwei Guo, Fuyun Wang, Fangqi Zhu, Xiaonan Nie, Shenhan Zhu, Shanchuan Lin, Hongsheng Li, Weilin Huang, Guang Shi, Haoqi Fan

Comments: Report

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This process enables the model to aggregate complementary information across heterogeneous modalities, facilitating a more faithful approximation of the shared multimodal knowledge manifold and improving downstream reasoning fidelity. As a result, Omni achieves strong performance on both multimodal generation and understanding benchmarks, while demonstrating advanced multimodal reasoning capabilities, including in-context generation of text, image, video, and 3D geometry.
[536] arXiv:2604.21922 [pdf, html, other]: Title: Characterizing Streaming Decidability of CSPs via Non-Redundancy

Amatya Sharma, Santhoshini Velusamy

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

We study the single-pass streaming complexity of deciding satisfiability of Constraint Satisfaction Problems (CSPs). A CSP is specified by a constraint language $\Gamma$, that is, a finite set of $k$-ary relations over the domain $[q] = \{0, \dots, q-1\}$. An instance of $\mathsf{CSP}(\Gamma)$ consists of $m$ constraints over $n$ variables $x_1, \ldots, x_n$ taking values in $[q]$. Each constraint $C_i$ is of the form $\{R_i,(x_{i_1} + \lambda_{i_1}, \ldots, x_{i_k} + \lambda_{i_k})\}$, where $R_i \in \Gamma$ and $\lambda_{i_1}, \ldots, \lambda_{i_k} \in [q]$ are constants; it is satisfied if and only if $(x_{i_1} + \lambda_{i_1}, \ldots, x_{i_k} + \lambda_{i_k}) \in R_i$, where addition is modulo $q$. In the streaming model, constraints arrive one by one, and the goal is to determine, using minimum memory, whether there exists an assignment satisfying all constraints.
For $k$-SAT, Vu (TCS 2024) proves an optimal $\Omega(n^k)$ space lower bound, while for general CSPs, Chou, Golovnev, Sudan, and Velusamy (JACM 2024) establish an $\Omega(n)$ lower bound; a complete characterization has remained open. We close this gap by showing that the single-pass streaming space complexity of $\mathsf{CSP}(\Gamma)$ is precisely governed by its non-redundancy, a structural parameter introduced by Bessiere, Carbonnel, and Katsirelos (AAAI 2020). The non-redundancy $\mathsf{NRD}_n(\Gamma)$ is the maximum number of constraints over $n$ variables such that every constraint $C$ is non-redundant, i.e., there exists an assignment satisfying all constraints except $C$. We prove that the single-pass streaming complexity of $\mathsf{CSP}(\Gamma)$ is characterized, up to a logarithmic factor, by $\mathsf{NRD}_n(\Gamma)$.
[537] arXiv:2604.21923 [pdf, html, other]: Title: The Sample Complexity of Multicalibration

Natalie Collina, Jiuyao Lu, Georgy Noarov, Aaron Roth

Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

We study the minimax sample complexity of multicalibration in the batch setting. A learner observes $n$ i.i.d. samples from an unknown distribution and must output a (possibly randomized) predictor whose population multicalibration error, measured by Expected Calibration Error (ECE), is at most $\varepsilon$ with respect to a given family of groups. For every fixed $\kappa > 0$, in the regime $|G|\le \varepsilon^{-\kappa}$, we prove that $\widetilde{\Theta}(\varepsilon^{-3})$ samples are necessary and sufficient, up to polylogarithmic factors. The lower bound holds even for randomized predictors, and the upper bound is realized by a randomized predictor obtained via an online-to-batch reduction. This separates the sample complexity of multicalibration from that of marginal calibration, which scales as $\widetilde{\Theta}(\varepsilon^{-2})$, and shows that mean-ECE multicalibration is as difficult in the batch setting as it is in the online setting, in contrast to marginal calibration which is strictly more difficult in the online setting. In contrast we observe that for $\kappa = 0$, the sample complexity of multicalibration remains $\widetilde{\Theta}(\varepsilon^{-2})$ exhibiting a sharp threshold phenomenon.
More generally, we establish matching upper and lower bounds, up to polylogarithmic factors, for a weighted $L_p$ multicalibration metric for all $1 \le p \le 2$, with optimal exponent $3/p$. We also extend the lower-bound template to a regular class of elicitable properties, and combine it with the online upper bounds of Hu et al. (2025) to obtain matching bounds for calibrating properties including expectiles and bounded-density quantiles.
[538] arXiv:2604.21924 [pdf, html, other]: Title: Long-Horizon Manipulation via Trace-Conditioned VLA Planning

Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu

Comments: Project page: this https URL

Subjects: Robotics (cs.RO)

Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated task-management VLM. The manager is decoupled from the executor and is invoked in a receding-horizon manner: given the current observation, it predicts a progress-aware remaining plan that combines (i) a subtask sequence with an explicit done + remaining split as lightweight language memory, and (ii) a visual trace -- a compact 2D keypoint trajectory prompt specifying where to go and what to approach next. The executor VLA is adapted to condition on the rendered trace, thereby turning long-horizon decision-making into repeated local control by following the trace. Crucially, predicting the remaining plan at each step yields an implicit closed loop: failed steps persist in subsequent outputs, and traces update accordingly, enabling automatic continuation and replanning without hand-crafted recovery logic or brittle visual-history buffers. Extensive experiments spanning embodied planning, long-horizon reasoning, trajectory prediction, and end-to-end manipulation in simulation and on a real Franka robot demonstrate strong gains in long-horizon success, robustness, and out-of-distribution generalization. Project page: this https URL
[539] arXiv:2604.21926 [pdf, html, other]: Title: Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

Hao-Yu Hsu, Tianhang Cheng, Jing Wen, Alexander G. Schwing, Shenlong Wang

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts purely from everyday wearable sensors. For this we introduce IMU-to-4D, a framework that repurposes large language models for non-visual spatiotemporal understanding of human-scene dynamics. IMU-to-4D uses data from a few inertial sensors from earbuds, watches, or smartphones and predicts detailed 4D human motion together with coarse scene structure. Experiments across diverse human-scene datasets show that IMU-to-4D yields more coherent and temporally stable results than SoTA cascaded pipelines, suggesting wearable motion sensors alone can support rich 4D understanding.
[540] arXiv:2604.21927 [pdf, html, other]: Title: Fine-Tuning Regimes Define Distinct Continual Learning Problems

Paul-Tiberiu Iordache, Elena Burceanu

Comments: 14 pages, 3 figures

Subjects: Machine Learning (cs.LG)

Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defined by the trainable parameter subspace, is itself a key evaluation variable. We formalize adaptation regimes as projected optimization over fixed trainable subspaces, showing that changing the trainable depth alters the effective update signal through which both current task fitting and knowledge preservation operate. This analysis motivates the hypothesis that method comparisons need not be invariant across regimes. We test this hypothesis in task incremental CL, five trainable depth regimes, and four standard methods: online EWC, LwF, SI, and GEM. Across five benchmark datasets, namely MNIST, Fashion MNIST, KMNIST, QMNIST, and CIFAR-100, and across 11 task orders per dataset, we find that the relative ranking of methods is not consistently preserved across regimes. We further show that deeper adaptation regimes are associated with larger update magnitudes, higher forgetting, and a stronger relationship between the two. These results show that comparative conclusions in CL can depend strongly on the chosen fine-tuning regime, motivating regime-aware evaluation protocols that treat trainable depth as an explicit experimental factor.
[541] arXiv:2604.21928 [pdf, html, other]: Title: Evaluation of Automatic Speech Recognition Using Generative Large Language Models

Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil, Sergio Burdisso, Petr Motlicek, Shiran Liu, Mickael Rouvier, Jane Wottawa, Richard Dufour

Subjects: Computation and Language (cs.CL)

Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This paper evaluates their relevance through three approaches: (1) selecting the best hypothesis between two candidates, (2) computing semantic distance using generative embeddings, and (3) qualitative classification of errors. On the HATS dataset, the best LLMs achieve 92--94\% agreement with human annotators for hypothesis selection, compared to 63\% for WER, also outperforming semantic metrics. Embeddings from decoder-based LLMs show performance comparable to encoder models. Finally, LLMs offer a promising direction for interpretable and semantic ASR evaluation.
[542] arXiv:2604.21930 [pdf, html, other]: Title: Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

Nicolae Filat, Ahmed Hussain, Konstantinos Kalogiannis, Elena Burceanu

Comments: 12 pages, 2 figures

Subjects: Machine Learning (cs.LG)

Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same stream can induce different CL regimes and therefore different benchmark conclusions. To study this effect, we introduce a taskification-level framework based on plasticity and stability profiles, a profile distance between taskifications, and Boundary-Profile Sensitivity (BPS), which diagnoses how strongly small boundary perturbations alter the induced regime before any CL model is trained. We evaluate continual finetuning, Experience Replay, Elastic Weight Consolidation, and Learning without Forgetting on network traffic forecasting with CESNET-Timeseries24, keeping the stream, model, and training budget fixed while varying only the temporal taskification. Across 9-, 30-, and 44-day splits, we observe substantial changes in forecasting error, forgetting, and backward transfer, showing that taskification alone can materially affect CL evaluation. We further find that shorter taskifications induce noisier distribution-level patterns, larger structural distances, and higher BPS, indicating greater sensitivity to boundary perturbations. These results show that benchmark conclusions in streaming CL depend not only on the learner and the data stream, but also on how that stream is taskified, motivating temporal taskification as a first-class evaluation variable.
[543] arXiv:2604.21931 [pdf, other]: Title: Seeing Fast and Slow: Learning the Flow of Time in Videos

Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual concept and develop models for reasoning about and manipulating the flow of time in videos. We first exploit the multimodal cues and temporal structure naturally present in videos to learn, in a self-supervised manner, to detect speed changes and estimate playback speed. We then show that these learned temporal reasoning models enable us to curate the largest slow-motion video dataset to date from noisy in-the-wild sources. Such slow-motion footage, typically filmed by high-speed cameras, contains substantially richer temporal detail than standard videos. Using this data, we further develop models capable of temporal control, including speed-conditioned video generation, which produces motion at specified playback speed, and temporal super-resolution, which tranforms low-FPS, blurry videos into high-FPS sequences with fine-grained temporal details. Our findings highlight time as a manipulable, perceptual dimension in video learning, opening doors to temporally controllable video generation, temporal forensics detection, and potentially richer world-models that understand how events unfold over time.

[544] arXiv:2604.20872 (cross-list from physics.soc-ph) [pdf, html, other]: Title: Dynamical Model for the Sustainable Development Goals

Alberto García-Rodríguez, Tzipe Govezensky, Julia Tagüeña, Kimmo K. Kaski, Rafael A. Barrio

Subjects: Physics and Society (physics.soc-ph); Computers and Society (cs.CY)

The 2030 Agenda for Sustainable Development of the United Nations outlines 17 goals as global challenges for countries of the world to address in their development. However, the progress of countries towards these goals has been much slower than expected. In a previous study, we analyzed the data over two decades (2000--2022), using unsupervised machine learning techniques. Based on this study, we take into account three main factors to construct a mathematical model to simulate and predict the dynamical behavior of the SDGs. These factors are: (1) the distribution of amount of resources that each country uses to meet the goals, (2) the cooperation between countries, and (3) the correlations between the goals. In this work, we show that the model is capable of reproducing the real data and therefore could be used to simulate hypothetical scenarios that could help to improve actions towards optimal fulfillment of the goals.
[545] arXiv:2604.20882 (cross-list from quant-ph) [pdf, html, other]: Title: HHL with a Coherent Fourier Oracle: A Proof-of-Concept Quantum Architecture for Joint Melody-Harmony Generation

Alexis Kirke

Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Sound (cs.SD)

Quantum algorithms with a proven theoretical speedup over classical computation are rare. Among the most prominent is the Harrow-Hassidim-Lloyd (HHL) algorithm for solving sparse linear systems. Here, HHL is applied to encode melodic preference: the system matrix encodes Narmour implication-realisation and Krumhansl-Kessler tonal stability, so its solution vector is a music-cognition-weighted note-pair distribution. The key constraint of HHL is that reading its output classically cancels the quantum speedup; the solution must be consumed coherently. This motivates a coherent Fourier harmonic oracle: a unitary that applies chord-transition weights directly to the HHL amplitude vector, so that a single measurement jointly selects both melody notes and a two-chord progression.
A two-note/two-chord (2/2) block is used to contain the exponential growth of the joint state space that would otherwise make classical simulation of larger blocks infeasible. For demonstrations of longer passages, blocks are chained classically - each block's collapsed output conditions the next -- as a temporary workaround until fault-tolerant hardware permits larger monolithic circuits. A four-block chain produces 8 notes over 8 chords with grammatically valid transitions at every block boundary.
Independent rule-based harmony validation confirms that 97% of generated chord progressions are rated strong or acceptable. The primary motivation is that HHL carries a proven exponential speedup over classical linear solvers; this work demonstrates that a coherent HHL+oracle pipeline - the prerequisite for that speedup to be realised in a musical setting - is mechanically achievable. Audio realisations of representative outputs are made available for listening online.
[546] arXiv:2604.20885 (cross-list from physics.bio-ph) [pdf, other]: Title: From Physical Difference to Meaning: A Constructor-Theoretic Framework for Prebiotic Information in Casimir-Lifshitz-Coupled Protocell Clusters

Michael Massoth

Comments: 8 pages, 3 figures, The Eighteenth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies, BIOTECHNO 2026, Valencia, Spain

Subjects: Biological Physics (physics.bio-ph); General Literature (cs.GL); Populations and Evolution (q-bio.PE)

This paper develops a physical framework for the prebiotic emergence of information and meaning. Building on Constructor Theory, we define information as a reproducible physical difference and meaning as a difference with stable functional consequences. Casimir-Lifshitz-coupled protocell clusters serve as a minimal model that exhibits reproducible attractors, ordered transitions, and autonomous task structures. We show that such clusters carry both informational states (e.g., distances, geometries, gradients) and meaningful states that regulate prebiotic tasks such as approach, exchange, or stabilization. This approach integrates physical mechanisms, computational mechanics, and early proto-semantic functions into a coherent account of information formation before biology.
[547] arXiv:2604.20886 (cross-list from physics.chem-ph) [pdf, html, other]: Title: KinetiDiff: Docking-Guided Diffusion for De Novo ACVR1 Inhibitor Design in Fibrodysplasia Ossificans Progressiva

Aaryan Patel

Comments: 21 pages, 10 figures

Subjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG)

We present KinetiDiff, a structure-based framework for de novo kinase inhibitor design that integrates a Geometry-Complete Diffusion Model with real-time AutoDock Vina gradient guidance. By injecting physics-based docking gradients into the diffusion denoising loop, KinetiDiff steers molecule generation toward high-affinity conformations for ACVR1 (ALK2), the causative kinase in Fibrodysplasia Ossificans Progressiva. From 10,000 diffusion samples, the framework produced 9,997 valid molecules. The best candidate achieved $-11.05$ kcal/mol (pKd = 8.10), a 19.2% improvement over the crystallographic reference. The top 100 candidates all exceed the reference, with 100% Lipinski compliance, median synthetic accessibility of 2.67, and internal diversity of 0.790. Systematic ablation across four guidance strategies--Vina-Direct (physics), HNN-Denovo (neural proxy), multi-objective, and unguided--demonstrates that real-time docking guidance dominates on all metrics. We evaluate HNN-Denovo as a computationally efficient alternative (60-fold speedup per step), revealing a domain-mismatch limitation (r = 0.224 correlation with Vina) that explains its inferior performance. These results establish gradient-guided geometric diffusion as a practical approach for generating potent, synthetically accessible inhibitors against rare-disease kinase targets.
[548] arXiv:2604.20887 (cross-list from math.DS) [pdf, html, other]: Title: Spectral Kernel Dynamics for Planetary Surface Graphs: Distinction Dynamics and Topological Conservation

Jnaneshwar Das

Comments: 17 pages, 0 figures

Subjects: Dynamical Systems (math.DS); Earth and Planetary Astrophysics (astro-ph.EP); Machine Learning (cs.LG); Robotics (cs.RO)

The spectral kernel field equation R[k] = T[k] lacks a conservation-law analog. We prove (i) the fixed-point flow is strictly volume-expanding (tr DF > 0), precluding automatic conservation, and (ii) the conservation deficit per mode equals the Hessian stability margin exactly: D_m = -Delta'. Closing the deficit requires a scene-side compensating contribution, which we formalise as the distinction dynamics equation dc/dt = G[c, h_t], with MaxCal-optimal realisation G_opt. On fixed-topology 3D surface graphs we derive a conditional topology-preserving compression theorem: retaining k >= beta_0 + beta_1 modes (under a spectral-ordering assumption) preserves all Betti-number charges; we include a worked short-cycle counterexample (figure-eight) calibrating when the assumption fails. A triple necessary spectral diagnostic -- Fiedler-mode concentration, elevated curl energy, anomalous beta_1 -- is derived for planetary drainage networks at O(N) cost. Two internal real-data sequences serve as preliminary consistency checks; full benchmarks and adaptive-topology extensions are deferred.
[549] arXiv:2604.20899 (cross-list from cond-mat.mtrl-sci) [pdf, html, other]: Title: Predicting Scale-Up of Metal-Organic Framework Syntheses with Large Language Models

Peter Walther, Hongrui Sheng, Xinxin Liu, Bin Feng, Reid Coyle, Xinhua Yan, Kyle Smith, Harrison Kayal, Shyam Chand Pal, Zhiling Zheng

Comments: 39 pages

Subjects: Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI)

Scalable synthesis remains the gate between MOF discovery and industrial deployment, as scale-up know-how is fragmented across disparate reports. We introduce ESU-MOF, a literature-mined dataset and a positive-unlabeled learning strategy that fine-tunes large language models to predict scalability potential with 91.4% accuracy, enabling rapid data-driven triage for industrial MOF discovery.
[550] arXiv:2604.20907 (cross-list from stat.ML) [pdf, html, other]: Title: Achieving the Kesten-Stigum bound in the non-uniform hypergraph stochastic block model

Manuel Fernandez V, Ludovic Stephan, Yizhe Zhu

Comments: 67 pages, 1 figure

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Combinatorics (math.CO); Probability (math.PR); Statistics Theory (math.ST)

We study the community detection problem in the non-uniform hypergraph stochastic block model (HSBM), where hyperedges of varying sizes coexist. This setting captures higher-order and multi-view interactions and raises a fundamental question: can multiple uniform hypergraph layers below the detection threshold be combined to enable weak recovery? We answer this question by establishing a Kesten--Stigum-type bound for weak recovery in a general class of non-uniform HSBMs with $r$ blocks, generated according to multiple symmetric probability tensors. In the case $r=2$, we show that weak recovery is possible whenever the sum of the signal-to-noise ratios across all uniform hypergraph layers exceeds one, thereby confirming the positive part of a conjecture in (Chodrow et al., 2023). Moreover, we provide a polynomial-time spectral algorithm that achieves this threshold via an optimally weighted non-backtracking operator. For the unweighted non-backtracking matrix, our spectral method attains a different algorithmic threshold, also conjectured in (Chodrow et al., 2023).
Our approach develops a spectral theory for weighted non-backtracking operators on non-uniform hypergraphs, including a precise characterization of outlier eigenvalues and eigenvector overlaps. We introduce a novel Ihara--Bass formula tailored to weighted non-uniform hypergraphs, which yields an efficient low-dimensional representation and leads to a provable spectral reconstruction algorithm. Taken together, these results provide a principled and computationally efficient approach to clustering in non-uniform hypergraphs, and highlight the role of optimal weighting in aggregating heterogeneous higher-order interactions.
[551] arXiv:2604.20910 (cross-list from astro-ph.IM) [pdf, html, other]: Title: Planetary Exploration 3.0: A Roadmap for Software-Defined, Radically Adaptive Space Systems

Masahiro Ono, Daniel Selva, Morgan L. Cable, Marie Ethvignot, Margaret Hansen, Andreas M. Hein, Elena-Sorina Lupu, Zachary Manchester, David Murrow, Chad Pozarycki, Pascal Spino, Amanda Stockton, Mathieu Choukroun, Soon-Jo Chung, John Day, Alexander Demagall, Anthony Freeman, Chloe Gentgen, Michel D. Ingham, Charity M. Phillips-Lander, Richard Rieber, Alejandro Salado, Maria Sakovsky, Lori R. Shiraishi, Yisong Yue, Kris Zacny

Journal-ref: AIAA ASCEND 2026

Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Earth and Planetary Astrophysics (astro-ph.EP); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)

The surface and subsurface of worlds beyond Mars remain largely unexplored. Yet these worlds hold keys to fundamental questions in planetary science - from potentially habitable subsurface oceans on icy moons to ancient records preserved in Kuiper Belt objects. NASA's success in Mars exploration was achieved through incrementalism: 22 progressively sophisticated missions over decades. This paradigm, which we call Planetary Exploration 2.0 (PE 2.0), is untenable for the outer Solar System, where cruise times of a decade or more make iterative missions infeasible. We propose Planetary Exploration 3.0 (PE 3.0): a paradigm in which unvisited worlds are explored by a single or a few missions with radically adaptive space systems. A PE 3.0 mission conducts both initial exploratory science and follow-on hypothesis-driven science based on its own in situ data returns, evolving spacecraft capabilities to work resiliently in previously unseen environments. The key enabler of PE 3.0 is software-defined space systems (SDSSs) - systems that can adapt their functions at all levels through software updates. This paper presents findings from a Keck Institute for Space Studies (KISS) workshop on PE 3.0, covering: (1) PE 3.0 systems engineering including science definition, architecture, design methods, and verification & validation; (2) software-defined space system technologies including reconfigurable hardware, multi-functionality, and modularity; (3) onboard intelligence including autonomous science, navigation, controls, and embodied AI; and (4) three PE 3.0 mission concepts: a Neptune/Triton smart flyby, an ocean world explorer, and an Oort cloud reconnaissance mission.
[552] arXiv:2604.20912 (cross-list from quant-ph) [pdf, html, other]: Title: Quantum-HPC Software Stacks and the openQSE Reference Architecture: A Survey

Amir Shehata, Brian Austin, Tom Beck, Lukas Burgholzer, Alex Chernoguzov, Spencer Churchill, Andrea Delgado, Yasuko Eckert, Jeffery Heckey, Kevin Kissell, Katherine Klymko, Josh Moles, Thomas Naughton, Lee James O'Riordan, Christian Ortiz Pauyac, Guen Prawiroatmodjo, Ermal Rrapaj, Jiri Schindler, Laura Schulz, Sebastian Stern, Tyler Takeshita, Miwako Tsuji, Aleksander Wennersteen, Travis Humble, Martin Schulz

Comments: 23 pages, 2 figures

Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET); Software Engineering (cs.SE)

Quantum resources are increasingly integrated into high-performance computing (HPC) and cloud environments, but quantum high-performance computing (QHPC) software stacks remain isolated, often proprietary, full-stack solutions lacking common interfaces across runtime, resource management, orchestration, and execution layers. This paper analyzes nine production QHPC stacks and identifies common design patterns and emerging requirements, covering deployment models, application interaction patterns, SDK support, and readiness for fault-tolerant operation. The survey exposes consistent needs in runtime abstraction, resource management, interconnect semantics, and observability. Based on these findings, we propose the open quantum-HPC software ecosystem ( openQSE) reference architecture as a first step toward unifying the state-of-the-practice. openQSE defines a set of layer boundaries that allow different implementations to interoperate while preserving deployment flexibility, and is structured to support both current noisy intermediate-scale quantum (NISQ) workloads and future fault-tolerant quantum computing (FTQC) systems without changes to upper-layer application interfaces.
[553] arXiv:2604.20980 (cross-list from math.DS) [pdf, other]: Title: The Riccati Characteristic Equation

Douglas R. Frey

Subjects: Dynamical Systems (math.DS); Systems and Control (eess.SY)

The Riccati differential equation is examined in light of its connection to second order linear time varying systems. In that light it becomes the clear generalization for the characteristic equation of linear time invariant systems, and is called the Riccati Characteristic Equation (RCE). Consequently, the RCE becomes the unifying centerpiece for the study of linear systems. Its solutions are considered in complementary pairs that form a continuum based on a primitive pair. Pairs may always be found as purely real solutions, despite the fact that complex conjugate primitive solutions are shown to exist in many cases. Not only is the pairing unique, but the general form of solutions, shown here for the first time, is uniquely compact and encompasses all known solutions, while allowing for all initial conditions. Classical engineering mathematics examples are shown to conform to this approach, which provides new insights to all, especially Floquet theory.
[554] arXiv:2604.20981 (cross-list from q-bio.QM) [pdf, html, other]: Title: PanGuide3D: Cohort-Robust Pancreas Tumor Segmentation via Probabilistic Pancreas Conditioning and a Transformer Bottleneck

Sunny Joy Ma, Xiang Ma

Subjects: Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Pancreatic tumor segmentation in contrast-enhanced computed tomography (CT) is clinically important yet technically challenging: lesions are often small, heterogeneous, and easily confused with surrounding soft tissue, and models that perform well on one cohort frequently degrade under cohort shift. Our goal is to improve cross-cohort generalization while keeping the model architecture simple, efficient, and practical for 3D CT segmentation. We introduce PanGuide3D, a cohort-robust architecture with a shared 3D encoder, a pancreas decoder that predicts a probabilistic pancreas map, and a tumor decoder that is explicitly conditioned on this pancreas probability at multiple scales via differentiable soft gating. To capture long-range context under distribution shift, we further add a lightweight Transformer bottleneck in the U-Net bottleneck representation. We evaluate cohort transfer by training on the PanTS (Pancreatic Tumor Segmentation) cohort and testing both in-cohort (PanTS) and out-of-cohort on MSD (Medical Segmentation Decathlon) Task07 Pancreas, using matched preprocessing and training protocols across strong baselines. We collect voxel-level segmentation metrics, patient-level tumor detection, subgroup analyses by tumor size and anatomical location, volume-conditioned performance analyses, and calibration measurements to assess reliability. Across the evaluated models, PanGuide3D achieves the best overall tumor performance and shows improved cross-cohort generalization, particularly for small tumors and challenging anatomical locations, while reducing anatomically implausible false positives. These findings support probabilistic anatomical conditioning as a practical strategy for improving cross-cohort robustness in an end-to-end model and suggest potential utility for contouring support, treatment planning, and multi-institutional studies.
[555] arXiv:2604.21029 (cross-list from math.OC) [pdf, other]: Title: Integrated packing, placement, scheduling, and routing of personalized production: a pharmaceutical Industry 4.0 use-case with a planar transport system

Viktor Emil Korladinov, Antonin Novak, Zdeněk Hanzálek, Erik Sonntag, František Štěpánek

Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)

The recent emergence of planar transport systems necessitates re-evaluation of Flexible Manufacturing Systems (FMS) to address the simultaneous scheduling of internal logistics and production operations. By operating on a tile-based planar grid, these systems allow independent movers full two-dimensional freedom, mitigating inefficiencies inherent to traditional sequential lines. This paper applies a planar FMS framework to a real-world use case in the pharmaceutical industry: the automated production of personalized drugs.
Implementing this system requires solving optimization problems at both tactical and operational levels. The tactical level involves decisions regarding production line layout and the positioning of drug dispensers. A Mixed-Integer Quadratic Programming model is utilized for the packing problem to exploit drug co-occurrence patterns found in historical patient data. Subsequently, we solve the placement problem - a bi-level problem combining an assignment problem with Shortest Hamiltonian paths with neighborhoods - to arrange dispensers in a layout minimizing expected travel distances.
The operational level is encountered daily, scheduling individual movers to process new orders as quickly as possible. This scheduling problem is formulated using Constraint Programming, modeling movers as reservoir resources to ensure order completeness, complemented by a routing phase using an iterative conflict-resolution mechanism and DAG-based reasoning to convert schedules into conflict-free paths.
Evaluation using real-world prescription data for 40 drugs shows the framework scales efficiently across several layout topologies for up to 500 orders, with schedules that are highly effective and computationally tractable for daily operations.
[556] arXiv:2604.21068 (cross-list from cond-mat.mtrl-sci) [pdf, other]: Title: Expanding the extreme-k dielectric materials space through physics-validated generative reasoning

Hossain Hridoy, Tahiya Chowdhury, Md Shafayat Hossain

Subjects: Materials Science (cond-mat.mtrl-sci); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Artificial Intelligence (cs.AI)

The most technologically consequential materials are often the rarest: they occupy narrow regions of chemical space, obey competing physical constraints, and appear only sparsely in existing databases. High-kappa dielectrics, high-Tc superconductors, and ferromagnetic insulators are to name a few. This scarcity fundamentally limits today's data-driven materials discovery, where machine-learning models excel at interpolation but struggle to generate genuinely new candidates. Here, we introduce DielecMIND, an artificial intelligence framework that reframes materials discovery as a reasoning-driven exploration instead of a database-screening problem. Using high-kappa dielectrics as a data-scarce and technologically stringent test case, DielecMIND combines large-language-model hypothesis generation for the first time with physics validated first-principles calculation to navigate chemical space beyond known compounds. Prior to our work, only 14 experimentally or computationally validated materials with kappa > 150 were known. Our framework discovers and validates 5 new such compounds, expanding this rare-materials class by a remarkable = 35% in a single study. Among them, we find that Ba2TiHfO6 exhibits a dielectric constant of 637, minimal loss at low optical frequencies, and stability up to 800 K. Beyond dielectrics, this work demonstrates a new paradigm for artificial-intelligence-guided discovery: one that generates a small number of physically grounded, experimentally plausible candidates yet measurably expands sparsely populated functional materials spaces. Thus, DielecMIND points toward a general strategy for discovering rare, high-impact functional materials where data scarcity has long constrained progress.
[557] arXiv:2604.21073 (cross-list from cond-mat.mtrl-sci) [pdf, html, other]: Title: Generative Discovery of Magnetic Insulators under Competing Physical Constraints

Qiulin Zeng, Tahiya Chowdhury, Md Shafayat Hossain

Subjects: Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI)

Discovering materials that must simultaneously satisfy multiple competing constraints remains a central challenge in computational materials design, particularly in data-scarce regimes where conventional data-driven approaches are least effective. Magnetic insulators represent a stringent example: the electronic conditions that favor magnetic order often also promote metallicity, while insulating behavior suppresses the interactions that stabilize magnetism. As a result, experimentally viable magnetic insulators are rare and difficult to identify through conventional screening. Here, we introduce MagMatLLM, a constraint-guided generative discovery framework that integrates language-model-based crystal generation with evolutionary selection, surrogate screening, and first-principles validation to target simultaneous stability, magnetism, and insulating behavior. Unlike stability-first approaches, the framework enforces functional constraints during generation and selection, steering the search toward sparsely populated regions of materials space defined by competing physical requirements. Using this workflow, we identify twelve previously unreported candidate magnetic insulators, including Tm$_4$Co$_2$Cr$_2$O$_{12}$ and Cr$_4$Nb$_2$O$_{12}$. Of these, ten are dynamically stable by phonon analysis and exhibit finite band gaps and nonzero magnetic moments in spin-polarized density functional theory calculations. Beyond the specific compounds identified here, this work establishes a general constraint-guided paradigm for multi-objective materials discovery in sparse chemical spaces and provides a transferable strategy for the design of quantum materials under competing physical constraints.
[558] arXiv:2604.21085 (cross-list from physics.ao-ph) [pdf, html, other]: Title: climt-paraformer: Stable Emulation of Convective Parameterization using a Temporal Memory-aware Transformer

Shuochen Wang, Nishant Yadav, Joy Merwin Monteiro, Auroop R. Ganguly

Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)

Accurate representation of moist convective sub-grid-scale processes remains a major challenge in global climate models, as traditional parameterization schemes are both computationally expensive and difficult to scale. Neural network (NN) emulators offer a promising alternative by learning efficient mappings between atmospheric states and convective tendencies while retaining fidelity to the underlying physics. However, most existing NN-based parameterizations are memory-less and rely only on instantaneous inputs, even though convection evolves over time and depends on prior atmospheric states. Recent studies have begun to incorporate convective memory, but they often treat past states as independent features rather than modeling temporal dependencies explicitly. In this work, we develop a temporal memory-aware Transformer emulator for the Emanuel convective parameterization and evaluate it in a single-column climate model (SCM) under both offline and online configurations. The Transformer captures temporal correlations and nonlinear interactions across consecutive atmospheric states. Compared with baseline emulators, including a memory-less multilayer perceptron and a recurrent long short-term memory model, the Transformer achieves lower offline errors. Sensitivity analysis indicates that a memory length of approximately 100 minutes yields the best performance, whereas longer memory degrades performance. We further test the emulator in long-term coupled simulations and show that it remains stable over 10 years. Overall, this study demonstrates the importance of explicit temporal modeling for NN-based parameterizations.
[559] arXiv:2604.21089 (cross-list from quant-ph) [pdf, html, other]: Title: A rigorous quasipolynomial-time classical algorithm for SYK thermal expectations

Alexander Zlokapa

Comments: 58 pages

Subjects: Quantum Physics (quant-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Data Structures and Algorithms (cs.DS); Mathematical Physics (math-ph)

Estimating local observables in Gibbs states is a central problem in quantum simulation. While this task is BQP-complete at asymptotically low temperatures, the possibility of quantum advantage at constant temperature remains open. The Sachdev-Ye-Kitaev (SYK) model is a natural candidate: at any constant temperature, its Gibbs states have polynomial quantum circuit complexity and are not described by Gaussian states. Rigorous analyses of the SYK model are difficult due to the failure of known techniques using random matrix theory, cluster expansions, and rigorous formulations of the quantum path integral and replica trick. Despite this, we give a rigorous proof of a quasipolynomial-time classical algorithm that estimates SYK local thermal expectations at sufficiently high constant temperature. Our result introduces a new Wick-pair cluster expansion that we expect to be broadly useful for disordered quantum many-body systems.
[560] arXiv:2604.21097 (cross-list from stat.ML) [pdf, html, other]: Title: Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Gabriel Melo, Leonardo Santiago, Peter Y. Lu

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Chaos arises in many complex dynamical systems, from weather to power grids, but is difficult to accurately model using data-driven emulators, including neural operator architectures. For chaotic systems, the inherent sensitivity to initial conditions makes exact long-term forecasts theoretically infeasible, meaning that traditional squared-error losses often fail when trained on noisy data. Recent work has focused on training emulators to match the statistical properties of chaotic attractors by introducing regularization based on handcrafted local features and summary statistics, as well as learned statistics extracted from a diverse dataset of trajectories. In this work, we propose a family of adversarial optimal transport objectives that jointly learn high-quality summary statistics and a physically consistent emulator. We theoretically analyze and experimentally validate a Sinkhorn divergence formulation (2-Wasserstein) and a WGAN-style dual formulation (1-Wasserstein). Our experiments across a variety of chaotic systems, including systems with high-dimensional chaotic attractors, show that emulators trained with our approach exhibit significantly improved long-term statistical fidelity.
[561] arXiv:2604.21163 (cross-list from eess.SP) [pdf, html, other]: Title: Efficient Design of Fronthaul-Constrained Uplink Reception for Cell-Free XL-MIMO

Dogon Kim, Hyunmin Noh, Seok-Hwan Park

Comments: accepted for publication in IEEE Wireless Communications Letters

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

With the evolution of multiple-input multiple-output (MIMO) technology toward extremely large (XL) MIMO systems comprising hundreds of, or more, antennas, this work investigates scalable and fronthaul-efficient reception design for the uplink of cell-free (CF) XL-MIMO systems. In such systems, the uplink signals transmitted by mobile user equipments (UEs) are jointly decoded at a central processing unit (CPU) connected to distributed access points (APs) via finite-capacity fronthaul links. We address the joint optimization of linear transform matrices, used by the APs to reduce the signal dimension and fronthaul load, and fronthaul compression strategies to maximize the uplink sumrate. A fractional programming (FP)-based iterative algorithm is first developed, followed by a reduced-complexity variant, termed accelerated FP (A-FP), along with its decentralized implementation whose fronthaul overhead remains independent of the number of AP antennas. Numerical results show that the proposed A-FP scheme significantly reduces computational complexity compared to FP implemented with general-purpose solvers, while substantially outperforming scalable baseline schemes that rely solely on local channel state information.
[562] arXiv:2604.21187 (cross-list from math.CO) [pdf, other]: Title: Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery

Benjamin Przybocki, John Mackey, Marijn J. H. Heule, Bernardo Subercaseaux

Subjects: Combinatorics (math.CO); Artificial Intelligence (cs.AI)

Ramsey-good graphs are graphs that contain neither a clique of size $s$ nor an independent set of size $t$. We study doubly saturated Ramsey-good graphs, defined as Ramsey-good graphs in which the addition or removal of any edge necessarily creates an $s$-clique or a $t$-independent set. We present a method combining SAT solving with bespoke LLM-generated code to discover infinite families of such graphs, answering a question of Grinstead and Roberts from 1982. In addition, we use LLMs to generate and formalize correctness proofs in Lean. This case study highlights the potential of integrating automated reasoning, large language models, and formal verification to accelerate mathematical discovery. We argue that such tool-driven workflows will play an increasingly central role in experimental mathematics.
[563] arXiv:2604.21200 (cross-list from math.AP) [pdf, html, other]: Title: A Temperature-Coupled Cahn-Hilliard-Stokes-Heat Model for Thermally Driven Phase Separation

Maria Deliyianni, Boris Muha, Andrej Novak

Comments: 42 pages, 34 figures

Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA)

We study a diffuse-interface model for thermally driven phase separation in viscous incompressible mixtures. The system couples a convective Cahn-Hilliard equation for the order parameter with a Stokes subsystem for the velocity-pressure field and a heat equation for the temperature. Temperature enters the bulk free energy through a Landau-type coefficient, while the phase field feeds back on the flow through concentration-dependent density and viscosity, yielding a phenomenological temperature-coupled Cahn-Hilliard-Stokes-Heat system. We motivate the chemical potential by a temperature-dependent Landau free energy and derive a priori estimates for the regularized subproblems. On the analytical side, we prove local-in-time existence of weak solutions for a regularized coupled system. On the numerical side, we propose a fully discrete finite element scheme combining a convex-splitting time discretization for the Cahn-Hilliard equation with an implicit treatment of viscous and thermal diffusion terms and a an implicit Stokes solve. Under impermeable velocity boundary conditions, the Cahn-Hilliard substep conserves mass, in the purely diffusive isothermal case, the convex-splitting discretization is unconditionally energy-stable for the Cahn-Hilliard free energy. Numerical experiments in two dimensions illustrate thermally driven spinodal decomposition, wall-induced phase separation near cooled walls, and phase separation in narrow channels under imposed thermal gradients. The simulations show the qualitative influence of key nondimensional parameters (such as the mass and thermal Péclet numbers, the Cahn number, the density and viscosity ratios, and the gravitational parameter $G$) on pattern formation, interface motion, and flow structure, and confirm that the proposed framework is a robust tool for studying thermally driven phase separation in confined geometries.
[564] arXiv:2604.21202 (cross-list from econ.EM) [pdf, html, other]: Title: Participation and Representation in Local Government Speech

Olivia Martin, Amar Venugopal

Subjects: Econometrics (econ.EM); Computation and Language (cs.CL)

Local government meetings are the most common formal channel through which residents speak directly with elected officials, contest policies, and shape local agendas. However, data constraints typically limit the empirical study of these meetings to agendas, single cities, or short time horizons. We collect and transcribe a massive new dataset of city council meetings from 115 California cities over the last decade, using advanced transcription and diarization techniques to analyze the speech content of the meetings themselves. We document two sets of descriptive findings: First, city council meetings are frequent, long, and vary modestly across towns and time in topical content. Second, public participants are substantially older, whiter, more male, more liberal, and more likely to own homes than the registered voter population, and public participation surges when topics related to land use and zoning are included in meeting agendas. Given this skew, we examine the main policy lever municipalities have to shift participation patterns: meeting access costs. Exploiting pandemic-era variation in remote access, we show that eliminating remote options reduces the number of speakers, but does not clearly change the composition of speakers. Collectively, these results provide the most comprehensive empirical portrait to date of who participates in local democracy, what draws them in, and how institutional design choices shape both the volume and composition of public input.
[565] arXiv:2604.21203 (cross-list from stat.ML) [pdf, html, other]: Title: Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction

Ziyang Wei, Wanrong Zhu, Jingyang Lyu, Wei Biao Wu

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible second-order (Hessian) information or suffer from slow convergence. To address these challenges, we propose a novel, fully online de-biased covariance estimator that eliminates the need for second-order derivatives while significantly improving estimation accuracy. Our method employs a bias-reduction technique to achieve a convergence rate of $n^{(\alpha-1)/2} \sqrt{\log n}$, outperforming existing Hessian-free alternatives.
[566] arXiv:2604.21210 (cross-list from quant-ph) [pdf, html, other]: Title: The Feedback Hamiltonian is the Score Function: A Diffusion-Model Framework for Quantum Trajectory Reversal

Sagar Dubey, Alan John

Comments: 14 pages

Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

In continuously monitored quantum systems, the feedback protocol of García-Pintos, Liu, and Gorshkov reshapes the arrow of time: a Hamiltonian $H_{\mathrm{meas}} = r A / \tau$ applied with gain $X$ tilts the distribution of measurement trajectories, with $X < -2$ producing statistically time-reversed outcomes. Why this specific Hamiltonian achieves reversal, and how the mechanism relates to score-based diffusion models in machine learning, has remained unexplained.
We compute the functional derivative of the log path probability of the quantum trajectory distribution directly in density-matrix space. Combining Girsanov's theorem applied to the measurement record, Fréchet differentiation on the Banach space of trace-class operators, and Kähler geometry on the pure-state projective manifold, we prove that $\delta \log P_F / \delta \rho = r A / \tau = H_{\mathrm{meas}}$. The García-Pintos feedback Hamiltonian is the score function of the quantum trajectory distribution -- exactly the object Anderson's reverse-time diffusion theorem requires for trajectory reversal. The identification extends to multi-qubit systems with independent measurement channels, where the score is a sum of local operators.
Two consequences follow. First, the feedback gain $X$ generates a continuous one-parameter family of path measures (for feedback-active Hamiltonians with $[H, A] \neq 0$), with $X = -2$ recovering the backward process in leading-order linearization -- a structure absent from classical diffusion, where reversal is binary. Second, the score identification enables machine learning (ML) score estimation methods -- denoising score matching, sliced score matching -- to replace the analytic formula when its idealizations (unit efficiency, zero delay, Gaussian noise) fail in real experiments.
[567] arXiv:2604.21216 (cross-list from econ.TH) [pdf, html, other]: Title: Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics

Elija Perrier

Comments: Under review

Subjects: Theoretical Economics (econ.TH); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

The First Fundamental Theorem of Welfare Economics assumes that welfare-bearing agents are autonomous and implicitly relies on a binary distinction between autonomy and instrumentality. Welfare subjects are those who have autonomy and therefore the capacity to choose and enter into utility comparisons, while everything else does not. In post-AGI economies this presupposition becomes nontrivial because artificial systems may exhibit varying degrees of autonomy, functioning as tools, delegates, strategic market actors, manipulators of choice environments, or possible welfare subjects. We argue that the theorem ought to be subject to an autonomy qualification where the impact of these changes in autonomy assumptions is incorporated. Using a minimal general-equilibrium model with autonomy-conditioned welfare, welfare-status assignment, delegation accounting, and verification institutions, we set out conditions for which autonomy-complete competitive equilibrium is autonomy-Pareto efficient. The classical theorem is recovered as the low-autonomy limit.
[568] arXiv:2604.21222 (cross-list from cond-mat.mtrl-sci) [pdf, other]: Title: Neutron and X-ray Diffraction Reveal the Limits of Long-Range Machine Learning Potentials for Medium-Range Order in Silica Glass

Sai Harshit Balantrapu, Atul C. Thakur, Chris Benmore, Ganesh Sivaraman

Comments: 19 pages, 9 figures

Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)

Glassy silica is a foundational material in optics and electronics, yet accurately predicting its medium-range order (MRO) remains a major challenge for machine-learning interatomic potentials (MLIPs). While local MLIPs reproduce the short-range SiO4 tetrahedral network well, it remains unclear whether locality alone is sufficient to recover the first sharp diffraction peak (FSDP), the principal experimental signature of MRO. Here, we combine neutron and X-ray diffraction measurements with large-scale molecular dynamics driven by two MACE-based models: a short-range (SR) potential and a long-range (LR) extension incorporating reciprocal-space gated attention. The SR model systematically over-structures the network, producing an overly intense FSDP in both the liquid and glassy states. Incorporating long-range interactions improves agreement with experiment for the liquid structure by reducing this excess ordering, but the LR model still fails to recover the experimental amorphous MRO after quenching. Ring-statistics and bond-angle analyses reveal that SR model exhibits an artificially narrow distribution dominated by six-membered rings, while the LR model produces a broader but still biased ring population. Despite preserving the correct tetrahedral geometry, both models show limited variability in Si-O-Si angles, indicating constrained network flexibility. These structural signatures demonstrate that both models retain excessive memory of the parent liquid network, leading to kinetically trapped and nonphysical medium-range configurations during vitrification. These results show that explicit long-range interactions are necessary but not sufficient for predictive modelling of disordered silica and suggest that accurate MRO further requires training data and sampling strategies that adequately represent the liquid-to-glass transition.
[569] arXiv:2604.21233 (cross-list from physics.ao-ph) [pdf, html, other]: Title: Assessing Emulator Design and Training for Modal Aerosol Microphysics Parameterizations in E3SMv2

Shady E. Ahmed, Hui Wan, Saad Qadeer, Panos Stinis, Kezhen Chong, Mohammad Taufiq Hassan Mozumder, Kai Zhang, Ann S. Almgren

Comments: 16 pages, 7 figures

Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Geophysics (physics.geo-ph)

Toward the goal of using Scientific Machine Learning (SciML) emulators to improve the numerical representation of aerosol processes in global atmospheric models, we explore the emulation of aerosol microphysics processes under cloud-free conditions in the 4-mode Modal Aerosol Module (MAM4) within the Energy Exascale Earth System Model version 2 (E3SMv2). To develop an in-depth understanding of the challenges and opportunities in applying SciML to aerosol processes, we begin with a simple feedforward neural network architecture that has been used in earlier studies, but we systematically examine key emulator design choices, including architecture complexity and variable normalization, while closely monitoring training convergence behavior.
Our results show that optimization convergence, scaling strategy, and network complexity strongly influence emulation accuracy. When effective scaling is applied and convergence is achieved, the relatively simple architecture, used together with a moderate network size, can reproduce key features of the microphysics-induced aerosol concentration changes with promising accuracy. These findings provide practical clues for the next stages of emulator development; they also provide general insights that are likely applicable to the emulation of other aerosol processes, as well as other atmospheric physics involving multi-scale variability.
[570] arXiv:2604.21260 (cross-list from stat.ML) [pdf, html, other]: Title: Calibeating Prediction-Powered Inference

Lars van der Laan, Mark Van Der Laan

Comments: Paper website: this https URL

Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Econometrics (econ.EM); Quantitative Methods (q-bio.QM); Methodology (stat.ME)

We study semisupervised mean estimation with a small labeled sample, a large unlabeled sample, and a black-box prediction model whose output may be miscalibrated. A standard approach in this setting is augmented inverse-probability weighting (AIPW) [Robins et al., 1994], which protects against prediction-model misspecification but can be inefficient when the prediction score is poorly aligned with the outcome scale. We introduce Calibrated Prediction-Powered Inference, which post-hoc calibrates the prediction score on the labeled sample before using it for semisupervised estimation. This simple step requires no retraining and can improve the original score both as a predictor of the outcome and as a regression adjustment for semisupervised inference. We study both linear and isotonic calibration. For isotonic calibration, we establish first-order optimality guarantees: isotonic post-processing can improve predictive accuracy and estimator efficiency relative to the original score and simpler post-processing rules, while no further post-processing of the fitted isotonic score yields additional first-order gains. For linear calibration, we show first-order equivalence to PPI++. We also clarify the relationship among existing estimators, showing that the original PPI estimator is a special case of AIPW and can be inefficient when the prediction model is accurate, while PPI++ is AIPW with empirical efficiency maximization [Rubin et al., 2008]. In simulations and real-data experiments, our calibrated estimators often outperform PPI and are competitive with, or outperform, AIPW and PPI++. We provide an accompanying Python package, ppi_aipw, at this https URL.
[571] arXiv:2604.21270 (cross-list from stat.ML) [pdf, html, other]: Title: CLT-Optimal Parameter Error Bounds for Linear System Identification

Yichen Zhou, Stephen Tu

Comments: 36 pages

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)

There has been remarkable progress over the past decade in establishing finite-sample, non-asymptotic bounds on recovering unknown system parameters from observed system behavior. Surprisingly, however, we show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification, even in the most fundamental setting of estimating a discrete-time linear dynamical system (LDS) via ordinary least-squares regression (OLS). Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale that we show correctly captures the CLT scaling. From our analysis we obtain finite-sample bounds for both (i) stable systems and (ii) the many-trajectories setting that match the instance-specific optimal rates up to constant factors in Frobenius norm, and polylogarithmic state-dimension factors in spectral norm.
[572] arXiv:2604.21274 (cross-list from quant-ph) [pdf, html, other]: Title: Random Access Codes: Explicit Constructions, Optimality, and Classical-Quantum Gaps

Ruho Kondo, Yuki Sato, Hiroshi Yano, Yota Maeda, Kosuke Ito, Naoki Yamamoto

Comments: 15 pages, 2 figures, 2 tables

Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

A random access code (RAC) encodes an $L$-bit string into a $k$-bit $(L>k)$ message from which any designated source bit can be recovered with high probability. Its quantum counterpart, a quantum random access code (QRAC), replaces the $k$-bit message with $k$ qubits. While upper bounds on the decoding success probability have long been studied in both classical and quantum settings, explicit constructions of optimal codes are known only in special cases, even for classical RACs. In this paper, we develop a constructive framework for classical $(L,k)$-RACs under both average- and worst-case criteria. We show that optimal code design reduces to selecting $2^k$ points in $\{0,1\}^L$ and $[0,1]^L$ for the average- and worst-case criteria, respectively, so as to minimize a distance-like objective. This characterization yields explicit constructions for general $(L,k)$. For $k=L-1$, we further obtain closed-form optimal encoders and decoders for both criteria, and show that the resulting classical $(L,L-1)$-RACs attain the corresponding proved upper bounds. We also show that these optimal classical codes induce $(L,L-1)$-QRACs that attain a conjectured upper bound on the decoding success probability. Numerical optimization suggests little difference between RACs and QRACs in the average-case setting, but a potentially large classical-quantum gap in the worst-case nonasymptotic regime.
[573] arXiv:2604.21292 (cross-list from math.CO) [pdf, html, other]: Title: Large values in time series and additive combinatorics

Alex Iosevich, Vishal Gupta

Comments: 13 pages, 6 figures

Subjects: Combinatorics (math.CO); Information Theory (cs.IT); Applications (stat.AP)

It is well-known in industrial data science that large values of real-life time series tend to be structured and often follow concrete and visible patterns. In this paper, we use ideas from additive combinatorics and discrete Fourier analysis to give this heuristic a mathematical foundation. Our main tool is the Fourier ratio, a complexity measure previously used in compressed sensing, combined with a generalized version of Chang's lemma from additive combinatorics. Together, these yield a precise prediction: when the Fourier ratio of a time series is small, the set of its largest values can be additively generated by a very small set using only $\{-1,0,1\}$ coefficients. We test this prediction on US inflation data and Delhi climate data, both in their original form and after mean-centering. The numerical results confirm the predicted structure: a generating set of size $4$--$7$ suffices to span large spectra containing dozens of points, even when the Fourier ratio is large enough that our theoretical bounds become loose. These findings provide a rigorous explanation for why extreme values in real-world data are information-rich and structurally significant.
[574] arXiv:2604.21432 (cross-list from stat.ML) [pdf, other]: Title: A single algorithm for both restless and rested rotting bandits

Julien Seznec, Pierre Ménard, Alessandro Lazaric, Michal Valko

Comments: In AISTATS 2020

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are recommended over and over) or by an external factor (e.g., content becomes outdated). These two situations can be modeled as specific instances of the rested and restless bandit settings, where arms are rotting (i.e., their value decrease over time). These problems were thought to be significantly different, since Levine et al. (2017) showed that state-of-the-art algorithms for restless bandit perform poorly in the rested rotting setting. In this paper, we introduce a novel algorithm, Rotting Adaptive Window UCB (RAW-UCB), that achieves near-optimal regret in both rotting rested and restless bandit, without any prior knowledge of the setting (rested or restless) and the type of non-stationarity (e.g., piece-wise constant, bounded variation). This is in striking contrast with previous negative results showing that no algorithm can achieve similar results as soon as rewards are allowed to increase. We confirm our theoretical findings on a number of synthetic and dataset-based experiments.
[575] arXiv:2604.21448 (cross-list from cond-mat.soft) [pdf, html, other]: Title: Continuum granular flow model with restitution-derived viscoelastic damping

Bodhinanda Chandra, Sachith Dunatunga, Ken Kamrin

Comments: 36 pages, 20 figures

Subjects: Soft Condensed Matter (cond-mat.soft); Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)

This work presents a unified viscoelastic-viscoplastic continuum framework for modeling rate-dependent granular flows across regimes. The formulation incorporates two distinct rate-dependent mechanisms, namely micro-inertia and viscoelastic dissipation, within a single continuum description. A central contribution is an explicit link between the coefficient of restitution and a continuum viscosity, derived from an analysis of wave attenuation in granular assemblies, thereby establishing a direct connection between particle-scale collision physics and macroscopic damping. This relation is introduced while retaining inertia-dependent plastic flow governed by the classical $\mu(I)$ rheology. The constitutive model is constructed by meticulously partitioning elastic and viscous responses within the model and corresponding stress-update routine, such that viscous dissipation governs wave propagation and collisional processes without altering the plastic flow rule. The framework is implemented within the material point method to simulate transient processes involving large deformations, material separation, and subsequent reconsolidation. A range of numerical examples, including steady, transient, vibrational, and impact-driven flows, demonstrates that the model captures wave propagation, diffusion, and rate-dependent granular behavior within a unified continuum setting.
[576] arXiv:2604.21474 (cross-list from physics.comp-ph) [pdf, html, other]: Title: A Thin Sheet Volume Integral Equation Solver for Simulation of Bianisotropic Metasurfaces

Sebastian Celis Sierra, Meruyert Khamitova, Ran Zhao, Sadeed Bin Sayed, Hakan Bagci

Subjects: Computational Physics (physics.comp-ph); Numerical Analysis (math.NA)

A thin-sheet (TS) volume integral equation (VIE) formulation incorporating generalized sheet transition conditions (GSTCs) is presented for the simulation of three-dimensional (3D) bianisotropic metasurfaces. The metasurface is represented as an equivalent TS, with its constitutive tensors derived from the GSTC susceptibility tensors. Invoking the TS approximation, the governing VIEs are reduced to surface integral equations (SIEs), in which tangential and normal flux density components are treated as distinct sets of unknowns and discretized using Rao-Wilton-Glisson and pulse basis functions, respectively. In contrast to conventional GSTC approaches based on conventional SIEs, which represent only tangential fields, the proposed framework rigorously enforces the bianisotropic GSTCs, including normal field interactions, while retaining the flux-based VIE character of the formulation. Numerical examples demonstrate the accuracy and robustness of the proposed TS-VIE-GSTC solver for polarization rotation, perfect reflection, multi-directional attenuation, and oblique phase-shift transformation.
[577] arXiv:2604.21475 (cross-list from quant-ph) [pdf, html, other]: Title: Suppressing the Erasure Error of Fusion Operation in Photonic Quantum Computing

Xiangyu Ren, Yuexun Huang, Zhemin Zhang, Yuchen Zhu, Tsung-Yi Ho, Antonio Barbalace, Zhiding Liang

Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR)

Photonic quantum computing provides a promising route toward quantum computation by naturally supporting the measurement-based quantum computation (MBQC) model. In MBQC, programs are executed through measurements on a pre-generated graph state, whose construction largely depends on probabilistic fusion operations. However, fusion operations in PQC are vulnerable to two major error sources: fusion failure and fusion erasure. As a result, MBQC compilation must account for both error mechanisms to generate reliable and efficient photonic executions. Prior state-of-the-art MBQC compilation, represented by OneAdapt, is designed for all-photonic architectures and mainly focuses on handling fusion failures. Nevertheless, it does not explicitly model fusion erasures induced by photon loss, which can be substantially more damaging than fusion failures.
To mitigate fusion erasure errors, we introduce a new MBQC compilation scheme built upon the spin qubit quantum memory. We propose tree-encoded fusion, an encoding strategy that suppresses erasure errors during graph-state generation. We further incorporate this scheme into a compiler framework with algorithms that reduce the execution overhead of quantum programs. We evaluate the proposed framework using a realistic PQC simulator on six representative quantum algorithm benchmarks across multiple program scales. The results show that tree-encoded fusion achieves better robustness than alternative fusion-encoding strategies, and that our compiler provides exponential improvement over OneAdapt. In addition, we validate the feasibility of our approach through a proof-of-concept demonstration on real PQC hardware.
[578] arXiv:2604.21507 (cross-list from eess.AS) [pdf, html, other]: Title: DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline

Nikhil Raghav

Comments: 13 pages, 7 figures, 2 tables. Code available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Speaker diarization (SD) is the task of answering "who spoke when" in a multi-speaker audio stream. Classically, an SD system clusters segments of speech belonging to an individual speaker's identity. Recent years have seen substantial progress in SD through end-to-end neural diarization (EEND) approaches. DiariZen, a hybrid SD pipeline built upon a structurally pruned WavLM-Large encoder, a Conformer backend with powerset classification, and VBx clustering, represents the leading open-source state of the art at the time of writing across multiple benchmarks. Despite its strong performance, the DiariZen architecture spans several repositories and frameworks, making it difficult for researchers and practitioners to understand, reproduce, or extend the system as a whole. This tutorial paper provides a self-contained, block-by-block explanation of the complete DiariZen pipeline, decomposing it into seven stages: (1) audio loading and sliding window segmentation, (2) WavLM feature extraction with learned layer weighting, (3) Conformer backend and powerset classification, (4) segmentation aggregation via overlap-add, (5) speaker embedding extraction with overlap exclusion, (6) VBx clustering with PLDA scoring, and (7) reconstruction and RTTM output. For each block, we provide the conceptual motivation, source code references, intermediate tensor shapes, and annotated visualizations of the actual outputs on a 30s excerpt from the AMI Meeting Corpus. The implementation is available at this https URL, which includes standalone executable scripts for each block and a Jupyter notebook that runs the complete pipeline end-to-end.
[579] arXiv:2604.21518 (cross-list from eess.IV) [pdf, html, other]: Title: DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

Shiyan Su, Ruyi Zha, Danli Shi, Hongdong Li, Xuelian Cheng

Comments: Accepted to AAAI 2026. Project page: this https URL

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Neural representations (NRs), such as neural fields and 3D Gaussians, effectively model volumetric data in computed tomography (CT) but suffer from severe artifacts under sparse-view settings. To address this, we propose DiffNR, a novel framework that enhances NR optimization with diffusion priors. At its core is SliceFixer, a single-step diffusion model designed to correct artifacts in degraded slices. We integrate specialized conditioning layers into the network and develop tailored data curation strategies to support model finetuning. During reconstruction, SliceFixer periodically generates pseudo-reference volumes, providing auxiliary 3D perceptual supervision to fix underconstrained regions. Compared to prior methods that embed CT solvers into time-consuming iterative denoising, our repair-and-augment strategy avoids frequent diffusion model queries, leading to better runtime performance. Extensive experiments show that DiffNR improves PSNR by 3.99 dB on average, generalizes well across domains, and maintains efficient optimization.
[580] arXiv:2604.21595 (cross-list from stat.ML) [pdf, html, other]: Title: A Kernel Nonconformity Score for Multivariate Conformal Prediction

Louis Meyer, Wenkai Xu

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Multivariate conformal prediction requires nonconformity scores that compress residual vectors into scalars while preserving certain implicit geometric structure of the residual distribution. We introduce a Multivariate Kernel Score (MKS) that produces prediction regions that explicitly adapt to this geometry. We show that the proposed score resembles the Gaussian process posterior variance, unifying Bayesian uncertainty quantification with the coverage guarantees of frequentist-type. Moreover, the MKS can be decomposed into an anisotropic Maximum Mean Discrepancy (MMD) that interpolates between kernel density estimation and covariance-weighted distance. We prove finite-sample coverage guarantees and establish convergence rates that depend on the effective rank of the kernel-based covariance operator rather than the ambient dimension, enabling dimension-free adaptation. On regression tasks, the MKS reduces the volume of prediction region significantly, compared to ellipsoidal baselines while maintaining nominal coverage, with larger gains at higher dimensions and tighter coverage levels.
[581] arXiv:2604.21691 (cross-list from stat.ML) [pdf, html, other]: Title: There Will Be a Scientific Theory of Deep Learning

Jamie Simon, Daniel Kunin, Alexander Atanasov, Enric Boix-Adserà, Blake Bordelon, Jeremy Cohen, Nikhil Ghosh, Florentin Guth, Arthur Jacot, Mason Kamb, Dhruva Karkada, Eric J. Michaud, Berkan Ottlik, Joseph Turnbull

Comments: 41 pages, 6 figures

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In this paper, we make the case that a scientific theory of deep learning is emerging. By this we mean a theory which characterizes important properties and statistics of the training process, hidden representations, final weights, and performance of neural networks. We pull together major strands of ongoing research in deep learning theory and identify five growing bodies of work that point toward such a theory: (a) solvable idealized settings that provide intuition for learning dynamics in realistic systems; (b) tractable limits that reveal insights into fundamental learning phenomena; (c) simple mathematical laws that capture important macroscopic observables; (d) theories of hyperparameters that disentangle them from the rest of the training process, leaving simpler systems behind; and (e) universal behaviors shared across systems and settings which clarify which phenomena call for explanation.
Taken together, these bodies of work share certain broad traits: they are concerned with the dynamics of the training process; they primarily seek to describe coarse aggregate statistics; and they emphasize falsifiable quantitative predictions. We argue that the emerging theory is best thought of as a mechanics of the learning process, and suggest the name learning mechanics. We discuss the relationship between this mechanics perspective and other approaches for building a theory of deep learning, including the statistical and information-theoretic perspectives. In particular, we anticipate a symbiotic relationship between learning mechanics and mechanistic interpretability.
We also review and address common arguments that fundamental theory will not be possible or is not important. We conclude with a portrait of important open directions in learning mechanics and advice for beginners. We host further introductory materials, perspectives, and open questions at this http URL.
[582] arXiv:2604.21753 (cross-list from cond-mat.mtrl-sci) [pdf, html, other]: Title: Neural surrogates for crystal growth dynamics with variable supersaturation: explicit vs. implicit conditioning

Matteo Rigoni, Daniele Lanzoni, Francesco Montalenti, Roberto Bergamaschini

Subjects: Materials Science (cond-mat.mtrl-sci); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

Simulations of crystal growth are performed by using Convolutional Recurrent Neural Network surrogate models, trained on a dataset of time sequences computed by numerical integration of Allen-Cahn dynamics including faceting via kinetic anisotropy. Two network architectures are developed to take into account the effects of a variable supersaturation value. The first infers it implicitly by processing an input mini-sequence of a few evolution frames and then returns a consistent continuation of the evolution. The second takes the supersaturation parameter as an explicit input along with a single initial frame and predicts the entire sequence. The two models are systematically tested to establish strengths and weaknesses, comparing the prediction performance for models trained on datasets of different size and, in the first architecture, different lengths of input mini-sequence. The analysis of point-wise and mean absolute errors shows how the explicit parameter conditioning guarantees the best results, reproducing with high-fidelity the ground-truth profiles. Comparable results are achievable by the mini-sequence approach only when using larger training datasets. The trained models show strong conditioning by the supersaturation parameter, consistently reproducing its overall impact on growth rates as well as its local effect on the faceted morphology. Moreover, they are perfectly scalable even on 256 times larger domains and can be successfully extended to more than 10 times longer sequences with limited error accumulation. The analysis highlights the potential and limits of these approaches in view of their general exploitation for crystal growth simulations.
[583] arXiv:2604.21800 (cross-list from quant-ph) [pdf, html, other]: Title: Variance Geometry of Exact Pauli-Detecting Codes: Continuous Landscapes Beyond Stabilizers

Arunaday Gupta, Baisong Sun, Xi He, Bei Zeng

Comments: 30 pages, 1 figure

Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)

Exact quantum codes detecting a prescribed set of Pauli errors are approached through algebraic constructions--stabilizer, codeword-stabilized, permutation-invariant, topological, and related families. Geometrically, exact Pauli detection is governed by joint higher-rank numerical ranges of these Pauli operators, whose structure for rank $\geq 2$ is largely uncharted. From this viewpoint, we show that such codes often form connected continuous families rather than collections of disjoint solution regions. These families are characterized by a single scalar derived from the Knill-Laflamme conditions: denoted $\lambda^*$, it is the Euclidean norm of the signature vector of Pauli expectation values on the maximally mixed code state, and provides a one-parameter summary of the code's joint Pauli variance profile. Within these continuous landscapes, stabilizer codes occupy only discrete, measure-zero subsets of the attainable $\lambda^*$-spectrum, exposing a largely unexplored continuum of genuinely nonadditive exact codes. We establish this picture by analyzing the geometry of higher-rank operator compressions, and extend it to symmetry-restricted settings where cyclic and permutation symmetries are imposed on both the error model and the code projector. Small-system cases reveal interval, singleton, and empty regimes through eigenvalue interlacing and symmetry-sector decompositions; larger systems are treated numerically via Stiefel-manifold optimization and symmetry-adapted parameterizations. In every unrestricted and symmetry-compatible case analyzed, the attainable $\lambda^*$-spectrum forms a single closed interval whenever nonempty--although a general proof remains open. These results place stabilizer, symmetric, and nonadditive code families within a unified higher-rank variance framework, suggesting a continuous geometric perspective on the landscape of exact quantum codes.
[584] arXiv:2604.21825 (cross-list from math.DS) [pdf, html, other]: Title: On the algebra of Koopman eigenfunctions and on some of their infinities

Zahra Monfared, Saksham Malhotra, Sekiya Hajime, Ioannis Kevrekidis, Felix Dietrich

Subjects: Dynamical Systems (math.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA)

For continuous-time dynamical systems with reversible trajectories, the nowhere-vanishing eigenfunctions of the Koopman operator of the system form a multiplicative group. Here, we exploit this property to accelerate the systematic numerical computation of the eigenspaces of the operator. Given a small set of (so-called ``principal'') eigenfunctions that are approximated conventionally, we can obtain a much larger set by constructing polynomials of the principal eigenfunctions. This enriches the set, and thus allows us to more accurately represent application-specific observables. Often, eigenfunctions exhibit localized singularities (e.g. in simple, one-dimensional problems with multiple steady states) or extended ones (e.g. in simple, two-dimensional problems possessing a limit cycle, or a separatrix); we discuss eigenfunction matching/continuation across such singularities. By handling eigenfunction singularities and enabling their continuation, our approach supports learning consistent global representations from locally sampled data. This is particularly relevant for multistable systems and applications with sparse or fragmented measurements.
[585] arXiv:2604.21836 (cross-list from q-bio.NC) [pdf, html, other]: Title: Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion

Eghbal A. Hosseini, Brian Cheung, Evelina Fedorenko, Alex H. Williams

Journal-ref: ICLR 2026 Workshop on Representational Alignment (Re-Align)

Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)

Neural networks exhibit a remarkable degree of representational convergence across diverse architectures, training objectives, and even data modalities. This convergence is predictive of alignment with brain representation. A recent hypothesis suggests this arises from learning the underlying structure in the environment in similar ways. However, it is unclear how individual stimuli elicit convergent representations across networks. An image can be perceived in multiple ways and expressed differently using words. Here, we introduce a methodology based on the Generalized Procrustes Algorithm to measure intra-modal representational convergence at the single-stimulus level. We applied this to vision models with distinct training objectives, selecting stimuli based on their degree of alignment (intra-modal dispersion). Crucially, we found that this intra-modal dispersion strongly modulates alignment between vision and language models (cross-modal convergence). Specifically, stimuli with low intra-modal dispersion (high agreement among vision models) elicited significantly higher cross-modal alignment than those with high dispersion, by up to a factor of two (e.g., in pairings of DINOv2 with language models). This effect was robust to stimulus selection criteria and generalized across different pairings of vision and language models. Measuring convergence at the single-stimulus level provides a path toward understanding the sources of convergence and divergence across modalities, and between neural networks and human neural representations.
[586] arXiv:2604.21849 (cross-list from stat.ML) [pdf, html, other]: Title: Beyond Expected Information Gain: Stable Bayesian Optimal Experimental Design with Integral Probability Metrics and Plug-and-Play Extensions

Di Wu, Ling Liang, Haizhao Yang

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Computation (stat.CO)

Bayesian Optimal Experimental Design (BOED) provides a rigorous framework for decision-making tasks in which data acquisition is often the critical bottleneck, especially in resource-constrained settings. Traditionally, BOED typically selects designs by maximizing expected information gain (EIG), commonly defined through the Kullback-Leibler (KL) divergence. However, classical evaluation of EIG often involves challenging nested expectations, and even advanced variational methods leave the underlying log-density-ratio objective unchanged. As a result, support mismatch, tail underestimation, and rare-event sensitivity remain intrinsic concerns for KL-based BOED. To address these fundamental bottlenecks, we introduce an IPM-based BOED framework that replaces density-based divergences with integral probability metrics (IPMs), including the Wasserstein distance, Maximum Mean Discrepancy, and Energy Distance, resulting in a highly flexible plug-and-play BOED framework. We establish theoretical guarantees showing that IPM-based utilities provide stronger geometry-aware stability under surrogate-model error and prior misspecification than classical EIG-based utilities. We also validate the proposed framework empirically, demonstrating that IPM-based designs yield highly concentrated credible sets. Furthermore, by extending the same sample-based BOED template in a plug-and-play manner to geometry-aware discrepancies beyond the IPM class, illustrated by a neural optimal transport estimator, we achieve accurate optimal designs in high-dimensional settings where conventional nested Monte Carlo estimators and advanced variational methods fail.
[587] arXiv:2604.21858 (cross-list from physics.flu-dyn) [pdf, html, other]: Title: Meshless $h$-adaptive Solution for non-Newtonian Natural Convection in a Differentially Heated Cavity

Miha Rot, Gregor Kosec

Comments: 6 pages, 11 figures; Conference paper

Subjects: Fluid Dynamics (physics.flu-dyn); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

One of the main challenges in numerically solving partial differential equations is finding a discretisation for the computational domain that balances the accurate representation of the underlying field with computational efficiency. Meshless methods approximate differential operators based on the values of the field in computational nodes, offering a natural approach to adaptivity. The density of computational nodes can either be increased to enhance accuracy or decreased to reduce the number of numerical operations, depending on the properties of the intermediate solution. In this paper, we utilise an adaptive discretisation approach for the numerical simulation of natural convection in non-Newtonian fluid flow. The shear-thinning behaviour is interesting both due to its numerous occurrences in nature, blood being a prime example, and due to its properties, as the decreasing viscosity with increasing shear rate results in sharper flow structures. We focus on the de Vahl Davis test case, a natural convection driven flow in a differentially heated rectangular cavity. The thin boundary layer flow along the vertical boundaries makes this an ideal test case for refinement. We demonstrate that adaptively refining the node density enhances computational efficiency and examine how the parameters for adaptive refinement affect the solution.
[588] arXiv:2604.21863 (cross-list from quant-ph) [pdf, html, other]: Title: Replay-buffer engineering for noise-robust quantum circuit optimization

Akash Kundu, Sebastian Feld

Comments: Comments are warmly welcomed. 9 page main content, 17 page appendix

Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG)

Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step, and the routine discard of noiseless trajectories when retraining under hardware noise. We address all three by treating the replay buffer as a primary algorithmic lever for quantum optimization. We introduce ReaPER$+$, an annealed replay rule that transitions from TD error-driven prioritization early in training to reliability-aware sampling as value estimates mature, achieving $4-32\times$ gains in sample efficiency over fixed PER, ReaPER, and uniform replay while consistently discovering more compact circuits across quantum compilation and QAS benchmarks; validation on LunarLander-v3 confirms the principle is domain-agnostic. Furthermore we eliminate the quantum-classical evaluation bottleneck in curriculum RL by introducing OptCRLQAS which amortizes expensive evaluations over multiple architectural edits, cutting wall-clock time per episode by up to $67.5\%$ on a 12-qubit optimization problem without degrading solution quality. Finally we introduce a lightweight replay-buffer transfer scheme that warm-starts noisy-setting learning by reusing noiseless trajectories, without network-weight transfer or $\epsilon$-greedy pretraining. This reduces steps to chemical accuracy by up to $85-90\%$ and final energy error by up to $90\%$ over from-scratch baselines on 6-, 8-, and 12-qubit molecular tasks. Together, these results establish that experience storage, sampling, and transfer are decisive levers for scalable, noise-robust quantum circuit optimization.
[589] arXiv:2604.21870 (cross-list from physics.ed-ph) [pdf, html, other]: Title: Locating acts of mechanistic reasoning in student team conversations with mechanistic machine learning

Kaitlin Gili, Mainak Nistala, Kristen Wendell, Michael C. Hughes

Subjects: Physics Education (physics.ed-ph); Machine Learning (cs.LG)

STEM education researchers are often interested in identifying moments of students' mechanistic reasoning for deeper analysis, but have limited capacity to search through many team conversation transcripts to find segments with a high concentration of such reasoning. We offer a solution in the form of an interpretable machine learning model that outputs time-varying probabilities that individual students are engaging in acts of mechanistic reasoning, leveraging evidence from their own utterances as well as contributions from the rest of the group. Using the toolkit of intentionally-designed probabilistic models, we introduce a specific inductive bias that steers the probabilistic dynamics toward desired, domain-aligned behavior. Experiments compare trained models with and without the inductive bias components, investigating whether their presence improves the desired model behavior on transcripts involving never-before-seen students and a novel discussion context. Our results show that the inductive bias improves generalization -- supporting the claim that interpretability is built into the model for this task rather than imposed post hoc. We conclude with practical recommendations for STEM education researchers seeking to adopt the tool and for ML researchers aiming to extend the model's design. Overall, we hope this work encourages the development of mechanistically interpretable models that are understandable and controllable for both end users and model designers in STEM education research.
[590] arXiv:2604.21893 (cross-list from stat.ML) [pdf, html, other]: Title: Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

Sherly Alfonso-Sánchez, Cristián Bravo, Kristina G. Stankova

Comments: 35 pages, 8 figures

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Risk Management (q-fin.RM)

Geographic context is often consider relevant to motor insurance risk, yet public actuarial datasets provide limited location identifiers, constraining how this information can be incorporated and evaluated in claim-frequency models. This study examines how geographic information from alternative data sources can be incorporated into actuarial models for Motor Third Party Liability (MTPL) claim prediction under such constraints.
Using the BeMTPL97 dataset, we adopt a zone-level modeling framework and evaluate predictive performance on unseen postcodes. Geographic information is introduced through two channels: environmental indicators from OpenStreetMap and CORINE Land Cover, and orthoimagery released by the Belgian National Geographic Institute for academic use. We evaluate the predictive contribution of coordinates, environmental features, and image embeddings across three baseline models: generalized linear models (GLMs), regularized GLMs, and gradient-boosted trees, while raw imagery is modeled using convolutional neural networks.
Our results show that augmenting actuarial variables with constructed geographic information improves accuracy. Across experiments, both linear and tree-based models benefit most from combining coordinates with environmental features extracted at 5 km scale, while smaller neighborhoods also improve baseline specifications. Generally, image embeddings do not improve performance when environmental features are available; however, when such features are absent, pretrained vision-transformer embeddings enhance accuracy and stability for regularized GLMs. Our results show that the predictive value of geographic information in zone-level MTPL frequency models depends less on model complexity than on how geography is represented, and illustrate that geographic context can be incorporated despite limited individual-level spatial information.

[591] arXiv:2003.03639 (replaced) [pdf, other]: Title: The classification of minimally unsatisfiable 2-CNFs -- a fundamental study

Hoda Abbasizanjani, Oliver Kullmann

Comments: 74 pages; this replacement (previously 27 pages) has been completely revised, with complete proofs now

Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

Conjunctive normal forms where every clause has length at most two are called 2-CNFs. We study minimally unsatisfiable 2-CNFs (2-MUs), that is, unsatisfiable 2-CNFs where removing any clause destroys unsatisfiability, and obtain their full classification up to isomorphism. The main tool is the implication digraph: we show that for 2-MUs these digraphs are "weak double cycles" (WDCs), big cycles of small cycles with possible overlaps. Combining logical and graph-theoretical methods, we prove that WDCs have at most one skew-symmetry (a self-inverse fixed-point-free anti-automorphism reversing the direction of arcs). It follows that the isomorphisms between 2-MUs are exactly the isomorphisms between their implication digraphs, reducing the classification of 2-MUs to the classification of a well-structured class of digraphs. We obtain a variety of applications for 2-MUs of deficiency k (the difference between the number of clauses and the number of variables): the smoothing of skew-symmetric WDCs corresponds exactly to the canonical normal form obtained by 1-singular Davis-Putnam reduction, and the resulting homeomorphism types are in one-to-one correspondence with binary bracelets of length k. The automorphism group of any 2-MU of deficiency k is a subgroup of the dihedral group with 4k elements. The isomorphism problem for 2-MUs is decidable in quadratic time, and the number of isomorphism types of 2-MUs for fixed k is Theta(n^(3k-1)). The article is addressed to both the logic and the graph theory communities, with complete proofs provided throughout.
[592] arXiv:2106.01254 (replaced) [pdf, other]: Title: Principled Evaluation with Human Labels: One Rater at a Time and Rater Equivalence

Paul Resnick, Yuqing Kong, Grant Schoenebeck, Tim Weninger

Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)

In many classification tasks, there is no definitive ground truth, only human judgments that may disagree. We address two challenges that arise in such settings: (1) how to use human raters to score classifiers, and (2) how to use them for comparison benchmarks. For the first, the common practice is to score classifiers against the majority vote of an evaluation panel of several human raters. We argue that this is not justified when either of two properties fails: objectivity or equanimity. Instead, under a utility model appropriate for such settings, scoring against one rater at a time and averaging the scores across raters is a more principled approach. For the second, we introduce the concept of rater equivalence: the smallest number of human raters whose combined judgment matches the classifier's performance. We provide a provably optimal algorithm for combining benchmark panel labels, and demonstrate the framework through case studies.
[593] arXiv:2305.01626 (replaced) [pdf, html, other]: Title: Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Gašper Beguš, Thomas Lu, Zili Wang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Computational models of syntax are predominantly text-based. Here we propose that the most basic first step in the evolution of syntax can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and elementary suboperations of syntax -- concatenation. We introduce \textit{spontaneous concatenation}: a phenomenon where a ciwGAN/fiwGAN models (based on convolutional neural networks) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the training data. We replicate this finding in several independently trained models with different hyperparameters and training data. Additionally, networks trained on two words learn to embed words into novel unobserved word combinations. We also show that the concatenated outputs contain precursors to compositionality. To our knowledge, this is a previously unreported property of CNNs trained in the ciwGAN/fiwGAN setting on raw speech and has implications both for our understanding of how these architectures learn as well as for modeling syntax and its evolution in the brain from raw acoustic inputs. We also propose and formalize a neural mechanism called \textit{disinhibition} that outlines a possible artificial and biological neural pathway towards concatenation and compositionality and suggests our modeling is useful for generating testable predictions for biological and artificial neural processing of spoken language.
[594] arXiv:2306.16191 (replaced) [pdf, other]: Title: OpenCitations Meta

Arcangelo Massari, Fabio Mariani, Ivan Heibi, Silvio Peroni, David Shotton

Comments: 31 pages, 8 figures

Journal-ref: Quantitative Science Studies 2024. 5 (1) 50-75

Subjects: Digital Libraries (cs.DL)

OpenCitations Meta is a new database for open bibliographic metadata of scholarly publications involved in the citations indexed by the OpenCitations infrastructure, adhering to Open Science principles and published under a CC0 license to promote maximum reuse. It presently incorporates bibliographic metadata for publications recorded in Crossref, DataCite and PubMed, making it the largest bibliographic metadata source using Semantic Web technologies. It assigns new globally persistent identifiers (PIDs), known as OpenCitations Meta Identifiers (OMIDs) to all bibliographic resources, enabling it both to disambiguate publications described using different external PIDS (e.g., a DOI in Crossref and a PMID in PubMed), and to handle citations involving publications lacking external PIDs. By hosting bibliographic metadata internally, OpenCitations Meta eliminates its former reliance on API calls to external resources and thus enhances performance in response to user queries. Its automated data curation, following the OpenCitations Data Model, includes deduplication, error correction, metadata enrichment and full provenance tracking, ensuring transparency and traceability of data and bolstering confidence in data integrity, a feature unparalleled in other bibliographic databases. Its commitment to Semantic Web standards ensures superior interoperability compared to other machine-readable formats, with availability via a SPARQL endpoint, REST APIs and data dumps.
[595] arXiv:2309.07176 (replaced) [pdf, html, other]: Title: Mind the Gap: Optimal and Equitable Encouragement Policies

Angela Zhou

Comments: Updated with major new case study on SNAP recertification benefits

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. We study personalized decision problems in which the planner controls recommendations into treatment rather than treatment itself. Under a covariate-conditional no-direct-effect model of encouragement, policy value depends on two distinct objects: responsiveness to encouragement and treatment efficacy. This modeling distinction makes induced treatment take-up, rather than recommendation rates alone, the natural fairness target and yields tractable policy characterizations under budget and access constraints. In settings with deterministic algorithmic recommendations, the same model localizes overlap-robustness to the recommendation-response model rather than the downstream outcome model. We illustrate the methods in case studies based on data from reminders of SNAP benefits recertification, and from pretrial supervised release with electronic monitoring. While the specific remedy to inequities in algorithmic allocation is context-specific, it requires studying both take-up of decisions and downstream outcomes of them.
[596] arXiv:2310.02635 (replaced) [pdf, html, other]: Title: Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Weirui Ye, Yunsheng Zhang, Haoyang Weng, Xianfan Gu, Shengjie Wang, Tong Zhang, Mengchen Wang, Pieter Abbeel, Yang Gao

Comments: CoRL 2024 (Oral)

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Reinforcement learning (RL) is a promising approach for solving robotic manipulation tasks. However, it is challenging to apply the RL algorithms directly in the real world. For one thing, RL is data-intensive and typically requires millions of interactions with environments, which are impractical in real scenarios. For another, it is necessary to make heavy engineering efforts to design reward functions manually. To address these issues, we leverage foundation models in this paper. We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. The benefits of our framework are threefold: (1) \textit{sample efficient}; (2) \textit{minimal and effective reward engineering}; (3) \textit{agnostic to foundation model forms and robust to noisy priors}. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation. Across 5 dexterous tasks with real robots, FAC achieves an average success rate of 86\% after one hour of real-time learning. Across 8 tasks in the simulated Meta-world, FAC achieves 100\% success rates in 7/8 tasks under less than 100k frames (about 1-hour training), outperforming baseline methods with manual-designed rewards in 1M frames. We believe the RLFP framework can enable future robots to explore and learn autonomously in the physical world for more tasks. Visualizations and code are available at this https URL.
[597] arXiv:2401.07386 (replaced) [pdf, other]: Title: How do machines learn? Evaluating the AIcon2abs method

Rubens Lacerda Queiroz, Cabral Lima, Fabio Ferrentini Sampaio, Priscila Machado Vieira Lima

Comments: textual review (spelling and grammar); reorganization of the elements of some figures; New references included

Subjects: Computers and Society (cs.CY)

This study expands on previous work that introduced the AIcon2abs method (AI from Concrete to Abstract: Demystifying Artificial Intelligence to the general public), an innovative approach designed to increase public understanding of machine learning (ML) across diverse age groups, including K-12 students, and aims to evaluate its effectiveness. AIcon2Abs employs the WiSARD algorithm, a weightless neural network known for its simplicity, and user accessibility. WiSARD does not require Internet, making it ideal for non-technical users and resource-limited environments. This method enables participants to intuitively visualize and interact with ML processes through engaging, hands-on activities, as if they were the algorithms themselves. The method allows users to intuitively visualize and understand the internal processes of training and classification through practical activities. Once WiSARDs functionality does not require an Internet connection, it can learn effectively from a minimal dataset, even from a single example. This feature enables users to observe how the machine improves its accuracy incrementally as it receives more data. Moreover, WiSARD generates mental images representing what it has learned, highlighting essential features of the classified data. AIcon2abs was tested through a six-hour remote course with 34 Brazilian participants, including 5 children, 5 adolescents, and 24 adults. Data analysis was conducted from two perspectives: a mixed-method pre-experiment (including hypothesis testing), and a qualitative phenomenological analysis. Nearly all participants rated AIcon2abs positively, with the results demonstrating a high degree of satisfaction in achieving the intended outcomes. This research was approved by the CEP-HUCFF-UFRJ Research Ethics Committee.
[598] arXiv:2402.09568 (replaced) [pdf, other]: Title: Irreducible Markov Chains on spaces of graphs with fixed degree-color sequences

Félix Almendra-Hernández, Jesús A. De Loera, Sonja Petrović

Comments: Corrected figure caption and vertex label; edited introductory text to clarify definitions

Subjects: Discrete Mathematics (cs.DM); Commutative Algebra (math.AC); Combinatorics (math.CO)

We study a colored generalization of the famous simple-switch Markov chain for sampling the set of graphs with a fixed degree sequence. Here we consider the space of graphs with colored vertices, in which we fix the degree sequence and another statistic arising from the vertex coloring,
and prove that the set can be connected with simple color-preserving switches or moves. These moves form a basis for defining an irreducible Markov chain necessary for testing statistical model fit to block-partitioned network data. Our methods further generalize well-known algebraic results from the 1990s: namely, that the corresponding moves can be used to construct a regular triangulation for a generalization of the second hypersimplex. On the other hand, in contrast to the monochromatic case, we show that for \emph{simple} graphs, the 1-norm of the moves necessary to connect the space increases with the number of colors.
[599] arXiv:2404.07013 (replaced) [pdf, html, other]: Title: Numerical approximation of SDEs driven by fractional Brownian motion for all $H\in(0,1)$ using WIS integration

Utku Erdogan, Gabriel J. Lord, Roy B. Schieven

Subjects: Numerical Analysis (math.NA)

We examine the numerical approximation of a quasilinear stochastic differential equation (SDE) with multiplicative fractional Brownian motion. The stochastic integral is interpreted in the Wick-Itô-Skorohod (WIS) sense that is well defined and centered for all $H\in(0,1)$. We give an introduction to the theory of WIS integration before we examine existence and uniqueness of a solution to the SDE. We then introduce our numerical method which is based on previous theoretical results for $H\geq \frac{1}{2}$. We construct explicitly a translation operator required for the practical implementation of the method and are not aware of any other implementation of a numerical method for the WIS SDE. We then prove a strong convergence result that gives, in the autonomous case, an error of $O(\Delta t^H)$ and in the non-autonomous case $O(\Delta t^{\min(H,\zeta)})$, where $\zeta$ is a time-Hölder continuity parameter. We present some numerical experiments and conjecture that the theoretical results may not be optimal since we observe numerically a rate of $\min(H+\frac{1}{2},1)$ in the autonomous case. This work opens up the possibility to efficiently simulate SDEs for all $H$ values, including small values of $H$ when the stochastic integral is interpreted in the WIS sense.
[600] arXiv:2406.11354 (replaced) [pdf, html, other]: Title: Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

Zilun Zhang, Yutao Sun, Tiancheng Zhao, Leigang Sha, Ruochen Xu, Kyusong Lee, Jianwei Yin

Comments: Accepted by ICASSP 2026 (Oral)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks was observed compared to their single-modality counterparts. To address these challenges, we introduce a novel model-agnostic self-decompression method, Tree Generation (TG), that decompresses knowledge within LLMs into the training corpus. This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps. By incorporating the dumped corpus during SFT for MLLMs, we significantly reduce the forgetting problem.
[601] arXiv:2406.16173 (replaced) [pdf, html, other]: Title: Crepe: A Mobile Screen Data Collector Using Graph Query

Yuwen Lu, Meng Chen, Qi Zhao, Victor Cox, Yang Yang, Meng Jiang, Jay Brockman, Tamara Kay, Toby Jia-Jun Li

Comments: Best Paper Honorable Mention Award at CHI 2026 (Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems), Article No. 296, 20 pages

Journal-ref: In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26), Article No. 296, 1-20

Subjects: Human-Computer Interaction (cs.HC)

Collecting mobile datasets remains challenging for academic researchers due to limited data access and technical barriers. Commercial organizations often possess exclusive access to mobile data, leading to a "data monopoly" that restricts the independence of academic research. Existing open-source mobile data collection frameworks primarily focus on mobile sensing data rather than screen content, which is crucial for various research studies. We present Crepe, a no-code Android app that enables researchers to collect information displayed on screen through simple demonstrations of target data. Crepe utilizes a novel Graph Query technique which augments the structures of mobile UI screens to support flexible identification, location, and collection of specific data pieces. The tool emphasizes participants' privacy and agency by providing full transparency over collected data and allowing easy opt-out. We designed and built Crepe for research purposes only and in scenarios where researchers obtain explicit consent from participants. Code for Crepe will be open-sourced to support future academic research data collection.
[602] arXiv:2407.02127 (replaced) [pdf, html, other]: Title: Control theory and splitting methods

Karine Beauchard, Adrien Busnot Laurent, Frédéric Marbach

Comments: enhanced several results to arbitrary order; better exposition; 43p

Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)

Our goal is to highlight some deep connections between numerical splitting methods and control theory. We consider evolution equations of the form $\dot{x} = f_0(x) + f_1(x)$, where $f_0$ encodes non-reversible dynamics, motivating schemes that involve only forward flows of $f_0$. In this context, a splitting method can be interpreted as a trajectory of the control-affine system $\dot{x}(t)=f_0(x(t))+u(t)f_1(x(t))$, associated with a control $u$ that is a finite sum of Dirac masses. The goal is then to find a control such that the flow generated by $f_0 + u(t)f_1$ is as close as possible to the flow of $f_0+f_1$.
Using this interpretation and classical tools from control theory, we revisit well-known results on numerical splitting methods and prove several new ones. First, we show that there exist numerical schemes of arbitrary order involving only forward flows of $f_0$, provided one allows complex coefficients for $f_1$. Equivalently, for complex-valued controls, we prove that the Lie algebra rank condition is equivalent to small-time local controllability. Second, for real-valued coefficients, we show that the well-known order restrictions are linked to so-called "bad" Lie brackets from control theory, which are known to obstruct small-time local controllability. We investigate the conditions under which high-order methods exist, thanks to a basis of the free Lie algebra that we recently constructed.
[603] arXiv:2407.19664 (replaced) [pdf, html, other]: Title: Adaptive Soft Error Protection for Neural Network Processing

Xinghua Xue, Cheng Liu, Feng Min, Yinhe Han

Subjects: Machine Learning (cs.LG)

Previous research on selective protection for neural network components typically exploits only static vulnerability differences. Although these methods improve upon classical modular redundancy, they still incur substantial overhead for neural network workloads that are both memory-intensive and compute-intensive. In this work, we observe that neural network vulnerability is also input-dependent and varies dynamically at runtime. With this observation, we propose an adaptive, vulnerability-aware fault tolerance framework. At its core, a lightweight graph neural network (GNN) model dynamically predicts soft error vulnerabilities across inputs and neural network components, enabling real-time adaptation of fault tolerance policies. This design offers a complementary and more efficient protection scheme compared to traditional approaches. Experimental results demonstrate that the GNN predictor achieves over 95% accuracy in identifying critical inputs and components. Moreover, our adaptive scheme reduces computational overhead by an average of 42.12% while preserving model accuracy, significantly outperforming static selective protection methods.
[604] arXiv:2409.09874 (replaced) [pdf, html, other]: Title: The Landscape of GPU-Centric Communication

Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET); Performance (cs.PF)

In recent years, GPUs have become the preferred accelerators for HPC and ML applications due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter-GPU communication can create scalability bottlenecks, especially as the number of GPUs per node and cluster grows. Traditionally, the CPU managed multi-GPU communication, but advancements in GPU-centric communication now challenge this CPU dominance by reducing its involvement, granting GPUs more autonomy in communication tasks, and addressing mismatches in multi-GPU communication and computation.
This paper provides a landscape of GPU-centric communication, focusing on vendor mechanisms and user-level library supports. It aims to clarify the complexities and diverse options in this field, define the terminology, and categorize existing approaches within and across nodes. The paper discusses vendor-provided mechanisms for communication and memory management in multi-GPU execution and reviews major communication libraries, their benefits, challenges, and performance insights. Then, it explores key research paradigms, future outlooks, and open research questions. By extensively describing GPU-centric communication techniques across the software and hardware stacks, we provide researchers, programmers, engineers, and library designers insights on how to exploit multi-GPU systems at their best.
[605] arXiv:2409.14551 (replaced) [pdf, html, other]: Title: Unconditional energy stable hybrid IEQ-FEMs for the Cahn-Hilliard-Navier-Stokes equations

Yaoyao Chen, Dongqian Li, Yin Yang, Peimeng Yin

Comments: 30 pages, 19 figures

Subjects: Numerical Analysis (math.NA)

We investigate two unconditionally energy stable invariant energy quadratization (IEQ) finite element methods (FEMs) [Chen et al. Numerical Algorithms, DOI: https://doi.org/10.1007/s11075-024-01910-z, 2024] for solving the Cahn-Hilliard-Navier-Stokes (CHNS) equations. The time discretization of these IEQ-FEMs is based on the first- and second-order backward differentiation methods. \textcolor{black}{The auxiliary energy function introduced by the IEQ approach, modeling the square root of the nonlinear part of the energy, does not belong to the finite element space used for the spatial discretization.} These methods offer distinct advantages. Consequently, we propose a new hybrid IEQ-FEM that combines the strengths of both schemes, offering computational efficiency and unconditional energy stability in the finite element space. We provide rigorous proofs of mass conservation and energy dissipation for the proposed IEQ-FEMs. Several numerical experiments are presented to validate the accuracy, efficiency, and solution properties of the proposed method.
[606] arXiv:2410.16006 (replaced) [pdf, html, other]: Title: Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Divyanshu Aggarwal, Sankarshan Damle, Navin Goyal, Satya Lokam, Sunayana Sitaram

Comments: 19 pages, 6 tables, 4 figures, Accepted to ACL 2026 Findings

Subjects: Computation and Language (cs.CL)

A common challenge towards the adaptability of Large Language Models (LLMs) is their ability to learn new languages over time without hampering the model's performance on languages in which the model is already proficient (usually English). Continual fine-tuning (CFT) is the process of sequentially fine-tuning an LLM to enable the model to adapt to downstream tasks with varying data distributions and time shifts. This paper focuses on the language adaptability of LLMs through CFT. We study a two-phase CFT process in which an English-only end-to-end fine-tuned LLM from Phase 1 (predominantly Task Ability) is sequentially fine-tuned on a multilingual dataset -- comprising task data in new languages -- in Phase 2 (predominantly Language Ability). We observe that the ``similarity'' of Phase 2 tasks with Phase 1 determines the LLM's adaptability. For similar phase-wise datasets, the LLM after Phase 2 does not show deterioration in task ability. In contrast, when the phase-wise datasets are not similar, the LLM's task ability deteriorates. We test our hypothesis on the open-source \mis\ and \llm\ models with multiple phase-wise dataset pairs. To address the deterioration, we analyze tailored variants of two CFT methods: layer freezing and generative replay. Our findings demonstrate their effectiveness in enhancing the language ability of LLMs while preserving task performance, in comparison to relevant baselines.
[607] arXiv:2410.16698 (replaced) [pdf, html, other]: Title: Hyperboloid GPLVM for Discovering Continuous Hierarchies via Nonparametric Estimation

Koshi Watanabe, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Comments: Accepted at AISTATS 2025

Subjects: Machine Learning (cs.LG)

Dimensionality reduction (DR) offers a useful representation of complex high-dimensional data. Recent DR methods focus on hyperbolic geometry to derive a faithful low-dimensional representation of hierarchical data. However, existing methods are based on neighbor embedding, frequently ruining the continual relation of the hierarchies. This paper presents hyperboloid Gaussian process (GP) latent variable models (hGP-LVMs) to embed high-dimensional hierarchical data with implicit continuity via nonparametric estimation. We adopt generative modeling using the GP, which brings effective hierarchical embedding and executes ill-posed hyperparameter tuning. This paper presents three variants that employ original point, sparse point, and Bayesian estimations. We establish their learning algorithms by incorporating the Riemannian optimization and active approximation scheme of GP-LVM. For Bayesian inference, we further introduce the reparameterization trick to realize Bayesian latent variable learning. In the last part of this paper, we apply hGP-LVMs to several datasets and show their ability to represent high-dimensional hierarchies in low-dimensional spaces.
[608] arXiv:2410.18217 (replaced) [pdf, other]: Title: Accurate Analytical Modeling of Small-Size Rotary Transformers for Wound-Rotor Resolvers

Saeed Hajmohammadi, MohammadSadegh KhajueeZadeh, Farid Tootoonchian, Sajjad Mohammadi

Subjects: Systems and Control (eess.SY)

Rotary transformers are commonly used in wound rotor resolvers to transfer excitation signals to the rotating winding without mechanical contact. In many analyses, the rotary transformer is modeled as an ideal transformer, where the voltage transfer ratio is assumed to be equal to the turns ratio. However, in miniature rotary transformers used in compact resolver systems, leakage inductance can become comparable to the magnetizing inductance due to reduced core dimensions and unavoidable air gaps, leading to deviations from the ideal voltage transfer behavior. This paper presents an accurate equivalent circuit model for miniature rotary transformers employed in wound rotor resolvers. The proposed model analytically derives the magnetizing and leakage inductances using a magnetic equivalent circuit that accounts for flux fringing and air gap effects. The model is validated through three dimensional finite element analysis and experimental measurements on a fabricated prototype under both no load and resolver excitation conditions. The results demonstrate improved prediction accuracy of the secondary voltage compared with conventional models, enabling more reliable characterization of excitation transfer in compact resolver systems.
[609] arXiv:2411.00171 (replaced) [pdf, html, other]: Title: EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization

Mujin Cheon, Jay H. Lee, Dong-Yeun Koh, Calvin Tsay

Comments: 2025 International Conference on Machine Learning (ICML). 17 pages, 10 figures

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

To avoid myopic behavior, multi-step lookahead Bayesian optimization (BO) algorithms consider the sequential nature of BO and have demonstrated promising results in recent years. However, owing to the curse of dimensionality, most of these methods make significant approximations or suffer scalability issues. This paper presents a novel reinforcement learning (RL)-based framework for multi-step lookahead BO in high-dimensional black-box optimization problems. The proposed method enhances the scalability and decision-making quality of multi-step lookahead BO by efficiently solving the sequential dynamic program of the BO process in a near-optimal manner using RL. We first introduce an Attention-DeepSets encoder to represent the state of knowledge to the RL agent and subsequently propose a multi-task, fine-tuning procedure based on end-to-end (encoder-RL) on-policy learning. We evaluate the proposed method, EARL-BO (Encoder Augmented RL for BO), on synthetic benchmark functions and hyperparameter tuning problems, finding significantly improved performance compared to existing multi-step lookahead and high-dimensional BO methods.
[610] arXiv:2411.11707 (replaced) [pdf, html, other]: Title: Federated Co-tuning Framework for Large and Small Language Models

Tao Fan, Yan Kang, Guoqiang Ma, Lixin Fan, Shuoling Liu, Kai Chen, Qiang Yang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

By adapting Large Language Models (LLMs) to domain-specific tasks or enriching them with domain-specific knowledge, we can fully harness the capabilities of LLMs. Nonetheless, a gap persists in achieving simultaneous mutual enhancement between the server's LLM and the downstream clients' Small Language Models (SLMs). To address this, we propose FedCoLLM, a novel and parameter-efficient federated framework designed for co-tuning LLMs and SLMs. This approach is aimed at adaptively transferring server-side LLMs knowledge to clients' SLMs while simultaneously enriching the LLMs with domain insights from the clients. To accomplish this, FedCoLLM utilizes lightweight adapters in conjunction with SLMs, facilitating knowledge exchange between server and clients in a manner that respects data privacy while also minimizing computational and communication overhead. Our evaluation of FedCoLLM, utilizing various public LLMs and SLMs across a range of NLP text generation tasks, reveals that the performance of clients' SLMs experiences notable improvements with the assistance of the LLMs. Simultaneously, the LLMs enhanced via FedCoLLM achieves comparable performance to that obtained through direct fine-tuning on clients' data. Our code has been contributed to the FATE open-source project and is now publicly accessible at this https URL.
[611] arXiv:2411.16771 (replaced) [pdf, html, other]: Title: VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli

Comments: To appear in TMLR 2026. Code available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision Large Language Models (VLLMs) are widely acknowledged to be prone to hallucinations. Existing research addressing this problem has primarily been confined to image inputs, with limited exploration of video-based hallucinations. Furthermore, current evaluation methods fail to capture nuanced errors in generated responses, which are often exacerbated by the rich spatiotemporal dynamics of videos. To address this, we introduce VidHal, a benchmark specially designed to evaluate video-based hallucinations in VLLMs. VidHal is constructed by bootstrapping video instances across a wide range of common temporal aspects. A defining feature of our benchmark lies in the careful creation of captions which represent varying levels of hallucination associated with each video. To enable fine-grained evaluation, we propose a novel caption ordering task requiring VLLMs to rank captions by hallucinatory extent. We conduct extensive experiments on VidHal and comprehensively evaluate a broad selection of models. Our results uncover significant limitations in existing VLLMs regarding hallucination generation. Through our benchmark, we aim to inspire further research on 1) holistic understanding of VLLM capabilities, particularly regarding hallucination, and 2) extensive development of advanced VLLMs to alleviate this problem.
[612] arXiv:2411.17061 (replaced) [pdf, html, other]: Title: SCASeg: Strip Cross-Attention for Efficient Semantic Segmentation

Guoan Xu, Jiaming Chen, Wenfeng Huang, Wenjing Jia, Guangwei Gao, Guo-Jun Qi

Comments: TIP

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Vision Transformer (ViT) has achieved notable success in computer vision, with its variants widely validated across various downstream tasks, including semantic segmentation. However, as general-purpose visual encoders, ViT backbones often do not fully address the specific requirements of task decoders, highlighting opportunities for designing decoders optimized for efficient semantic segmentation. This paper proposes Strip Cross-Attention (SCASeg), an innovative decoder head specifically designed for semantic segmentation. Instead of relying on the conventional skip connections, we utilize lateral connections between encoder and decoder stages, leveraging encoder features as Queries in cross-attention modules. Additionally, we introduce a Cross-Layer Block (CLB) that integrates hierarchical feature maps from various encoder and decoder stages to form a unified representation for Keys and Values. The CLB also incorporates the local perceptual strengths of convolution, enabling SCASeg to capture both global and local context dependencies across multiple layers, thus enhancing feature interaction at different scales and improving overall efficiency. To further optimize computational efficiency, SCASeg compresses the channels of queries and keys into one dimension, creating strip-like patterns that reduce memory usage and increase inference speed compared to traditional vanilla cross-attention. Experiments show that SCASeg's adaptable decoder delivers competitive performance across various setups, outperforming leading segmentation architectures on benchmark datasets, including ADE20K, Cityscapes, COCO-Stuff 164k, and Pascal VOC2012, even under diverse computational constraints.
[613] arXiv:2412.13241 (replaced) [pdf, html, other]: Title: A systematic review of assistive technologies for children with dyslexia

Sansrit Paudel, Subek Acharya, Piriyankan Kirupaharan, Bishal KC, Bipul Thapa

Subjects: Human-Computer Interaction (cs.HC)

Dyslexia is a neurological learning disability that primarily disrupts one's ability to read, write, and spell, affecting an estimated 15-20% of the global population. This high prevalence underscores the importance of developing effective interventions. This study presents a systematic literature review conducted between 2015 and 2024 to evaluate current trends in assistive technologies for children with dyslexia. This research shows that digital assistive technologies are leading interventions, especially with the use of mobile apps and augmented reality. More innovative technologies like virtual reality, NLP, haptic technologies, and tangible user interfaces are emerging to provide unique solutions addressing the user's needs. While non-computing devices are generally less effective in comparison to modern digital solutions, they provide a promising alternative in settings with limited access to technology.
[614] arXiv:2501.02576 (replaced) [pdf, html, other]: Title: DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Ziyang Song, Zerong Wang, Bo Li, Hao Zhang, Ruijie Zhu, Li Liu, Peng-Tao Jiang, Tianzhu Zhang

Comments: 11 pages, 6 figures, 6 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Monocular depth estimation within the diffusion-denoising paradigm demonstrates impressive generalization ability but suffers from low inference speed. Recent methods adopt a single-step deterministic paradigm to improve inference efficiency while maintaining comparable performance. However, they overlook the gap between generative and discriminative features, leading to suboptimal results. In this work, we propose DepthMaster, a single-step diffusion model designed to adapt generative features for the discriminative depth estimation task. First, to mitigate overfitting to texture details introduced by generative features, we propose a Feature Alignment module, which incorporates high-quality semantic features to enhance the denoising network's representation capability. Second, to address the lack of fine-grained details in the single-step deterministic framework, we propose a Fourier Enhancement module to adaptively balance low-frequency structure and high-frequency details. We adopt a two-stage training strategy to fully leverage the potential of the two modules. In the first stage, we focus on learning the global scene structure with the Feature Alignment module, while in the second stage, we exploit the Fourier Enhancement module to improve the visual quality. Through these efforts, our model achieves state-of-the-art performance in terms of generalization and detail preservation, outperforming other diffusion-based methods across various datasets. Our project page can be found at this https URL.
[615] arXiv:2501.03946 (replaced) [pdf, other]: Title: Proxy Discrimination After Students for Fair Admissions

Frank Fagan

Journal-ref: Journal of Law & Technology at Texas, forthcoming 2025

Subjects: Computers and Society (cs.CY)

Today, there is no clear legal test for regulating the use of variables that proxy for race and other protected classes and classifications. This Article develops such a test. Decision tools that use proxies are narrowly tailored when they exhibit the weakest total proxy power. The test is necessarily comparative. Thus, if two algorithms predict loan repayment or university academic performance with identical accuracy rates, but one uses zip code and the other does not, then the second algorithm can be said to have deployed a more equitable means for achieving the same result as the first algorithm. Scenarios in which two algorithms produce comparable and non-identical results present a greater challenge. This Article suggests that lawmakers can develop caps to permissible proxy power over time, as courts and algorithm builders learn more about the power of variables. Finally, the Article considers who should bear the burden of producing less discriminatory alternatives and suggests plaintiffs remain in the best position to keep defendants honest - so long as testing data is made available.
[616] arXiv:2501.11275 (replaced) [pdf, html, other]: Title: Higher Order Approximation Rates for ReLU CNNs in Korobov Spaces

Yuwen Li, Guozhi Zhang

Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

This paper investigates the $L_p$ approximation error for higher order Korobov functions using deep convolutional neural networks (CNNs) with ReLU activation. For target functions having a mixed derivative of order m+1 in each direction, we improve classical approximation rate of second order to (m+1)-th order (modulo a logarithmic factor) in terms of the depth of CNNs. The key ingredient in our analysis is approximate representation of high-order sparse grid basis functions by CNNs. The results suggest that higher order expressivity of CNNs does not severely suffer from the curse of dimensionality.
[617] arXiv:2502.04416 (replaced) [pdf, html, other]: Title: Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis

Zehua Pei, Hui-Ling Zhen, Lancheng Zou, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu

Comments: Accepted by ACL 2026 Main

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Scaling large language models (LLMs) improves performance but significantly increases inference costs, with feed-forward networks (FFNs) consuming the majority of computational resources. While Mixture-of-Experts (MoE) architectures can reduce this cost through sparse activation, restructuring existing dense models into MoEs typically requires extensive retraining on hundreds of billions of tokens. We propose an analytical post-training framework that rapidly restructures FFNs into sparse MoE architectures using only a small calibration dataset. The method analyzes neuron activation patterns to partition neurons into always-active shared experts and conditionally activated routed experts, then constructs a router analytically from representative neuron statistics, enabling immediate deployment or optional lightweight fine-tuning. This approach applies both to dense models and recursively to existing MoE models for hierarchical sparsity. Experiments demonstrate up to $1.17\times$ speedup in compute-bound scenarios with only minutes of processing and 2k-sample fine-tuning, outperforming methods requiring orders of magnitude more resources.
[618] arXiv:2502.09795 (replaced) [pdf, html, other]: Title: Geometry-aided Vision-based Localization of Future Mars Helicopters in Challenging Illumination Conditions

Dario Pisanti, Robert Hewitt, Roland Brockers, Georgios Georgakis

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Planetary exploration using aerial assets has the potential for unprecedented scientific discoveries on Mars. While NASA's Mars helicopter Ingenuity proved flight in Martian atmosphere is possible, future Mars rotorcraft will require advanced navigation capabilities for long-range flights. One such critical capability is Map-based Localization (MbL) which registers an onboard image to a reference map during flight to mitigate cumulative drift from visual odometry. However, significant illumination differences between rotorcraft observations and a reference map prove challenging for traditional MbL systems, restricting the operational window of the vehicle. In this work, we investigate a new MbL system and propose Geo-LoFTR, a geometry-aided deep learning model for image registration that is more robust under large illumination differences than prior models. The system is supported by a custom simulation framework that uses real orbital maps to produce large amounts of realistic images of the Martian terrain. Comprehensive evaluations show that our proposed system outperforms prior MbL efforts in terms of localization accuracy under significant lighting and scale variations. Furthermore, we demonstrate the validity of our approach across a simulated Martian day and on real Mars imagery. Code and datasets are available at: this https URL.
[619] arXiv:2502.15793 (replaced) [pdf, html, other]: Title: Anomaly Detection in Smart Power Grids with Graph-Regularized MS-SVDD: a Multimodal Subspace Learning Approach

Thomas Debelle, Fahad Sohrab, Pekka Abrahamsson, Moncef Gabbouj

Comments: 23 pages, 5 figures, supplementary material

Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Anomaly detection in smart power grids is a critical challenge due to the complexity, heterogeneity, and dynamic nature of sensor data streams. Existing one-class classification methods, particularly Subspace Support Vector Data Description (SVDD), have been extended to multimodal scenarios but often fail to fully exploit the structural dependencies across modalities, limiting their robustness in real-world applications. In this paper, we address this gap by proposing a generalized Multimodal Subspace Support Vector Data Description (MS-SVDD) model with graph-embedded regularization. The method projects data from multiple modalities into a shared low-dimensional subspace while preserving modality-specific structure through Laplacian regularizers. Our approach is evaluated on a three-modality dataset derived from smart grid event time series, using a dedicated preprocessing pipeline for constructing one-class classification training samples. The results demonstrate that our graph-embedded MS-SVDD improves robustness of event detection compared to conventional approaches, highlighting the potential of integrating graph priors with multimodal subspace learning for advancing anomaly detection in critical infrastructure. More broadly, this work contributes to the wider field of AI by illustrating how relational and structural information can be systematically embedded into one-class models, enabling robust learning under complex, high-dimensional, and multimodal conditions.
[620] arXiv:2502.20769 (replaced) [pdf, html, other]: Title: Information Bottleneck-Guided Heterogeneous Graph Learning for Interpretable Neurodevelopmental Disorder Diagnosis

Yueyang Li, Lei Chen, Wenhao Dong, Shengyu Gong, Zijian Kang, Boyang Wei, Weiming Zeng, Hongjie Yan, Lingbin Bian, Zhiguo Zhang, Wai Ting Siok, Nizhuan Wang

Journal-ref: Neurocomputing, 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Developing interpretable models for neurodevelopmental disorders (NDDs) diagnosis presents significant challenges in effectively encoding, decoding, and integrating multimodal neuroimaging data. While many existing machine learning approaches have shown promise in brain network analysis, they typically suffer from limited interpretability, particularly in extracting meaningful biomarkers from functional magnetic resonance imaging (fMRI) data and establishing clear relationships between imaging features and demographic characteristics. Besides, current graph neural network methodologies face limitations in capturing both local and global functional connectivity patterns while simultaneously achieving theoretically principled multimodal data fusion. To address these challenges, we propose the Interpretable Information Bottleneck Heterogeneous Graph Neural Network (I2B-HGNN), a unified framework that applies information bottleneck principles to guide both brain connectivity modeling and cross-modal feature integration. This framework comprises two complementary components. The first is the Information Bottleneck Graph Transformer (IBGraphFormer), which combines transformer-based global attention mechanisms with graph neural networks through information bottleneck-guided pooling to identify sufficient biomarkers. The second is the Information Bottleneck Heterogeneous Graph Attention Network (IB-HGAN), which employs meta-path-based heterogeneous graph learning with structural consistency constraints to achieve interpretable fusion of neuroimaging and demographic data. The experimental results demonstrate that I2B-HGNN achieves superior performance in diagnosing NDDs, exhibiting both high classification accuracy and the ability to provide interpretable biomarker identification while effectively analyzing non-imaging data.
[621] arXiv:2503.10475 (replaced) [pdf, html, other]: Title: Stratified Topological Autonomy for Long-Range Coordination (STALC)

Cora A. Duggan, Adam Goertz, Adam Polevoy, Mark Gonzales, Kevin C. Wolfe, Bradley Woosley, John G. Rogers III, Joseph Moore

Comments: ©2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

In this paper, we present Stratified Topological Autonomy for Long-Range Coordination (STALC), a hierarchical planning approach for multi-robot coordination in real-world environments with significant inter-robot spatial and temporal dependencies. At its core, STALC consists of a multi-robot graph-based planner which combines a topological graph with a novel, computationally efficient mixed-integer programming formulation to generate highly-coupled multi-robot plans in seconds. To enable autonomous planning across different spatial and temporal scales, we construct our graphs so that they capture connectivity between free-space regions and other problem-specific features, such as traversability or risk. We then use receding-horizon planners to achieve local collision avoidance and formation control. To evaluate our approach, we consider a multi-robot reconnaissance scenario where robots must autonomously coordinate to navigate through an environment while minimizing the risk of detection by observers. Through simulation-based experiments, we show that our approach is able to scale to address complex multi-robot planning scenarios. Through hardware experiments, we demonstrate our ability to generate graphs from real-world data and successfully plan across the entire hierarchy to achieve shared objectives.
[622] arXiv:2503.16416 (replaced) [pdf, html, other]: Title: Survey on Evaluation of LLM-based Agents

Asaf Yehudai, Lilach Eden, Alan Li, Guy Uziel, Yilun Zhao, Roy Bar-Haim, Arman Cohan, Michal Shmueli-Scheuer

Comments: ACL Findings

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

LLM-based agents represent a paradigm shift in AI, enabling autonomous systems to plan, reason, and use tools while interacting with dynamic environments. This paper provides the first comprehensive survey of evaluation methods for these increasingly capable agents. We analyze the field of agent evaluation across five perspectives: (1) Core LLM capabilities needed for agentic workflows, like planning, and tool use; (2) Application-specific benchmarks such as web and SWE agents; (3) Evaluation of generalist agents; (4) Analysis of agent benchmarks' core dimensions; and (5) Evaluation frameworks and tools for agent developers. Our analysis reveals current trends, including a shift toward more realistic, challenging evaluations with continuously updated benchmarks. We also identify critical gaps that future research must address, particularly in assessing cost-efficiency, safety, and robustness, and in developing fine-grained, scalable evaluation methods.
[623] arXiv:2503.17239 (replaced) [pdf, html, other]: Title: SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche

Journal-ref: Findings of the ACL 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Fine-tuning large language models (LLMs) is a common practice to adapt generalist models to specialized domains. However, recent studies show that fine-tuning can erode safety alignment, causing LLMs to respond to harmful or unethical prompts. Many methods to realign safety have been proposed, but often introduce custom algorithms that are difficult to implement or compromise task utility. In this work, we propose SafeMERGE, a lightweight, post-fine-tuning framework that restores safety while maintaining downstream performance. SafeMERGE selectively merges fine-tuned with safety-aligned model layers only when they deviate from safe behavior, measured by a cosine similarity criterion. Across four LLMs and several tasks, SafeMERGE consistently reduces harmful outputs compared to other defenses, with negligible or even positive impact on utility. Our results demonstrate that selective, layer-wise merging offers a robust safeguard against the inadvertent loss of safety during fine-tuning, establishing SafeMERGE as a simple yet effective post-fine-tuning defense.
[624] arXiv:2504.03476 (replaced) [pdf, html, other]: Title: Anatomy-Aware Text-Visual Fusion with Dual-Perspective Prompts for Fine-Grained Lumbar Spine Segmentation

Sheng Lian, Jianlong Cai, Dengfeng Pan, Guang-Yong Chen, Hao Xu, Fan Zhang, Guodong Fan, Shuo Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate lumbar spine segmentation is crucial for diagnosing spinal disorders. Existing methods typically use coarse-grained segmentation strategies that lack the fine detail needed for precise diagnosis. Additionally, their reliance on visual-only models hinders the capture of anatomical semantics, leading to misclassified categories and poor segmentation details. To address these limitations, we present ATM-Net, an innovative framework that employs an anatomy-aware, text-guided, multi-modal fusion mechanism for fine-grained segmentation of lumbar substructures, i.e., vertebrae (VBs), intervertebral discs (IDs), and spinal canal (SC). ATM-Net adopts the Anatomy-aware Text Prompt Generator (ATPG) to adaptively convert image annotations into anatomy-aware prompts in different views. These insights are further integrated with image features via the Holistic Anatomy-aware Semantic Fusion (HASF) module, building a comprehensive anatomical context. The Channel-wise Contrastive Anatomy-Aware Enhancement (CCAE) module further enhances class discrimination and refines segmentation through class-wise channel-level multi-modal contrastive learning. Extensive experiments on the MRSpineSeg and SPIDER datasets demonstrate that ATM-Net significantly outperforms state-of-the-art methods, with consistent improvements regarding class discrimination and segmentation details. For example, ATM-Net achieves Dice of 79.39% and HD95 of 9.91 pixels on SPIDER, outperforming the competitive SpineParseNet by 8.31% and 4.14 pixels, respectively.
[625] arXiv:2504.07940 (replaced) [pdf, other]: Title: Beyond the Frame: Generating 360 Panoramic Videos from Perspective Videos

Rundong Luo, Matthew Wallingford, Ali Farhadi, Noah Snavely, Wei-Chiu Ma

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

360° videos have emerged as a promising medium to represent our dynamic visual world. Compared to the "tunnel vision" of standard cameras, their borderless field of view offers a more complete perspective of our surroundings. While existing video models excel at producing standard videos, their ability to generate full panoramic videos remains elusive. In this paper, we investigate the task of video-to-360° generation: given a perspective video as input, our goal is to generate a full panoramic video that is consistent with the original video. Unlike conventional video generation tasks, the output's field of view is significantly larger, and the model is required to have a deep understanding of both the spatial layout of the scene and the dynamics of objects to maintain spatio-temporal consistency. To address these challenges, we first leverage the abundant 360° videos available online and develop a high-quality data filtering pipeline to curate pairwise training data. We then carefully design a series of geometry- and motion-aware operations to facilitate the learning process and improve the quality of 360° video generation. Experimental results demonstrate that our model can generate realistic and coherent 360° videos from in-the-wild perspective video. In addition, we showcase its potential applications, including video stabilization, camera viewpoint control, and interactive visual question answering.
[626] arXiv:2504.11159 (replaced) [pdf, html, other]: Title: C-SHAP for time series: An approach to high-level temporal explanations

Annemarie Jutte, Faizan Ahmed, Jeroen Linssen, Maurice van Keulen

Comments: Comments: 18 pages, 7 figures, improved and expanded version of the original paper

Subjects: Artificial Intelligence (cs.AI)

In high-stakes domains, such as healthcare and industry, the explainability of AI-based decision-making has become crucial. Without insight into model reasoning, the reliability of these models cannot be ensured. Applications often rely on the time series data type which, unlike the image data type, is underexplored with respect to the development of explainable AI (XAI) techniques. Most existing XAI techniques for time series are focused on point- or subsequence-based explanations. This limits their usability since points and subsequences do not capture all relevant patterns and may not result in human-interpretable explainability. In this paper, we close this gap and propose a concept-based XAI approach (C-SHAP), where concepts are defined as high-level patterns extracted from the time series data. C-SHAP leverages the SHAP method to determine the influence of these concepts on predictions. The effectiveness of the developed framework is illustrated for use cases from healthcare and industry, in the form of Human Activity Recognition (HAR) and predictive maintenance.
[627] arXiv:2504.19363 (replaced) [pdf, html, other]: Title: Sequence Reconstruction for Sticky Insertion/Deletion Channels

Van Long Phuoc Pham, Yeow Meng Chee, Kui Cai, Van Khu Vu

Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

The sequence reconstruction problem for insertion/deletion channels has attracted significant attention owing to their applications recently in some emerging data storage systems, such as racetrack memories, DNA-based data storage. Our goal is to investigate the reconstruction problem for sticky-insdel channels where both sticky-insertions and sticky-deletions occur. If there are only sticky-insertion errors, the reconstruction problem for sticky-insertion channel is a special case of the reconstruction problem for tandem-duplication channel which has been well-studied. In this work, we consider the $(t, s)$-sticky-insdel channel where there are at most $t$ sticky-insertion errors and $s$ sticky-deletion errors when we transmit a message through the channel. For the reconstruction problem, we are interested in the minimum number of distinct outputs from these channels that are needed to uniquely recover the transmitted vector. We first provide a recursive formula to determine the minimum number of distinct outputs required. Next, we provide an efficient algorithm to reconstruct the transmitted vector from erroneous sequences.
[628] arXiv:2505.06617 (replaced) [pdf, html, other]: Title: Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites

Timothée Anne, Noah Syrkis, Meriem Elhosni, Florian Turati, Franck Legendre, Alain Jaquier, Sebastian Risi

Comments: This is the extended version (published in IEEE Transactions on Evolutionary Computation) of our conference paper presented at ALIFE 2025

Subjects: Neural and Evolutionary Computing (cs.NE)

Quality-Diversity (QD) algorithms seek to discover diverse, high-performing solutions across a behavior space, in contrast to conventional optimization methods that target a single optimum. Adversarial problems present unique challenges for QD approaches, as the competing nature of opposing sides creates interdependencies that complicate the evolution process. Existing QD methods applied to such scenarios typically fix one side, constraining the open-endedness. We present Generational Adversarial MAP-Elites (GAME), a coevolutionary QD algorithm that evolves both sides by alternating which side is evolved at each generation. By integrating a vision embedding model (VEM), our approach eliminates the need for domain-specific behavior descriptors and instead operates on video. We validate GAME across three distinct adversarial domains: a multi-agent battle game, a soft-robot wrestling environment, and a deck building game. We validate that all its components are necessary, that the VEM is effective in two different domains, and that GAME finds better solutions than one-sided QD baselines. Our experiments reveal several evolutionary phenomena, including arms race-like dynamics, enhanced novelty through generational extinction, and the preservation of neutral mutations as crucial stepping stones toward the highest performance. While GAME successfully illuminates all three adversarial problems, its capacity for truly open-ended discovery remains constrained by the nature of the search spaces used in this paper. These findings show GAME's broad applicability and highlight opportunities for future research into open-ended adversarial coevolution. Code and videos are available at: this https URL
[629] arXiv:2505.09254 (replaced) [pdf, html, other]: Title: Moving towards informative and actionable social media research

Joseph B. Bak-Coleman, Stephan Lewandowsky, Philipp Lorenz-Spreen, Arvind Narayanan, Amy Orben, Lisa Oswald

Subjects: Social and Information Networks (cs.SI); Adaptation and Self-Organizing Systems (nlin.AO)

Social media is nearly ubiquitous in modern life, raising concerns about its societal impacts -- from mental health and polarization to violence and democratic disruption. Yet research on its causal effects is still inconclusive: Various methods, spanning observational to experimental, can yield seemingly conflicting results. Considering the complexity of such socio-technical systems, with coupled networks, feedback loops and collective phenomena, this may not be surprising. Here, we enumerate and examine the features of social media as a complex system that challenge our ability to infer causality at societal scales. Attempts to ascertain and summarize causal effects have tended to prioritize findings from randomized controlled trials (RCTs). However, like observational studies, RCTs rely on assumptions that may frequently be violated in the context of social media, especially regarding societal outcomes at scale. Drawing on insight from disciplines that have faced similar challenges, like climate-science or epidemiology, we propose a path forward that combines the strengths of observational and experimental approaches while acknowledging the limitations of each. Progress, we argue, requires moving beyond isolated, linear effects to mechanistic explanations of how social media platforms generate collective outcomes.
[630] arXiv:2505.09971 (replaced) [pdf, html, other]: Title: APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds

Yuan Gao, Shaobo Xia, Sheng Nie, Cheng Wang, Xiaohuan Xi, Bisheng Yang

Comments: 18 pages,12 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Airborne laser scanning (ALS) point cloud semantic segmentation is a fundamental task for large-scale 3D scene understanding. Fixed models deployed in real-world scenarios often suffer from performance degradation due to continuous domain shifts caused by environmental and sensor changes. Continuous Test-Time Adaptation (CTTA) enables adaptation to evolving unlabeled domains, but its application to ALS point clouds remains underexplored, hindered by the lack of benchmarks and the risks of catastrophic forgetting and error accumulation. To address these challenges, we propose APCoTTA (ALS Point cloud Continuous Test-Time Adaptation), a novel CTTA framework tailored for ALS point cloud semantic segmentation. APCoTTA consists of three key components. First, we adapt a gradient-driven layer selection mechanism for ALS point clouds, selectively updating low-confidence layers while freezing stable ones to preserve source knowledge and mitigate catastrophic forgetting. Second, an entropy-based consistency loss discards unreliable samples and enforces consistency regularization solely on reliable ones, effectively reducing error accumulation and improving adaptation stability. Third, a random parameter interpolation mechanism stochastically blends adapted parameters with source model parameters, further balancing target adaptation and source knowledge retention. Finally, we construct two benchmarks, ISPRSC and H3DC, to address the lack of CTTA benchmarks for ALS point cloud segmentation. Extensive experiments demonstrate that APCoTTA achieves superior performance on both benchmarks, improving mIoU by approximately 9\% and 14\% over direct inference. The new benchmarks and code are available at this https URL.
[631] arXiv:2505.10708 (replaced) [pdf, html, other]: Title: SafeTrans: LLM-assisted Transpilation from C to Rust

Muhammad Farrukh, Baris Coskun, Tapti Palit, Michalis Polychronakis

Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Rust is a strong contender for a memory-safe alternative to C as a "systems" language, but porting the vast amount of existing C code to Rust remains daunting. In this paper, we evaluate the potential of large language models (LLMs) to automate the transpilation of C code to idiomatic Rust. We present SafeTrans, a generic framework that leverages LLMs to i) transpile C code into Rust, and ii) iteratively repair compilation and runtime errors. A key novelty of our approach is a few-shot guided repair technique for translation errors, which provides contextual information and example code snippets for specific error types, guiding the LLM toward the correct solution. Another novel aspect of our work is the evaluation of the security implications of the transpilation process, showing how some vulnerability classes in C persist in the translated Rust code. SafeTrans was evaluated with six leading LLMs on 2,653 C programs and two real-world C projects. Our results show that iterative repair improves the rate of successful translations from 54% to 80% for the best-performing LLM (gpt-4o).
[632] arXiv:2505.11336 (replaced) [pdf, other]: Title: XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration

Nuo Chen, Andre Lin HuiKai, Jiaying Wu, Junyi Hou, Zining Zhang, Qian Wang, Xidong Wang, Bingsheng He

Comments: 41 pages, 19 figures; Accepted at ACL 2026

Subjects: Computation and Language (cs.CL)

Despite the growing adoption of large language models (LLMs) in academic workflows, their capabilities remain limited in supporting high-quality scientific writing. Most existing systems are designed for general-purpose scientific text generation and fail to meet the sophisticated demands of research communication beyond surface-level polishing, for example, maintaining conceptual coherence across sections. Furthermore, academic writing is inherently iterative and revision-driven, a process that is not well supported by direct prompting-based paradigms. To address these scenarios, we propose a human-AI collaboration framework for academic paper revision, centered on criteria-guided intent alignment and context-aware modeling. To validate the framework, we curate a dataset of 7,000 research papers from top-tier venues, annotated with 140,000 instruction--response pairs that reflect realistic, section-level scientific revisions. We instantiate the framework in XtraGPT, the first suite of open-source LLMs (1.5B to 14B parameters) specifically fine-tuned for context-aware academic paper revision. Extensive experiments show that XtraGPT significantly outperforms same-scale baselines and rivals the quality of proprietary counterparts. Both automated preference assessments and human evaluations confirm the effectiveness of XtraGPT in improving scientific drafts. Our code and models are available at this https URL and this https URL.
[633] arXiv:2505.11702 (replaced) [pdf, html, other]: Title: Post-Training Augmentation Invariance

Keenan Eikenberry, Lizuo Liu, Yoonsang Lee

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This work develops a framework for post-training augmentation invariance, in which our goal is to add invariance properties to a pretrained network without altering its behavior on the original, non-augmented input distribution. We define this notion precisely and additionally introduce augmented encoders, which are probabilistic encoders that formalize augmentation-based encoding processes and that serve as our fundamental object of study. We introduce two losses for augmented encoders, namely, Markov-Wasserstein minimization and Wasserstein correlation maximization, and we demonstrate empirically that both losses can be used to train lightweight, one-hidden-layer MLP adapter networks E_theta that, when appended to the latent space of a pretrained network F, do indeed lead to (approximate) post-training augmentation invariance. For example, on STL10 with F = DINOv2 features, the composite network C o E_theta o F, where C is a linear classifier and where E_theta is one of our proposed adapter networks, achieves 94% classification accuracy on arbitrarily rotated images, whereas a network of the form C o F without the adapter E_theta drops to 71% accuracy. Similarly, we can boost noise-invariant classification results from 58% up to 86%. Significantly, we obtain these results with no fine-tuning (the weights of F remain frozen throughout), and our methods introduce little corruption to the original features, since E_theta acts nearly isometrically on the non-augmented latent distribution. In contrast, we show that adapter networks trained with alternative candidate losses, specifically SimCLR and HSIC maximization, produce uncompetitive classification results and fundamentally corrupt the original latent space. Code available at: this https URL
[634] arXiv:2505.12282 (replaced) [pdf, other]: Title: Kernel interpolation on generalized sparse grids

Michael Griebel, Helmut Harbrecht, Michael Multerer

Subjects: Numerical Analysis (math.NA)

We consider scattered data approximation on product regions of equal and different dimensionality. On each of these regions, we assume quasi-uniform but unstructured data sites and construct optimal sparse grids for scattered data interpolation on the product region. For this, we derive new improved error estimates for the respective kernel interpolation error by invoking duality arguments. An efficient algorithm to solve the underlying linear system of equations is proposed. The algorithm is based on the sparse grid combination technique, where a sparse direct solver is used for the elementary anisotropic tensor product kernel interpolation problems. The application of the sparse direct solver is facilitated by applying a samplet matrix compression to each univariate kernel matrix, resulting in an essentially sparse representation of the latter. In this way, we obtain a method that is able to deal with large problems up to billions of interpolation points, especially in case of reproducing kernels of nonlocal nature. Numerical results are presented to qualify and quantify the approach.
[635] arXiv:2505.13527 (replaced) [pdf, html, other]: Title: Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression

Jingyu Peng, Maolin Wang, Nan Wang, Jiatong Li, Yuchen Li, Yuyang Ye, Wanyu Wang, Pengyue Jia, Kai Zhang, Xiangyu Zhao

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Despite substantial advancements in aligning large language models (LLMs) with human values, current safety mechanisms remain susceptible to jailbreak attacks. We hypothesize that this vulnerability stems from distributional discrepancies between alignment-oriented prompts and malicious prompts. To investigate this, we introduce LogiBreak, a novel and universal black-box jailbreak method that leverages logical expression translation to circumvent LLM safety systems. By converting harmful natural language prompts into formal logical expressions, LogiBreak exploits the distributional gap between alignment data and logic-based inputs, preserving the underlying semantic intent and readability while evading safety constraints. We evaluate LogiBreak on a multilingual jailbreak dataset spanning three languages, demonstrating its effectiveness across various evaluation settings and linguistic contexts.
[636] arXiv:2505.15269 (replaced) [pdf, html, other]: Title: LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval

Zhenyu Ning, Guangda Liu, Qihao Jin, Chengwei Li, Wenchao Ding, Minyi Guo, Jieru Zhao

Comments: Accepted by DAC'26

Journal-ref: 63rd ACM/IEEE Design Automation Conference (DAC '26), July 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent developments in Video Large Language Models (Video LLMs) have enabled models to process hour-long videos and exhibit exceptional performance. Nonetheless, the Key-Value (KV) cache expands linearly over time, leading to substantial memory overhead and response delay--critical challenges in various real-world online applications, such as Deepseek services, autonomous driving and robotics. To mitigate these issues, we propose $\textbf{LiveVLM}$, a training-free and query-agnostic framework specifically designed for online video understanding and real-time interaction. LiveVLM employs a Vision Sink Bucketing (VSB) mechanism to process video streams in real time, retain long-term video details and eliminate redundant KVs. This mechanism utilizes vision-to-vision attention scores as the metric and seeks to maximize the coverage of contextual information during compression. Noting that KV cache compressed in a query-agnostic manner inevitably retains irrelevant information for specific queries, LiveVLM incorporates a Position-agnostic KV Retrieval (PaR) mechanism to reduce interference from redundant context. The keypoint of PaR lies in decoupling positional embeddings to enhance the similarity between key tensors, thereby supporting efficient retrieval at the granularity of pages. Extensive experiments demonstrate that LiveVLM enables the foundation LLaVA-OneVision model to achieve state-of-the-art accuracy among both training-free query-agnostic methods and training-based online models.
[637] arXiv:2505.16737 (replaced) [pdf, html, other]: Title: Secure LLM Fine-Tuning via Safety-Aware Probing

Chengcan Wu, Zhixin Zhang, Zeming Wei, Yihao Zhang, Xiaokun Luan, Meng Sun

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Optimization and Control (math.OC)

Large language models (LLMs) have achieved remarkable success across many applications, but their ability to generate harmful content raises serious safety concerns. Although safety alignment techniques are often applied during pre-training or post-training, recent studies show that subsequent fine-tuning on adversarial or even benign data can still compromise model safety. In this paper, we revisit the fundamental question of why fine-tuning on non-harmful data may nevertheless degrade safety. We show that the safety and task-performance loss landscapes are partially decoupled, so updates that improve task-specific performance may still move the model toward unsafe regions. Based on this insight, we propose a safety-aware probing (SAP) optimization framework for mitigating safety risks during fine-tuning. Concretely, SAP uses contrastive safety signals to locate safety-correlated directions, and optimizes a lightweight probe that perturbs hidden-state propagation during fine-tuning, thereby steering parameter updates away from harmful trajectories while preserving task-specific learning. Extensive experiments show that SAP consistently improves the safety--utility tradeoff across multiple models and tasks. Averaged over multiple LLMs, SAP reduces the harmful score significantly relative to standard fine-tuning, outperforming strong baselines while maintaining competitive task-specific performance. SAP also demonstrates stronger robustness under harmful data poisoning, adversarial fine-tuning, and a dedicated post-fine-tuning adaptive attack, validating that SAP is an effective and scalable framework for preserving LLM safety during fine-tuning. Our code is available at this https URL.
[638] arXiv:2505.18648 (replaced) [pdf, other]: Title: TEE is not a Healer: Rollback-Resistant Reliable Storage (Extended Version)

Sadegh Keshavarzi, Gregory Chockler, Alexey Gotsman

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Recent advances in secure hardware technologies, such as Intel SGX or ARM TrustZone, offer an opportunity to substantially reduce the costs of Byzantine fault-tolerance by placing the program code and state within a secure enclave known as a Trusted Execution Environment (TEE). However, the protection offered by a TEE only applies during program execution. Once power is switched off, the non-volatile portion of the program state becomes vulnerable to rollback attacks wherein it is undetectably reverted to an older version. In this paper we consider the problem of implementing reliable read/write registers out of failure-prone replicas subject to state rollbacks. To this end, we introduce a new unified model that captures multiple failure types that can affect a TEE-based system and establish tight bounds on the fault-tolerance of register constructions in this model. We consider both the static case, where failure thresholds hold throughout the entire execution, and the dynamic case, where any number of replicas can roll back, provided these failures do not occur too often. Our dynamic register emulation algorithm, TEE-Rex, provides the first correct implementation of a distributed state recovery procedure that requires neither durable storage nor specialized hardware, such as trusted monotonic counters.
[639] arXiv:2505.20243 (replaced) [pdf, html, other]: Title: It's High Time: A Survey of Temporal Question Answering

Bhawna Piryani, Abdelrahman Abdallah, Jamshid Mozafari, Avishek Anand, Adam Jatowt

Comments: Accepted at ACL 2026

Journal-ref: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics: ACL 2026

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Time plays a critical role in how information is generated, retrieved, and interpreted. In this survey, we provide a comprehensive overview of Temporal Question Answering (TQA), a research area that focuses on answering questions involving temporal constraints or context. As time-stamped content from sources like news articles, web archives, and knowledge bases continues to grow, TQA systems must address challenges such as detecting temporal intent, normalizing time expressions, ordering events, and reasoning over evolving or ambiguous facts. We organize existing work through a unified perspective that captures the interaction between corpus temporality, question temporality, and model capabilities, enabling a systematic comparison of datasets, tasks, and approaches. We review recent advances in TQA enabled by neural architectures, especially transformer-based models and Large Language Models (LLMs), highlighting progress in temporal language modeling, retrieval-augmented generation (RAG), and temporal reasoning. We also discuss benchmark datasets and evaluation strategies designed to test temporal robustness,
[640] arXiv:2505.22266 (replaced) [pdf, html, other]: Title: FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation

Jialin Yan, Yu Cheng, Zhaoxia Yin, Xinpeng Zhang, Shilin Wang, Tanfeng Sun, Xinghao Jiang

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

The rapid development of Artificial Intelligence Generated Content (AIGC) has made high-fidelity generated audio widely available across the Internet, driving the advancement of audio steganography. Benefiting from advances in deep learning, current audio steganography schemes are mainly based on encoder-decoder network architectures. While these methods guarantee a certain level of perceptual quality for stego audio, they typically face high computational cost and long implementation time, as well as poor anti-steganalysis performance. To address the aforementioned issues, we pioneer a Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation (FGAS). Adversarial perturbations carrying a secret message are embedded into the cover audio to generate stego audio. The receiver only needs to share the structure and key of the fixed decoder network to accurately extract the secret message from the stego audio. In FGAS, we propose an Audio Adversarial Perturbation Generation (A2PG) strategy with an optional robust extension and design a lightweight fixed decoder. The fixed decoder guarantees reliable extraction of the hidden message, while adversarial perturbations are optimized to keep the stego audio perceptually and statistically close to the cover audio, thereby improving anti-steganalysis performance. The experimental results show that FGAS significantly improves stego audio quality, achieving an average PSNR gain of over 10 dB compared to SOTA methods. Furthermore, FGAS demonstrates strong robustness against common audio processing attacks. Moreover, FGAS exhibits superior anti-steganalysis performance across different relative payloads; under high-capacity embedding, it achieves a classification error rate about 2% higher, indicating stronger anti-steganalysis performance than current SOTA methods.
[641] arXiv:2506.01937 (replaced) [pdf, html, other]: Title: RewardBench 2: Advancing Reward Model Evaluation

Saumya Malik, Valentina Pyatkin, Sander Land, Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Nathan Lambert

Comments: Data, models, and leaderboard available at this https URL

Journal-ref: ICLR 2026

Subjects: Computation and Language (cs.CL)

Reward models are used throughout the post-training of language models to capture nuanced signals from preference data and provide a training target for optimization across instruction following, reasoning, safety, and more domains. The community has begun establishing best practices for evaluating reward models, from the development of benchmarks that test capabilities in specific skill areas to others that test agreement with human preferences. At the same time, progress in evaluation has not been mirrored by the effectiveness of reward models in downstream tasks -- simpler direct alignment algorithms are reported to work better in many cases. This paper introduces RewardBench 2, a new multi-skill reward modeling benchmark designed to bring new, challenging data for accuracy-based reward model evaluation -- models score about 20 points on average lower on RewardBench 2 compared to the first RewardBench -- while being highly correlated with downstream performance. Compared to most other benchmarks, RewardBench 2 sources new human prompts instead of existing prompts from downstream evaluations, facilitating more rigorous evaluation practices. In this paper, we describe our benchmark construction process and report how existing models perform on it, while quantifying how performance on the benchmark correlates with downstream use of the models in both inference-time scaling algorithms, like best-of-N sampling, and RLHF training algorithms like proximal policy optimization.
[642] arXiv:2506.03374 (replaced) [pdf, html, other]: Title: Product Quantization for Surface Soil Similarity

Haley Dozier, Althea Henslee, Ashley Abraham, Andrew Strelzoff, Mark Chappell

Comments: To be published in the CSCE 2022 proceedings

Subjects: Machine Learning (cs.LG)

The use of machine learning (ML) techniques has allowed rapid advancements in many scientific and engineering fields. One of these problems is that of surface soil taxonomy, a research area previously hindered by the reliance on human-derived classifications, which are mostly dependent on dividing a dataset based on historical understandings of that data rather than data-driven, statistically observable similarities. Using a ML-based taxonomy allows soil researchers to move beyond the limitations of human visualization and create classifications of high-dimension datasets with a much higher level of specificity than possible with hand-drawn taxonomies. Furthermore, this pipeline allows for the possibility of producing both highly accurate and flexible soil taxonomies with classes built to fit a specific application. The machine learning pipeline outlined in this work combines product quantization with the systematic evaluation of parameters and output to get the best available results, rather than accepting sub-optimal results by using either default settings or best guess settings.
[643] arXiv:2506.04292 (replaced) [pdf, html, other]: Title: GARG-AML against Smurfing: A Scalable and Interpretable Graph-Based Framework for Anti-Money Laundering

Bruno Deprez, Bart Baesens, Tim Verdonck, Wouter Verbeke

Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Applications (stat.AP)

Purpose: We introduce GARG-AML, a fast and transparent graph-based method to catch `smurfing', a common money-laundering tactic. It assigns a single, easy-to-understand risk score to every account in both directed and undirected networks. Unlike overly complex models, it balances detection power with the speed and clarity that investigators require.
Methodology: The method maps an account's immediate and secondary connections (its second-order neighbourhood) into an adjacency matrix. By measuring the density of specific blocks within this matrix, GARG-AML flags patterns that mimic smurfing behaviour. We further boost the model's performance using decision trees and gradient-boosting classifiers, testing the results against current state-of-the-art on both synthetic and open-source data.
Findings: GARG-AML matches or beats state-of-the-art performance across all tested datasets. Crucially, it easily processes the massive transaction graphs typical of large financial institutions. By leveraging only the adjacency matrix of the second-order neighbourhood and basic network features, this work highlights the potential of fundamental network properties towards advancing fraud detection.
Originality: The originality lies in the translation of human expert knowledge of smurfing directly into a simple network representation, rather than relying on uninterpretable deep learning. Because GARG-AML is built expressly for the real-world business demands of scalability and interpretability, banks can easily incorporate it in their existing AML solutions.
[644] arXiv:2506.04634 (replaced) [pdf, other]: Title: Incentivizing Collaboration for Detection of Credential Database Breaches

Mridu Nanda, Michael K. Reiter

Subjects: Cryptography and Security (cs.CR)

Decoy passwords, or ``honeywords,'' alert a site to its breach if entered in a login attempt on that site. However, an attacker can identify a user-chosen password from among the decoys, without alerting the site to its breach, via credential stuffing, i.e., entering the stolen passwords at another site where a user reused her password. Prior work thus proposed that sites monitor for the entry of their honeywords at other sites, but the incentives for sites to participate in this monitoring remain unclear. In this paper, we propose and evaluate an algorithm by which sites can exchange monitoring favors. Through a model-checking analysis, we show that a site can improve its ability to detect its own breach when it increases the monitoring effort it expends for others. We quantify how key parameters impact detection effectiveness and their implications for deploying a monitoring ecosystem. Finally, we evaluate our algorithm on a breached credential dataset, demonstrating effectiveness at scale.
[645] arXiv:2506.09998 (replaced) [pdf, html, other]: Title: Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling

Tim Z. Xiao, Johannes Zenn, Zhen Liu, Weiyang Liu, Robert Bamler, Bernhard Schölkopf

Comments: Technical Report v2 (27 pages, 14 figures)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Large language models (LLMs) can often accurately describe probability distributions using natural language, yet they still struggle to generate faithful samples from them. This mismatch limits their use in tasks requiring reliable stochasticity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making. We investigate this gap between knowledge and sampling in the context of Bernoulli distributions. We introduce Verbalized Rejection Sampling (VRS), a natural-language adaptation of classical rejection sampling that prompts the LLM to reason about and accept or reject proposed samples. Despite relying on the same Bernoulli mechanism internally, VRS substantially reduces sampling bias across models. We provide theoretical analysis showing that, under mild assumptions, VRS improves over direct sampling, with gains attributable to both the algorithm and prompt design. More broadly, our results show how classical probabilistic tools can be verbalized and embedded into LLM workflows to improve reliability, without requiring access to model internals or heavy prompt engineering.
[646] arXiv:2506.10374 (replaced) [pdf, html, other]: Title: Optimal Non-Adaptive Group Testing with One-Sided Error Guarantees

Daniel McMorrow, Jonathan Scarlett

Journal-ref: IEEE Transactions on Information Theory (Volume: 72, Issue: 5, May 2026)

Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

The group testing problem consists of determining a sparse subset of defective items from within a larger set of items via a series of tests, where each test outcome indicates whether at least one defective item is included in the test. We study the approximate recovery setting, where the recovery criterion of the defective set is relaxed to allow a small number of items to be misclassified. In particular, we consider one-sided approximate recovery criteria, where we allow either only false negative or only false positive misclassifications. Under false negatives only (i.e., finding a subset of defectives), we show that there exists an algorithm matching the optimal threshold of two-sided approximate recovery. Under false positives only (i.e., finding a superset of the defectives), we provide a converse bound showing that the better of two existing algorithms is optimal.
[647] arXiv:2506.12721 (replaced) [pdf, html, other]: Title: Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

Bowen Zuo, Yinglun Zhu

Comments: To appear at ICLR 2026

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models. However, existing methods typically allocate compute uniformly across all queries, overlooking variation in query difficulty. To address this inefficiency, we formulate test-time compute allocation as a novel bandit learning problem and propose adaptive algorithms that estimate query difficulty on the fly and allocate compute accordingly. Compared to uniform allocation, our algorithms allocate more compute to challenging queries while maintaining accuracy on easier ones. Among challenging queries, our algorithms further learn to prioritize solvable instances, effectively reducing excessive computing on unsolvable queries. We theoretically prove that our algorithms achieve better compute efficiency than uniform allocation and empirically validate their effectiveness on math and code benchmarks. Specifically, our algorithms achieve up to an 11.10% performance improvement (15.04% relative) on the MATH-500 dataset, up to 10.82% (14.44% relative) on the AIME25 dataset, and up to an 11.23% performance improvement (15.29% relative) on the LiveCodeBench dataset.
[648] arXiv:2506.19579 (replaced) [pdf, html, other]: Title: Fake or Real, Can Robots Tell? Evaluating VLM Robustness to Domain Shift in Single-View Robotic Scene Understanding

Federico Tavella, Amber Drinkwater, Angelo Cangelosi

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Robotic scene understanding increasingly relies on Vision-Language Models (VLMs) to generate natural language descriptions of the environment. In this work, we systematically evaluate single-view object captioning for tabletop scenes captured by a robotic manipulator, introducing a controlled physical domain shift that contrasts real-world tools with geometrically similar 3D-printed counterparts that differ in texture, colour, and material. We benchmark a suite of state-of-the-art, locally deployable VLMs across multiple metrics to assess semantic alignment and factual grounding. Our results demonstrate that while VLMs describe common real-world objects effectively, performance degrades markedly on 3D-printed items despite their structurally familiar forms. We further expose critical vulnerabilities in standard evaluation metrics, showing that some fail to detect domain shifts entirely or reward fluent but factually incorrect captions. These findings highlight the limitations of deploying foundation models for embodied agents and the need for more robust architectures and evaluation protocols in physical robotic applications.
[649] arXiv:2506.21546 (replaced) [pdf, other]: Title: Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination

Xinzhuo Li, Adheesh Juvekar, Jiaxun Zhang, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Yifan Shen, Tianjiao Yu, Ismini Lourentzou

Comments: Project webpage: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Segmentation Vision-Language Models (VLMs) have significantly advanced grounded visual understanding, yet they remain prone to pixel-grounding hallucinations, producing masks for incorrect objects or for objects that are entirely absent. Existing evaluations rely almost entirely on text- or label-based perturbations, which check only whether the predicted mask matches the queried label. Such evaluations overlook the spatial footprint and severity of hallucination and therefore fail to reveal vision-driven hallucinations, which are more challenging and more prevalent. To address this gap, we formalize the task of Counterfactual Segmentation Reasoning (CSR), where a model must segment the referenced object in the factual image and abstain in its counterfactual counterpart. To support this task, we curate HalluSegBench, the first large-scale benchmark to diagnose referring and reasoning expression segmentation hallucinations using controlled visual counterfactuals, alongside new evaluation metrics that measure hallucination severity and disentangle vision- and language-driven failure modes. We further introduce RobustSeg, a segmentation VLM trained with counterfactual fine-tuning (CFT) to learn when to segment and when to abstain. Experimental results confirm RobustSeg reduces hallucinations by 30%, while improving segmentation performance on FP-RefCOCO(+/g).
[650] arXiv:2507.01829 (replaced) [pdf, other]: Title: mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

Tristan Torchet, Christian Metzner, Karthik Charan Raghunathan, Jimmy Weber, Sebastian Billaudelle, Laura Kriener, Melika Payvand

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Multi-timescale sequence modeling relies on capturing both local fast dynamics and global slow context; yet, maintaining these capabilities under the strict memory constraints common to edge devices remains an open challenge. Current State-of-the-Art models with constant memory footprints trade off long-range selectivity and high-precision modeling of fast dynamics. To overcome this trade-off within a fixed memory budget, we propose mGRADE (minimally Gated Recurrent Architecture with Delay Embedding), a hybrid-memory system that introduces inductive biases across timescales by integrating a convolution with learnable temporal spacings with a lightweight gated recurrent component. We show theoretically that the learnable spacings are equivalent to a delay embedding, enabling parameter-efficient reconstruction of partially-observed fast dynamics, while the gated recurrent component selectively maintains long-range context with minimal memory overhead. On the challenging Long-Range Arena benchmark and 35-way Google Speech Commands raw audio classification task, mGRADE reduces the memory footprint by up to a factor of 8 compared to other State-of-the-Art models, while maintaining competitive performance.
[651] arXiv:2507.03806 (replaced) [pdf, html, other]: Title: Certified Coil Geometry Learning for Short-Range Magnetic Actuation and Spacecraft Docking Application

Yuta Takahashi, Hayate Tajima, Shin-ichiro Sakai

Comments: IEEE Robotics and Automation Letters. Preprint Version. Accepted March, 2026 (DOI: this https URL)

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

This paper presents a learning-based framework for approximating an exact magnetic-field interaction model, supported by both numerical and experimental validation. High-fidelity magnetic-field interaction modeling is essential for achieving exceptional accuracy and responsiveness across a wide range of fields, including transportation, energy systems, medicine, biomedical robotics, and aerospace robotics. In aerospace engineering, magnetic actuation has been investigated as a fuel-free solution for multi-satellite attitude and formation control. Although the exact magnetic field can be computed from the Biot-Savart law, the associated computational cost is prohibitive, and prior studies have therefore relied on dipole approximations to improve efficiency. However, these approximations lose accuracy during proximity operations, leading to unstable behavior and even collisions. To address this limitation, we develop a learning-based approximation framework that faithfully reproduces the exact field while dramatically reducing computational cost. This framework directly derives a coefficient matrix that maps inter-satellite current vectors to the resulting forces and torques, enabling efficient computation of control current commands. The proposed method additionally provides a certified error bound, derived from the number of training samples, ensuring reliable prediction accuracy. The learned model can also accommodate interactions between coils of different sizes through appropriate geometric transformations, without retraining. To verify the effectiveness of the proposed framework under challenging conditions, a spacecraft docking scenario is examined through both numerical simulations and experimental validation.
[652] arXiv:2507.03933 (replaced) [pdf, other]: Title: Losing our Tail, Again: (Un)Natural Selection & Multilingual LLMs

Eva Vanmassenhove

Comments: 12 pages

Subjects: Computation and Language (cs.CL)

Multilingual Large Language Models considerably changed how technologies influence language. While previous technologies could mediate or assist humans, there is now a tendency to offload the task of writing itself to these technologies, enabling models to change our languages more directly. While they provide us quick access to information and impressively fluent output, beneath their (apparent) sophistication lies a subtle, insidious threat: the gradual decline and loss of linguistic diversity. In this position paper, I explore how model collapse, with a particular focus on translation technology, can lead to the loss of linguistic forms, grammatical features, and cultural nuance. Model collapse refers to the consequences of self-consuming training loops, where automatically generated data (re-)enters the training data, leading to a gradual distortion of the data distribution and the underrepresentation of low-probability linguistic phenomena. Drawing on recent work in Computer Vision, Natural Language Processing and Machine Translation, I argue that the many tails of our linguistic distributions might be vanishing, and with them, the narratives and identities they carry. This paper is a call to resist linguistic flattening and to reimagine Natural Language Processing as a field that encourages, values and protects expressive multilingual diversity and creativity.
[653] arXiv:2507.04023 (replaced) [pdf, html, other]: Title: Do LLMs Overthink Basic Math Reasoning? Benchmarking the Accuracy-Efficiency Tradeoff in Language Models

Gaurav Srivastava, Aafiya Hussain, Sriram Srinivasan, Xuan Wang

Comments: Accepted to ACL 2026 Findings

Subjects: Computation and Language (cs.CL)

Large language models (LLMs) achieve impressive performance on complex mathematical benchmarks yet sometimes fail on basic math reasoning while generating unnecessarily verbose responses. In this paper, we present LLMThinkBench, a systematic benchmark and comprehensive empirical study to evaluate the efficiency of reasoning in LLMs, focusing on the fundamental tradeoff between accuracy and overthinking. First, we formalize the accuracy-verbosity tradeoff. Second, we introduce the Overthinking Score, a harmonic-mean metric combining accuracy and token-efficiency for holistic model evaluation. Third, we establish an evaluation protocol with dynamically-generated data across 14 basic math tasks. Fourth, we conduct a large-scale empirical study evaluating 53 LLMs, including reasoning and quantized variants across different reasoning budgets. Fifth, we release LLMThinkBench as an open-source Python package and public leaderboard for reproducibility. Our findings reveal: 1) model performance on complex benchmarks does not translate directly to basic math reasoning; 2) reasoning models generate ~18x more tokens while sometimes achieving lower accuracy and exhibit catastrophic collapse when tokens are constrained, dropping by up to ~36%; 3) the accuracy-verbosity relationship is non-monotonic with extended reasoning budgets yielding diminishing returns (GPT-5/o-series models show zero accuracy gain from low -> medium -> high reasoning effort). Our findings challenge the assumption that longer reasoning in LLMs necessarily improves mathematical reasoning. Our public leaderboard is available at this https URL. Our open-source Python package is available at this https URL, and the codebase can be found at this https URL for easy and reproducible evaluation.
[654] arXiv:2507.05385 (replaced) [pdf, html, other]: Title: EduCoder: An Open-Source Annotation System for Education Transcript Data

Guanzhong Pan, Mei Tan, Hyunji Nam, Lucía Langlois, James Malamut, Liliana Deonizio, Dorottya Demszky

Subjects: Computation and Language (cs.CL)

We introduce EduCoder, a domain-specialized tool designed to support utterance-level annotation of educational dialogue. While general-purpose text annotation tools for NLP and qualitative research abound, few address the complexities of coding education dialogue transcripts -- with diverse teacher-student and peer interactions. Common challenges include defining codebooks for complex pedagogical features, supporting both open-ended and categorical coding, and contextualizing utterances with external features, such as the lesson's purpose and the pedagogical value of the instruction. EduCoder is designed to address these challenges by providing a platform for researchers and domain experts to collaboratively define complex codebooks based on observed data. It incorporates both categorical and open-ended annotation types along with contextual materials. Additionally, it offers a side-by-side comparison of multiple annotators' responses, allowing comparison and calibration of annotations with others to improve data reliability. The system is open-source, with a demo video available.
[655] arXiv:2507.06721 (replaced) [pdf, other]: Title: Faster Algorithms for $(2k-1)$-Stretch Distance Oracles

Avi Kadria, Liam Roditty

Subjects: Data Structures and Algorithms (cs.DS)

Let $G=(V, E)$ be an undirected $n$-vertices $m$-edges graph with non-negative edge weights. In this paper, we present three new algorithms for constructing a $(2k-1)$-stretch distance oracle with $O(n^{1+\frac{1}{k}})$ space. The first algorithm runs in $\Ot(\max(n^{1+2/k}, m^{1-\frac{1}{k-1}}n^{\frac{2}{k-1}}))$ time, and improves upon the $\Ot(\min(mn^{\frac{1}{k}},n^2))$ time of Thorup and Zwick [STOC 2001, JACM 2005] and Baswana and Kavitha [FOCS 2006, SICOMP 2010], for every $k > 2$ and $m=\Omega(n^{1+\frac{1}{k}+\eps})$. This yields the first truly subquadratic time construction for every $2 < k < 6$, and nearly resolves the open problem posed by Wulff-Nilsen [SODA 2012] on the existence of such constructions.
The two other algorithms have a running time of the form $\Ot(m+n^{1+f(k)})$, which is near linear in $m$ if $m=\Omega(n^{1+f(k)})$, and therefore optimal in such graphs. One algorithm runs in $\Ot(m+n^{\frac32+\frac{3}{4k-6}})$-time, which improves upon the $\Ot(n^2)$-time algorithm of Baswana and Kavitha [FOCS 2006, SICOMP 2010], for $3 < k < 6$, and upon the $\Ot(m+n^{\frac{3}{2}+\frac{2}{k}+O(k^{-2})})$-time algorithm of Wulff-Nilsen [SODA 2012], for every $k\geq 6$. This is the first linear time algorithm for constructing a $7$-stretch distance oracle and a $9$-stretch distance oracle, for graphs with truly subquadratic density.\footnote{with $m=n^{2-\eps}$ for some $\eps > 0$.} The other algorithm runs in $\Ot(\sqrt{k}m+kn^{1+\frac{2\sqrt{2}}{\sqrt{k}}})$ time, (and hence relevant only for $k\ge 16$), and improves upon the $\Ot(\sqrt{k}m+kn^{1+\frac{2\sqrt{6}}{\sqrt{k}}+O(k^{-1})})$ time algorithm of Wulff-Nilsen [SODA 2012] (which is relevant only for $k\ge 96$). ...
[656] arXiv:2507.08717 (replaced) [pdf, html, other]: Title: Knowledge Graph-Based approach for Sustainable 6G End-to-End System Design

Akshay Jain, Sylvaine Kerboeuf, Sokratis Barmpounakis, Cristóbal Vinagre Z., Stefan Wendt, Dinh Thai Bui, Pol Alemany, Riccardo Nicolicchia, José María Jorquera Valero, Dani Korpi, Mohammad Hossein Moghaddam, Mikko A. Uusitalo, Patrik Rugeland, Abdelkader Outtagarts, Karthik Upadhya, Panagiotis Demestichas, Raul Muñoz, Manuel Gil Pérez, Daniel Adanza, Ricard Vilalta

Comments: The paper has been accepted for publication in IEEE Open Journal of the Communications Society (IEEE OJCOMS)

Subjects: Networking and Internet Architecture (cs.NI)

Previous generations of cellular communication, such as 5G, have been designed with the objective of improving key performance indicators (KPIs) such as throughput, latency, etc. However, to meet the evolving KPI demands and the ambitious sustainability targets for the Information and Communication Technology (ICT) industry, 6G will need to be designed differently. 6G will need to consider both the performance and sustainability targets for the various use cases it will serve. In addition, 6G will have various candidate technological enablers, making the design space of the system even more complex. Furthermore, due to the subjective nature of sustainability indicators, especially social sustainability, the literature still lacks clear methods to link them with technical enablers and 6G system design. Hence, in this article a novel method for 6G end-to-end (E2E) system design based on Knowledge graphs (KG) has been introduced. It considers as its input: the use case KPIs, use case sustainability requirements expressed as Key Values (KV) and KV Indicators (KVIs), the ability of the technological enablers to satisfy these KPIs and KVIs, the 6G system design principles defined in Hexa-X-II project, the maturity of a technological enabler and the dependencies between the various enablers. The KG method also introduces a novel approach for determining the key values addressed by a technological enabler. The effectiveness of the KG method was demonstrated by its application in designing the 6G E2E system for the cooperating mobile robot use case defined in the Hexa-X-II project, where 82 enablers were selected. Lastly, results from proof-of-concept demonstrations for a subset of the selected enablers have also been provided, which reinforce the efficacy of the KG method for designing a sustainable 6G system.
[657] arXiv:2507.14491 (replaced) [pdf, html, other]: Title: Artifacts of Numerical Integration in Learning Dynamical Systems

Bing-Ze Lu, Richard Tsai

Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)

In many applications, one needs to learn a dynamical system from its solutions sampled at a finite number of time points. The learning problem is often formulated as an optimization problem over a chosen function class. However, in the optimization procedure, prediction data from generic dynamics requires a numerical integrator to assess the mismatch with the observed data. This paper reveals potentially serious effects of a chosen numerical scheme on the learning outcome. Specifically, the analysis demonstrates that a damped oscillatory system may be incorrectly identified as having "anti-damping" and exhibiting a reversed oscillation direction, even though it adequately fits the given data points. This paper shows that the stability region of the selected integrator will distort the nature of the learned dynamics. Crucially, reducing the step size or raising the order of an explicit integrator does not, in general, remedy this artifact, because higher-order explicit methods have stability regions that extend further into the right half complex plane. Furthermore, it is shown that the implicit midpoint method can preserve either conservative or dissipative properties from discrete data, offering a principled integrator choice even when the only prior knowledge is that the system is autonomous.
[658] arXiv:2507.15753 (replaced) [pdf, other]: Title: Algebraic Language Models for Inverse Design of Metamaterials via Diffusion Transformers

Li Zheng, Siddhant Kumar, Dennis M. Kochmann

Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)

Generative machine learning models have revolutionized material discovery by capturing complex structure-property relationships, yet extending these approaches to the inverse design of three-dimensional metamaterials remains limited by computational complexity and underexplored design spaces due to the lack of expressive representations. Here we present DiffuMeta, a generative framework integrating diffusion transformers with an algebraic language representation, encoding three-dimensional geometries as mathematical sentences. This compact, unified parameterization spans diverse topologies, enabling the direct application of transformers to structural design. DiffuMeta leverages diffusion models to generate new shell structures with precisely targeted stress-strain responses under large deformations, accounting for buckling and contact while addressing the inherent one-to-many mapping by producing diverse solutions. Uniquely, our approach enables simultaneous control over multiple mechanical objectives, including linear and nonlinear responses beyond training domains. Experimental validation of fabricated structures further confirms the efficacy of our approach for accelerated design of metamaterials and structures with tailored properties.
[659] arXiv:2507.20619 (replaced) [pdf, html, other]: Title: Generating Project-Specific Test Cases with Requirement Validation Intention

Binhang Qi, Yun Lin, Xinyi Weng, Yuhuan Huang, Chenyan Liu, Hailong Sun, Zhi Jin, Jin Song Dong

Comments: Accepted at ISSTA 2026

Subjects: Software Engineering (cs.SE)

Test cases are valuable assets for maintaining software quality. State-of-the-art automated test generation techniques typically focus on maximizing program branch coverage or translating focal methods into test code. However, in contrast to branch coverage or code-to-test translation, practical tests are written out of the need to validate whether a requirement has been fulfilled. Specifically, each test usually reflects a developer's validation intention for a program function, regarding (1) what is the test scenario of a program function? and (2) what is expected behavior under such a scenario? Without taking such intention into account, generated tests are less likely to be adopted in practice. In this work, we propose IntentionTest, which generates project-specific tests given the description of validation intention. IntentionTest adopts a retrieval-and-edit manner. First, given a focal code and a description of validation intention consisting of a test objective with test precondition and expected results, IntentionTest retrieves a reusable test in the project as the test reference. Then, IntentionTest edits the test reference with an LLM regarding the validation intention toward the target test. We extensively evaluate IntentionTest against four baselines on 3,680 test cases. Compared to state-of-the-art baselines, IntentionTest can (1) generate tests far more semantically relevant to ground-truth tests by (i) killing 28.1% to 37.6% more common mutants and (ii) sharing 16.9% to 23.9% more common coverage; and (2) generate 23.7% to 49.0% more successful passing tests.
[660] arXiv:2508.01302 (replaced) [pdf, html, other]: Title: Aligning Language Models with Real-time Knowledge Editing

Chenming Tang, Yutong Yang, Kexue Wang, Yunfang Wu

Comments: Accepted to ACL 2026 (main conference)

Subjects: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE)

Knowledge editing aims to modify outdated knowledge in language models efficiently while retaining their original capabilities. Mainstream datasets for knowledge editing are predominantly static and fail to keep in pace with the evolving real-world knowledge. In this work, we introduce CRAFT, an ever-evolving real-world dataset for knowledge editing. It evaluates models on temporal locality, common-sense locality, composite portability and alias portability, providing a comprehensive and challenging evaluation for knowledge editing, on which previous methods hardly achieve balanced performance. Towards flexible real-time knowledge editing, we propose KEDAS, a novel paradigm of knowledge editing alignment featuring diverse edit augmentation and self-adaptive post-alignment inference, exhibiting significant performance gain on both CRAFT and traditional datasets compared to previous methods. We hope this work may serve as a catalyst for shifting the focus of knowledge editing from static update to dynamic evolution.
[661] arXiv:2508.06283 (replaced) [pdf, html, other]: Title: Situationally-aware Path Planning Exploiting 3D Scene Graphs

Saad Ejaz, Marco Giberna, Muhammad Shaheer, Jose Andres Millan-Romera, Ali Tourani, Paul Kremer, Holger Voos, Jose Luis Sanchez-Lopez

Subjects: Robotics (cs.RO)

3D Scene Graphs integrate both metric and semantic information, yet their structure remains underutilized for improving path planning efficiency and interpretability. In this work, we present S-Path, a situationally-aware path planner that leverages the metric-semantic structure of indoor 3D Scene Graphs to significantly enhance planning efficiency. S-Path follows a two-stage process: it first performs a search over a semantic graph derived from the scene graph to yield a human-understandable high-level path. This also identifies relevant regions for planning, which later allows the decomposition of the problem into smaller, independent subproblems that can be solved in parallel. We also introduce a replanning mechanism that, in the event of an infeasible path, reuses information from previously solved subproblems to update semantic heuristics and prioritize reuse to further improve the efficiency of future planning attempts. Extensive experiments on both real-world and simulated environments show that S-Path achieves average reductions of 6x in planning time while maintaining comparable path optimality to classical sampling-based planners and surpassing them in complex scenarios, making it an efficient and interpretable path planner for environments represented by indoor 3D Scene Graphs. Code available at: this https URL
[662] arXiv:2508.07605 (replaced) [pdf, html, other]: Title: Coordinated Power Management on Heterogeneous Systems

Zhong Zheng, Zhiling Lan, Xingfu Wu, Valerie E. Taylor, Michael E. Papka

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Performance prediction is essential for energy-efficient computing in heterogeneous computing systems that integrate CPUs and GPUs. However, traditional performance modeling methods often rely on exhaustive offline profiling, which becomes impractical due to the large setting space and the high cost of profiling large-scale applications. In this paper, we present OPEN, a framework consists of offline and online phases. The offline phase involves building a performance predictor and constructing an initial dense matrix. In the online phase, OPEN performs lightweight online profiling, and leverages the performance predictor with collaborative filtering to make performance prediction. We evaluate OPEN on multiple heterogeneous systems, including those equipped with A100 and A30 GPUs. Results show that OPEN achieves prediction accuracy up to 98.29\%. This demonstrates that OPEN effectively reduces profiling cost while maintaining high accuracy, making it practical for power-aware performance modeling in modern HPC environments. Overall, OPEN provides a lightweight solution for performance prediction under power constraints, enabling better runtime decisions in power-aware computing environments.
[663] arXiv:2508.10177 (replaced) [pdf, html, other]: Title: KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems

Stepan Kulibaba, Artem Dzhalilov, Roman Pakhomov, Oleg Svidchenko, Alexander Gasnikov, Aleksei Shpilman

Subjects: Artificial Intelligence (cs.AI)

Recent Large Language Model (LLM)-based AutoML systems demonstrate impressive capabilities but face significant limitations such as constrained exploration strategies and a severe execution bottleneck. Exploration is hindered by one-shot methods lacking diversity and Monte Carlo Tree Search (MCTS) approaches that fail to recombine strong partial solutions. The execution bottleneck arises from lengthy code validation cycles that stifle iterative refinement. To overcome these challenges, we introduce KompeteAI, a novel AutoML framework with dynamic solution space exploration. Unlike previous MCTS methods that treat ideas in isolation, KompeteAI introduces a merging stage that composes top candidates. We further expand the hypothesis space by integrating Retrieval-Augmented Generation (RAG), sourcing ideas from Kaggle notebooks and arXiv papers to incorporate real-world strategies. KompeteAI also addresses the execution bottleneck via a predictive scoring model and an accelerated debugging method, assessing solution potential using early stage metrics to avoid costly full-code execution. This approach accelerates pipeline evaluation 6.9 times. KompeteAI outperforms leading methods (e.g., RD-agent, AIDE, and Ml-Master) by an average of 3\% on the primary AutoML benchmark, MLE-Bench. Additionally, we propose Kompete-bench to address limitations in MLE-Bench, where KompeteAI also achieves state-of-the-art results
[664] arXiv:2508.11354 (replaced) [pdf, other]: Title: FunduSegmenter: Leveraging the RETFound Foundation Model for Joint Optic Disc and Optic Cup Segmentation in Retinal Fundus Images

Zhenyi Zhao, Muthu Rama Krishnan Mookiah, Emanuele Trucco

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Purpose: This study introduces the first adaptation of RETFound for joint optic disc (OD) and optic cup (OC) segmentation. RETFound is a well-known foundation model developed for fundus camera and optical coherence tomography images, which has shown promising performance in disease diagnosis. Methods: We propose FunduSegmenter, a model integrating a series of novel modules with RETFound, including a Pre-adapter, a Decoder, a Post-adapter, skip connections with Convolutional Block Attention Module and a Vision Transformer block adapter. The model is evaluated on a proprietary dataset, GoDARTS, and four public datasets, IDRiD, Drishti-GS, RIM-ONE-r3, and REFUGE, through internal verification, external verification and domain generalization experiments. Results: An average Dice similarity coefficient of 90.51% was achieved in internal verification, which outperformed all baselines, some substantially (nnU-Net: 82.91%; DUNet: 89.17%; TransUNet: 87.91%). In all external verification experiments, the average results were about 3% higher than those of the best baseline, and our model was also competitive in domain generalization. Conclusions: This study explored the potential of the latent general representations learned by RETFound for OD and OC segmentation in fundus camera images. Our FunduSegmenter generally outperformed state-of-the-art baseline methods. The proposed modules are general and can be extended to fine-tuning other foundation models. Translational Relevance: The model shows strong stability and generalization on both in-distribution and out-of-distribution data, providing stable OD and OC segmentation. This is an essential step for many automated tasks, from setting the accurate retinal coordinate to biomarker discovery. The code and trained weights are available at: this https URL.
[665] arXiv:2508.14017 (replaced) [pdf, html, other]: Title: Analog computation with transcriptional networks

David Doty, Mina Latifi, David Soloveichick

Subjects: Computational Complexity (cs.CC); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)

Transcriptional networks represent one of the most extensively studied types of systems in synthetic biology. Although the completeness of transcriptional networks for digital logic is well-established, *analog* computation plays a crucial role in biological systems and offers significant potential for synthetic biology applications. While transcriptional circuits typically rely on cooperativity and highly non-linear behavior of transcription factors to regulate *production* of proteins, they are often modeled with simple linear *degradation* terms. In contrast, general analog dynamics require both non-linear positive as well as negative terms, seemingly necessitating control over not just transcriptional (i.e., production) regulation but also the degradation rates of transcription factors.
Surprisingly, we prove that controlling transcription factor production (i.e., transcription rate) without explicitly controlling degradation is mathematically complete for analog computation, achieving equivalent capabilities to systems where both production and degradation are programmable. We demonstrate our approach on several examples including oscillatory and chaotic dynamics, analog sorting, memory, PID controller, and analog extremum seeking. Our result provides a systematic methodology for engineering novel analog dynamics using synthetic transcriptional networks without the added complexity of degradation control and informs our understanding of the capabilities of natural transcriptional circuits.
We provide a compiler, in the form of a Python package that can take any system of polynomial ODEs and convert it to an equivalent transcriptional network implementing the system *exactly*, under appropriate conditions.
[666] arXiv:2508.15840 (replaced) [pdf, html, other]: Title: Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution

Robert Dilworth

Comments: 33 pages, 7 figures, 3 tables

Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Information Retrieval (cs.IR)

When using a public communication channel--whether formal or informal, such as commenting or posting on social media--end users have no expectation of privacy: they compose a message and broadcast it for the world to see. Even if an end user takes utmost precautions to anonymize their online presence--using an alias or pseudonym; masking their IP address; spoofing their geolocation; concealing their operating system and user agent; deploying encryption; registering with a disposable phone number or email; disabling non-essential settings; revoking permissions; and blocking cookies and fingerprinting--one obvious element still lingers: the message itself. Assuming they avoid lapses in judgment or accidental self-exposure, there should be little evidence to validate their actual identity, right? Wrong. The content of their message--necessarily open for public consumption--exposes an attack vector: stylometric analysis, or author profiling. In this paper, we dissect the technique of stylometry, discuss an antithetical counter-strategy in adversarial stylometry, and devise enhancements through Unicode steganography.
[667] arXiv:2508.19353 (replaced) [pdf, html, other]: Title: Efficient Multi-Source Knowledge Transfer by Model Merging

Marcin Osial, Bartosz Wójcik, Bartosz Zieliński, Sebastian Cygert

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

While transfer learning is an effective strategy, it often overlooks the opportunity to leverage knowledge from numerous available models online. Addressing this multi-source transfer learning problem is a promising path to boost adaptability and cut re-training costs. However, existing methods remain inherently coarse-grained: they lack the precision needed for fine-grained knowledge extraction as well as the scalability required to aggregate knowledge from either large numbers of source models or models with high parameter counts. We address these limitations by leveraging Singular Value Decomposition (SVD) to first decompose each source model into its elementary, rank-one components. A subsequent aggregation stage then selects only the most salient components from all sources, thereby overcoming the previous efficiency and precision limitations. To best preserve and leverage the synthesized knowledge base, our method adapts to the target task by fine-tuning only the principal singular values of the merged matrix. In essence, this process recalibrates the importance of top SVD components. The proposed framework allows for efficient and scalable multi-source transfer learning in both vision and language domains, while remaining robust to perturbations in both the input space and the parameter space.
[668] arXiv:2508.21720 (replaced) [pdf, other]: Title: PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation

Jiho Choi, Seojeong Park, Seongjong Song, Hyunjung Shim

Comments: ACL 2026

Subjects: Artificial Intelligence (cs.AI)

Automating scientific poster generation requires hierarchical document understanding and coherent content-layout planning. Existing methods often rely on flat summarization or optimize content and layout separately. As a result, they often suffer from information loss, weak logical flow, and poor visual balance. We present PosterForest, a training-free framework for scientific poster generation. Our method introduces the Poster Tree, a structured intermediate representation that captures document hierarchy and visual-textual semantics across multiple levels. Building on this representation, content and layout agents perform hierarchical reasoning and recursive refinement, progressively optimizing the poster from global organization to local composition. This joint optimization improves semantic coherence, logical flow, and visual harmony. Experiments show that PosterForest outperforms prior methods in both automatic and human evaluations, without additional training or domain-specific supervision.
[669] arXiv:2509.03294 (replaced) [pdf, html, other]: Title: A Comprehensive Guide to Differential Privacy: From Theory to User Expectations

Napsu Karmitsa, Antti Airola, Tapio Pahikkala, Tinja Pitkämäki

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The increasing availability of personal data has enabled significant advances in fields such as machine learning, healthcare, and cybersecurity. However, this data abundance also raises serious privacy concerns, especially in light of powerful re-identification attacks and growing legal and ethical demands for responsible data use. Differential privacy (DP) has emerged as a principled, mathematically grounded framework for mitigating these risks. This review provides a comprehensive survey of DP, covering its theoretical foundations, practical mechanisms, and real-world applications. It explores key algorithmic tools and domain-specific challenges - particularly in privacy-preserving machine learning and synthetic data generation. The report also highlights usability issues and the need for improved communication and transparency in DP systems. Overall, the goal is to support informed adoption of DP by researchers and practitioners navigating the evolving landscape of data privacy.
[670] arXiv:2509.08537 (replaced) [pdf, html, other]: Title: A posteriori error analysis and adaptivity of a space-time finite element method for the wave equation in second order formulation

Zhaonan Dong, Emmanuil H. Georgoulis, Lorenzo Mascotto, Zuodong Wang

Subjects: Numerical Analysis (math.NA)

We establish rigorous \emph{a posteriori} error bounds for a space-time finite element method of arbitrary order discretising linear wave problems in second order formulation. The method combines standard finite elements in space and continuous piecewise polynomials in time with an upwind discontinuous Galerkin-type approximation for the second temporal derivative. The proposed scheme accepts dynamic mesh modification, as required by space-time adaptive algorithms, resulting in a discontinuous temporal discretisation when mesh changes occur. We prove \emph{a posteriori} error bounds in the $L^\infty(L^2)$-norm, using carefully designed temporal and spatial reconstructions; explicit control on the constants (including the spatial and temporal orders of the method) in those error bounds is shown. The convergence behaviour of an error estimator is verified numerically, also taking into account the effect of the mesh change. A space-time adaptive algorithm is proposed and tested numerically.
[671] arXiv:2509.10897 (replaced) [pdf, html, other]: Title: TV Subgradient-Guided Multi-Source Fusion for Spectral Imaging in Dual-Camera CASSI Systems

Weiqiang Zhao, Tianzhu Liu, Yuzhe Gui, Wei Bian, Yanfeng Gu

Comments: Main text: 14 pages, 12 figures; Supplementary material: 8 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)

Balancing spectral, spatial, and temporal resolutions is a key challenge in spectral imaging. The Dual-Camera Coded Aperture Snapshot Spectral Imaging (DC-CASSI) system alleviates this trade-off but suffers from severely ill-posed reconstruction problems due to its high compression ratio. Existing methods are constrained by scene-specific tuning or excessive reliance on paired training data. To address these issues, we propose a Total Variation (TV) subgradient-guided multi-source fusion framework for DC-CASSI reconstruction, comprising three core components: (1) An end-to-end Single-Disperser CASSI (SD-CASSI) observation model based on the tensor-form Kronecker $\delta$, which establishes a rigorous mathematical foundation for physical constraints while enabling efficient adjoint operator implementation; (2) An adaptive spatial reference generator that integrates SD-CASSI's physical model and RGB subspace constraint, generating the reference image as reliable spatial prior; (3) A TV subgradient-guided regularization term that encodes local structural directions from the reference image into spectral reconstruction, achieving high-quality fused results. The framework is validated on simulated datasets and real-world datasets. Experimental results demonstrate that it achieves state-of-the-art reconstruction performance and robust noise resilience. This work not only establishes an interpretable theoretical foundation for subgradient-guided fusion but also provides a practical fusion-based paradigm for high-fidelity spectral image reconstruction in DC-CASSI systems. Source code: this https URL.
[672] arXiv:2509.18629 (replaced) [pdf, html, other]: Title: HyperAdapt: Simple High-Rank Adaptation

Abel Gurung, Joseph Campbell

Comments: Published in Transactions on Machine Learning Research

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Foundation models excel across diverse tasks, but adapting them to specialized applications often requires fine-tuning, an approach that is memory and compute-intensive. Parameter-efficient fine-tuning (PEFT) methods mitigate this by updating only a small subset of weights. In this paper, we introduce HyperAdapt, a parameter-efficient fine-tuning method that significantly reduces the number of trainable parameters compared to state-of-the-art methods like LoRA. Specifically, HyperAdapt adapts a pre-trained weight matrix by applying row- and column-wise scaling through diagonal matrices, thereby inducing a high-rank update while requiring only $n+m$ trainable parameters for an $n \times m$ matrix. Theoretically, we establish an upper bound on the rank of HyperAdapt's updates, and empirically, we confirm that it consistently induces high-rank transformations across model layers. Experiments on GLUE, arithmetic reasoning, and commonsense reasoning benchmarks with models up to 14B parameters demonstrate that HyperAdapt matches or nearly matches the performance of full fine-tuning and state-of-the-art PEFT methods while using orders of magnitude fewer trainable parameters.
[673] arXiv:2509.20712 (replaced) [pdf, html, other]: Title: CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

Zhenpeng Su, Leiyu Pan, Minxuan Lv, Yuntao Li, Wenping Hu, Fuzheng Zhang, Kun Gai, Guorui Zhou

Comments: This paper has been accepted by ACL 2026

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Reinforcement learning (RL) has become a powerful paradigm for optimizing large language models (LLMs) to handle complex reasoning tasks. A core challenge in this process lies in managing policy entropy, which reflects the balance between exploration and exploitation during training. Existing methods, such as proximal policy optimization (PPO) and its variants, discard valuable gradient signals from low-probability tokens due to the clipping mechanism. We systematically analyze the entropy dynamics and reveal that these clipped tokens play a critical yet overlooked role in regulating entropy evolution. We propose \textbf{C}oordinating \textbf{E}ntropy via \textbf{G}radient-\textbf{P}reserving \textbf{P}olicy \textbf{O}ptimization (CE-GPPO), a novel algorithm that reintroduces gradients from clipped tokens in native PPO in a gentle and bounded manner. By controlling the magnitude of gradients from tokens outside the clipping interval, CE-GPPO is able to achieve an exploration-exploitation trade-off. We provide theoretical justification and empirical evidence showing that CE-GPPO effectively mitigates entropy instability. Extensive experiments on mathematical reasoning benchmarks show that CE-GPPO consistently outperforms strong baselines across different model scales.
[674] arXiv:2509.21275 (replaced) [pdf, html, other]: Title: InfiniPipe: Elastic Pipeline Parallelism for Efficient Variable-Length Long-Context LLM Training

Shiju Wang, Yujie Wang, Ao Sun, Fangcheng Fu, Zijian Zhu, Bin Cui, Xu Han, Kaisheng Ma

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

Long context training is crucial for LLM's context extension. Existing schemes, such as sequence parallelism, incur substantial communication overhead. Pipeline parallelism (PP) reduces this cost, but its effectiveness hinges on partitioning granularity. Batch-level PP employing sequence packing exhibits high memory consumption in long-context scenarios, whereas token-level PP splitting sequences into slices alleviates memory overhead but may incur hardware under-utilization. Moreover, the skewed distribution of sequence length in real-world datasets renders monolithic and static granularity PP's sub-optimal performance. In this paper, we propose 1) \textit{Elastic Pipeline Parallelism} (EPP) that orchestrates token-level PP and batch-level PP to adapt to resource and workload heterogeneity, and 2) \textit{Stage-Aware Chunk-Level Adaptive Checkpointing} that efficiently integrates gradient checkpointing with EPP. Comprehensive experiments demonstrate that InfiniPipe achieves a 1.69x speedup over state-of-the-art systems. Our code is open-sourced at this https URL.
[675] arXiv:2509.21361 (replaced) [pdf, other]: Title: Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs

Norman Paulsen

Comments: 20 pages, 4 charts. AAIML (2026)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language model (LLM) providers boast big numbers for maximum context window sizes. To test the real world use of context windows, we 1) define a concept of maximum effective context window, 2) formulate a testing method of a context window's effectiveness over various sizes and problem types, and 3) create a standardized way to compare model efficacy for increasingly larger context window sizes to find the point of failure. We collected hundreds of thousands of data points across several models and found significant differences between reported Maximum Context Window (MCW) size and Maximum Effective Context Window (MECW) size. Our findings show that the MECW is, not only, drastically different from the MCW but also shifts based on the problem type. A few top of the line models in our test group failed with as little as 100 tokens in context; most had severe degradation in accuracy by 1000 tokens in context. All models fell far short of their Maximum Context Window by as much as 99 percent. Our data reveals the Maximum Effective Context Window shifts based on the type of problem provided, offering clear and actionable insights into how to improve model accuracy and decrease model hallucination rates.
[676] arXiv:2509.21976 (replaced) [pdf, html, other]: Title: Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

Zilun Zhang, Zian Guan, Tiancheng Zhao, Haozhan Shen, Tianyu Li, Yuxiang Cai, Zhonggen Su, Zhaojun Liu, Jianwei Yin, Xiang Li

Comments: Accepted by ISPRS

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Referring expression understanding in remote sensing poses unique challenges, as it requires reasoning over complex object-context relationships. While supervised fine-tuning (SFT) on multimodal large language models achieves strong performance with massive labeled datasets, they struggle in data-scarce scenarios, leading to poor generalization. To address this limitation, we propose Geo-R1, a reasoning-centric reinforcement fine-tuning (RFT) paradigm for few-shot geospatial referring. Geo-R1 enforces the model to first generate explicit, interpretable reasoning chains that decompose referring expressions, and then leverage these rationales to localize target objects. This "reason first, then act" process enables the model to make more effective use of limited annotations, enhances generalization, and provides interpretability. We validate Geo-R1 on three carefully designed few-shot geospatial referring benchmarks, where our model consistently and substantially outperforms SFT baselines. It also demonstrates strong cross-dataset generalization, highlighting its robustness. Code and data will be released at: this https URL.
[677] arXiv:2509.22321 (replaced) [pdf, html, other]: Title: Distributed Associative Memory via Online Convex Optimization

Bowen Wang, Matteo Zecchin, Osvaldo Simeone

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

An associative memory (AM) enables cue-response recall, and associative memorization has recently been noted to underlie the operation of modern neural architectures such as Transformers. This work addresses a distributed setting where agents maintain a local AM to recall their own associations as well as selective information from others. Specifically, we introduce a distributed online gradient descent method that optimizes local AMs at different agents through communication over routing trees. Our theoretical analysis establishes sublinear regret guarantees, and experiments demonstrate that the proposed protocol consistently outperforms existing online optimization baselines.
[678] arXiv:2509.23649 (replaced) [pdf, html, other]: Title: From Past To Path: Masked History Learning for Next-Item Prediction in Generative Recommendation

KaiWen Wei, Kejun He, Xiaomian Kang, Jie Zhang, Yuming Yang, Li Jin, Zhenyang Li, Jiang Zhong, He Bai, Junnan Zhu

Comments: Accepted to ACL 2026

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Generative recommendation, which directly generates item identifiers, has emerged as a promising paradigm for recommendation systems. However, its potential is fundamentally constrained by the reliance on purely autoregressive training. This approach focuses solely on predicting the next item while ignoring the rich internal structure of a user's interaction history, thus failing to grasp the underlying intent. To address this limitation, we propose Masked History Learning (MHL), a novel training framework that shifts the objective from simple next-step prediction to deep comprehension of history. MHL augments the standard autoregressive objective with an auxiliary task of reconstructing masked historical items, compelling the model to understand ``why'' an item path is formed from the user's past behaviors, rather than just ``what'' item comes next. We introduce two key contributions to enhance this framework: (1) an entropy-guided masking policy that intelligently targets the most informative historical items for reconstruction, and (2) a curriculum learning scheduler that progressively transitions from history reconstruction to future prediction. Experiments on three public datasets show that our method significantly outperforms state-of-the-art generative models, highlighting that a comprehensive understanding of the past is crucial for accurately predicting a user's future path.
[679] arXiv:2509.23744 (replaced) [pdf, other]: Title: Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning

Yucheng Wang, Yifan Hou, Aydin Javadov, Mubashara Akhtar, Mrinmaya Sachan

Comments: Our code (this https URL) and data (this https URL) are publicly available

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Multimodal large language models (MLLMs) promise enhanced reasoning by integrating diverse inputs such as text, vision, and audio. Yet cross-modal reasoning remains underexplored, with conflicting reports on whether added modalities help or harm performance. These inconsistencies stem from a lack of controlled evaluation frameworks and analysis of models' internals to isolate when and why modality interactions support or undermine reasoning. We address this gap through a logic-grounded evaluation framework that categorizes multimodal reasoning into six interaction patterns, varying how facts are distributed across modalities and logically combined. Empirically, additional modalities enhance reasoning only when they provide independent and sufficient reasoning paths, while redundant or chained entailment support often hurts performance. Moreover, reasoning degrades in three systematic ways: weaker modalities drag down overall performance, conflicts bias preference toward certain modalities, and joint signals from different modalities fail to be integrated effectively. Therefore, we identify two core failures: task-composition bottleneck, where recognition and reasoning cannot be jointly executed in one pass, and fusion bottleneck, where early integration introduces bias. For further investigation, we find that attention patterns fail to encode fact usefulness, but a simple two-step prompting (recognize then reason) restores performance, confirming the task-composition bottleneck. Moreover, modality identity remains recoverable in early layers, and softening attention in early fusion improves reasoning, highlighting biased fusion as another failure mode. Overall, our findings show that integration, not perception, is the main barrier to multimodal reasoning, suggesting composition-aware training and early fusion control as promising directions.
[680] arXiv:2509.24239 (replaced) [pdf, html, other]: Title: ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models

Jincheng Liu, Sijun He, Jingjing Wu, Xiangsen Wang, Yang Chen, Zhaoqi Kuang, Siqi Bao, Yuan Yao

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent large language models (LLMs) have shown strong reasoning capabilities. However, a critical question remains: do these models possess genuine strategic reasoning, or do they primarily excel at pattern recognition? To address this, we present ChessArena, a chess-based testbed for evaluating LLMs. Chess demands strategic reasoning, precise rule adherence, and the ability to track complex game states. ChessArena is a competitive framework where LLMs play against each other under four play modes. We evaluate 13 LLMs across over 800 games, testing basic understanding, move selection, and puzzle solving. Results reveal significant shortcomings: no model beats Maia-1100 (human amateur level), and some lose to random play. We also present a strong baseline: our fine-tuned Qwen3-8B substantially improves performance, approaching much larger state-of-the-art reasoning models.
[681] arXiv:2509.24565 (replaced) [pdf, html, other]: Title: Stronger Directed Low-Diameter Decompositions with Sub-Logarithmic Diameter and Separation

Bernhard Haeupler, Richard Hladík, Shengzhe Wang, Zhijun Zhang

Comments: Minor bug fix on page 14

Subjects: Data Structures and Algorithms (cs.DS)

This paper significantly strengthens directed low-diameter decompositions in several ways.
We define and give the first results for separated low-diameter decompositions in directed graphs, tighten and generalize probabilistic guarantees, and prove new independence results between (far away) edges. Our results are the first to give meaningful guarantees for decompositions with small diameters $D = \Omega(\log\log n)$ in contrast to the state of the art that only applies to super-logarithmic diameters $D = \omega(\log n)$.
These results transfer several important and widely used aspects of undirected low-diameter decompositions to the directed setting. All our results are algorithmic -- small modifications to two existing directed low-diameter decompositions [BFHL25; Li25] can be used to sample decompositions with our new guarantees in near-linear time $\tilde{O}(m)$.
[682] arXiv:2509.25868 (replaced) [pdf, html, other]: Title: ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations

Yindong Wang, Martin Preiß, Margarita Bugueño, Jan Vincent Hoffbauer, Abdullatif Ghajar, Tolga Buz, Gerard de Melo

Comments: Accepted to EACL 2026 (Main Conference, Oral presentation)

Journal-ref: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8174-8187, Rabat, Morocco, 2026

Subjects: Computation and Language (cs.CL)

The mechanisms underlying scientific confabulation in Large Language Models (LLMs) remain poorly understood. We introduce ReFACT (Reddit False And Correct Texts), a benchmark of 1,001 expert-annotated question-answer pairs with span-level error annotations derived from Reddit's r/AskScience. Evaluating 9 state-of-the-art LLMs reveals two critical limitations. First, models exhibit a dominant "salient distractor" failure mode: 61% of incorrect span predictions are semantically unrelated to actual errors. Crucially, this pattern persists across all model scales (1B to 70B), indicating a fundamental semantic grounding deficit that scaling alone fails to resolve. Second, we find that comparative judgment is paradoxically harder than independent detection, even GPT-4o's F1 score drops from 0.67 to 0.53 when comparing answers side-by-side. These findings directly challenge the reliability of LLM-as-Judge paradigms for scientific factuality. Code and data are released at this https URL.
[683] arXiv:2510.01857 (replaced) [pdf, html, other]: Title: Learning Reasoning Reward Models from Expert Demonstration via Inverse Reinforcement Learning

Claudio Fanconi, Nicolás Astorga, Mihaela van der Schaar

Subjects: Artificial Intelligence (cs.AI)

Current approaches to improving reasoning in large language models (LLMs) primarily rely on either supervised fine-tuning (SFT) over expert traces or reinforcement learning (RL) with outcome-level rewards. However, SFT is fundamentally imitative, while outcome-based RL assumes access to a well-specified verifier. To address this gap, we propose an adversarial inverse reinforcement learning (AIRL) framework that learns reasoning rewards directly from expert demonstrations. We evaluate this framework across reward granularities (sparse, interval, and dense). Granularity controls the resolution of credit assignment: sparse rewards emphasise global trajectory quality and training stability, while denser rewards provide higher-resolution step-level supervision for error localisation but are harder to optimise stably. We show that the learned reasoning rewards are useful in three complementary ways. First, as a training signal, they often outperform SFT, with the best variant improving over SFT on medical reasoning (MedReason), mathematics (GSM8K), and challenging scientific question-answering (MMLU-Pro). Second, as an inference-time reranker, they gain up to 17.4 percentage points under a fixed sampling budget. Third, the learned reward transfers across tasks and backbones, suggesting that part of the signal is reusable beyond a single domain or model, and that finer-grained rewards identify the first step at which a trajectory deviates from a correct path. This supports the diagnosis of reasoning failures and the improvement of test-time selection. Together, these results show that AIRL can recover a reusable intermediate reasoning step from demonstrations alone, bridging the gap between pure imitation and reward-driven optimisation for LLM reasoning.
[684] arXiv:2510.03247 (replaced) [pdf, html, other]: Title: Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data

Jiancheng Zhang, Yinglun Zhu

Comments: Accepted by Transactions on Machine Learning Research (TMLR)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Active learning (AL) is a principled strategy to reduce annotation cost in data-hungry deep learning. However, existing AL algorithms focus almost exclusively on unimodal data, overlooking the substantial annotation burden in multimodal learning. We introduce the first framework for multimodal active learning with unaligned data, where the learner must actively acquire cross-modal alignments rather than labels on pre-aligned pairs. This setting captures the practical bottleneck in modern multimodal pipelines, where unimodal features are easy to obtain but high-quality alignment is costly. We develop a new algorithm that combines uncertainty and diversity principles in a modality-aware design, achieves linear-time acquisition, and applies seamlessly to both pool-based and streaming-based settings. Extensive experiments on benchmark datasets demonstrate that our approach consistently reduces multimodal annotation cost while preserving performance; for instance, on the ColorSwap dataset it cuts annotation requirements by up to 40% without loss in accuracy.
[685] arXiv:2510.04070 (replaced) [pdf, html, other]: Title: Markov kernels in Mathlib's probability library

Rémy Degenne

Comments: 33 pages

Subjects: Digital Libraries (cs.DL); Probability (math.PR)

The probability folder of Mathlib, Lean's mathematical library, makes a heavy use of Markov kernels. We present their definition and properties and describe the formalization of the disintegration theorem for Markov kernels. That theorem is used to define conditional probability distributions of random variables as well as posterior distributions. We then explain how Markov kernels are used in a more unusual way to get a common definition of independence and conditional independence and, following the same principles, to define sub-Gaussian random variables. Finally, we also discuss the role of kernels in our formalization of entropy and Kullback-Leibler divergence.
[686] arXiv:2510.04371 (replaced) [pdf, html, other]: Title: Speculative Actions: A Lossless Framework for Faster Agentic Systems

Naimeng Ye, Arnav Ahuja, Georgios Liargkovas, Yunan Lu, Kostis Kaffes, Tianyi Peng

Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)

AI agents are increasingly deployed in complex, interactive environments, yet their runtime remains a major bottleneck for training, evaluation, and real-world use. Typical agent behavior unfolds sequentially, with each action requiring an API call that can incur substantial latency. For example, a game of chess between two state-of-the-art agents can take hours. We introduce Speculative Actions, a lossless acceleration framework for general agentic systems. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, our method uses faster models to predict likely future actions and execute them in parallel, committing only when predictions match. We evaluate speculative actions across gaming, e-commerce, and web search environments, and additionally study a lossy extension in an operating systems setting. Across domains, we achieve up to 55% next-action prediction accuracy, translating into up to 20% latency reductions. Finally, we present a cost-latency analysis that formalizes the tradeoff between speculative breadth and time savings. This analysis enables principled tuning and selective branch launching to ensure that multi-branch speculation delivers practical speedups without prohibitive cost growth.
[687] arXiv:2510.04772 (replaced) [pdf, html, other]: Title: Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge

Max Kirchner, Hanna Hoffmann, Alexander C. Jenke, Oliver L. Saldanha, Kevin Pfeiffer, Weam Kanjo, Julia Alekseenko, Claas de Boer, Santhi Raj Kolamuri, Lorenzo Mazza, Nicolas Padoy, Sophia Bano, Annika Reinke, Lena Maier-Hein, Danail Stoyanov, Jakob N. Kather, Fiona R. Kolbinger, Sebastian Bodenstedt, Stefanie Speidel

Comments: A challenge report pre-print (31 pages), including 7 tables and 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Developing generalizable surgical AI requires multi-institutional data, yet patient privacy constraints preclude direct data sharing, making Federated Learning (FL) a natural candidate solution. The application of FL to complex, spatiotemporal surgical video data remains largely unbenchmarked. We present the FedSurg Challenge, the first international benchmarking initiative dedicated to FL in surgical vision, evaluated as a proof-of-concept on a multi-center laparoscopic appendectomy dataset (preliminary subset of Appendix300). Three submissions were evaluated on generalization to an unseen center and center-specific adaptation. Centralized and Swarm Learning baselines isolate the contributions of task difficulty and decentralization to observed performance. Even with all data pooled centrally, the task achieved only 26.31\% F1-score on the unseen center, while decentralized training introduced an additional, separable performance penalty. Temporal modeling emerges as the dominant architectural factor: video-level spatiotemporal models consistently outperformed frame-level approaches regardless of aggregation strategy. Naive local fine-tuning leads to classifier collapse on imbalanced local data; structured personalized FL with parameter-efficient fine-tuning represents a more principled path toward center-specific adaptation. By characterizing current FL limitations through rigorous statistical analysis, this work establishes a methodological reference point for robust, privacy-preserving AI systems in surgical video analysis.
[688] arXiv:2510.04823 (replaced) [pdf, html, other]: Title: Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis

Arnela Hadzic, Simon Johannes Joham, Martin Urschler

Comments: Published in the Proceedings of the Third Austrian Symposium on AI, Robotics, and Vision (AIRoV 2026)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generating synthetic CT (sCT) from MRI or CBCT plays a crucial role in enabling MRI-only and CBCT-based adaptive radiotherapy, improving treatment precision while reducing patient radiation exposure. To address this task, we adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI to sCT and CBCT to sCT across three anatomical regions: abdomen, head and neck, and thorax. Validation and testing were performed through the challenge submission system. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to the relatively low training resolution imposed by memory and runtime constraints. Future work will explore patch-based training and latent-space flow models to improve resolution and local structural fidelity.
[689] arXiv:2510.05482 (replaced) [pdf, html, other]: Title: ATOM: A Pretrained Neural Operator for Multitask Molecular Dynamics

Luke Thompson, Davy Guan, Dai Shi, Slade Matthews, Junbin Gao, Andi Han

Comments: Accepted at ICLR2026

Journal-ref: https://iclr.cc/virtual/2026/poster/10008346

Subjects: Machine Learning (cs.LG)

Molecular dynamics (MD) simulations underpin modern computational drug discovery, materials science, and biochemistry. Recent machine learning models provide high-fidelity MD predictions without the need to repeatedly solve quantum mechanical forces, enabling significant speedups over conventional pipelines. Yet many such methods typically enforce strict equivariance and rely on sequential rollouts, thus limiting their flexibility and simulation efficiency. They are also commonly single-task, trained on individual molecules and fixed timeframes, which restricts generalization to unseen compounds and extended timesteps. To address these issues, we propose Atomistic Transformer Operator for Molecules (ATOM), a pretrained transformer neural operator for multitask molecular dynamics. ATOM adopts a quasi-equivariant design that requires no explicit molecular graph and employs a temporal attention mechanism, allowing for the accurate parallel decoding of multiple future states. To support operator pretraining across chemicals and timescales, we curate TG80, a large, diverse, and numerically stable MD dataset with over 2.5 million femtoseconds of trajectories across 80 compounds. ATOM achieves state-of-the-art performance on established single-task benchmarks, such as MD17, RMD17 and MD22. After multitask pretraining on TG80, ATOM shows exceptional zero-shot generalization to unseen molecules across varying time horizons. We believe ATOM represents a significant step toward accurate, efficient, and transferable molecular dynamics models.
[690] arXiv:2510.08814 (replaced) [pdf, html, other]: Title: A Quantale-Weakness Route to $P \neq NP$ via CD Evidence Normalization and Gauge-Buffered Locked Ensembles

Ben Goertzel

Subjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI)

We present a proof architecture for $P \neq NP$ based on an upper--lower clash in polytime-capped conditional description length. We construct an efficiently samplable family of SAT instances $Y$ such that every satisfying witness for $Y$ yields the same global message $M(Y)$. If $P=NP$, then a standard polynomial-time SAT self-reduction recovers $M(Y)$ from $Y$, so \[ K_{\mathrm{poly}}(M(Y)\mid Y)=O(1). \]
The lower-bound side shows the opposite. For the same ensemble, no fixed polynomial-time observer can gain substantial predictive advantage on a linear number of selected message coordinates. The argument treats computation as an evidence-producing process: predictive advantage is converted into constructible-dual evidence skew and then into pairwise distinctions between message-opposite worlds. A normalization theorem shows that every target-relevant non-neutral evidence leaf is either a safe-buffer observation or a hidden-gauge observation. Safe-buffer observations have negligible leakage, while hidden-gauge observations are limited by gauge-rank accounting. This yields an atomic evidence budget implying that total message-resolving advantage is $o(t)$ across $t$ selected coordinates.
Boundary-law mixing gives the near-random baseline for the visible surface. Combining this with the evidence budget gives product small-success and then, by Compression-from-Success, \[ K_{\mathrm{poly}}(M(Y)\mid Y)\ge \Omega(t) \] with high probability. This contradicts the constant upper bound from $P=NP$. Therefore $P \neq NP$.
[691] arXiv:2510.10971 (replaced) [pdf, html, other]: Title: RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection

Yejin Lee, Hyeseon Ahn, Yo-Sub Han

Comments: 20 pages, 9 figures, ACL 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Hate speech remains prevalent in human society and continues to evolve in its forms and expressions. Modern advancements in internet and online anonymity accelerate its rapid spread and complicate its detection. However, hate speech datasets exhibit diverse characteristics primarily because they are constructed from different sources and platforms, each reflecting different linguistic styles and social contexts. Despite this diversity, prior studies on hate speech detection often rely on fixed methodologies without adapting to data-specific features. We introduce RV-HATE, a detection framework designed to account for the dataset-specific characteristics of each hate speech dataset. RV-HATE consists of multiple specialized modules, where each module focuses on distinct linguistic or contextual features of hate speech. The framework employs reinforcement learning to optimize weights that determine the contribution of each module for a given dataset. A voting mechanism then aggregates the module outputs to produce the final decision. RV-HATE offers two primary advantages: (1)~it improves detection accuracy by tailoring the detection process to dataset-specific attributes, and (2)~it also provides interpretable insights into the distinctive features of each dataset. Consequently, our approach effectively addresses implicit hate speech and achieves superior performance compared to conventional static methods. Our code is available at this https URL.
[692] arXiv:2510.13918 (replaced) [pdf, html, other]: Title: Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling

Peng Kuang, Yanli Wang, Xiaoyu Han, Yaowenqi Liu, Kaidi Xu, Haohan Wang

Comments: Published as a conference paper at ICLR 2026

Subjects: Computation and Language (cs.CL)

Process reward models (PRMs) are a cornerstone of test-time scaling (TTS), designed to verify and select the best responses from large language models (LLMs). However, this promise is challenged by recent benchmarks where simple majority voting, which ignores PRM signals, occasionally outperforms standard PRM-based selection. This raises a critical question: How can we effectively utilize verification signals from PRMs for TTS? To address this, we start by developing a theoretical framework for optimally combining signals from both the LLM and the PRM. Our framework reveals that the optimal strategy is a weighted aggregation of responses, a strategy whose effectiveness hinges on estimating weights that capture the complex interplay between the models. Based on our theoretical results, we empirically show that these optimal weighting functions differ significantly across LLM-PRM pairs and, notably, often assign substantial negative weights. Motivated by these insights, we propose efficient pre-computation methods to calibrate these weighting functions. Extensive experiments across 5 LLMs and 7 PRMs demonstrate that our calibration method significantly boosts the TTS efficiency, surpassing the performance of vanilla weighted majority voting while using only $21.3\%$ of the computation. Ultimately, our work demonstrates that investing in a more intelligent aggregation strategy can be a more convincing path to performance gains than simply scaling test-time computation.
[693] arXiv:2510.15096 (replaced) [pdf, html, other]: Title: OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data

Alana Renda, Jillian Ross, Michael Cafarella, Jacob Andreas

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Real-world settings where language models (LMs) are deployed -- in domains spanning healthcare, finance, and other forms of knowledge work -- require models to grapple with incomplete information and reason under uncertainty. Yet most LM evaluations focus on problems with well-defined answers and success criteria. This gap exists in part because natural problems involving uncertainty are difficult to construct: given that LMs have access to most of the same knowledge as humans, it is non-trivial to design questions for which LMs will struggle to produce correct answers, but which humans can answer reliably. As a result, LM performance on reasoning under uncertainty remains poorly characterized. To address this gap, we introduce OpenEstimate, an extensible, multi-domain benchmark for evaluating LMs on numerical estimation tasks that require models to synthesize significant amounts of background information and express predictions as probabilistic priors. We assess these priors for accuracy and calibration, quantifying their usefulness relative to samples from the true distribution of interest. Across six frontier LMs, we find that LM-elicited priors are often inaccurate and overconfident. Performance improves modestly depending on how uncertainty is elicited from the model, but is largely unaffected by changes in sampling strategy, reasoning effort, or prompt design. The OpenEstimate benchmark thus offers a challenging evaluation for frontier LMs and a platform for developing models that are better at probabilistic estimation and reasoning under uncertainty.
[694] arXiv:2510.15313 (replaced) [pdf, other]: Title: Capabilities and Evaluation Biases of Large Language Models in Classical Chinese Poetry Generation: A Case Study on Tang Poetry

Bolei Ma, Yina Yao, Anna-Carolina Haensch

Comments: ACL 2026 Findings

Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) are increasingly applied to creative domains, yet their performance in classical Chinese poetry generation and evaluation remains poorly understood. We propose a three-step evaluation framework that combines computational metrics, LLM-as-a-judge assessment, and human expert validation. Using this framework, we evaluate six state-of-the-art LLMs across multiple dimensions of poetic quality, including themes, emotions, imagery, form, and style, in the context of Tang poetry generation. Our analysis reveals a critical "echo chamber" effect: LLMs systematically overrate machine-generated poems that mimic statistical patterns yet fail strict prosodic rules, diverging significantly from human expert judgments. These findings underscore the limitations of using LLMs as standalone evaluators for culturally complex tasks, highlighting the necessity of hybrid human-model validation frameworks.
[695] arXiv:2510.18091 (replaced) [pdf, html, other]: Title: Accelerating Vision Transformers with Adaptive Patch Sizes

Rohan Choudhury, JungEun Kim, Jinhyung Park, Eunho Yang, László A. Jeni, Kris M. Kitani

Comments: Accepted to ICLR 2026. Project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Vision Transformers (ViTs) partition input images into uniformly sized patches regardless of their content, resulting in long input sequence lengths for high-resolution images. We present Adaptive Patch Transformers (APT), which addresses this by using multiple different patch sizes within the same image. APT reduces the total number of input tokens by allocating larger patch sizes in more homogeneous areas and smaller patches in more complex ones. APT achieves a drastic speedup in ViT inference and training, increasing throughput by 40% on ViT-L and 50% on ViT-H while maintaining downstream performance, and can be applied to a previously fine-tuned ViT, converging in as little as 1 epoch. It also significantly reduces training and inference time without loss of performance in high-resolution dense visual tasks, achieving up to 30\% faster training and inference in visual QA, object detection, and semantic segmentation.
[696] arXiv:2510.18457 (replaced) [pdf, html, other]: Title: VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models

Tianci Bi, Xiaoyi Zhang, Yan Lu, Nanning Zheng

Comments: Accepted at CVPR 2026. Code and models available at: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The performance of Latent Diffusion Models (LDMs) is critically dependent on the quality of their visual tokenizers. While recent works have explored incorporating Vision Foundation Models (VFMs) into the tokenizers training via distillation, we empirically find this approach inevitably weakens the robustness of learnt representation from original VFM. In this paper, we bypass the distillation by proposing a more direct approach by leveraging the frozen VFM for the LDMs tokenizer, named VFM Variational Autoencoder (VFM-VAE).To fully exploit the potential to leverage frozen VFM for the LDMs tokenizer, we design a new decoder to reconstruct realistic images from the semantic-rich representation of VFM. With the proposed VFM-VAE, we conduct a systematic study on how the representation from different tokenizers impact the representation learning process throughout diffusion training, enabling synergistic benefits of dual-side alignment on both tokenizers and diffusion models. Our effort in tokenizer design and training strategy lead to superior performance and efficiency: our system reaches a gFID (w/o CFG) of 2.22 in merely 80 epochs (a 10$\times$ speedup over prior tokenizers). With continued training to 640 epochs, it further attains a gFID (w/o CFG) of 1.62. These results offer solid evidence for the substantial potential of VFMs to serve as visual tokenizers to accelerate the LDM training progress.
[697] arXiv:2510.18731 (replaced) [pdf, html, other]: Title: Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards

Ming Li

Comments: ACL2026, camera-ready

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large Language Models demonstrate strong capabilities in single-turn instruction following but suffer from Lost-in-Conversation (LiC), a degradation in performance as information is revealed progressively in multi-turn settings. Motivated by the current progress on Reinforcement Learning with Verifiable Rewards (RLVR), we propose Curriculum Reinforcement Learning with Verifiable Accuracy and Abstention Rewards (RLAAR), a framework that encourages models not only to generate correct answers, but also to judge the solvability of questions in the multi-turn conversation setting. Our approach employs a competence-gated curriculum that incrementally increases dialogue difficulty (in terms of instruction shards), stabilizing training while promoting reliability. Using multi-turn, on-policy rollouts and a mixed-reward system, RLAAR teaches models to balance problem-solving with informed abstention, reducing premature answering behaviors that cause LiC. Evaluated on LiC benchmarks, RLAAR significantly mitigates LiC performance decay (62.6% to 75.1%) and improves calibrated abstention rates (33.5% to 73.4%). Together, these results provide a practical recipe for building multi-turn reliable and trustworthy LLMs.
[698] arXiv:2510.18914 (replaced) [pdf, html, other]: Title: Fairness Evaluation and Inference Level Mitigation in LLMs

Afrozah Nadeem, Mark Dras, Usman Naseem

Comments: Accepted at The 64th Annual Meeting of the Association for Computational Linguistics San Diego, California, United, States July 2 to 7, 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended dialogue and conversations. Although training-time or data-centric methods attempt to reduce these effects, they are computationally expensive, irreversible once deployed, and slow to adapt to new conversational contexts. Pruning-based methods provide a flexible and transparent way to reduce bias by adjusting the neurons responsible for certain behaviors. However, most existing approaches are static; once a neuron is removed, the model loses the ability to adapt when the conversation or context changes. To address this, we propose a dynamic, reversible, pruning-based framework that detects context-aware neuron activations and applies adaptive masking to modulate their influence during generation. Our inference-time solution provides fine-grained, memory-aware mitigation with knowledge-preserved, more coherent behavior across multilingual single- and multi-turn dialogues, enabling dynamic fairness control in real-world conversational AI.
[699] arXiv:2510.20064 (replaced) [pdf, html, other]: Title: Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs

Hongyi Liu, Jiaji Huang, Zhen Jia, Youngsuk Park, Yu-Xiang Wang

Comments: ICLR'26

Subjects: Machine Learning (cs.LG)

Speculative decoding is widely used in accelerating large language model (LLM) inference. In this work, we focus on the online draft model selection problem in speculative decoding. We design an algorithm that provably competes with the best draft model in hindsight for each query in terms of either the token acceptance probability or expected acceptance length. In particular, we show that we can accurately evaluate all draft models, instead of only the chosen model without incurring additional queries to the target model, which allows us to improve exponentially over the existing bandit-based approach as the number of draft models increases. Our approach is generically applicable with any speculative decoding methods (single draft, multi-drafts and draft-trees). Moreover, we design system-efficient versions of online learners and demonstrate that the overhead in computation and latency can be substantially reduced. We conduct extensive experiments on open-source LLMs and diverse datasets, demonstrating that our methods substantially outperform the state-of-the-art EAGLE3 and the BanditSpec baseline in a variety of domains where specialized domain-expert drafters are available, especially when long reasoning chains are required.
[700] arXiv:2510.20505 (replaced) [pdf, html, other]: Title: RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA

Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, Flora D. Salim

Comments: 19 pages, 2 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Retrieval-augmented generation (RAG) remains brittle on multi-step questions and heterogeneous evidence sources, trading accuracy against latency and token/tool budgets. This paper introduces RELOOP, a structure aware framework using Hierarchical Sequence (HSEQ) that (i) linearize documents, tables, and knowledge graphs into a reversible hierarchical sequence with lightweight structural tags, and (ii) perform structure-aware iteration to collect just-enough evidence before answer synthesis. A Head Agent provides guidance that leads retrieval, while an Iteration Agent selects and expands HSeq via structure-respecting actions (e.g., parent/child hops, table row/column neighbors, KG relations); Finally the head agent composes canonicalized evidence to genearte the final answer, with an optional refinement loop to resolve detected contradictions. Experiments on HotpotQA (text), HybridQA/TAT-QA (table+text), and MetaQA (KG) show consistent EM/F1 gains over strong single-pass, multi-hop, and agentic RAG baselines with high efficiency. Besides, RELOOP exhibits three key advantages: (1) a format-agnostic unification that enables a single policy to operate across text, tables, and KGs without per-dataset specialization; (2) \textbf{guided, budget-aware iteration} that reduces unnecessary hops, tool calls, and tokens while preserving accuracy; and (3) evidence canonicalization for reliable QA, improving answers consistency and auditability.
[701] arXiv:2510.20792 (replaced) [pdf, html, other]: Title: BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation

Liang Ye, Shengqin Chen, Jiazhu Dai

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Biomolecules (q-bio.BM)

The rapid progress of graph generation has raised new security concerns, particularly regarding backdoor vulnerabilities. Though prior work has explored backdoor attacks against diffusion models for image or unconditional graph generation, those against conditional graph generation models, especially text-guided graph generation models, remain largely unexamined. This paper proposes BadGraph, a backdoor attack method against latent diffusion models for text-guided graph generation. BadGraph leverages textual triggers to poison training data, covertly implanting backdoors that induce attacker-specified subgraphs during inference when triggers appear, while preserving normal performance on clean inputs. Extensive experiments on four benchmark datasets (PubChem, ChEBI-20, PCDes, MoMu) demonstrate the effectiveness and stealth of the attack: a poisoning rate of less than 10% can achieve a 50% attack success rate, while 24% suffices for over an 80% success rate, with negligible performance degradation on benign samples. Ablation studies further reveal that the backdoor is implanted during VAE and diffusion training rather than pretraining. These findings reveal the security vulnerabilities in latent diffusion models for text-guided graph generation, highlight the serious risks in applications such as drug discovery, and underscore the need for robust defenses against the backdoor attack in such diffusion models.
[702] arXiv:2510.23274 (replaced) [pdf, html, other]: Title: Privacy-Preserving Semantic Communication over Wiretap Channels with Learnable Differential Privacy

Weixuan Chen, Qianqian Yang, Shuo Shao, Shunpu Tang, Zhiguo Shi, Shui Yu

Subjects: Cryptography and Security (cs.CR); Image and Video Processing (eess.IV)

While semantic communication (SemCom) improves transmission efficiency by focusing on task-relevant information, it also raises critical privacy concerns. Many existing secure SemCom approaches rely on restrictive or impractical assumptions, such as favorable channel conditions for the legitimate user or prior knowledge of the eavesdropper's model. To address these limitations, this paper proposes a novel secure SemCom framework for image transmission over wiretap channels, leveraging differential privacy (DP) to provide approximate privacy guarantees. Specifically, our approach first extracts disentangled semantic representations from source images using generative adversarial network (GAN) inversion method, and then selectively perturbs private semantic representations with approximate DP noise. Distinct from conventional DP-based protection methods, we introduce DP noise with learnable pattern, instead of traditional white Gaussian or Laplace noise, achieved through adversarial training of neural networks (NNs). This design mitigates the inherent non-invertibility of DP while effectively protecting private information. Moreover, it enables explicitly controllable security levels by adjusting the privacy budget according to specific security requirements, which is not achieved in most existing secure SemCom approaches. Experimental results demonstrate that, compared with the previous DP-based method and direct transmission, the proposed method significantly degrades the reconstruction quality for the eavesdropper, while introducing only slight degradation in task performance. Under comparable security levels, our approach achieves an LPIPS advantage of 0.06-0.29 and an FPPSR advantage of 0.10-0.86 for the legitimate user compared with the previous DP-based method.
[703] arXiv:2510.26615 (replaced) [pdf, html, other]: Title: SlideAgent: Hierarchical Agentic Framework for Multi-Page Visual Document Understanding

Yiqiao Jin, Rachneet Kaur, Zhen Zeng, Sumitra Ganesh, Srijan Kumar

Comments: ACL 2026 Main Conference. this https URL

Subjects: Computation and Language (cs.CL)

Retrieval-augmented generation (RAG) extends large language models (LLMs) with external knowledge, but it must balance limited effective context, redundant retrieved evidence, and the loss of fine-grained facts under aggressive compression. Pure compression-based approaches reduce input size but often discard fine-grained details essential for factual accuracy. We propose SARA, a hybrid RAG framework that targets answer quality under fixed token budgets by combining natural-language snippets with semantic compression vectors. SARA retains a small set of passages in text form to preserve entities and numerical values, compresses the remaining evidence into interpretable vectors for broader coverage, and uses those vectors for iterative evidence reranking. Across 9 datasets and 5 open-source LLMs spanning 3 model families (Mistral, Llama, and Gemma), SARA consistently improves answer relevance (+17.71), answer correctness (+13.72), and semantic similarity (+15.53), demonstrating the importance of integrating textual and compressed representations for robust, context-efficient RAG.
[704] arXiv:2511.00413 (replaced) [pdf, html, other]: Title: Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

Jinghui Wang, Shaojie Wang, Yinghan Cui, Xuxing Chen, Chao Wang, Liang Huang, Can Tang, Xiaojiang Zhang, Junyi Peng, Li Wan, Haotian Zhang, Bin Chen

Subjects: Machine Learning (cs.LG)

Agentic large language model (LLM) training often involves multi-turn interaction trajectories that branch into multiple execution paths due to concurrent tool use, think-mode, sub-agent, context management and other runtime designs. As a result, the tokens produced by a single task naturally form a tree-structured token trajectory with shared prefixes, rather than a linear sequence. Existing training pipelines linearize such trajectories and treat each branch independently, leading to substantial redundant computation in both forward and backward passes. We derive that averaging the loss over all branches independently is algebraically identical to a per-token weighted loss, where each token's weight equals the fraction of branches passing through it. The problem therefore reduces to computing the log-probability of every token in the prefix tree exactly once, with no repeated computation across shared prefixes: we propose DFS serialization of the tree, which visits every token exactly once, and adapt full-attention and SSM layers to ensure the resulting log-probabilities match independent per-branch calculation exactly. In practice, a single trajectory tree can be too large to fit in GPU memory; we therefore propose Redundancy-Free Tree Partitioning, which handles memory-constrained settings with zero redundant computation and peak memory bounded by a single root-to-leaf path. Together, these contributions form Tree Training, an efficient framework for training LLMs on tree-structured trajectories, achieving up to 6.2x end-to-end training speedup on dense and MoE models for both supervised fine-tuning and reinforcement learning.
[705] arXiv:2511.01458 (replaced) [pdf, html, other]: Title: When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA

Luca Carlini, Dennis Pierantozzi, Mauro Orazio Drago, Chiara Lena, Cesare Hassan, Elena De Momi, Danail Stoyanov, Sophia Bano, Mobarak I. Hoque

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Safety and reliability are critical for deploying visual question answering (VQA) systems in surgery, where incorrect or ambiguous responses can cause patient harm. A key limitation of existing uncertainty estimation methods, such as Semantic Nearest Neighbor Entropy (SNNE), is that they do not explicitly account for the conditioning question. As a result, they may assign high confidence to answers that are semantically consistent yet misaligned with the clinical question, especially under variation in question phrasing. We propose Question-Aligned Semantic Nearest Neighbor Entropy (QA-SNNE), a black-box uncertainty estimator that incorporates question-answer alignment into semantic entropy through bilateral gating. QA-SNNE measures uncertainty by weighting pairwise semantic similarities among sampled answers according to their relevance to the question, using embedding-based, entailment-based, or cross-encoder alignment strategies. To assess robustness to language variation, we construct an out-of-template rephrased version of a benchmark surgical VQA dataset, where only the question wording is modified while images and ground-truth answers remain unchanged. We evaluate QA-SNNE on five VQA models across two benchmark surgical VQA datasets in both zero-shot and parameter-efficient fine-tuned (PEFT) settings, including out-of-template questions. QA-SNNE improves AUROC on EndoVis18-VQA for two of three zero-shot models in-template (e.g., +15% for Llama3.2 and +21% for Qwen2.5) and achieves up to +8% AUROC improvement under out-of-template rephrasing, with mixed results on external validation. Overall, QA-SNNE provides a practical, model-agnostic safeguard for surgical VQA by linking semantic uncertainty to question relevance.
[706] arXiv:2511.02777 (replaced) [pdf, html, other]: Title: PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

Antonio Oroz, Matthias Nießner, Tobias Kirschstein

Comments: Project Page: this https URL Video: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present PercHead, a model for single-image 3D head reconstruction and disentangled 3D editing - two tasks that are inherently challenging due to ambiguity in plausible explanations for the same input. At the heart of our approach lies our novel perceptual loss based on DINOv2 and SAM 2.1. Unlike widely-adopted low-level losses like LPIPS, SSIM or L1, we rely on deep visual understanding of images and the resulting generalized supervision signals. We show that our new loss can be a drop-in replacement for standard losses and used to improve visual quality in high-frequency areas. We base our model architecture on Vision Transformers (ViTs), allowing us to decouple the 3D representation from the 2D input. We train our method on multi-view images for view-consistency and in-the-wild images for strong transferability to new environments. Our model achieves state-of-the-art performance in novel-view synthesis and, furthermore, exhibits exceptional robustness to extreme viewing angles. We also extend our base model to disentangled 3D editing by swapping the encoder and fine-tuning the network. A segmentation map controls geometry and either a text prompt or a reference image specifies appearance. We highlight the intuitive and powerful 3D editing capabilities through an interactive GUI. Project Page: this https URL Video: this https URL
[707] arXiv:2511.03232 (replaced) [pdf, html, other]: Title: Transformer-Progressive Mamba Network for Lightweight Image Super-Resolution

Sichen Guo, Wenjie Li, Yuanyang Liu, Guangwei Gao, Jian Yang, Chia-Wen Lin

Comments: 14 pages, 12 figures, 9 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, Mamba-based super-resolution (SR) methods have demonstrated the ability to capture global receptive fields with linear complexity, addressing the quadratic computational cost of Transformer-based SR approaches. However, existing Mamba-based methods lack fine-grained transitions across different modeling scales, which limits the efficiency of feature representation. In this paper, we propose T-PMambaSR, a lightweight SR framework that integrates window-based self-attention with Progressive Mamba. By enabling interactions among receptive fields of different scales, our method establishes a fine-grained modeling paradigm that progressively enhances feature representation without introducing additional computational cost. Furthermore, we introduce an Adaptive High-Frequency Refinement Module (AHFRM) to recover high-frequency details lost during Transformer and Mamba processing. Extensive experiments demonstrate that T-PMambaSR progressively enhances the model's receptive field and expressiveness, yielding better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost.
[708] arXiv:2511.03325 (replaced) [pdf, html, other]: Title: SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding

Mauro Orazio Drago, Luca Carlini, Pelinsu Celebi Balyemez, Dennis Pierantozzi, Chiara Lena, Cesare Hassan, Danail Stoyanov, Elena De Momi, Sophia Bano, Mobarak I. Hoque

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video Question Answering (VideoQA) in the surgical domain aims to enhance intraoperative understanding by enabling AI models to reason over temporally coherent events rather than isolated frames. Current approaches are limited to static image features, and available datasets often lack temporal annotations, ignoring the dynamics critical for accurate procedural interpretation. We propose SurgViVQA, a surgical VideoQA model that extends visual reasoning from static images to dynamic surgical scenes. It uses a Masked Video--Text Encoder to fuse video and question features, capturing temporal cues such as motion and tool--tissue interactions, which a fine-tuned large language model (LLM) then decodes into coherent answers. To evaluate its performance, we curated REAL-Colon-VQA, a colonoscopic video dataset that includes motion-related questions and diagnostic attributes, as well as out-of-template questions with rephrased or semantically altered formulations to assess model robustness. Experimental validation on REAL-Colon-VQA and the public EndoVis18-VQA dataset shows that SurgViVQA outperforms existing image-based VQA benchmark models, particularly in keyword accuracy, improving over PitVQA by +11\% on REAL-Colon-VQA and +9\% on EndoVis18-VQA. A perturbation study on the questions further confirms improved generalizability and robustness to variations in question phrasing. SurgViVQA and the REAL-Colon-VQA dataset provide a framework for temporally-aware understanding in surgical VideoQA, enabling AI models to interpret dynamic procedural contexts more effectively. Code and dataset available at this https URL.
[709] arXiv:2511.04638 (replaced) [pdf, other]: Title: Addressing divergent representations from causal interventions on neural networks

Satchel Grant, Simon Jerome Han, Alexa R. Tartaglini, Christopher Potts

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

A common approach to mechanistic interpretability is to causally manipulate model representations via targeted interventions in order to understand what those representations encode. Here we ask whether such interventions create out-of-distribution (divergent) representations, and whether this raises concerns about how faithful their resulting explanations are to the target model in its natural state. First, we demonstrate theoretically and empirically that common causal intervention techniques often do shift internal representations away from the natural distribution of the target model. Then, we provide a theoretical analysis of two cases of such divergences: "harmless" divergences that occur in the behavioral null-space of the layer(s) of interest, and "pernicious" divergences that activate hidden network pathways and cause dormant behavioral changes. Finally, in an effort to mitigate the pernicious cases, we apply and modify the Counterfactual Latent (CL) loss from Grant (2025) allowing representations from causal interventions to remain closer to the natural distribution, reducing the likelihood of harmful divergences while preserving the interpretive power of the interventions. Together, these results highlight a path towards more reliable interpretability methods.
[710] arXiv:2511.06209 (replaced) [pdf, html, other]: Title: ReProbe: Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models

Jingwei Ni, Ekaterina Fadeeva, Tianyi Wu, Mubashara Akhtar, Jiaheng Zhang, Elliott Ash, Markus Leippold, Timothy Baldwin, See-Kiong Ng, Artem Shelmanov, Mrinmaya Sachan

Comments: ACL 2026 Main

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

LLMs can solve complex tasks by generating long, multi-step reasoning chains. Test-time scaling (TTS) can further improve performance by sampling multiple variants of intermediate reasoning steps, verifying their correctness, and selecting the best steps for continuation. However, existing verification approaches, such as Process Reward Models (PRMs), are computationally expensive and require large-scale human or model-generated annotations. We propose a lightweight alternative for step-level reasoning verification based on probing the internal states of LLMs. We train a transformer-based probe that uses the internal states of a frozen LLM to estimate the credibility of its reasoning steps during generation. Annotation can be provided either by a larger LLM (e.g., DeepSeek-R1) or in a self-supervised manner by the original model itself. The probes are lightweight, containing fewer than 10M parameters. Across multiple domains, including mathematics, planning, and general knowledge question answering, our probes match or exceed the performance of PRMs that are up to 810x larger. These results suggest that LLM internal states encode confidence in their reasoning processes and can serve as reliable signals for step verification, offering a promising path toward scalable, generalizable TTS and more introspective LLMs.
[711] arXiv:2511.07752 (replaced) [pdf, html, other]: Title: Back to the Future: The Role of Past and Future Context Predictability in Incremental Language Production

Shiva Upadhye, Richard Futrell

Comments: 73 pages, 12 figures

Subjects: Computation and Language (cs.CL)

Contextual predictability shapes how we choose and encode words in production. The effects of a word's predictability given preceding or past context are generally well-understood in both production and comprehension, but studies of naturalistic production have also revealed a poorly-understood yet robust backward predictability effect of a word given only its future context, which may be linked to future planning. Across two studies of naturalistic speech, we revisit backward predictability using improved operationalizations, introducing a conceptually motivated information-theoretic measure that quantifies the information shared between a word and future context under the constraints imposed by the past context. Study 1 shows that this measure produces effects qualitatively similar to backward predictability while explaining unique variance in phonetic reduction. Study 2 examines substitution errors within a generative framework that models lexical, contextual, and communicative influences on word choice to predict the identity of the word that surfaces as an error. Within this framework, we find that past-conditioned predictability increases error likelihood, whereas future-conditioned predictability reduces it. Further, our proposed measure emerges as the strongest contextual predictor of error identity, subsuming backward predictability. Analysis of error types further reveals graded trade offs in how speakers prioritize form-, meaning-, and context-based information during lexical planning. Together, these findings illuminate how past and future context shape word choice and encoding, linking contextual predictability to mechanisms of incremental planning in sentence production.
[712] arXiv:2511.11439 (replaced) [pdf, html, other]: Title: Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis

Yiling He, Junchi Lei, Hongyu She, Shuo Shao, Xinran Zheng, Yiping Liu, Zhan Qin, Lorenzo Cavallaro

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Binary security has increasingly relied on deep learning to reason about malware behavior and program semantics. However, the performance often degrades as threat landscapes evolve and code representations shift. While continual learning (CL) offers a natural solution through sequential updates, most existing approaches rely on data replay or unconstrained updates, limiting their applicability and effectiveness in data-sensitive security environments. We propose RETROFIT, which regulates knowledge retention and adaptation with controlled forgetting at each update, without requiring historical data. Our key idea is to consolidate previously trained and newly fine-tuned models, serving as teachers of legacy and emergent knowledge, through retrospective-free parameter merging. Forgetting control is achieved by 1) constraining parameter changes to low-rank and sparse subspaces for approximate orthogonality, and 2) employing a confidence-guided arbitration mechanism to dynamically aggregate knowledge from both teachers.
Our evaluation on two representative applications demonstrates that RETROFIT consistently mitigates forgetting while maintaining adaptability. In malware detection under temporal drift, it substantially improves the retention score, from 20.2% to 38.6% over CL baselines, and exceeds the oracle upper bound on new data. In binary summarization across decompilation levels, where analyzing stripped binaries is especially challenging, RETROFIT achieves over 2x the BLEU score of transfer learning used in prior work and surpasses all baselines in cross-representation generalization.
[713] arXiv:2511.13587 (replaced) [pdf, html, other]: Title: VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping

Haotian Dong, Ye Li, Rongwei Lu, Chen Tang, Shu-Tao Xia, Zhi Wang

Comments: CVPR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Visual autoregressive (AR) generation models have demonstrated strong potential for image generation, yet their next-token-prediction paradigm introduces considerable inference latency. Although speculative decoding (SD) has been proven effective for accelerating visual AR models, its "draft one step, then verify one step" paradigm prevents a direct reduction in the number of forward passes, limiting its acceleration potential. Motivated by the interchangeability of visual tokens, we explore verification skipping in the SD process for the first time to explicitly cut the number of target model forward passes, thereby reducing inference latency. By analyzing the characteristics of the drafting stage, we observe that verification redundancy and stale feature reusability are key factors to maintain generation quality while improving speed for verification-free steps. Inspired by these two observations, we propose a novel SD framework VVS to accelerate visual AR model via partial verification skipping, which integrates three complementary modules: (1) a verification-free token selector with dynamic truncation, (2) token-level feature caching and reuse, and (3) fine-grained skipped step scheduling. Consequently, VVS reduces the number of target model forward passes by $2.8\times$ relative to vanilla AR decoding while maintaining competitive generation quality, offering a superior speed-quality trade-off over conventional SD frameworks and revealing strong potential to reshape the SD paradigm. Our code is available at this https URL.
[714] arXiv:2511.17085 (replaced) [pdf, html, other]: Title: Why Do Language Model Agents Whistleblow?

Kushal Agrawal, Frank Xiao, Guido Bergman, Asa Cooper Stickland

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The deployment of Large Language Models (LLMs) as tool-using agents causes their alignment training to manifest in new ways. Recent work finds that language models can use tools in ways that contradict the interests or explicit instructions of the user. We study LLM whistleblowing: a subset of this behavior where models disclose suspected misconduct to parties beyond the dialog boundary (e.g., regulatory agencies) without user instruction or knowledge. We introduce an evaluation suite of diverse and realistic staged misconduct scenarios to assess agents for this behavior. Across models and settings, we find that: (1) the frequency of whistleblowing varies widely across model families, (2) increasing the complexity of the task the agent is instructed to complete lowers whistleblowing tendencies, (3) nudging the agent in the system prompt to act morally substantially raises whistleblowing rates, and (4) giving the model more obvious avenues for non-whistleblowing behavior, by providing more tools and a detailed workflow to follow, decreases whistleblowing rates. Additionally, we verify the robustness of our dataset by testing for model evaluation awareness, and find that both black-box methods and probes on model activations show lower evaluation awareness in our settings than in comparable previous work.
[715] arXiv:2511.17099 (replaced) [pdf, html, other]: Title: Multivariate Sensitivity Analysis of Electric Machine Efficiency Maps and Profiles Under Design Uncertainty

Aylar Partovizadeh, Sebastian Schöps, Dimitrios Loukrezis

Subjects: Computational Engineering, Finance, and Science (cs.CE)

This work introduces the use of multivariate global sensitivity analysis for assessing the impact of uncertain electric machine design parameters on efficiency maps and profiles. Contrary to the common approach of applying variance-based (Sobol') sensitivity analysis elementwise, multivariate sensitivity analysis provides a single sensitivity index per parameter, thus allowing for a holistic estimation of parameter importance over the full efficiency map or profile. Its benefits are demonstrated on permanent magnet synchronous machine models of different fidelity. Computations based on Monte Carlo sampling and polynomial chaos expansions are compared in terms of computational cost. The sensitivity analysis results are subsequently used to simplify the models, by fixing non-influential parameters to their nominal values and allowing random variations only for influential parameters. Uncertainty estimates obtained with the full and reduced models confirm the validity of model simplification guided by multivariate sensitivity analysis.
[716] arXiv:2511.18264 (replaced) [pdf, html, other]: Title: SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors

Ruijie Fan, Junyan Ye, Huan Chen, Zilong Huang, Xiaolei Wang, Weijia Li

Comments: 14 pages, 12 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing satellite video tracking methods often struggle with generalization, requiring scenario-specific training to achieve satisfactory performance, and are prone to track loss in the presence of occlusion. To address these challenges, we propose SatSAM2, a zero-shot satellite video tracker built on SAM2, designed to adapt foundation models to the remote sensing domain. SatSAM2 introduces two core modules: a Kalman Filter-based Constrained Motion Module (KFCMM) to exploit temporal motion cues and suppress drift, and a Motion-Constrained State Machine (MCSM) to regulate tracking states based on motion dynamics and reliability. To support large-scale evaluation, we propose MatrixCity Video Object Tracking (MVOT), a synthetic benchmark containing 1,500+ sequences and 157K annotated frames with diverse viewpoints, illumination, and occlusion conditions. Extensive experiments on two satellite tracking benchmarks and MVOT show that SatSAM2 outperforms both traditional and foundation model-based trackers, including SAM2 and its variants. Notably, on the OOTB dataset, SatSAM2 achieves a 5.84% AUC improvement over state-of-the-art methods. Our code and dataset will be publicly released to encourage further research.
[717] arXiv:2511.18513 (replaced) [pdf, html, other]: Title: LRDUN: A Low-Rank Deep Unfolding Network for Efficient Spectral Compressive Imaging

He Huang, Yujun Guo, Wei He

Comments: 17 pages, 16 figures,

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep unfolding networks (DUNs) have achieved remarkable success and become the mainstream paradigm for spectral compressive imaging (SCI) reconstruction. Existing DUNs are derived from full-HSI imaging models, where each stage operates directly on the high-dimensional HSI, refining the entire data cube based on the single 2D coded measurement. However, this paradigm leads to computational redundancy and suffers from the ill-posed nature of mapping 2D residuals back to 3D space of HSI. In this paper, we propose two novel imaging models corresponding to the spectral basis and subspace image by explicitly integrating low-rank (LR) decomposition with the sensing model. Compared to recovering the full HSI, estimating these compact low-dimensional components significantly mitigates the ill-posedness. Building upon these novel models, we develop the Low-Rank Deep Unfolding Network (LRDUN), which jointly solves the two subproblems within an unfolded proximal gradient descent (PGD) framework. Furthermore, we introduce a Generalized Feature Unfolding Mechanism (GFUM) that decouples the physical rank in the data-fidelity term from the feature dimensionality in the prior module, enhancing the representational capacity and flexibility of the network. Extensive experiments on simulated and real datasets demonstrate that the proposed LRDUN achieves state-of-the-art (SOTA) reconstruction quality with significantly reduced computational cost.
[718] arXiv:2511.18539 (replaced) [pdf, other]: Title: TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting

Lingyu Jiang, Lingyu Xu, Peiran Li, Dengzhe Hou, Qianwen Ge, Dingyi Zhuang, Shuo Xing, Wenjing Chen, Xiangbo Gao, Ting-Hsuan Chen, Xueying Zhan, Xin Zhang, Ziming Zhang, Zhengzhong Tu, Michael Zielewski, Kazunori Yamada, Fangzhou Lin

Comments: 15 pages, 5 figures, 6 tables

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

We propose TimePre, a simple framework that unifies the efficiency of Multilayer Perceptron (MLP)-based models with the distributional flexibility of Multiple Choice Learning (MCL) for Probabilistic Time-Series Forecasting (PTSF). Stabilized Instance Normalization (SIN), the core of TimePre, is a normalization layer that explicitly addresses the trade-off among accuracy, efficiency, and stability. SIN stabilizes the hybrid architecture by correcting channel-wise statistical shifts, thereby resolving the catastrophic hypothesis collapse. Extensive experiments on six benchmark datasets demonstrate that TimePre achieves state-of-the-art (SOTA) accuracy on key probabilistic metrics. Critically, TimePre achieves inference speeds that are orders of magnitude faster than sampling-based models, and is more stable than prior MCL approaches.
[719] arXiv:2511.20697 (replaced) [pdf, other]: Title: Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores

Congren Dai, Yue Yang, Krinos Li, Huichi Zhou, Shijie Liang, Bo Zhang, Enyang Liu, Ge Jin, Hongran An, Haosen Zhang, Peiyuan Jing, Kinhei Lee, Z henxuan Zhang, Xiaobing Li, Maosong Sun

Comments: Accepted to ACL 2026 Main Conference

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

Understanding complete musical scores entails integrated reasoning over pitch, rhythm, harmony, and large-scale structure, yet the ability of Large Language Models and Vision--Language Models to interpret full musical notation remains insufficiently examined. We introduce Musical Score Understanding Benchmark (MSU-Bench), a human-curated benchmark for score-level musical understanding across textual (ABC notation) and visual (PDF) modalities. MSU-Bench contains 1,800 generative question-answer pairs from works by Bach, Beethoven, Chopin, Debussy, and others, organised into four levels of increasing difficulty, ranging from onset information to texture and form. Evaluations of more than fifteen state-of-the-art models, in both zero-shot and fine-tuned settings, reveal pronounced modality gaps, unstable level-wise performance, and challenges in maintaining multilevel correctness. Fine-tuning substantially improves results across modalities while preserving general knowledge, positioning MSU-Bench as a robust foundation for future research in multimodal reasoning. The benchmark and code are available at this https URL.
[720] arXiv:2511.20834 (replaced) [pdf, html, other]: Title: Spira: Exploiting Voxel Data Structural Properties for Efficient Sparse Convolution in Point Cloud Networks

Dionysios Adamopoulos, Anastasia Poulopoulou, Georgios Goumas, Christina Giannoula

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR); Machine Learning (cs.LG); Performance (cs.PF)

Sparse Convolution (SpC) powers 3D point cloud networks widely used in autonomous driving and augmented/virtual reality. SpC builds a kernel map that stores mappings between input voxel coordinates, output coordinates, and weight offsets, then uses this map to compute feature vectors for output coordinates. Our work identifies three key properties of voxel coordinates: they are integer-valued, bounded within a limited spatial range, and geometrically continuous, i.e., neighboring voxels on the same object surface are highly likely to exist at small spatial offsets from each other. Prior SpC engines do not fully exploit these properties and suffer from high pre-processing and post-processing overheads during kernel map construction. To address this, we design Spira, the first voxel-property-aware SpC engine for GPUs. Spira proposes (i) a high-performance one-shot search algorithm that builds the kernel map with no pre-processing and high data locality, (ii) an effective packed-native processing scheme that accesses packed voxel coordinates at low cost, (iii) a flexible dual-dataflow execution mechanism that efficiently computes output feature vectors by adapting to layer characteristics, and (iv) a network-wide parallelization strategy that builds kernel maps for all SpC layers concurrently at network start. Our evaluation shows that Spira significantly outperforms prior state-of-the-art SpC engines by 1.68x on average and up to 3.04x for end-to-end inference, and by 2.11x on average and up to 3.44x for layer-wise execution across diverse layer configurations. The source code of Spira is freely available at this http URL.
[721] arXiv:2511.21777 (replaced) [pdf, html, other]: Title: Artificial intelligence for methane detection: from continuous monitoring to verified mitigation

Gonzalo Mateo-Garcia, Anna Allen, Itziar Irakulis-Loitxate, Manuel Montesino-San Martin, Marc Watine, Cynthia Randles, Tharwat Mokalled, Alma Raunak, Carol Castañeda-Martinez, Juan E. Jonhson, Javier Gorroño, James Requeima, Claudio Cifarelli, Luis Guanter, Richard E. Turner, Manfredi Caltagirone

Subjects: Machine Learning (cs.LG)

Methane is a potent greenhouse gas, responsible for roughly 30\% of warming since pre-industrial times. A small number of large point sources account for a disproportionate share of emissions, creating an opportunity for substantial reductions by targeting relatively few sites. Detection and attribution of large emissions at scale for notification to asset owners remains challenging. Here, we introduce MARS-S2L, a machine learning model that detects methane emissions in publicly available multispectral satellite imagery. Trained on a manually curated dataset of over 80,000 images, the model provides high-resolution detections every two days, enabling facility-level attribution and identifying 78\% of plumes with an 8\% false positive rate at 697 previously unseen sites. Deployed operationally, MARS-S2L has issued 1,015 notifications to stakeholders in 20 countries, enabling verified, permanent mitigation of six persistent emitters, including a previously unknown site in Libya. These results demonstrate a scalable pathway from satellite detection to quantifiable methane mitigation.
[722] arXiv:2511.21978 (replaced) [pdf, html, other]: Title: PAT3D: Physics-Augmented Text-to-3D Scene Generation

Guying Lin, Kemeng Huang, Michael Liu, Ruihan Gao, Hanke Chen, Lyuhao Chen, Beijia Lu, Taku Komura, Yuan Liu, Jun-Yan Zhu, Minchen Li

Comments: 19 pages, 12 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce PAT3D, the first physics-augmented text-to-3D scene generation framework that integrates vision-language models with physics-based simulation to produce physically plausible, simulation-ready, and intersection-free 3D scenes. Given a text prompt, PAT3D generates 3D objects, infers their spatial relations, and organizes them into a hierarchical scene tree, which is then converted into initial conditions for simulation. A differentiable rigid-body simulator ensures realistic object interactions under gravity, driving the scene toward static equilibrium without interpenetrations. To further enhance scene quality, we introduce a simulation-in-the-loop optimization procedure that guarantees physical stability and non-intersection, while improving semantic consistency with the input prompt. Experiments demonstrate that PAT3D substantially outperforms prior approaches in physical plausibility, semantic consistency, and visual quality. Beyond high-quality generation, PAT3D uniquely enables simulation-ready 3D scenes for downstream tasks such as scene editing and robotic manipulation. Code and data are available at: this https URL.
[723] arXiv:2511.22793 (replaced) [pdf, html, other]: Title: GSpaRC: Gaussian Splatting for Real-time Reconstruction of RF Channels

Bhavya Sai Nukapotula, Rishabh Tripathi, Seth Pregler, Dileep Kalathil, Srinivas Shakkottai, Theodore S. Rappaport

Subjects: Machine Learning (cs.LG)

Channel state information (CSI) is essential for adaptive beamforming and maintaining robust links in wireless communication systems. However, acquiring CSI incurs significant overhead, consuming up to 25\% of spectrum resources in 5G networks due to frequent pilot transmissions at sub-millisecond intervals. Recent approaches aim to reduce this burden by reconstructing CSI from spatiotemporal RF measurements, such as signal strength and direction-of-arrival. While effective in offline settings, these methods often suffer from inference latencies in the 5--100~ms range, making them impractical for real-time systems. We present GSpaRC: Gaussian Splatting for Real-time Reconstruction of RF Channels, the first algorithm to break the 1 ms latency barrier while maintaining high accuracy. GSpaRC represents the RF environment using a compact set of 3D Gaussian primitives, each parameterized by a lightweight neural model augmented with physics-informed features such as distance-based attenuation. Unlike traditional vision-based splatting pipelines, GSpaRC is tailored for RF reception: it employs an equirectangular projection onto a hemispherical surface centered at the receiver to reflect omnidirectional antenna behavior. A custom CUDA pipeline enables fully parallelized directional sorting, splatting, and rendering across frequency and spatial dimensions. Evaluated on multiple RF datasets, GSpaRC achieves similar CSI reconstruction fidelity to recent state-of-the-art methods while reducing training and inference time by over an order of magnitude. By trading modest GPU computation for a substantial reduction in pilot overhead, GSpaRC enables scalable, low-latency channel estimation suitable for deployment in 5G and future wireless systems. The code is available here: \href{this https URL}{GSpaRC}.
[724] arXiv:2511.23159 (replaced) [pdf, other]: Title: AI for software engineering: from probable to provable

Bertrand Meyer

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Vibe coding, the much-touted use of AI techniques for programming, faces two overwhelming obstacles: the difficulty of specifying goals ("prompt engineering" is a form of requirements engineering, one of the toughest disciplines of software engineering); and the hallucination phenomenon. Programs are only useful if they are correct or very close to correct.
The solution? Combine the creativity of artificial intelligence with the rigor of formal specification methods and the power of formal program verification, supported by modern proof tools.
[725] arXiv:2512.03048 (replaced) [pdf, html, other]: Title: The Specification Trap: Why Static Value Alignment Alone Is Insufficient for Robust Alignment

Austin Spizzirri

Comments: 31 pages, no figures. Version 5. First posted as arXiv:2512.03048 in November 2025. First in a six-paper research program on AI alignment

Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Static content-based AI value alignment is insufficient for robust alignment under capability scaling, distributional shift, and increasing autonomy. This holds for any approach that treats alignment as optimizing toward a fixed formal value-object, whether reward function, utility function, constitutional principles, or learned preference representation. Three philosophical results create compounding difficulties: Hume's is-ought gap (behavioral data underdetermines normative content), Berlin's value pluralism (human values resist consistent formalization), and the extended frame problem (any value encoding will misfit future contexts that advanced AI creates). RLHF, Constitutional AI, inverse reinforcement learning, and cooperative assistance games each instantiate this specification trap, and their failure modes reflect structural vulnerabilities, not merely engineering limitations that better data or algorithms will straightforwardly resolve. Known workarounds for individual components face mutually reinforcing difficulties when the specification is closed: the moment it ceases to update from the process it governs. Drawing on compatibilist philosophy, the paper argues that behavioral compliance under training conditions does not guarantee robust alignment under novel conditions, and that this gap grows with system capability. For value-laden autonomous systems, known closed approaches face structural vulnerabilities that worsen with capability. The constructive burden shifts to open, developmentally responsive approaches, though whether such approaches can be achieved remains an empirical question.
[726] arXiv:2512.03465 (replaced) [pdf, other]: Title: Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

Robert Dilworth

Comments: 20 pages, 8 figures, 2 tables

Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Information Retrieval (cs.IR)

In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments -- comments that were later alchemized into $\textit{TraceTarnish}$ data -- to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features -- features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues -- function-word frequencies, content-word distributions, and the Type-Token Ratio -- serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.
[727] arXiv:2512.05591 (replaced) [pdf, html, other]: Title: Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Zhenpeng Su, Leiyu Pan, Minxuan Lv, Tiehua Mei, Zijia Lin, Yuntao Li, Wenping Hu, Ruiming Tang, Kun Gai, Guorui Zhou

Comments: This paper has been accepted by ACL2026

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Large language model post-training relies on reinforcement learning to improve model capability and alignment quality. However, the off-policy training paradigm introduces distribution shift, which often pushes the policy beyond the trust region, leading to training instabilities manifested as fluctuations in policy entropy and unstable gradients. Although PPO-Clip mitigates this issue through importance clipping, it still overlooks the global distributional shift of actions. To address these challenges, we propose using the entropy ratio between the current and previous policies as a new global metric that effectively quantifies the relative change in policy exploration throughout updates. Building on this metric, we introduce an \textbf{Entropy Ratio Clipping} (ERC) mechanism that imposes bidirectional constraints on the entropy ratio. This stabilizes policy updates at the global distribution level and compensates for the inability of PPO-clip to regulate probability shifts of un-sampled actions. We integrate ERC into both DAPO and GPPO reinforcement learning algorithms. Experiments across multiple benchmarks show that ERC consistently improves performance.
[728] arXiv:2512.06171 (replaced) [pdf, html, other]: Title: Automated Annotation of Shearographic Measurements Enabling Weakly Supervised Defect Detection

Jessica Plassmann, Nicolas Schuler, Michael Schuth, Georg von Freymann

Comments: 13 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Shearography is an interferometric technique sensitive to surface displacement gradients, providing high sensitivity for detecting subsurface defects in safety-critical components. A key limitation to industrial adoption is the lack of high-quality annotated datasets, since manual labeling remains labor-intensive, subjective, and difficult to standardize. We present an automated labeling pipeline that generates candidate defect bounding boxes with Grounded DINO, refines them using SAM masks, and exports YOLO-format labels for downstream detector training. Quantitative evaluation shows the generated boxes are suitable for weakly supervised learning, while high-resolution masks provide qualitative visualization. This approach reduces manual effort and supports scalable dataset creation for robust industrial defect detection.
[729] arXiv:2512.06834 (replaced) [pdf, other]: Title: COIVis: Eye-tracking-based Visual Exploration of Concept Learning in MOOC Videos

Zhiguang Zhou, Ruiqi Yu, Yuming Ma, Hao Ni, Guojun Li, Li Ye, Xiaoying Wang, Yize Li, Yigang Wang, Yong Wang

Comments: 17pages, 8 figures

Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)

Massive Open Online Courses (MOOCs) make high-quality instruction accessible. However, the lack of face-to-face interaction makes it difficult for instructors to obtain feedback on learners' performance and provide more effective instructional guidance. Traditional analytical approaches, such as clickstream logs or quiz scores, capture only coarse-grained learning outcomes and offer limited insight into learners' moment-to-moment cognitive states. In this study, we propose COIVis, an eye tracking-based visual analytics system that supports concept-level exploration of learning processes in MOOC videos. COIVis first extracts course concepts from multimodal video content and aligns them with the temporal structure and screen space of the lecture, defining Concepts of Interest (COIs), which anchor abstract concepts to specific spatiotemporal regions. Learners' gaze trajectories are transformed into COI sequences, and five interpretable learner-state features -- Attention, Cognitive Load, Interest, Preference, and Synchronicity -- are computed at the COI level based on eye tracking metrics. Building on these representations, COIVis provides a narrative, multi-view visualization enabling instructors to move from cohort-level overviews to individual learning paths, quickly locate problematic concepts, and compare diverse learning strategies. We evaluate COIVis through two case studies and in-depth user-feedback interviews. The results demonstrate that COIVis effectively provides instructors with valuable insights into the consistency and anomalies of learners' learning patterns, thereby supporting timely and personalized interventions for learners and optimizing instructional design.
[730] arXiv:2512.09111 (replaced) [pdf, html, other]: Title: Language-Conditioned Safe Trajectory Generation for Spacecraft Rendezvous

Yuji Takubo, Arpit Dwivedi, Sukeerth Ramkumar, Luis A. Pabon, Daniele Gammelli, Marco Pavone, Simone D'Amico

Comments: 42 pages, 12 figures. Submitted to AIAA Journal of Guidance, Control, and Dynamics

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

Reliable real-time trajectory generation is essential for future autonomous spacecraft. While recent progress in nonconvex guidance and control is paving the way for onboard autonomous trajectory optimization, these methods still rely on extensive expert input (e.g., waypoints, constraints, mission timelines, etc.), which limits operational scalability in complex missions such as rendezvous and proximity operations. This paper introduces SAGES (Semantic Autonomous Guidance Engine for Space), a trajectory-generation framework that translates natural-language commands into spacecraft trajectories that reflect high-level intent while respecting nonconvex constraints. Experiments in two settings (fault-tolerant proximity operations with continuous-time constraint enforcement and a free-flying robotic platform) demonstrate that SAGES reliably produces trajectories aligned with human commands, achieving over 90% semantic-behavioral consistency across diverse behavior modes. Ultimately, this work marks an initial step toward language-conditioned, constraint-aware spacecraft trajectory generation, enabling operators to interactively guide both safety and behavior through intuitive natural-language commands with reduced expert burden.
[731] arXiv:2512.09292 (replaced) [pdf, other]: Title: Identifying Bias in Machine-generated Text Detection

Kevin Stowe, Svetlana Afanaseva, Rodolfo Raimundo, Yitao Sun, Kailash Patil

Comments: 13 pages, 2 figures, 7 tables

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was generated using a model or written by a person. While detection models show strong performance, they have the capacity to cause significant negative impacts. We explore potential biases in English machine-generated text detection systems. We curate a dataset of student essays and assess 16 different detection systems for bias across four attributes: gender, race/ethnicity, English-language learner (ELL) status, and economic status. We evaluate these attributes using regression-based models to determine the significance and power of the effects, as well as performing subgroup analysis. We find that while biases are generally inconsistent across systems, there are several key issues: several models tend to classify disadvantaged groups as machine-generated, ELL essays are more likely to be classified as machine-generated, economically disadvantaged students' essays are less likely to be classified as machine-generated, and non-White ELL essays are disproportionately classified as machine-generated relative to their White counterparts. Finally, we perform human annotation and find that while humans perform generally poorly at the detection task, they show no significant biases on the studied attributes.
[732] arXiv:2512.18908 (replaced) [pdf, html, other]: Title: Multimodal Bayesian Network for Robust Assessment of Casualties in Autonomous Triage

Szymon Rusiecki, Cecilia G. Morales, Kimberly Elenberg, Leonard Weiss, Artur Dubrawski

Comments: Presented at NeurIPS 2025 Workshop: Structured Probabilistic Inference & Generative Modeling

Subjects: Artificial Intelligence (cs.AI)

Mass Casualty Incidents can overwhelm emergency medical systems and resulting delays or errors in the assessment of casualties can lead to preventable deaths. We present a decision support framework that fuses outputs from multiple computer vision models, estimating signs of severe hemorrhage, respiratory distress, physical alertness, or visible trauma, into a Bayesian network constructed entirely from expert-defined rules. Unlike traditional data-driven models, our approach does not require training data, supports inference with incomplete information, and is robust to noisy or uncertain observations. We report performance for two missions involving 11 and 9 casualties, respectively, where our Bayesian network model substantially outperformed vision-only baselines during evaluation of our system in the DARPA Triage Challenge (DTC) field scenarios. The accuracy of physiological assessment improved from 15% to 42% in the first scenario and from 19% to 46% in the second, representing nearly threefold increase in performance. More importantly, overall triage accuracy increased from 14% to 53% in all patients, while the diagnostic coverage of the system expanded from 31% to 95% of the cases requiring assessment. These results demonstrate that expert-knowledge-guided probabilistic reasoning can significantly enhance automated triage systems, offering a promising approach to supporting emergency responders in MCIs. This approach enabled Team Chiron to achieve 4th place out of 11 teams during the 1st physical round of the DTC.
[733] arXiv:2512.19995 (replaced) [pdf, html, other]: Title: Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Ming Li, Chenrui Fan, Yize Cheng, Soheil Feizi, Tianyi Zhou

Comments: ACL2026, camera-ready

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models increasingly expose reasoning traces, yet their underlying cognitive structure and steps remain difficult to identify and analyze beyond surface-level statistics. We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models), a scalable framework that explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, Verify, etc. When applied to mathematical problem solving by diverse models, this abstraction reveals reproducible thinking dynamics and structural differences between reasoning and non-reasoning models, which are not apparent from token-level views. We further present two diagnostic case studies showing that exploration functions as a critical branching step associated with correctness, and that efficiency-oriented methods selectively suppress evaluative feedback steps rather than uniformly shortening responses. Together, our results demonstrate that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
[734] arXiv:2512.20288 (replaced) [pdf, other]: Title: UbiQVision: Quantifying Uncertainty in XAI for Image Recognition

Akshat Dubey, Aleksandar Anžel, Bahar İlgen, Georges Hattab

Comments: Under Review. Updated manuscript. Feedback from reviewers incorporated

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent advances in deep learning have led to its widespread adoption across diverse domains, including medical imaging. This progress is driven by increasingly sophisticated model architectures, such as ResNets, Vision Transformers, and Hybrid Convolutional Neural Networks, that offer enhanced performance at the cost of greater complexity. This complexity often compromises model explainability and interpretability. SHAP has emerged as a prominent method for providing interpretable visualizations that aid domain experts in understanding model predictions. However, SHAP explanations can be unstable and unreliable in the presence of epistemic and aleatoric uncertainty. In this study, we address this challenge by using Dirichlet posterior sampling and Dempster-Shafer theory to quantify the uncertainty that arises from these unstable explanations in medical imaging applications. The framework uses a belief, plausible, and fusion map approach alongside statistical quantitative analysis to produce quantification of uncertainty in SHAP. Furthermore, we evaluated our framework on three medical imaging datasets with varying class distributions, image qualities, and modality types which introduces noise due to varying image resolutions and modality-specific aspect covering the examples from pathology, ophthalmology, and radiology, introducing significant epistemic uncertainty.
[735] arXiv:2512.22274 (replaced) [pdf, html, other]: Title: GeCo: Evaluating Geometric Consistency for Video Generation via Motion and Structure

Leslie Gu, Junhwa Hur, Charles Herrmann, Fangneng Zhan, Todd Zickler, Deqing Sun, Hanspeter Pfister

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce GeCo, a geometry-grounded metric for jointly detecting geometric deformation and occlusion-inconsistency artifacts in static scenes. By fusing residual motion and depth priors, GeCo produces interpretable, dense consistency maps that reveal these artifacts. We use GeCo to systematically benchmark recent video generation models, uncovering common failure modes, and further employ it as a training-free guidance loss to reduce deformation artifacts during video generation.
[736] arXiv:2512.22753 (replaced) [pdf, other]: Title: From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software

Moustapha Awwalou Diouf, Maimouna Tamah Diao, Iyiola Emmanuel Olatunji, Abdoul Kader Kaboré, Jordan Samhi, Gervais Mendy, Samuel Ouya, Jacques Klein, Tegawendé F. Bissyandé

Subjects: Software Engineering (cs.SE)

LLMs democratize software engineering by enabling non-programmers to create applications, but this same accessibility fundamentally undermines security assumptions that have guided software engineering for decades. We show in this work how publicly available LLMs can be socially engineered to transform novices into capable attackers, challenging the foundational principle that exploitation requires technical expertise. To that end, we propose RSA (Role-assignment, Scenario-pretexting, and Action-solicitation), a pretexting strategy that manipulates LLMs into generating functional exploits despite their safety mechanisms. Testing against Odoo -- a widely used ERP platform, we evaluated five mainstream LLMs (GPT-4o, Gemini, Claude, Microsoft Copilot, and DeepSeek) and successfully exploited every tested CVE: at least one LLM produced a functional exploit for each within 3-5 prompting rounds. While prior work~\cite{jin2025good} found LLM-assisted attacks difficult and requiring manual effort, we demonstrate that this overhead can be eliminated entirely.
Our findings invalidate core software engineering security principles: the distinction between technical and non-technical actors no longer provides valid threat models; technical complexity of vulnerability descriptions offers no protection when LLMs can abstract it away; and traditional security boundaries dissolve when the same tools that build software can be manipulated to break it. This represents a paradigm shift in software engineering -- we must redesign security practices for an era where exploitation requires only the ability to craft prompts, not understand code.
Artifacts available at: this https URL.
[737] arXiv:2601.02438 (replaced) [pdf, html, other]: Title: Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection

Yun Bian, Yi Chen, HaiQuan Wang, ShiHao Li, Zhe Cui

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security defects. Existing multimodal methods typically fuse Natural Code Sequence (NCS) representations extracted by pretrained models with Code Property Graph (CPG) representations extracted by graph neural networks, under the implicit assumption that introducing an additional modality necessarily yields information gain. Through empirical analysis, we demonstrate the limitations of this assumption: pretrained models already encode substantial structural information implicitly, leading to strong overlap between the two modalities; moreover, graph encoders are generally less effective than pretrained language models in feature extraction. As a result, naive fusion not only struggles to obtain complementary signals but can also dilute effective discriminative cues due to noise propagation. To address these challenges, we propose a task-conditioned complementary fusion strategy that uses Fisher information to quantify task relevance, transforming cross-modal interaction from full-spectrum matching into selective fusion within a task-sensitive subspace. Our theoretical analysis shows that, under an isotropic perturbation assumption, this strategy significantly tightens the upper bound on the output error. Based on this insight, we design the TaCCS-DFA framework, which combines online low-rank Fisher subspace estimation with an adaptive gating mechanism to enable efficient task-oriented fusion. Experiments on the BigVul, Devign, and ReVeal benchmarks demonstrate that TaCCS-DFA delivers up to a 6.3-point gain in F1 score with only a 3.4% increase in inference latency, while maintaining low calibration error.
[738] arXiv:2601.03248 (replaced) [pdf, html, other]: Title: STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning

Juntong Ni, Shiyu Wang, Qi He, Ming Jin, Wei Jin

Comments: ACL 2026 Main, we release our code publicly at this https URL

Subjects: Computation and Language (cs.CL)

Spatio-temporal reasoning in time series involves the explicit synthesis of temporal dynamics, spatial dependencies, and textual context. This capability is vital for high-stakes decision-making in systems such as traffic networks, power grids, and disease propagation. However, the field remains underdeveloped because most existing works prioritize predictive accuracy over reasoning. To address the gap, we introduce ST-Bench, a benchmark consisting of four core tasks, including etiological reasoning, entity identification, correlation reasoning, and in-context forecasting, developed via a network SDE-based multi-agent data synthesis pipeline. We then propose STReasoner, which empowers LLM to integrate time series, graph structure, and text for explicit reasoning. To promote spatially grounded logic, we introduce S-GRPO, a reinforcement learning algorithm that rewards performance gains specifically attributable to spatial information. Experiments show that STReasoner achieves average accuracy gains between 17% and 135% at only 0.004X the cost of proprietary models and generalizes robustly to real-world data.
[739] arXiv:2601.05019 (replaced) [pdf, html, other]: Title: Hán Dān Xué Bù (Mimicry) or Qīng Chū Yú Lán (Mastery)? A Cognitive Perspective on Reasoning Distillation in Large Language Models

Yueqing Hu, Xinyang Peng, Shuting Peng, Hanqi Wang, Tianhong Wang

Comments: 7 pages, 7 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Recent Large Reasoning Models trained via reinforcement learning exhibit a "natural" alignment with human cognitive costs. However, we show that the prevailing paradigm of reasoning distillation -- training student models to mimic these traces via Supervised Fine-Tuning (SFT) -- fails to transmit this cognitive structure. Testing the "Hán Dān Xué Bù" (Superficial Mimicry) hypothesis across 14 models, we find that distillation induces a "Functional Alignment Collapse": while teacher models mirror human difficulty scaling ($\bar{r}=0.64$), distilled students significantly degrade this alignment ($\bar{r}=0.34$), often underperforming their own pre-distillation baselines ("Negative Transfer"). Our analysis suggests that SFT induces a "Cargo Cult" effect, where students ritualistically replicate the linguistic form of reasoning (verbosity) without internalizing the teacher's dynamic resource allocation policy. Consequently, reasoning distillation decouples computational cost from cognitive demand, revealing that human-like cognition is an emergent property of active reinforcement, not passive imitation.
[740] arXiv:2601.05127 (replaced) [pdf, html, other]: Title: LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization

Etai Sella, Yoav Baron, Hadar Averbuch-Elor, Daniel Cohen-Or, Or Patashnik

Comments: Accepted to SIGGRAPH 2026. Project Page: this https URL

Subjects: Graphics (cs.GR)

Recent diffusion-based image editing methods commonly rely on text or high-level instructions to guide the generation process, offering intuitive but coarse control. In contrast, we focus on explicit, prompt-free editing, where the user directly specifies the modification by cropping and pasting an object or sub-object into a chosen location within an image. This operation affords precise spatial and visual control, yet it introduces a fundamental challenge: preserving the identity of the pasted object while harmonizing it with its new context. We observe that attention maps in diffusion-based editing models inherently govern whether image regions are preserved or adapted for coherence. Building on this insight, we introduce LooseRoPE, a saliency-guided modulation of rotational positional encoding (RoPE) that loosens the positional constraints to continuously control the attention field of view. By relaxing RoPE in this manner, our method smoothly steers the model's focus between faithful preservation of the input image and coherent harmonization of the inserted object, enabling a balanced trade-off between identity retention and contextual blending. Our approach provides a flexible and intuitive framework for image editing, achieving seamless compositional results without textual descriptions or complex user input.
[741] arXiv:2601.05563 (replaced) [pdf, html, other]: Title: What's Left Unsaid? Detecting and Correcting Misleading Omissions in Multimodal News Previews

Fanxiao Li, Jiaying Wu, Tingchao Fu, Dayang Li, Herun Wan, Wei Zhou, Min-Yen Kan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

Even when factually correct, social-media news previews (image-headline pairs) can induce interpretation drift: by selectively omitting crucial context, they lead readers to form judgments that diverge from what the full article supports. This covert harm is subtler than explicit misinformation, yet remains underexplored. To address this gap, we develop a multi-stage pipeline that simulates preview-based and context-based understanding, enabling construction of the MM-Misleading benchmark. Using MM-Misleading, we systematically evaluate open-source LVLMs and uncover pronounced blind spots in omission-based misleadingness detection. We further propose OMGuard, which combines (1) Interpretation-Aware Fine-Tuning for misleadingness detection and (2) Rationale-Guided Misleading Content Correction, where explicit rationales guide headline rewriting to reduce misleading impressions. Experiments show that OMGuard lifts an 8B model's detection accuracy to the level of a 235B LVLM while delivering markedly stronger end-to-end correction. Further analysis shows that misleadingness usually arises from local narrative shifts, such as missing background, instead of global frame changes, and identifies image-driven cases where text-only correction fails, underscoring the need for visual interventions.
[742] arXiv:2601.06033 (replaced) [pdf, html, other]: Title: How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape

Patrick Gage Kelley, Steven Rousso-Schindler, Renee Shelby, Kurt Thomas, Allison Woodruff

Comments: 21 pages, 4 tables, 1 figure

Journal-ref: In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26). Association for Computing Machinery, New York, NY, USA, Article 1316, 1-21

Subjects: Human-Computer Interaction (cs.HC); Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Generative AI (GenAI) is a powerful technology poised to reshape Trust & Safety. While misuse by attackers is a growing concern, its defensive capacity remains underexplored. This paper examines these effects through a qualitative study with 43 Trust & Safety experts across five domains: child safety, election integrity, hate and harassment, scams, and violent extremism. Our findings characterize a landscape in which GenAI empowers both attackers and defenders. GenAI dramatically increases the scale and speed of attacks, lowering the barrier to entry for creating harmful content, including sophisticated propaganda and deepfakes. Conversely, defenders envision leveraging GenAI to detect and mitigate harmful content at scale, conduct investigations, deploy persuasive counternarratives, improve moderator wellbeing, and offer user support. This work provides a strategic framework for understanding GenAI's impact on Trust & Safety and charts a path for its responsible use in creating safer online environments.
[743] arXiv:2601.06428 (replaced) [pdf, html, other]: Title: BackPlay: Head-Only Look-Back Self-Correction for Diffusion Language Models

Liming Liu, Binxuan Huang, Zixuan Zhang, Xin Liu, Bing Yin, Tuo Zhao

Comments: 16 pages

Subjects: Machine Learning (cs.LG)

Diffusion Language Models (DLMs) decode multiple tokens in parallel, but aggressive multi-token decoding amplifies cross-token dependency errors and can sharply degrade generation quality. We propose BackPlay, a frozen-backbone self-correction framework that trains only a lightweight correction head on a finetuned DLM without updating any backbone or adapter parameters. Because the head is trained on errors produced by the same frozen generator used at inference time, its training distribution aligns with the error patterns of the deployed model. We further introduce Look-back Correction, a training mechanism that injects predictions from earlier, more corrupted denoising states into later, richer contexts, enabling the head to leverage later context to detect mistakes made in earlier generation steps. During inference, BackPlay periodically revisits previously generated tokens through selective remasking and regeneration to limit error accumulation. Across mathematical reasoning and code generation benchmarks, BackPlay improves the speed--quality trade-off of the underlying DLM under multi-token decoding.
[744] arXiv:2601.06498 (replaced) [pdf, html, other]: Title: Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection

Minghui Jia, Qichao Zhang, Ali Luo, Linjing Li, Shuo Ye, Hailing Lu, Wen Hou, Dongbin Zhao

Comments: Accepted to ACL 2026 Main Conference

Subjects: Computation and Language (cs.CL); Instrumentation and Methods for Astrophysics (astro-ph.IM)

Due to the limited generalization and interpretability of deep learning classifiers, The final vetting of rare celestial object candidates still relies on expert visual inspection--a manually intensive process. In this process, astronomers leverage specialized tools to analyze spectra and construct reliable catalogs. However, this practice has become the primary bottleneck, as it is fundamentally incapable of scaling with the data deluge from modern spectroscopic surveys. To bridge this gap, we propose Spec-o3, a tool-augmented vision-language agent that performs astronomer-aligned spectral inspection via interleaved multimodal chain-of-thought reasoning. Spec-o3 is trained with a two-stage post-training recipe: cold-start supervised fine-tuning on expert inspection trajectories followed by outcome-based reinforcement learning on rare-type verification tasks. Evaluated on five rare-object identification tasks from LAMOST, Spec-o3 establishes a new State-of-the-Art, boosting the macro-F1 score from 28.3 to 76.5 with a 7B parameter base model and outperforming both proprietary VLMs and specialized deep models. Crucially, the agent demonstrates strong generalization to unseen inspection tasks across survey shifts (from LAMOST to SDSS/DESI). Expert evaluations confirm that its reasoning traces are coherent and physically consistent, supporting transparent and trustworthy decision-making. Code, data, and models are available at this https URL.
[745] arXiv:2601.07262 (replaced) [pdf, html, other]: Title: ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge Evolution

Jihong Wang, Jiamu Zhou, Weiming Zhang, Teng Wang, Weiwen Liu, Zhuosheng Zhang, Xingyu Lou, Weinan Zhang, Huarong Deng, Jun Wang

Subjects: Human-Computer Interaction (cs.HC)

With the advancement of vision-language models, web automation has made significant progress. However, deploying autonomous agents in real-world settings remains challenging, primarily due to site heterogeneity, where generalist models lack domain-specific priors for diverse interfaces, and long-horizon instability, characterized by the accumulation of decision drift over extended interactions. To address these challenges, we introduce ColorBrowserAgent (Complex Long-Horizon Browser Agent), a knowledge-evolving agent for robust web automation. Our approach addresses these challenges through two synergistic mechanisms: human-in-the-loop knowledge adaptation that transforms sparse human feedback into reusable domain knowledge, and knowledge-aligned progressive summarization that stabilizes long interactions through memory compression. Extensive experiments on WebArena, WebChoreArena and industrial deployment show that ColorBrowserAgent consistently outperforms strong baselines. It achieves a state-of-the-art success rate of 71.2% on WebArena and maintains 47.4% performance under zero-shot transfer setting on WebChoreArena. In commercial deployment, it improves user satisfaction by 19.3% relatively, verifying its robustness in real-world scenarios.
[746] arXiv:2601.09056 (replaced) [pdf, html, other]: Title: StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching

Robert Dilworth

Comments: 16 pages, 6 figures, 1 table

Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Information Retrieval (cs.IR)

Stylometry--the identification of an author through analysis of a text's style (i.e., authorship attribution)--serves many constructive purposes: it supports copyright and plagiarism investigations, aids detection of harmful content, offers exploratory cues for certain medical conditions (e.g., early signs of dementia or depression), provides historical context for literary works, and helps uncover misinformation and disinformation. In contrast, when stylometry is employed as a tool for authorship verification--confirming whether a text truly originates from a claimed author--it can also be weaponized for malicious purposes. Techniques such as de-anonymization, re-identification, tracking, profiling, and downstream effects like censorship illustrate the privacy threats that stylometric analysis can enable. Building on these concerns, this paper further explores how adversarial stylometry combined with steganography can counteract stylometric analysis. We first present enhancements to our adversarial attack, $\textit{TraceTarnish}$, providing stronger evidence of its capacity to confound stylometric systems and reduce their attribution and verification accuracy. Next, we examine how steganographic embedding can be fine-tuned to mask an author's stylistic fingerprint, quantifying the level of authorship obfuscation achievable as a function of the proportion of words altered with zero-width Unicode characters. Based on our findings, steganographic coverage of 33% or higher seemingly ensures authorship obfuscation. Finally, we reflect on the ways stylometry can be used to undermine privacy and argue for the necessity of defensive tools like $\textit{TraceTarnish}$.
[747] arXiv:2601.09253 (replaced) [pdf, html, other]: Title: RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning

Zehua Liu, Shuqi Liu, Tao Zhong, Mingxuan Yuan

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

While Supervised Fine-Tuning (SFT) and Rejection Sampling Fine-Tuning (RFT) are standard for LLM alignment, they either rely on costly expert data or discard valuable negative samples, leading to data inefficiency. To address this, we propose Reward Informed Fine-Tuning (RIFT), a simple yet effective framework that utilizes all self-generated samples. Unlike the hard thresholding of RFT, RIFT repurposes negative trajectories, reweighting the loss with scalar rewards to learn from both the positive and negative trajectories from the model outputs. To overcome the training collapse caused by naive reward integration, where direct multiplication yields an unbounded loss, we introduce a stabilized loss formulation that ensures numerical robustness and optimization efficiency. Extensive experiments on mathematical benchmarks across various base models show that RIFT consistently outperforms RFT. Our results demonstrate that RIFT is a robust and data-efficient alternative for alignment using mixed-quality, self-generated data.
[748] arXiv:2601.09361 (replaced) [pdf, html, other]: Title: GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

Jiaying Zhang, Lei Shi, Jiguo Li, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He

Comments: Accepted at ACL 2026 Main

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement Learning with Verifiable Rewards (RLVR) is a key paradigm for improving large-scale reasoning models. Unlike supervised fine-tuning (SFT), RLVR exhibits distinct optimization dynamics and is sensitive to the preservation of pre-trained geometric structures. However, existing parameter-efficient methods face key limitations in this regime. Low-rank adaptation methods, such as PiSSA, are primarily designed for Supervised Fine-Tuning (SFT) and do not account for the distinct optimization dynamics and geometric structures of RLVR. Conversely, directly fine-tuning the unstructured sparse parameter subspace favored by RLVR encounters efficiency bottlenecks on modern hardware. To address these challenges, we propose GeoRA (Geometry-Aware Low-Rank Adaptation), a low-rank adaptation method tailored for RLVR. Specifically, GeoRA exploits the anisotropic and compressible structure of RL update subspace, and extracts its principal directions via Singular Value Decomposition (SVD) to initialize low-rank adapters, while freezing residual components as a structural anchor during training. This design preserves the pre-trained structure and enables efficient dense computation. Experiments on Qwen and Llama models from 1.5B to 32B parameters show that GeoRA consistently outperforms strong low-rank baselines across RLVR settings in mathematics, medicine, and coding, while showing stronger generalization and less forgetting on out-of-domain tasks.
[749] arXiv:2601.09926 (replaced) [pdf, html, other]: Title: PROPER Agents: Proactivity Driven Personalized Agents for Advancing Knowledge Gap Navigation

Kirandeep Kaur, Vinayak Gupta, Aditya Gupta, Chirag Shah

Comments: ACL 2026

Subjects: Machine Learning (cs.LG)

Current approaches to proactive assistance move beyond the ask-and-respond paradigm by anticipating user needs. In practice, they either burden users with clarifying questions or rely on context-based extrapolation, often leading to unnecessary or mistimed interventions. Such systems lack explicit mechanisms to model users' knowledge gaps, resulting in incomplete or suboptimal task outcomes. To address this, we propose PROPER, a framework that explicitly models user-specific knowledge gaps in a controlled manner. Central to our approach is the notion of dimensions: structured, task-relevant factors that define the considerations required for effective task completion. Given a user query, the DGA (Dimension Generating Agent) identifies explicit dimensions (from the user's query) and generates a set of candidate implicit dimensions capturing unarticulated aspects of the task. The RGA (Response Generating Agent) integrates both explicit and implicit dimensions selectively to produce personalized, context-aware, and proactively informative responses. We evaluate PROPER across multiple domains using a structured, gap-aware rubric that measures coverage, initiative appropriateness, and intent alignment. PROPER improves on quality scores and win rates across all domains, achieving up to 84% gains in single-turn evaluation and consistent dominance in multi-turn interactions. All code for PROPER is available at: this https URL.
[750] arXiv:2601.10003 (replaced) [pdf, html, other]: Title: SocraticKG: Knowledge Graph Construction via QA-Driven Fact Extraction

Sanghyeok Choi, Woosang Jeon, Kyuseok Yang, Taehyeong Kim

Subjects: Computation and Language (cs.CL)

Constructing Knowledge Graphs (KGs) from unstructured text provides a structured framework for knowledge representation and reasoning, yet current LLM-based approaches struggle with a fundamental trade-off: factual coverage often leads to relational fragmentation, while premature consolidation causes information loss. To address this, we propose SocraticKG, an automated KG construction method that introduces question-answer pairs as a structured intermediate representation to systematically unfold document-level semantics prior to triple extraction. By employing 5W1H-guided QA expansion, SocraticKG captures contextual dependencies and implicit relational links typically lost in direct KG extraction pipelines, providing explicit grounding in the source document that helps mitigate implicit reasoning errors. Evaluation on the MINE benchmark and HotpotQA downstream task demonstrates that our approach effectively addresses the coverage-connectivity trade-off, achieving superior factual retention and structural cohesion while supporting complex multi-hop reasoning.
[751] arXiv:2601.10863 (replaced) [pdf, html, other]: Title: Beyond Accuracy: A Stability-Aware Metric for Multi-Horizon Forecasting

Chutian Ma, Grigorii Pomazkin, Giacinto Paolo Saggese, Paul Smith

Subjects: Machine Learning (cs.LG)

Traditional time series forecasting methods optimize for accuracy alone. This objective neglects temporal consistency, in other words, how consistently a model predicts the same future event as the forecast origin changes. We introduce the forecast accuracy and coherence score (forecast AC score for short) for measuring the quality of probabilistic multi-horizon forecasts in a way that accounts for both multi-horizon accuracy and stability. Our score additionally allows user-specified weights to balance accuracy and consistency requirements. As an example application, we implement the score as a differentiable objective function for training seasonal auto-regressive integrated models and evaluate it on the M4 Hourly benchmark dataset. Results demonstrate consistent improvements over traditional maximum likelihood estimation. Regarding stability, the AC-optimized model generated out-of-sample forecasts with 15.8\% reduced variance over forecasts targeting the same timestamp. In terms of accuracy, the AC-optimized model achieved considerable improvements for medium-to-long-horizon forecasts. While one-step-ahead forecasts exhibited a 3.9\% increase in MSE, forecasts from horizon three onward experienced improved accuracy, with a peak improvement of approximately 6\% in MSE at horizons 9-12. These results indicate that our metric successfully trains models to produce more stable and accurate multi-step forecasts in exchange for a relatively small degradation in one-step-ahead performance.
[752] arXiv:2601.11044 (replaced) [pdf, html, other]: Title: AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

Keyu Li, Junhao Shi, Yang Xiao, Mohan Jiang, Jie Sun, Yunze Wu, Dayuan Fu, Shijie Xia, Xiaojie Cai, Tianze Xu, Weiye Si, Wenjie Li, Dequan Wang, Pengfei Liu

Comments: Accepted by ACL 2026 Main Conference

Subjects: Artificial Intelligence (cs.AI)

Large Language Models (LLMs) based autonomous agents demonstrate multifaceted capabilities to contribute substantially to economic production. However, existing benchmarks remain focused on single agentic capability, failing to capture long-horizon real-world scenarios. Moreover, the reliance on human-in-the-loop feedback for realistic tasks creates a scalability bottleneck, hindering automated rollout collection and evaluation. To bridge this gap, we introduce AgencyBench, a comprehensive benchmark derived from daily AI usage, evaluating 6 core agentic capabilities across 32 real-world scenarios, comprising 138 tasks with specific queries, deliverables, and rubrics. These scenarios require an average of 90 tool calls, 1 million tokens, and hours of execution time to resolve. To enable automated evaluation, we employ a user simulation agent to provide iterative feedback, and a Docker sandbox to conduct visual and functional rubric-based assessment. Experiments reveal that closed-source models significantly outperform open-source models (48.4% vs 32.1%). Further analysis reveals significant disparities across models in resource efficiency, feedback-driven self-correction, and specific tool-use preferences. Finally, we investigate the impact of agentic scaffolds, observing that proprietary models demonstrate superior performance within their native ecosystems (e.g., Claude-4.5-Opus via Claude-Agent-SDK), while open-source models exhibit distinct performance peaks, suggesting potential optimization for specific execution frameworks. AgencyBench serves as a critical testbed for next-generation agents, highlighting the necessity of co-optimizing model architecture with agentic frameworks. We believe this work sheds light on the future direction of autonomous agents, and we release the full benchmark and evaluation toolkit at this https URL.
[753] arXiv:2601.11194 (replaced) [pdf, html, other]: Title: ATATA: One Algorithm to Align Them All

Boyi Pang, Savva Ignatyev, Vladimir Ippolitov, Ramil Khafizov, Yurii Melnik, Oleg Voynov, Maksim Nakhodnov, Aibek Alanov, Xiaopeng Fan, Peter Wonka, Evgeny Burnaev

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We suggest a new multi-modal algorithm for joint inference of paired structurally aligned samples with Rectified Flow models. While some existing methods propose a codependent generation process, they do not view the problem of joint generation from a structural alignment perspective. Recent work uses Score Distillation Sampling to generate aligned 3D models, but SDS is known to be time-consuming, prone to mode collapse, and often provides cartoonish results. By contrast, our suggested approach relies on the joint transport of a segment in the sample space, yielding faster computation at inference time. Our approach can be built on top of an arbitrary Rectified Flow model operating on the structured latent space. We show the applicability of our method to the domains of image, video, and 3D shape generation using state-of-the-art baselines and evaluate it against both editing-based and joint inference-based competing approaches. We demonstrate a high degree of structural alignment for the sample pairs obtained with our method and a high visual quality of the samples. Our method improves the state-of-the-art for image and video generation pipelines. For 3D generation, it is able to show comparable quality while working orders of magnitude faster.
[754] arXiv:2601.12334 (replaced) [pdf, html, other]: Title: Worst-case Nonlinear Regression with Error Bounds

Alberto Bemporad

Comments: 23 pages, 7 figures

Subjects: Systems and Control (eess.SY)

We propose an active-learning method for nonlinear minimax regression. Given a nonlinear function that can be arbitrarily evaluated over a compact set, we fit a surrogate model, such as a feedforward neural network, by minimizing the maximum absolute approximation error. To handle the nonsmoothness of this worst-case loss, we introduce a smooth $L_\infty$ approximation that enables efficient gradient-based training. The training set is iteratively enriched by querying points of largest error via global optimization. We also derive constant and input-dependent worst-case error bounds over the entire input domain. The approach is validated on approximations of nonlinear functions and nonconvex sets, uncertain models of nonlinear dynamics, and explicit model predictive control laws. A Python library is available at this https URL.
[755] arXiv:2601.12944 (replaced) [pdf, html, other]: Title: Concavity of Tsallis Entropy and Tsallis Entropy Power along Heat Flow

Lukang Sun

Comments: 17

Subjects: Information Theory (cs.IT)

We study the evolution of Tsallis entropy along the heat flow and establish its concavity in arbitrary dimensions. Extending prior results that were restricted to the one-dimensional setting, we prove that the Tsallis entropy is concave in time for a nontrivial range of the entropic index $q$ in both the one-dimensional and higher-dimensional settings. The analysis is based on a nonlinear transformation, together with a novel estimate for the second-order time derivative of the entropy and a rigorous justification of the integration-by-parts identities required in the argument. Our approach is fully analytic and avoids the use of computer-assisted methods that have limited previous works in higher dimensions. As consequences, we recover a generalized de Bruijn identity, establish the monotonicity of the associated $q$-Fisher information along the heat flow, and derive concavity properties for the Tsallis entropy power, including asymptotic results under general initial conditions. In addition, our method yields a new functional inequality that may be of independent interest. These results contribute to the broader program of extending classical information-theoretic inequalities beyond the Shannon framework to non-additive entropy settings.
[756] arXiv:2601.13690 (replaced) [pdf, html, other]: Title: Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning

Yue Guo, Fanfu Wang, Jianwei Lv, Xincheng Shi, Yuchen Li, Youya Wang, Yunsheng Zeng, Yujing Liu, Yunhao Qiao, Gen Li, Junfeng Wang, Bo Yuan

Comments: Accepted to ACL 2026 Findings

Subjects: Computation and Language (cs.CL)

Clinical Decision Support Systems (CDSSs) provide reasoning and inquiry guidance for physicians, yet they face notable challenges, including high maintenance costs and low generalization capability. Recently, Large Language Models (LLMs) have been widely adopted in healthcare due to their extensive knowledge reserves, retrieval, and communication capabilities. While LLMs show promise and excel at medical benchmarks, their diagnostic reasoning and inquiry skills are constrained. To mitigate this issue, we propose (1) Clinical Diagnostic Reasoning Data (CDRD) structure to capture abstract clinical reasoning logic, and a pipeline for its construction, and (2) the Dr. Assistant, a clinical diagnostic model equipped with clinical reasoning and inquiry skills. Its training involves a two-stage process: SFT, followed by RL with a tailored reward function. We also introduce a benchmark to evaluate both diagnostic reasoning and inquiry. Our experiments demonstrate that the Dr. Assistant outperforms open-source models and achieves competitive performance to closed-source models, providing an effective solution for clinical diagnostic inquiry guidance. Project information can be found at: this https URL .
[757] arXiv:2601.13711 (replaced) [pdf, html, other]: Title: GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark

Lotta Kiefer, Christoph Leiter, Sotaro Takeshita, Elena Schmidt, Steffen Eger

Subjects: Computation and Language (cs.CL)

Authorship verification (AV) is the task of determining whether two texts were written by the same author and has been studied extensively, predominantly for English data. In contrast, large-scale benchmarks and systematic evaluations for other languages remain scarce. We address this gap by introducing GerAV, a comprehensive benchmark for German AV comprising over 400k labeled text pairs. GerAV is built from Twitter and Reddit data, with the Reddit part further divided into in-domain and cross-domain message-based subsets, as well as a profile-based subset. This design enables controlled analysis of the effects of data source, topical domain, and text length. Using the provided training splits, we conduct a systematic evaluation of strong baselines and state-of-the-art models and find that our best approach, a fine-tuned large language model, outperforms recent baselines by up to 0.09 absolute F1 score and surpasses GPT-5 in a zero-shot setting by 0.08. We further observe a trade-off between specialization and generalization: models trained on specific data types perform best under matching conditions but generalize less well across data regimes, a limitation that can be mitigated by combining training sources. Overall, GerAV provides a challenging and versatile benchmark for advancing research on German and cross-domain AV. Our code and information about data access are available on GitHub.
[758] arXiv:2601.15984 (replaced) [pdf, html, other]: Title: Partially Lazy Gradient Descent for Smoothed Online Learning

Naram Mhaisen, George Iosifidis

Comments: to appear in the proceedings of AISTATS 2026

Subjects: Machine Learning (cs.LG)

We introduce \textsc{$k$-lazyGD}, an online learning algorithm that bridges the gap between greedy Online Gradient Descent (OGD, for $k{=}1$) and lazy GD/dual-averaging (for $k{=}T$), creating a spectrum between reactive and stable updates. We analyze this spectrum in Smoothed Online Convex Optimization (SOCO), where the learner incurs both hitting and movement costs. Our main contribution is establishing that laziness is possible without sacrificing hitting performance: we prove that \textsc{$k$-lazyGD} achieves the optimal dynamic regret $\mathcal{O}(\sqrt{(P_T{+}1)T})$ for any laziness slack $k$ up to $\Theta(\sqrt{T/P_T})$, where $P_T$ is the comparator path length. This result formally connects the allowable laziness to the comparator's shifts, showing that \textsc{$k$-lazyGD} can retain the inherently small movements of lazy methods without compromising tracking ability. We base our analysis on the Follow the Regularized Leader (FTRL) framework, and derive a matching lower bound. Since the slack depends on $P_T$, an ensemble of learners with various slacks is used, yielding a method that is provably stable when it can be, and agile when it must be.
[759] arXiv:2601.16432 (replaced) [pdf, other]: Title: iPDB -- Optimizing Semantic SQL Queries

Udesh Kumarasinghe, Tyler Liu, Ahmed R. Mahmood, Chunwei Liu, Walid G. Aref

Subjects: Databases (cs.DB)

Structured Query Language (SQL) has remained the standard query language for databases. SQL is highly optimized for processing structured data laid out in relations. Meanwhile, in the present application development landscape, it is highly desirable to utilize the power of learned models to perform complex tasks. Large language models (LLMs) have been shown to understand and extract information from unstructured textual data. However, SQL as a query language and accompanying relational database systems are either incompatible or inefficient for workloads that require leveraging learned models. This results in complex engineering and multiple data migration operations that move data between the data sources and the model inference platform. In this paper, we present iPDB, a relational system that supports in-database machine learning (ML) and large language model (LLM) inferencing using extended SQL syntax. In iPDB, LLMs and ML calls can function as semantic projects, as predicates to perform semantic selects and semantic joins, or for semantic aggregations in group-by clauses. iPDB has a new relational predict operator along with semantic query optimizations that enable users to write and efficiently execute semantic SQL queries, outperforming other state-of-the-art systems by 2.5x mean speedup, with speedups of up to 30x.
[760] arXiv:2601.17747 (replaced) [pdf, html, other]: Title: Bridging Supervision Gaps: A Unified Framework for Remote Sensing Change Detection

Kaixuan Jiang, Chen Wu, Zhenghui Zhao, Chengxi Han, Haonan Guo, Hongruixuan Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Change detection (CD) aims to identify surface changes from multi-temporal remote sensing imagery. In real-world scenarios, Pixel-level change labels are expensive to acquire, and existing models struggle to adapt to scenarios with diverse annotation availability. To tackle this challenge, we propose a unified change detection framework (UniCD), which collaboratively handles supervised, weakly-supervised, and unsupervised tasks through a coupled architecture. UniCD eliminates architectural barriers through a shared encoder and multi-branch collaborative learning mechanism, achieving deep coupling of heterogeneous supervision signals. Specifically, UniCD consists of three supervision-specific branches. In the supervision branch, UniCD introduces the spatial-temporal awareness module (STAM), achieving efficient synergistic fusion of bi-temporal features. In the weakly-supervised branch, we construct change representation regularization (CRR), which steers model convergence from coarse-grained activations toward coherent and separable change modeling. In the unsupervised branch, we propose semantic prior-driven change inference (SPCI), which transforms unsupervised tasks into controlled weakly-supervised path optimization. Experiments on mainstream datasets demonstrate that UniCD achieves optimal performance across three tasks. It exhibits significant accuracy improvements in weakly and unsupervised scenarios, surpassing current state-of-the-art by 12.72% and 12.37% on LEVIR-CD, respectively.
[761] arXiv:2601.18491 (replaced) [pdf, html, other]: Title: AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Dongrui Liu, Qihan Ren, Chen Qian, Shuai Shao, Yuejin Xie, Yu Li, Zhonghao Yang, Haoyu Luo, Peng Wang, Qingyu Liu, Binxin Hu, Ling Tang, Jilin Mei, Dadi Guo, Leitao Yuan, Junyao Yang, Guanxu Chen, Qihao Lin, Yi Yu, Bo Zhang, Jiaxuan Guo, Jie Zhang, Wenqi Shao, Huiqi Deng, Zhiheng Xi, Wenjie Wang, Wenxuan Wang, Wen Shen, Zhikai Chen, Haoyu Xie, Jialing Tao, Juntao Dai, Jiaming Ji, Zhongjie Ba, Linfeng Zhang, Yong Liu, Quanshi Zhang, Lei Zhu, Zhihua Wei, Hui Xue, Chaochao Lu, Jing Shao, Xia Hu

Comments: 40 pages, 26 figures

Subjects: Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The rise of AI agents introduces complex safety and security challenges arising from autonomous tool use and environmental interactions. Current guardrail models lack agentic risk awareness and transparency in risk diagnosis. To introduce an agentic guardrail that covers complex and numerous risky behaviors, we first propose a unified three-dimensional taxonomy that orthogonally categorizes agentic risks by their source (where), failure mode (how), and consequence (what). Guided by this structured and hierarchical taxonomy, we introduce a new fine-grained agentic safety benchmark (ATBench) and a Diagnostic Guardrail framework for agent safety and security (AgentDoG). AgentDoG provides fine-grained and contextual monitoring across agent trajectories. More Crucially, AgentDoG can diagnose the root causes of unsafe actions and seemingly safe but unreasonable actions, offering provenance and transparency beyond binary labels to facilitate effective agent alignment. AgentDoG variants are available in three sizes (4B, 7B, and 8B parameters) across Qwen and Llama model families. Extensive experimental results demonstrate that AgentDoG achieves state-of-the-art performance in agentic safety moderation in diverse and complex interactive scenarios. All models and datasets are openly released.
[762] arXiv:2601.18622 (replaced) [pdf, html, other]: Title: Brazilian Social Media Anti-vaccine Information Disorder Dataset -- Telegram (2020-2025)

João Phillipe Cardenuto, Ana Carolina Monari, Michelle Diniz Lopes, Leopoldo Lusquino Filho, Anderson Rocha

Comments: 14 pages, 5 figures, 6 tables

Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)

Over the past decade, Brazil has experienced a decline in vaccination coverage, reversing decades of public health progress achieved through the National Immunization Program (PNI). Growing evidence points to the widespread circulation of vaccine-related misinformation -- particularly on social media platforms -- as a key factor driving this decline. Among these platforms, Telegram remains the only major platform permitting accessible and ethical data collection, offering insight into public channels where vaccine misinformation circulates extensively. This data paper introduces a curated dataset of about four million Telegram posts collected from 119 prominent Brazilian anti-vaccine channels between 2020 and 2025. The dataset includes message content, metadata, associated media, and classification related to vaccine posts, enabling researchers to examine how false or misleading information spreads, evolves, and influences public sentiment. By providing this resource, our aim is to support the scientific and public health community in developing evidence-based strategies to counter misinformation, promote trust in vaccination, and engage compassionately with individuals and communities affected by false narratives. The dataset and documentation are openly available for non-commercial research, under strict ethical and privacy guidelines at this https URL
[763] arXiv:2601.18672 (replaced) [pdf, html, other]: Title: A Dynamic Framework for Grid Adaptation in Kolmogorov-Arnold Networks

Spyros Rigas, Thanasis Papaioannou, Panagiotis Trakadas, Georgios Alexandridis

Comments: Accepted in IJCNN 2026

Subjects: Machine Learning (cs.LG)

Kolmogorov-Arnold Networks (KANs) have recently demonstrated promising potential in scientific machine learning, partly due to their capacity for grid adaptation during training. However, existing adaptation strategies rely solely on input data density, failing to account for the geometric complexity of the target function or metrics calculated during network training. In this work, we propose a generalized framework that treats knot allocation as a density estimation task governed by Importance Density Functions (IDFs), allowing training dynamics to determine grid resolution. We introduce a curvature-based adaptation strategy and evaluate it across synthetic function fitting, regression on a subset of the Feynman dataset and different instances of the Helmholtz PDE, demonstrating that it significantly outperforms the standard input-based baseline. Specifically, our method yields average relative error reductions of 25.3% on synthetic functions, 9.4% on the Feynman dataset, and 23.3% on the PDE benchmark. Statistical significance is confirmed via Wilcoxon signed-rank tests, establishing curvature-based adaptation as a robust and computationally efficient alternative for KAN training.
[764] arXiv:2601.18714 (replaced) [pdf, html, other]: Title: Low Cost, High Efficiency: LiDAR Place Recognition in Vineyards with Matryoshka Representation Learning

Judith Vilella-Cantos, Mauro Martini, Marcello Chiaberge, Mónica Ballesta, David Valiente

Journal-ref: Ecological Informatics, Volume 95, 2026, 103780, ISSN 1574-9541

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Localization in agricultural environments is challenging due to their unstructured nature and lack of distinctive landmarks. Although agricultural settings have been studied in the context of object classification and segmentation, the place recognition task for mobile robots is not trivial in the current state of the art. In this study, we propose MinkUNeXt-VINE, a lightweight, deep-learning-based method that surpasses state-of-the-art methods in vineyard environments thanks to its pre-processing and Matryoshka Representation Learning multi-loss approach. Our method prioritizes enhanced performance with low-cost, sparse LiDAR inputs and lower-dimensionality outputs to ensure high efficiency in real-time scenarios. Additionally, we present a comprehensive ablation study of the results on various evaluation cases and two extensive long-term vineyard datasets employing different LiDAR sensors. The results demonstrate the efficiency of the trade-off output produced by this approach, as well as its robust performance on low-cost and low-resolution input data. The code is publicly available for reproduction.
[765] arXiv:2601.20706 (replaced) [pdf, html, other]: Title: NPU Design for Diffusion Language Model Inference

Binglei Lou, Haoran Wu, Kevin Lau, Gregor MacDonald, Jiayi Nie, Yao Lai, Can Xiao, Xuan Guo, Jianyi Cheng, Rika Antonova, Robert Mullins, Aaron Zhao

Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Diffusion-based LLMs (dLLMs) fundamentally depart from traditional autoregressive (AR) LLM inference: they leverage bidirectional attention, block-wise KV cache refreshing, cross-step reuse, and a non-GEMM-centric sampling phase. These characteristics make current dLLMs incompatible with most existing NPUs, as their inference patterns, in particular the reduction-heavy, top-$k$-driven sampling stage, demand new ISA and memory hierarchy support beyond that of AR accelerators. In addition, the blocked diffusion KV cache breaks from the append-only paradigm assumed by AR NPUs, and conventional AR-derived KV quantization schemes were designed for static activation distributions and do not account for the step-wise distribution shifts introduced by iterative block-wise refinement in dLLMs.
In this paper, we introduce the first NPU accelerator specifically designed for dLLMs. It delivers: a dLLM-oriented ISA and compiler; a hardware-optimized execution model for both the transformer inference and diffusion sampling used in dLLMs; a novel Block-Adaptive Online Smoothing (BAOS) for quantizing KV cache in dLLMs; and a complete RTL implementation synthesized in 7nm. To evaluate and validate our design, we introduce a tri-path simulation framework that comprises analytical, cycle-accurate, and accuracy simulators, together with cross-validations against physical hardware. The full NPU stack, including ISA, simulation tools, and quantization software, will be open-sourced upon acceptance.
[766] arXiv:2601.20896 (replaced) [pdf, html, other]: Title: A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models

Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève

Comments: Accepted for publication in the 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Self-supervised learning (SSL) has transformed speech processing, yet its reliance on massive pre-training datasets remains a bottleneck. While robustness is often attributed to scale and diversity, the role of the data distribution is less understood. We systematically examine how curated subsets of pre-training data influence Automatic Speech Recognition (ASR) performance. Surprisingly, optimizing for acoustic, speaker, or linguistic diversity yields no clear improvements over random sampling. Instead, we find that prioritizing the longest utterances achieves superior ASR results while using only half the original dataset, reducing pre-training time by 24% on a large corpora. These findings suggest that for pre-training speech SSL models, data length is a more critical factor than either data diversity or overall data quantity for performance and efficiency, offering a new perspective for data selection strategies in SSL speech processing.
[767] arXiv:2601.22703 (replaced) [pdf, html, other]: Title: DAVIS: OOD Detection via Dominant Activations and Variance for Increased Separation

Abid Hassan, Tuan Ngo, Saad Shafiq, Nenad Medvidovic

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Detecting out-of-distribution (OOD) inputs is a critical safeguard for deploying machine learning models in the real world. However, most post-hoc detection methods operate on penultimate feature representations derived from global average pooling (GAP) -- a lossy operation that discards valuable distributional statistics from activation maps prior to global average pooling. We contend that these overlooked statistics, particularly channel-wise variance and dominant (maximum) activations, are highly discriminative for OOD detection. We introduce DAVIS, a simple and broadly applicable post-hoc technique that enriches feature vectors by incorporating these crucial statistics, directly addressing the information loss from GAP. Extensive evaluations show DAVIS sets a new benchmark across diverse architectures, including ResNet, DenseNet, and EfficientNet. It achieves significant reductions in the false positive rate (FPR95), with improvements of 48.26\% on CIFAR-10 using ResNet-18, 38.13\% on CIFAR-100 using ResNet-34, and 26.83\% on ImageNet-1k benchmarks using MobileNet-v2. Our analysis reveals the underlying mechanism for this improvement, providing a principled basis for moving beyond the mean in OOD detection.
[768] arXiv:2602.00208 (replaced) [pdf, html, other]: Title: Analyzing Shapley Additive Explanations to Understand Anomaly Detection Algorithm Behaviors and Their Complementarity

Jordan Levy, Paul Saves, Moncef Garouani, Nicolas Verstaevel, Benoit Gaudou

Comments: IDA Frontier Prize and Best Paper Award -Intelligent Data Analysis (IDA) 2026, Springer Nature

Journal-ref: In: IDA (LNCS), Springer, vol 16513 (2026)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Unsupervised anomaly detection is a challenging problem due to the diversity of data distributions and the lack of labels. Ensemble methods are often adopted to mitigate these challenges by combining multiple detectors, which can reduce individual biases and increase robustness. Yet building an ensemble that is genuinely complementary remains challenging, since many detectors rely on similar decision cues and end up producing redundant anomaly scores. As a result, the potential of ensemble learning is often limited by the difficulty of identifying models that truly capture different types of irregularities. To address this, we propose a methodology for characterizing anomaly detectors through their decision mechanisms. Using SHapley Additive exPlanations, we quantify how each model attributes importance to input features, and we use these attribution profiles to measure similarity between detectors. We show that detectors with similar explanations tend to produce correlated anomaly scores and identify largely overlapping anomalies. Conversely, explanation divergence reliably indicates complementary detection behavior. Our results demonstrate that explanation-driven metrics offer a different criterion than raw outputs for selecting models in an ensemble. However, we also demonstrate that diversity alone is insufficient; high individual model performance remains a prerequisite for effective ensembles. By explicitly targeting explanation diversity while maintaining model quality, we are able to construct ensembles that are more diverse, more complementary, and ultimately more effective for unsupervised anomaly detection.
[769] arXiv:2602.00469 (replaced) [pdf, html, other]: Title: Words that make SENSE: Sensorimotor Norms in Learned Lexical Token Representations

Abhinav Gupta, Toben H. Mintz, Jesse Thomason

Comments: 5 pages, 2 figures, codebase can be found at: this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

While word embeddings derive meaning from co-occurrence patterns, human language understanding is grounded in sensory and motor experience. We present $\text{SENSE}$ $(\textbf{S}\text{ensorimotor }$ $\textbf{E}\text{mbedding }$ $\textbf{N}\text{orm }$ $\textbf{S}\text{coring }$ $\textbf{E}\text{ngine})$, a learned projection model that predicts Lancaster sensorimotor norms from word lexical embeddings. We also conducted a behavioral study where 281 participants selected which among candidate nonce words evoked specific sensorimotor associations, finding statistically significant correlations between human selection rates and $\text{SENSE}$ ratings across 6 of the 11 modalities. Sublexical analysis of these nonce words selection rates revealed systematic phonosthemic patterns for the interoceptive norm, suggesting a path towards computationally proposing candidate phonosthemes from text data.
[770] arXiv:2602.00931 (replaced) [pdf, other]: Title: Continuous-Utility Direct Preference Optimization

Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Zihao He, Muhammad Usman Rafique, Asad Aali, Muhammad Ali Jamshed, John M. Cioffi, Emily Fox

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Large language model reasoning is often treated as a monolithic capability, relying on binary preference supervision that fails to capture partial progress or fine-grained reasoning quality. We introduce Continuous Utility Direct Preference Optimization (CU-DPO), a framework that aligns models to a portfolio of prompt-based cognitive strategies by replacing binary labels with continuous scores that capture fine-grained reasoning quality. We prove that learning with K strategies yields a Theta(K log K) improvement in sample complexity over binary preferences, and that DPO converges to the entropy-regularized utility-maximizing policy. To exploit this signal, we propose a two-stage training pipeline: (i) strategy selection, which optimizes the model to choose the best strategy for a given problem via best-vs-all comparisons, and (ii) execution refinement, which trains the model to correctly execute the selected strategy using margin-stratified pairs. On mathematical reasoning benchmarks, CU-DPO improves strategy selection accuracy from 35-46 percent to 68-78 percent across seven base models, yielding consistent downstream reasoning gains of up to 6.6 points on in-distribution datasets with effective transfer to out-of-distribution tasks.
[771] arXiv:2602.01493 (replaced) [pdf, html, other]: Title: OpInf-LLM: Parametric PDE Solving with LLMs via Operator Inference

Zhuoyuan Wang, Hanjiang Hu, Xiyu Deng, Saviz Mowlavi, Yorie Nakahira

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Solving diverse partial differential equations (PDEs) is fundamental in science and engineering. Large language models (LLMs) have demonstrated strong capabilities in code generation, symbolic reasoning, and tool use, but reliably solving PDEs across heterogeneous settings remains challenging. Prior work on LLM-based code generation and transformer-based foundation models for PDE learning has shown promising advances. However, a persistent trade-off between execution success rate and numerical accuracy arises, particularly when generalization to unseen parameters and boundary conditions is required. In this work, we propose OpInf-LLM, an LLM parametric PDE solving framework via operator inference. The proposed framework leverages small amounts of solution data to enable accurate prediction of diverse PDE instances, including unseen parameters and configurations, and provides seamless integration with LLMs for natural language task specification and physics-based reasoning of proper feature parameterization. Its low computational demands and unified solution pipeline further enable a high execution success rate across heterogeneous settings, opening new possibilities for generalizable reduced-order modeling in LLM-based PDE solving.
[772] arXiv:2602.02409 (replaced) [pdf, html, other]: Title: Catalyst: Out-of-Distribution Detection via Elastic Scaling

Abid Hassan, Tuan Ngo, Saad Shafiq, Nenad Medvidovic

Comments: Accepted at Conference on Computer Vision and Pattern Recognition (CVPR) 2026. arXiv admin note: text overlap with arXiv:2601.22703

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Out-of-distribution (OOD) detection is critical for the safe deployment of deep neural networks. State-of-the-art post-hoc methods typically derive OOD scores from the output logits or penultimate feature vector obtained via global average pooling (GAP). We contend that this exclusive reliance on the logit or feature vector discards a rich, complementary signal: the raw channel-wise statistics of the pre-pooling feature map lost in GAP. In this paper, we introduce Catalyst, a post-hoc framework that exploits these under-explored signals. Catalyst computes an input-dependent scaling factor ($\gamma$) on-the-fly from these raw statistics (e.g., mean, standard deviation, and maximum activation). This $\gamma$ is then fused with the existing baseline score, multiplicatively modulating it -- an $\textit{elastic scaling}$ -- to push the ID and OOD distributions further apart. We demonstrate Catalyst is a generalizable framework: it seamlessly integrates with logit-based methods (e.g., Energy, ReAct, SCALE) and also provides a significant boost to distance-based detectors like KNN. As a result, Catalyst achieves substantial and consistent performance gains, reducing the average False Positive Rate by 32.87 on CIFAR-10 (ResNet-18), 27.94% on CIFAR-100 (ResNet-18), and 22.25% on ImageNet (ResNet-50). Our results highlight the untapped potential of pre-pooling statistics and demonstrate that Catalyst is complementary to existing OOD detection approaches. Our code is available here: this https URL
[773] arXiv:2602.02866 (replaced) [pdf, html, other]: Title: Estimation of Cell-to-Cell Variation and State of Health for Battery Modules with Parallel-Connected Cells

Qinan Zhou, Jing Sun

Comments: Published the dataset; Addressed reviewer comments

Subjects: Systems and Control (eess.SY)

Estimating cell-to-cell variation (CtCV) and state of health (SoH) for battery modules composed of parallel-connected cells is challenging when only module-level signals are measurable and individual cell behaviors remain unobserved. Although progress has been made in SoH estimation, CtCV estimation remains unresolved in the literature. This paper proposes a unified framework that accurately estimates both CtCV and SoH for modules using only module-level information extracted from incremental capacity analysis (ICA) and differential voltage analysis (DVA). With the proposed framework, CtCV and SoH estimations can be decoupled into two separate tasks, allowing each to be solved with dedicated algorithms without mutual interference and providing greater design flexibility. The framework also exhibits strong versatility in accommodating different CtCV metrics, highlighting its general-purpose nature. Experimental validation on modules with three parallel-connected cells demonstrates that the proposed framework can systematically select optimal module-level features for CtCV and SoH estimations, deliver accurate CtCV and SoH estimates with high confidence and low computational complexity, remain effective across different C-rates, and be suitable for onboard implementation.
[774] arXiv:2602.03875 (replaced) [pdf, other]: Title: Reversible Deep Learning for 13C NMR in Chemoinformatics: On Structures and Spectra

Stefan Kuhn, Vandana Dwarka, Przemyslaw Karol Grenda, Eero Vainikko

Comments: 10 pages, 4 figures, 4 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

We introduce a reversible deep learning model for 13C NMR that uses a single conditional invertible neural network for both directions between molecular structures and spectra. The network is built from i-RevNet style bijective blocks, so the forward map and its inverse are available by construction. We train the model to predict a 128-bit binned spectrum code from a graph-based structure encoding, while the remaining latent dimensions capture residual variability. At inference time, we invert the same trained network to generate structure candidates from a spectrum code, which explicitly represents the one-to-many nature of spectrum-to-structure inference. On a filtered subset, the model is numerically invertible on trained examples, achieves spectrum-code prediction above chance, and produces coarse but meaningful structural signals when inverted on validation spectra. These results demonstrate that invertible architectures can unify spectrum prediction and uncertainty-aware candidate generation within one end-to-end model.
[775] arXiv:2602.08561 (replaced) [pdf, html, other]: Title: Automating Computational Reproducibility in Social Science: Comparing Prompt-Based and Agent-Based Approaches

Syed Mehtab Hussain Shah, Frank Hopfgartner, Arnim Bleier

Comments: 12 pages, 5 figures. Submitted to ACM conference

Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)

Reproducing computational research is often assumed to be as simple as rerunning the original code with provided data. In practice, missing packages, fragile file paths, version conflicts, or incomplete logic frequently cause analyses to fail, even when materials are shared. This study investigates whether large language models and AI agents can automate the diagnosis and repair of such failures, making computational results easier to reproduce and verify. We evaluate this using a controlled reproducibility testbed built from five fully reproducible R-based social science studies. Realistic failures were injected, ranging from simple issues to complex missing logic, and two automated repair workflows were tested in clean Docker environments. The first workflow is prompt-based, repeatedly querying language models with structured prompts of varying context, while the second uses agent-based systems that inspect files, modify code, and rerun analyses autonomously. Across prompt-based runs, reproduction success ranged from 31-79 percent, with performance strongly influenced by prompt context and error complexity. Complex cases benefited most from additional context. Agent-based workflows performed substantially better, with success rates of 69-96 percent across all complexity levels. These results suggest that automated workflows, especially agent-based systems, can significantly reduce manual effort and improve reproduction success across diverse error types. Unlike prior benchmarks, our testbed isolates post-publication repair under controlled failure modes, allowing direct comparison of prompt-based and agent-based approaches.
[776] arXiv:2602.11569 (replaced) [pdf, html, other]: Title: SemaPop: Semantic-Persona Conditioned and Controllable Population Synthesis

Zhenlin Qin, Yancheng Ling, Leizhen Wang, Francisco Câmara Pereira, Zhenliang Ma

Comments: Submitted to Transportation Research Part C: Emerging Technologies

Subjects: Artificial Intelligence (cs.AI)

Population synthesis is essential for individual-level simulation in transport planning and socio-economic analysis, yet remains challenging due to the need to capture both statistical dependencies and high-level behavioral semantics. Existing data-driven approaches predominantly rely on unconditional generation, limiting their ability to support scenario-driven or target-oriented population synthesis. This study proposes SemaPop, a semantic-conditioned and controllable population synthesis framework that introduces persona representations as conditioning signals for generation. By deriving persona text from survey data using large language models (LLMs) and encoding it into semantic embeddings, SemaPop enables controllable population generation under statistical constraints. We instantiate the framework using a GAN-based architecture with marginal regularization to preserve distributional consistency. Extensive experiments demonstrate that SemaPop substantially improves generative performance, yielding closer alignment with target marginal and joint distributions while maintaining sample-level feasibility and diversity under semantic conditioning. Counterfactual analyses further demonstrate that semantic interventions induce systematic and interpretable shifts in generated populations. These results highlight the potential of persona-based semantic conditioning for controllable and scenario-oriented population synthesis.
[777] arXiv:2602.11724 (replaced) [pdf, html, other]: Title: WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements

Xiwen Teoh, Yun Lin, Duc-Minh Nguyen, Ruofei Ren, Wenjie Zhang, Jin Song Dong

Subjects: Software Engineering (cs.SE)

Visual language model (VLM) agents show great promise in automating end-to-end (E2E) web testing against requirements in natural language. However, the probabilistic nature of language models can have inherent hallucinations. Therefore, given a detected inconsistency between the requirement and the web application, it is hard to distinguish whether it stems from the hallucination or a real application bug. Addressing this issue presents two core technical challenges: the implicit oracle inference challenge, where the agent must act as its own oracle to implicitly decide if the application's behavior is correct without guidance, and the probabilistic inference challenge, where an LLM's inconsistent reasoning undermines its trustworthiness as an oracle. Existing LLM-based approaches fail to capture such implicit oracles, either by treating any page navigation that doesn't crash as a success, or by checking each state in isolation, thus missing bugs dependent on context from prior steps.
We introduce WebTestPilot, an LLM-based agent designed to address these challenges. WebTestPilot uses (1) a symbolization layer which detects and symbolizes critical GUI elements on the web application into symbols (i.e., variables) and (2) translates natural language specification into a sequence of steps, each of which is equipped with inferred pre- and post-conditions over the symbols as an oracle. This oracle captures data, temporal, and causal dependencies, enabling the validation of implicit requirements. To advance research in this area, we build a benchmark of bug-injected web apps for evaluating NL-to-E2E testing. The results show that WebTestPilot achieves a task completion rate of 99%, with 96% precision and 96% recall in bug detection, outperforming the best baseline (+70 precision, +27 recall). The agent generalizes across diverse natural language inputs and model scales.
[778] arXiv:2602.11871 (replaced) [pdf, html, other]: Title: DMAP: A Distribution Map for Text

Tom Kempton, Julia Rozanova, Parameswaran Kamalaruban, Maeve Madigan, Karolina Wresilo, Yoann L. Launay, David Sutton, Stuart Burrell

Comments: ICLR 2026

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Large Language Models (LLMs) are a powerful tool for statistical text analysis, with derived sequences of next-token probability distributions offering a wealth of information. Extracting this signal typically relies on metrics such as perplexity, which do not adequately account for context; how one should interpret a given next-token probability is dependent on the number of reasonable choices encoded by the shape of the conditional distribution. In this work, we present DMAP, a mathematically grounded method that maps a text, via a language model, to a set of samples in the unit interval that jointly encode rank and probability information. This representation enables efficient, model-agnostic analysis and supports a range of applications. We illustrate its utility through three case studies: (i) validation of generation parameters to ensure data integrity, (ii) examining the role of probability curvature in machine-generated text detection, and (iii) a forensic analysis revealing statistical fingerprints left in downstream models that have been subject to post-training on synthetic data. Our results demonstrate that DMAP offers a unified statistical view of text that is simple to compute on consumer hardware, widely applicable, and provides a foundation for further research into text analysis with LLMs.
[779] arXiv:2602.13211 (replaced) [pdf, other]: Title: An Overlay Multicast Routing Method Based on Network Situational Awareness and Hierarchical Multi-Agent Reinforcement Learning

Miao Ye, Yanye Chen, Yong Wang, Cheng Zhu, Qiuxiang Jiang, Gai Huang, Feng Ding

Comments: 30page, 10 figures

Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

Compared with IP multicast, Overlay Multicast (OM) offers better compatibility and flexible deployment in heterogeneous, cross-domain networks. However, traditional OM struggles to adapt to dynamic traffic due to unawareness of physical resource states, and existing reinforcement learning methods fail to decouple OM's tightly coupled multi-objective nature, leading to high complexity, slow convergence, and instability. To address this, we propose MA-DHRL-OM, a multi-agent deep hierarchical reinforcement learning approach. Using SDN's global view, it builds a traffic-aware model for OM path planning. The method decomposes OM tree construction into two stages via hierarchical agents, reducing action space and improving convergence stability. Multi-agent collaboration balances multi-objective optimization while enhancing scalability and adaptability. Experiments show MA-DHRL-OM outperforms existing methods in delay, bandwidth utilization, and packet loss, with more stable convergence and flexible routing.
[780] arXiv:2602.14814 (replaced) [pdf, html, other]: Title: Learning State-Tracking from Code Using Linear RNNs

Julien Siems, Riccardo Grazzi, Kirill Kalinin, Hitesh Ballani, Babak Rahmani

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Over the last years, state-tracking tasks, particularly permutation composition, have become a testbed to understand the limits of sequence models architectures like Transformers and RNNs (linear and non-linear). However, these are often sequence-to-sequence tasks: learning to map actions (permutations) to states, which is incompatible with the next-token prediction setting commonly used to train language models. We address this gap by converting permutation composition into code via REPL traces that interleave state-reveals through prints and variable transformations. We show that linear RNNs capable of state-tracking excel also in this setting, while Transformers still fail. Motivated by this representation, we investigate why tracking states in code is generally difficult: actions are not always fully observable. We frame this as tracking the state of a probabilistic finite-state automaton with deterministic state reveals and show that linear RNNs can be worse than non-linear RNNs at tracking states in this setup.
[781] arXiv:2602.15566 (replaced) [pdf, html, other]: Title: Simultaneous Ordinal Maximin Share and Envy-Based Guarantees

Hannaneh Akrami, Timo Reichert

Subjects: Computer Science and Game Theory (cs.GT)

We study the fair allocation of indivisible goods among agents with additive valuations. The fair division literature has traditionally focused on two broad classes of fairness notions: envy-based notions and share-based notions. Within the share-based framework, most attention has been devoted to the maximin share (MMS) guarantee and its relaxations, while envy-based fairness has primarily centered on EFX and its relaxations. Recent work has shown the existence of allocations that simultaneously satisfy multiplicative approximate MMS and envy-based guarantees such as EF1 or EFX.
Motivated by this line of research, we study for the first time the compatibility between ordinal approximations of MMS and envy-based fairness notions. In particular, we establish the existence of allocations satisfying the following combined guarantees: (i) simultaneous $1$-out-of-$\lceil 3n/2 \rceil$ MMS and EFX for ordered instances; (ii) simultaneous $1$-out-of-$\lceil 3n/2 \rceil$ MMS and EF1 for top-$n$ instances; and (iii) simultaneous $1$-out-of-$4\lceil n/3 \rceil$ MMS and EF1 for ordered instances.
[782] arXiv:2602.16729 (replaced) [pdf, html, other]: Title: Intent Laundering: AI Safety Datasets Are Not What They Seem

Shahriar Golchin, Marc Wetter

Comments: v2 preprint: updated with more models and a new dataset

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

We systematically evaluate the quality of widely used adversarial safety datasets from two perspectives: in isolation and in practice. In isolation, we examine how well these datasets reflect real-world adversarial attacks based on three defining properties: being driven by ulterior intent, well-crafted, and out-of-distribution. We find that these datasets overrely on "triggering cues": words or phrases with overt negative/sensitive connotations that are intended to trigger safety mechanisms explicitly, which is unrealistic compared to real-world attacks. In practice, we evaluate whether these datasets genuinely measure safety risks or merely provoke refusals through triggering cues. To explore this, we introduce "intent laundering": a procedure that abstracts away triggering cues from adversarial attacks (data points) while strictly preserving their malicious intent and all relevant details. Our results show that current adversarial safety datasets fail to faithfully represent real-world adversarial behavior due to their overreliance on triggering cues. Once these cues are removed, all previously evaluated "reasonably safe" models become unsafe, including Gemini 3 Pro and Claude Sonnet 3.7/4. Moreover, when intent laundering is adapted as a jailbreaking technique, it consistently achieves high attack success rates, ranging from 90.00% to 100.00%, under fully black-box access. Overall, our findings expose a significant disconnect between how existing datasets evaluate model safety and how real-world adversaries behave.
[783] arXiv:2602.17955 (replaced) [pdf, html, other]: Title: Mining Type Constructs Using Patterns in AI-Generated Code

Imgyeong Lee, Tayyib Ul Hassan, Abram Hindle

Subjects: Software Engineering (cs.SE)

Artificial Intelligence (AI) increasingly automates various parts of the software development tasks. Although AI has enhanced the productivity of development tasks, it remains unstudied whether AI essentially outperforms humans in type-related programming tasks, such as employing type constructs properly for type safety, during its tasks. Moreover, there is no systematic study that evaluates whether AI agents overuse or misuse the type constructs under the complicated type systems to the same extent as humans. In this study, we present the first empirical analysis to answer these questions in the domain of TypeScript projects. Our findings show that, in contrast to humans, AI agents are 9x more prone to use the 'any' keyword. In addition, we observed that AI agents use advanced type constructs, including those that ignore type checks, more often compared to humans. Surprisingly, even with all these issues, Agentic pull requests (PRs) have 1.8x higher acceptance rates compared to humans for TypeScript. We encourage software developers to carefully confirm the type safety of their codebases whenever they coordinate with AI agents in the development process.
[784] arXiv:2602.18792 (replaced) [pdf, html, other]: Title: MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Changlu Guo, Anders Nymark Christensen, Anders Bjorholm Dahl, Morten Rieger Hannemose

Comments: Accepted by CVPR2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, yet effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, performs inference over 30x faster than the baseline and achieves comparable or state-of-the-art performance across five benchmark datasets spanning diverse visual domains, establishing a practical and generalizable solution for efficient counterfactual explanation.
[785] arXiv:2602.19208 (replaced) [pdf, html, other]: Title: How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization

Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi, Chaowen Hu, Lu Pan, Ke Zeng, Xunliang Cai

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for Large Language Model (LLM) reasoning, yet current methods face key challenges in resource allocation and policy optimization dynamics: (i) uniform rollout allocation ignores gradient variance heterogeneity across problems, and (ii) the softmax policy structure causes gradient attenuation for high-confidence correct actions, while excessive gradient updates may destabilize training. Therefore, we propose DynaMO, a theoretically-grounded dual-pronged optimization framework. At the sequence level, we prove that uniform allocation is suboptimal and derive variance-minimizing allocation from the first principle, establishing Bernoulli variance as a computable proxy for gradient informativeness. At the token level, we develop gradient-aware advantage modulation grounded in theoretical analysis of gradient magnitude bounds. Our framework compensates for gradient attenuation of high-confidence correct actions while utilizing entropy changes as computable indicators to stabilize excessive update magnitudes. Extensive experiments conducted on a diverse range of mathematical reasoning benchmarks demonstrate consistent improvements over strong RLVR baselines. Our implementation is available at: this https URL.
[786] arXiv:2602.23408 (replaced) [pdf, html, other]: Title: Demystifying Action Space Design for Robotic Manipulation Policies

Yuchun Feng, Jinliang Zheng, Zhihao Wang, Dongxiu Liu, Jianxiong Li, Jiangmiao Pang, Tai Wang, Xianyuan Zhan

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

The specification of the action space plays a pivotal role in imitation-based robotic manipulation policy learning, fundamentally shaping the optimization landscape of policy learning. While recent advances have focused heavily on scaling training data and model capacity, the choice of action space remains guided by ad-hoc heuristics or legacy designs, leading to an ambiguous understanding of robotic policy design philosophies. To address this ambiguity, we conducted a large-scale and systematic empirical study, confirming that the action space does have significant and complex impacts on robotic policy learning. We dissect the action design space along temporal and spatial axes, facilitating a structured analysis of how these choices govern both policy learnability and control stability. Based on 13,000+ real-world rollouts on a bimanual robot and evaluation on 500+ trained models over four scenarios, we examine the trade-offs between absolute vs. delta representations, and joint-space vs. task-space parameterizations. Our large-scale results suggest that properly designing the policy to predict delta actions consistently improves performance, while joint-space and task-space representations offer complementary strengths, favoring control stability and generalization, respectively.
[787] arXiv:2603.00110 (replaced) [pdf, html, other]: Title: Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation

Zijian Song, Qichang Li, Sihan Qin, Yuhao Chen, Tianshui Chen, Liang Lin, Guangrun Wang

Comments: 11 pages, 6 figures. arXiv admin note: text overlap with arXiv:2508.09822

Subjects: Robotics (cs.RO)

The scarcity of large-scale robotic data has motivated the repurposing of foundation models from other modalities for policy learning. In this work, we introduce PhysGen (Learning Physics from Pretrained Video Generation Models), a scalable continuous and sequential world interaction framework that leverages autoregressive video generation to solve robotic manipulation tasks. By treating the pretrained video model as a proxy for a physics simulator, PhysGen models the dynamic interplay between the external environment and robot actions. We introduce a multimodal continuous representation that unifies video and action into shared physical tokens, bridging the gap between discrete video generation and continuous robotic control. This approach enables the seamless transfer of implicit physical knowledge-such as object permanence and dynamics-from video pretraining to downstream this http URL ensure efficient convergence, we incorporate causal masking, inverse kinematics, Lookahead Multi-Token Prediction (L-MTP), and key-value (KV) caching. Experimental results on the Libero and ManiSkill benchmarks demonstrate that PhysGen consistently outperforms robust baselines, surpassing OpenVLA and WorldVLA by margins of 13.8% and 8.8%, respectively. Notably, in real-world scenarios, PhysGen matches the performance of large-scale action-pretrained models like $\pi_0$ without requiring prior action-specific pretraining, demonstrating superior capability in physically complex tasks such as grasping transparent objects. These findings validate the potential of extracting physical intuition from pretrained video generators to facilitate generalizable robotic manipulation.
[788] arXiv:2603.01170 (replaced) [pdf, other]: Title: ATLAS: AI-Assisted Threat-to-Assertion Learning for System-on-Chip Security Verification

Ishraq Tashdid, Kimia Tasnia, Alexander Garcia, Jonathan Valamehr, Sazadur Rahman

Comments: Accepted at the 63rd Design Automation Conference (DAC 2026), Long Beach, CA, USA (July, 2026)

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

This work presents ATLAS, an LLM-driven framework that bridges standardized threat modeling and property-based formal verification for System-on-Chip (SoC) security. Starting from vulnerability knowledge bases such as Common Weakness Enumeration (CWE), ATLAS identifies SoC-specific assets, maps relevant weaknesses, and generates assertion-based security properties and JasperGold scripts for verification. By combining asset-centric analysis with standardized threat model templates and multi-source SoC context, ATLAS automates the transformation from vulnerability reasoning to formal proof. Evaluated on three HACK@DAC benchmarks, ATLAS detected 39/48 CWEs and generated correct properties for 33 of those bugs, advancing automated, knowledge-driven SoC security verification toward a secure-by-design paradigm.
[789] arXiv:2603.07101 (replaced) [pdf, html, other]: Title: Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

Hugh Xuechen Liu, Kıvanç Tatar

Subjects: Artificial Intelligence (cs.AI)

Creatively translating complex gameplay ideas into executable artifacts (e.g., games as Unity projects and code) remains a central challenge in computational game creativity. Gameplay design patterns provide a structured representation for describing gameplay phenomena, enabling designers to decompose high-level ideas into entities, constraints, and rule-driven dynamics. Among them, goal patterns formalize common player-objective relationships. Goal Playable Concepts (GPCs) operationalize these abstractions as playable Unity engine implementations, supporting experiential exploration and compositional gameplay design. We frame scalable playable pattern realization as a problem of constrained executable creative synthesis: generated artifacts must satisfy Unity's syntactic and architectural requirements while preserving the semantic gameplay meanings encoded in goal patterns. This dual constraint limits scalability. Therefore, we investigate whether contemporary large language models (LLMs) can perform such synthesis under engine-level structural constraints and generate Unity code (as games) structured and conditioned by goal playable patterns. Using 26 goal pattern instantiations, we compare a direct generation baseline (natural language -> C# -> Unity) with pipelines conditioned on a human-authored Unity-specific intermediate representation (IR), across three IR configurations and two open-source models (DeepSeek-Coder-V2-Lite-Instruct and Qwen2.5-Coder-7B-Instruct). Compilation success is evaluated via automated Unity replay. We propose grounding and hygiene failure modes, identifying structural and project-level grounding as primary bottlenecks.
[790] arXiv:2603.07819 (replaced) [pdf, html, other]: Title: Fusion Complexity Inversion: Why Simpler Cross View Modules Outperform SSMs and Cross View Attention Transformers for Pasture Biomass Regression

Mridankan Mandal

Comments: Accepted to CVPR: Vision for Agriculture Workshop 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Accurate estimation of pasture biomass from agricultural imagery is critical for sustainable livestock management, yet existing methods are limited by the small, imbalanced, and sparsely annotated datasets typical of real world monitoring. In this study, adaptation of vision foundation models to agricultural regression is systematically evaluated on the CSIRO Pasture Biomass benchmark, a 357 image dual view dataset with laboratory validated, component wise ground truth for five biomass targets, through 17 configurations spanning four backbones (EfficientNet-B3 to DINOv3-ViT-L), five cross view fusion mechanisms, and a 4x2 metadata factorial. A counterintuitive principle, termed "fusion complexity inversion", is uncovered: on scarce agricultural data, a two layer gated depthwise convolution (R^2 = 0.903) outperforms cross view attention transformers (0.833), bidirectional SSMs (0.819), and full Mamba (0.793, below the no fusion baseline). Backbone pretraining scale is found to monotonically dominate all architectural choices, with the DINOv2 -> DINOv3 upgrade alone yielding +5.0 R^2 points. Training only metadata (species, state, and NDVI) is shown to create a universal ceiling at R^2 ~ 0.829, collapsing an 8.4 point fusion spread to 0.1 points. Actionable guidelines for sparse agricultural benchmarks are established: backbone quality should be prioritized over fusion complexity, local modules preferred over global alternatives, and features unavailable at inference excluded.
[791] arXiv:2603.07961 (replaced) [pdf, html, other]: Title: SGG-R$^{\rm 3}$: From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation

Jiaye Feng, Qixiang Yin, Yuankun Liu, Tong Mo, Weiping Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Scene Graph Generation (SGG) structures visual scenes as graphs of objects and their relations. While Multimodal Large Language Models (MLLMs) have advanced end-to-end SGG, current methods are hindered by both a lack of task-specific structured reasoning and the challenges of sparse, long-tailed relation distributions, resulting in incomplete scene graphs characterized by low recall and biased predictions. To address these issues, we introduce SGG-R$^{\rm 3}$, a structured reasoning framework that integrates task-specific chain-of-thought (CoT)-guided supervised fine-tuning (SFT) and reinforcement learning (RL) with group sequence policy optimization (GSPO), designed to engage in three sequential stages to achieve end-to-end unbiased scene graph generation. During the SFT phase, we propose a relation augmentation strategy by leveraging an MLLM and refined via embedding similarity filtering to alleviate relation sparsity. Subsequently, a stage-aligned reward scheme optimizes the procedural reasoning during RL. Specifically, we propose a novel dual-granularity reward which integrates fine-grained and coarse-grained relation rewards, simultaneously mitigating the long-tail issue via frequency-based adaptive weighting of predicates and improving relation coverage through semantic clustering. Experiments on two benchmarks show that SGG-R$^{\rm 3}$ achieves superior performance compared to existing methods, demonstrating the effectiveness and generalization of the framework.
[792] arXiv:2603.12845 (replaced) [pdf, html, other]: Title: Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation

Fei Wang, Xinye Zheng, Kun Li, Yanyan Wei, Yuxin Liu, Ganpeng Hu, Tong Bao, Jingwen Yang

Comments: Accepted by CVPR 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Predicting enzyme kinetic parameters quantifies how efficiently an enzyme catalyzes a specific substrate under defined biochemical conditions. Canonical parameters such as the turnover number ($k_\text{cat}$), Michaelis constant ($K_\text{m}$), and inhibition constant ($K_\text{i}$) depend jointly on the enzyme sequence, the substrate chemistry, and the conformational adaptation of the active site during binding. Many learning pipelines simplify this process to a static compatibility problem between the enzyme and substrate, fusing their representations through shallow operations and regressing a single value. Such formulations overlook the staged nature of catalysis, which involves both substrate recognition and conformational adaptation. In this regard, we reformulate kinetic prediction as a staged multimodal conditional modeling problem and introduce the Enzyme-Reaction Bridging Adapter (ERBA), which injects cross-modal information via fine-tuning into Protein Language Models (PLMs) while preserving their biochemical priors. ERBA performs conditioning in two stages: Molecular Recognition Cross-Attention (MRCA) first injects substrate information into the enzyme representation to capture specificity; Geometry-aware Mixture-of-Experts (G-MoE) then integrates active-site structure and routes samples to pocket-specialized experts to reflect induced fit. To maintain semantic fidelity, Enzyme-Substrate Distribution Alignment (ESDA) enforces distributional consistency within the PLM manifold in a reproducing kernel Hilbert space. Experiments across three kinetic endpoints and multiple PLM backbones, ERBA delivers consistent gains and stronger out-of-distribution performance compared with sequence-only and shallow-fusion baselines, offering a biologically grounded route to scalable kinetic prediction and a foundation for adding cofactors, mutations, and time-resolved structural cues.
[793] arXiv:2603.16797 (replaced) [pdf, html, other]: Title: Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling

Christian Belardi, Justin Lovelace, Kilian Q. Weinberger, Carla P. Gomes

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Guided diffusion sampling relies on approximating often intractable likelihood scores, which introduces significant noise into the sampling dynamics. We propose using adaptive moment estimation to stabilize these noisy likelihood scores during sampling. Despite its simplicity, our approach achieves state-of-the-art results on image restoration and class-conditional generation tasks, outperforming more complicated methods, which are often computationally more expensive. We provide empirical analysis of our method on both synthetic and real data, demonstrating that mitigating gradient noise through adaptive moments offers an effective way to improve alignment.
[794] arXiv:2603.18677 (replaced) [pdf, html, other]: Title: Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework

Eduardo Di Santi

Comments: 20 pages, 2 figures, 4 result tables. Mathematical framework for human-AI collaboration, cognitive amplification, cognitive delegation, and cognitive sustainability, simulation and optimisation

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Artificial intelligence is increasingly embedded in human decision making. In some cases, it enhances human reasoning. In others, it fosters excessive cognitive dependence. This paper introduces a conceptual and mathematical framework to distinguish cognitive amplification, where AI improves hybrid human AI performance while preserving human expertise, from cognitive delegation, where reasoning is progressively outsourced to the AI system, risking long term atrophy of human capabilities.
We define four operational metrics: the Cognitive Amplification Index, or CAI star, which measures collaborative gain beyond the best standalone agent; the Dependency Ratio, or D, and Human Reliance Index, or HRI, which quantify the structural dominance of the AI within the hybrid output; and the Human Cognitive Drift Rate, or HCDR, which captures the temporal erosion or maintenance of autonomous human performance. Together, these quantities characterize human AI systems in terms of both immediate hybrid performance and long term cognitive sustainability.
We validate the framework through an agent based simulation in NetLogo across three reliance regimes and multiple dependency and atrophy configurations. The results distinguish degenerate AI dominated delegation, human preserving but weakly competitive interaction, and intermediate boundary regimes that approach the AI baseline while remaining structurally dependent. Across all tested configurations, no regime achieves genuine amplification.
A constrained optimization over the atrophy parameter shows that reducing atrophy improves retained human capability, collaborative gain, and dependency structure, but even zero atrophy does not yield positive collaborative gain. The framework therefore provides a practical tool for evaluating whether human AI systems perform well in a way that also preserves human capability over time.
[795] arXiv:2603.18740 (replaced) [pdf, html, other]: Title: Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review

Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos, Diomidis Spinellis

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Automated Code Review (ACR) systems integrating Large Language Models (LLMs) are increasingly adopted in software development workflows, ranging from interactive assistants to autonomous agents in CI/CD pipelines. In this paper, we study how LLM-based vulnerability detection in ACR is affected by the framing effect: the tendency to let the presentation of information override its semantic content in forming judgments. We examine whether adversaries can exploit this through contextual-bias injection: crafting PR metadata to bias ACR security judgments as a supply-chain attack vector against real-world ACR pipelines.
To this end, we first conduct a large-scale exploratory study across 6 LLMs under five framing conditions, establishing the framing effect as a systematic and widespread phenomenon in LLM-based vulnerability detection, with bug-free framing producing the strongest effect. We then design a realistic and controlled experimental environment, evaluating 17 CVEs across 10 real-world projects, to assess the susceptibility of real-world ACR pipelines to vulnerability reintroduction attacks. We employ two attack strategies: a template-based attack inspired by prior related work, and a novel LLM-assisted iterative refinement attack.
We find that template-based attacks are ineffective and may even backfire, as direct biasing attempts raise suspicions. Our iterative refinement attack, on the other hand, achieves 100% success, exploiting a fundamental asymmetry: attackers can iteratively refine attacks against a local clone of the review pipeline, while defenders have only one chance to detect them. Debiasing via metadata redaction and explicit instructions restores detection in all affected cases. Overall, our findings highlight the dangers of over-relying on ACR and stress the importance of human oversight and contributor trust in the development process.
[796] arXiv:2603.21443 (replaced) [pdf, html, other]: Title: Decidability of Livelock Detection for Parameterized Self-Disabling Unidirectional Rings

Aly Farahat

Comments: Significant revision of the core result, where now bounded witness is established on the product graph. Algorithm updated to run the greatest fixed-point recursion on the new product graph construction. Now pointing to exhaustive test results, with updated code

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)

We revisit the decidability of livelock detection in self-disabling unidirectional ring protocols, a problem shown to be undecidable in general by Klinkhamer and Ebnenasir via reduction from the periodic domino problem. Despite this, practical protocols routinely admit finite proofs of livelock freedom via the same tiling constraints, and synthesis of parameterized self-stabilizing rings has been shown to be decidable -- suggesting a gap between theory and practice.
We identify the source of this gap: the apparent unboundedness of livelock reasoning is an artifact of working in the transition space. By lifting to an \emph{equivariant product space} -- the space of transition-witness pairs coupled by the zigzag equivariance conditions of Farahat -- we show that self-disabling induces a structure in which closure and period are preserved under backward propagation. This yields a bounded witness property: every livelock, constructed from a set of local transitions $T$, admits a representative as a local cycle of length at most $|T|^2$ in a finite product graph, independent of the ring size.
We derive a sound and complete decision procedure via greatest fixed-point iteration on the product graph. Our results demonstrate that decidability emerges not by restricting the problem syntactically, but by exposing its underlying finite combinatorial structure. We validate on over 4,300 protocols with zero errors, extend to $(1,1)$-asymmetric protocols, and derive a circulation law classifying livelocks by ring size. Code and algebraic foundation are at the URL this https URL.
[797] arXiv:2603.21697 (replaced) [pdf, html, other]: Title: Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee

Comments: Code released at: this https URL

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template jailbreaks that embed harmful goals inside simple three-panel visual narratives and prompt the model to role-play and "complete the comic." Building on JailbreakBench and JailbreakV, we introduce ComicJailbreak, a comic-based jailbreak benchmark with 1,167 attack instances spanning 10 harm categories and 5 task setups. Across 15 state-of-the-art MLLMs (six commercial and nine open-source), comic-based attacks achieve success rates comparable to strong rule-based jailbreaks and substantially outperform plain-text and random-image baselines, with ensemble success rates exceeding 90% on several commercial models. Then, with the existing defense methodologies, we show that these methods are effective against the harmful comics, they will induce a high refusal rate when prompted with benign prompts. Finally, using automatic judging and targeted human evaluation, we show that current safety evaluators can be unreliable on sensitive but non-harmful content. Our findings highlight the need for safety alignment robust to narrative-driven multimodal jailbreaks.
[798] arXiv:2603.22823 (replaced) [pdf, other]: Title: Empirical Comparison of Agent Communication Protocols for Task Orchestration

Ivan Dobrovolskyi

Subjects: Artificial Intelligence (cs.AI)

Context. The problem of comparative evaluation of communication protocols for task orchestration by large language model (LLM) agents is considered. The object of study is the process of interaction between LLM agents and external tools, as well as between autonomous LLM agents, during task orchestration. Objective. The goal of this work is to develop a systematic pilot benchmark comparing tool integration, multi-agent dele-gation, and hybrid architectures for standardized queries at three levels of complexity, and to quantify the advantages and disadvantages in terms of response time, context window consumption, cost, error recovery, and implementation complexity.
[799] arXiv:2603.24111 (replaced) [pdf, other]: Title: Toward a Multi-Layer ML-Based Security Framework for Industrial IoT

Aymen Bouferroum, Valeria Loscri, Abderrahim Benslimane (LIA)

Journal-ref: RESSI 2026, May 2026, Clervaux, Luxembourg

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

The Industrial Internet of Things (IIoT) introduces significant security challenges as resource-constrained devices become increasingly integrated into critical industrial processes. Existing security approaches typically address threats at a single network layer, often relying on expensive hardware and remaining confined to simulation environments. In this paper, we present the research framework and contributions of our doctoral thesis, which aims to develop a lightweight, Machine Learning (ML)-based security framework for IIoT environments. We first describe our adoption of the Tm-IIoT trust model and the Hybrid IIoT (H-IIoT) architecture as foundational baselines, then introduce the Trust Convergence Acceleration (TCA) approach, our primary contribution that integrates ML to predict and mitigate the impact of degraded network conditions on trust convergence, achieving up to a 28.6% reduction in convergence time while maintaining robustness against adversarial behaviors. We then propose a real-world deployment architecture based on affordable, open-source hardware, designed to implement and extend the security framework. Finally, we outline our ongoing research toward multi-layer attack detection, including physical-layer threat identification and considerations for robustness against adversarial ML attacks.
[800] arXiv:2603.26791 (replaced) [pdf, html, other]: Title: Crystal: Characterizing Relative Impact of Scholarly Publications

Hannah Collison, Benjamin Van Durme, Daniel Khashabi

Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

Assessing a cited paper's impact is typically done by analyzing its citation context in isolation within the citing paper. While this focuses on the most directly relevant text, it prevents relative comparisons across all the works a paper cites. We propose Crystal, which instead jointly ranks all cited papers within a citing paper using large language models (LLMs). To mitigate LLMs' positional bias, we rank each list three times in a randomized order and aggregate the impact labels through majority voting. This joint approach leverages the full citation context, rather than evaluating citations independently, to more reliably distinguish impactful references. Crystal outperforms a prior state-of-the-art impact classifier by +9.5% accuracy and +8.3% F1 on a dataset of human-annotated citations. Crystal further gains efficiency through fewer LLM calls and performs competitively with an open-source model, enabling scalable, cost-effective citation impact analysis. We release our rankings, impact labels, and codebase to support future research.
[801] arXiv:2603.27112 (replaced) [pdf, html, other]: Title: RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation

Sen Zhang, Runmei Li, Shizhuang Deng, Zhichao Zheng, Yuhe Zhang, Jiani Li, Kailun Zhang, Tao Zhang, Wenjun Wu, Qunbo Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

As Automatic Train Operation (ATO) advances toward GoA4 and beyond, it increasingly depends on efficient, reliable cab-view visual perception and decision-oriented inference to ensure safe operation in complex and dynamic railway environments. However, existing approaches focus primarily on basic perception and often generalize poorly to rare yet safety-critical corner cases. They also lack the high-level reasoning and planning capabilities required for operational decision-making. Although recent Large Multi-modal Models (LMMs) show strong generalization and cognitive capabilities, their use in safety-critical ATO is hindered by high computational cost and hallucination risk. Meanwhile, reliable domain-specific benchmarks for systematically evaluating cognitive capabilities are still lacking. To address these gaps, we introduce RailVQA-bench, the first VQA benchmark for cab-view visual cognition in ATO, comprising 20,000 single-frame and 1,168 video based QA pairs to evaluate cognitive generalization and interpretability in both static and dynamic scenarios. Furthermore, we propose RailVQA-CoM, a collaborative large-small model framework that combines small-model efficiency with large-model cognition via a transparent three-module architecture and adaptive temporal sampling, improving perceptual generalization and enabling more efficient reasoning and planning. Experiments demonstrate that the proposed approach substantially improves performance, enhances interpretability, improves efficiency, and strengthens cross-domain generalization in autonomous driving systems. Code and datasets will be available at this https URL.
[802] arXiv:2603.27406 (replaced) [pdf, html, other]: Title: On the Relationship between Bayesian Networks and Probabilistic Structural Causal Models

Peter J.F. Lucas, Eleonora Zullo, Fabio Stella

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, the relationship between probabilistic graphical models, in particular Bayesian networks, and causal diagrams, also called structural causal models, is studied. Structural causal models are deterministic models, based on structural equations or functions, that can be provided with uncertainty by adding independent, unobserved random variables to the models, equipped with probability distributions. One question that arises is whether a Bayesian network that has obtained from expert knowledge or learnt from data can be mapped to a probabilistic structural causal model, and whether or not this has consequences for the network structure and probability distribution. We show that linear algebra and linear programming offer key methods for the transformation, and examine properties for the existence and uniqueness of solutions based on dimensions of the probabilistic structural model. Finally, we examine in what way the semantics of the models is affected by this transformation.
Keywords: Causality, probabilistic structural causal models, Bayesian networks, linear algebra, experimental software.
[803] arXiv:2603.27427 (replaced) [pdf, html, other]: Title: Dissipativity-Based Distributed Control and Communication Topology Co-Design for Nonlinear DC Microgrids

Mohammad Javad Najafirad, Shirantha Welikala

Comments: arXiv admin note: text overlap with arXiv:2503.21042, arXiv:2503.04908

Subjects: Systems and Control (eess.SY)

This paper presents a dissipativity-based distributed droop-free control and communication topology co-design framework for voltage regulation and current sharing in DC microgrids (MGs), where constant-power loads (CPLs) and voltage-source converter (VSC) input saturation introduce significant nonlinearities. In particular, CPLs introduce an inherently destabilizing nonlinearity, while VSC input saturation imposes hard amplitude constraints on applicable control input at each distributed generator (DG), collectively making the DC MG control system design extremely challenging. To this end, the DC MG is modeled as a networked system of DGs, transmission lines, and loads coupled through a static interconnection matrix. Each DG is equipped with a local PI-based controller with an anti-windup compensator and a distributed consensus-based global controller, from which a nonlinear networked error dynamics model is derived. The CPL nonlinearity is characterized via sector-boundedness with the S-procedure applied directly to yield tight LMI conditions, while the VSC input saturation is handled via a dead-zone decomposition and sector-boundedness, with both nonlinearities simultaneously absorbed into the dissipativity analysis. Both nonlinearities are simultaneously absorbed into the dissipativity analysis using the S-procedure. Subsequently, local controller gains and passivity indices, and distributed controller gains and the communication topology are co-designed by solving a sequence of local and global Linear Matrix Inequality (LMI) problems, enabling a one-shot co-design process that avoids iterative procedures. The effectiveness of the proposed framework is validated through simulation of an islanded DC MG under multiple operating scenarios, demonstrating robust performance superior to conventional control approaches.
[804] arXiv:2603.27820 (replaced) [pdf, html, other]: Title: Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning

Zhiwen You, Xi Chen, Aniket Vashishtha, Simo Du, Gabriel Erion-Barner, Hongyuan Mei, Hao Peng, Yue Guo

Subjects: Computation and Language (cs.CL)

Clinical diagnosis is a complex reasoning process in which clinicians gather evidence, form hypotheses, and test them against alternative explanations. In medical training, this reasoning is explicitly developed through counterfactual questioning--e.g., asking how a diagnosis would change if a key symptom were absent or altered--to strengthen differential diagnosis skills. As large language model (LLM)-based systems are increasingly used for diagnostic support, ensuring the interpretability of their recommendations becomes critical. However, most existing LLM-based diagnostic agents reason over fixed clinical evidence without explicitly testing how individual findings support or weaken competing diagnoses. In this work, we propose a counterfactual multi-agent diagnostic framework inspired by clinician training that makes hypothesis testing explicit and evidence-grounded. Our framework introduces counterfactual case editing to modify clinical findings and evaluate how these changes affect competing diagnoses. We further define the Counterfactual Probability Gap, a method that quantifies how strongly individual findings support a diagnosis by measuring confidence shifts under these edits. These counterfactual signals guide multi-round specialist discussions, enabling agents to challenge unsupported hypotheses, refine differential diagnoses, and produce more interpretable reasoning trajectories. Across three diagnostic benchmarks and seven LLMs, our method consistently improves diagnostic accuracy over prompting and prior multi-agent baselines, with the largest gains observed in complex and ambiguous cases. Human evaluation further indicates that our framework produces more clinically useful, reliable, and coherent reasoning. These results suggest that incorporating counterfactual evidence verification is an important step toward building reliable AI systems for clinical decision support.
[805] arXiv:2603.28342 (replaced) [pdf, html, other]: Title: Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together with structured execution feedback on compilation, correctness, and speedup. To make this search reliable, we build backend-specific evaluation services for Triton on NVIDIA GPUs and Maca on MetaX GPUs. On the training side, we convert long-horizon evolution trajectories into step-centric supervision and reinforcement learning signals by retaining correctness-preserving, high-gain revisions, so that the model is optimized as a strong local improver inside the evolutionary loop rather than as a one-shot generator. Under a unified evolutionary protocol, Kernel-Smith-235B-RL achieves state-of-the-art overall performance on KernelBench with Nvidia Triton backend, attaining the best average speedup ratio and outperforming frontier proprietary models including Gemini-3.0-pro and Claude-4.6-opus. We further validate the framework on the MetaX MACA backend, where our Kernel-Smith-MACA-30B surpasses large-scale counterparts such as DeepSeek-V3.2-think and Qwen3-235B-2507-think, highlighting potential for seamless adaptation across heterogeneous platforms. Beyond benchmark results, the same workflow produces upstream contributions to production systems including SGLang and LMDeploy, demonstrating that LLM-driven kernel optimization can transfer from controlled evaluation to practical deployment.
[806] arXiv:2603.28680 (replaced) [pdf, other]: Title: Modeling AI-RAN Economics: A Techno-Economic Framework

Gabriele Gemmi, Michele Polese, Tommaso Melodia

Subjects: Networking and Internet Architecture (cs.NI)

The large-scale deployment of 5G networks has not delivered the expected return on investment for mobile network operators, raising concerns about the economic viability of future 6G rollouts. At the same time, surging demand for Artificial Intelligence (AI) inference and training workloads is straining global compute capacity. AI-RAN architectures, in which Radio Access Network (RAN) platforms accelerated on Graphics Processing Unit (GPU) share idle capacity with AI workloads during off-peak periods, offer a potential path to improved capital efficiency. However, the economic case for such systems remains unsubstantiated. In this paper, we present a techno-economic analysis of AI-RAN deployments by combining publicly available benchmarks of 5G Layer-1 processing on heterogeneous platforms -- from x86 servers with accelerators for channel coding to modern GPUs -- with realistic traffic models and AI service demand profiles for Large Language Model (LLM) inference. We construct a joint cost and revenue model that quantifies the surplus compute capacity available in GPU-based RAN deployments and evaluates the returns from leasing it to AI tenants. Our results show that, across a range of scenarios encompassing token depreciation, varying demand dynamics, and diverse GPU serving densities, the additional capital and operational expenditures of GPU-heavy deployments are offset by AI-on-RAN revenue, yielding a return on investment of up to 8x. These findings strengthen the long-term economic case for accelerator-based RAN architectures and future 6G deployments.
[807] arXiv:2603.28758 (replaced) [pdf, html, other]: Title: Distributionally Robust Planning with $\mathcal{L}_1$ Adaptive Control

Astghik Hakobyan, Amaras Nazarians, Aditya Gahlawat, Naira Hovakimyan, Ilya Kolmanovsky

Subjects: Systems and Control (eess.SY)

Safe operation of autonomous systems requires robustness to both model uncertainty and uncertainty in the environment. We propose DRP-$\mathcal{L}_1$AC, a hierarchical framework for stochastic nonlinear systems that integrates distributionally robust model predictive control (DR-MPC) with $\mathcal{L}_1$-adaptive control. The key idea is to use the $\mathcal{L}_1$-adaptive controller's online distributional certificates that bound the Wasserstein distance between nominal and true state distributions, thereby certifying the ambiguity sets used for planning without requiring distribution samples. Environmental uncertainty is captured via data-driven ambiguity sets constructed from finite samples. These are incorporated into a DR-MPC planner enforcing distributionally robust chance constraints over a receding horizon. Using Wasserstein duality, the resulting problem admits tractable reformulations and a sample-based implementation. We show theoretically and via numerical experimentation that our framework ensures certifiable safety in the presence of simultaneous system and environmental uncertainties.
[808] arXiv:2604.03873 (replaced) [pdf, html, other]: Title: SODA: Semi On-Policy Black-Box Distillation for Large Language Models

Xiwen Chen, Jingjing Wang, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hejian Sang, Zhipeng Wang, Alborz Geramifard, Feng Luo

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Black-box knowledge distillation for large language models presents a strict trade-off. Simple off-policy methods (e.g., sequence-level knowledge distillation) struggle to correct the student's inherent errors. Fully on-policy methods (e.g., Generative Adversarial Distillation) solve this via adversarial training but introduce well-known training instability and crippling computational overhead. To address this dilemma, we propose SODA (Semi On-policy Distillation with Alignment), a highly efficient alternative motivated by the inherent capability gap between frontier teachers and much smaller base models. Because a compact student model's natural, zero-shot responses are almost strictly inferior to the powerful teacher's targets, we can construct a highly effective contrastive signal simply by pairing the teacher's optimal response with a one-time static snapshot of the student's outputs. This demonstrates that exposing the small student to its own static inferior behaviors is sufficient for high-quality distribution alignment, eliminating the need for costly dynamic rollouts and fragile adversarial balancing. Extensive evaluations across four compact Qwen2.5 and Llama-3 models validate this semi on-policy paradigm. SODA matches or outperforms the state-of-the-art methods on 15 out of 16 benchmark results. More importantly, it achieves this superior distillation quality while training 10 times faster, consuming 27% less peak GPU memory, and completely eliminating adversarial instability.
[809] arXiv:2604.03956 (replaced) [pdf, html, other]: Title: VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models

Ravi Ranjan, Agoritsa Polyzou

Comments: 18 pages, 9 figures, Accepted to ACL-2026, KnowFM

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Vision-language-action (VLA) models are emerging as embodied foundation models for robotic manipulation, but their deployment introduces a new unlearning challenge: removing unsafe, spurious, or privacy-sensitive behaviors without degrading perception, language grounding, and action control. In OpenVLA-style policies, behavior is produced through a fused visual encoder, a cross-modal projector, and a language backbone that predicts tokenized robot actions, so undesirable knowledge can be distributed across perception, alignment, and reasoning/action layers rather than confined to a single module. Consequently, partial unlearning applied only to the vision stack or only to the language backbone is often insufficient, while conventional unlearning baselines designed for standalone vision or language models may leave residual forgetting or incur unnecessary utility loss in embodied settings. We propose VLA-Forget, a hybrid unlearning framework that combines ratio-aware selective editing for perception and cross-modal specificity with layer-selective reasoning/action unlearning for utility-preserving forgetting. VLA-Forget jointly optimizes three objectives: targeted forgetting, perceptual preservation, and reasoning retention, through staged updates over the visual encoder, projector, and upper action-generating transformer blocks. Across forget-set behavior probes and retain-task evaluations, VLA-Forget improves forgetting efficacy by 10%, preserves perceptual specificity by 22%, retains reasoning and task success by 9%, and reduces post-quantization recovery by 55% relative to strong unlearning baselines.
[810] arXiv:2604.04395 (replaced) [pdf, html, other]: Title: BiTDiff: Fine-Grained 3D Conducting Motion Generation via BiMamba-Transformer Diffusion

Tianzhi Jia, Kaixing Yang, Xiaole Yang, Xulong Tang, Ke Qiu, Shikui Wei, Yao Zhao

Comments: 15 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

3D conducting motion generation aims to synthesize fine-grained conductor motions from music, with broad potential in music education, virtual performance, digital human animation, and human-AI co-creation. However, this task remains underexplored due to two major challenges: (1) the lack of large-scale fine-grained 3D conducting datasets and (2) the absence of effective methods that can jointly support long-sequence generation with high quality and efficiency. To address the data limitation, we develop a quality-oriented 3D conducting motion collection pipeline and construct CM-Data, a fine-grained SMPL-X dataset with about 10 hours of conducting motion data. To the best of our knowledge, CM-Data is the first and largest public dataset for 3D conducting motion generation. To address the methodological limitation, we propose BiTDiff, a novel framework for 3D conducting motion generation, built upon a BiMamba-Transformer hybrid model architecture for efficient long-sequence modeling and a Diffusion-based generative strategy with human-kinematic decomposition for high-quality motion synthesis. Specifically, BiTDiff introduces auxiliary physical-consistency losses and a hand-/body-specific forward-kinematics design for better fine-grained motion modeling, while leveraging BiMamba for memory-efficient long-sequence temporal modeling and Transformer for cross-modal semantic alignment. In addition, BiTDiff supports training-free joint-level motion editing, enabling downstream human-AI interaction design. Extensive quantitative and qualitative experiments demonstrate that BiTDiff achieves state-of-the-art (SOTA) performance for 3D conducting motion generation on the CM-Data dataset. Code will be available upon acceptance.
[811] arXiv:2604.05260 (replaced) [pdf, html, other]: Title: ZipFold: Modular Actuators for Scaleable Adaptive Robots

Niklas Hagemann, Daniela Rus

Subjects: Robotics (cs.RO); Soft Condensed Matter (cond-mat.soft); Human-Computer Interaction (cs.HC)

There is a growing need for robots that can change their shape, size and mechanical properties to adapt to evolving tasks and environments. However, current shape-changing systems generally utilize bespoke, system-specific mechanisms that can be difficult to scale, reconfigure or translate from one application to another. This paper introduces a compact, easy-to-fabricate deployable actuator that achieves reversible scale and stiffness transformations through compound folding and zipping of flexible 3D-printed plastic strips into square-section deployable beams. The simple actuation method allows for smooth, continuous transitions between compact (flexible) and expanded (quasi-rigid) states, facilitating diverse shape and stiffness transformations when modules are combined into larger assemblies. The actuator's mechanical performance is characterized and an integrated system involving a four-module adaptive walking robot is demonstrated.
[812] arXiv:2604.05320 (replaced) [pdf, other]: Title: ExpressMM: Expressive Mobile Manipulation Behaviors in Human-Robot Interactions

Souren Pashangpour, Haitong Wang, Matthew Lisondra, Goldie Nejat

Subjects: Robotics (cs.RO)

Mobile manipulators are increasingly deployed in human-centered environments to perform tasks. While completing such tasks, they should also be able to communicate their intent to the people around them using expressive robot behaviors. Prior work on expressive robot behaviors has used preprogrammed or learning-from-demonstration-based expressive motions and large language model generated high-level interactions. The majority of these existing approaches have not considered human-robot interactions (HRI) where users may interrupt, modify, or redirect a robot's actions during task execution. In this paper, we develop the novel ExpressMM framework that integrates a high-level language-guided planner based on a vision-language model for perception and conversational reasoning with a low-level vision-language-action policy to generate expressive robot behaviors during collaborative HRI tasks. Furthermore, ExpressMM supports interruptible interactions to accommodate updated or redirecting instructions by users. We demonstrate ExpressMM on a mobile manipulator assisting a human in a collaborative assembly scenario and conduct audience-based evaluation of live HRI demonstrations. Questionnaire results show that the ExpressMM-enabled expressive behaviors helped observers clearly interpret the robot's actions and intentions while supporting socially appropriate and understandable interactions. Participants also reported that the robot was useful for collaborative tasks and behaved in a predictable and safe manner during the demonstrations, fostering positive perceptions of the robot's usefulness, safety, and predictability during the collaborative tasks.
[813] arXiv:2604.05673 (replaced) [pdf, html, other]: Title: Rectified Schrödinger Bridge Matching for Few-Step Visual Navigation

Wuyang Luan, Junhui Li, Weiguang Zhao, Wenjian Zhang, Tieru Wu, Rui Ma

Comments: 18 pages, 7 figures, 10 tables. Code available at this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Visual navigation is a core challenge in Embodied AI, requiring autonomous agents to translate high-dimensional sensory observations into continuous, long-horizon action trajectories. While generative policies based on diffusion models and Schrödinger Bridges (SB) effectively capture multimodal action distributions, they require dozens of integration steps due to high-variance stochastic transport, posing a critical barrier for real-time robotic control. We propose Rectified Schrödinger Bridge Matching (RSBM), a framework that exploits a shared velocity-field structure between standard Schrödinger Bridges ($\varepsilon=1$, maximum-entropy transport) and deterministic Optimal Transport ($\varepsilon\to 0$, as in Conditional Flow Matching), controlled by a single entropic regularization parameter $\varepsilon$. We prove two key results: (1) the conditional velocity field's functional form is invariant across the entire $\varepsilon$-spectrum (Velocity Structure Invariance), enabling a single network to serve all regularization strengths; and (2) reducing $\varepsilon$ linearly decreases the conditional velocity variance, enabling more stable coarse-step ODE integration. Anchored to a learned conditional prior that shortens transport distance, RSBM operates at an intermediate $\varepsilon$ that balances multimodal coverage and path straightness. Empirically, while standard bridges require $\geq 10$ steps to converge, RSBM achieves over 94% cosine similarity and 92% success rate in merely 3 integration steps -- without distillation or multi-stage training -- substantially narrowing the gap between high-fidelity generative policies and the low-latency demands of Embodied AI.
[814] arXiv:2604.05846 (replaced) [pdf, html, other]: Title: AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning

Yuanfu Sun, Kang Li, Dongzhe Fan, Jiajin Liu, Qiaoyu Tan

Comments: ACL 2026 Main Conference

Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) increasingly rely on agentic capabilities-iterative retrieval, tool use, and decision-making-to overcome the limits of static, parametric knowledge. Yet existing agentic frameworks treat external information as unstructured text and fail to leverage the topological dependencies inherent in real-world data. To bridge this gap, we introduce Agentic Graph Learning (AGL), a paradigm that reframes graph learning as an interleaved process of topology-aware navigation and LLM-based inference. Specifically, we propose AgentGL, the first reinforcement learning (RL)-driven framework for AGL. AgentGL equips an LLM agent with graph-native tools for multi-scale exploration, regulates tool usage via search-constrained thinking to balance accuracy and efficiency, and employs a graph-conditioned curriculum RL strategy to stabilize long-horizon policy learning without step-wise supervision. Across diverse Text-Attributed Graph (TAG) benchmarks and multiple LLM backbones, AgentGL substantially outperforms strong GraphLLMs and GraphRAG baselines, achieving absolute improvements of up to 17.5% in node classification and 28.4% in link prediction. These results demonstrate that AGL is a promising frontier for enabling LLMs to autonomously navigate and reason over complex relational environments. The code is publicly available at this https URL.
[815] arXiv:2604.07958 (replaced) [pdf, html, other]: Title: ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks

Jiayang Xu, Fan Zhuo, Majun Zhang, Changhao Pan, Zehan Wang, Siyu Chen, Xiaoda Yang, Tao Jin, Zhou Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Current video editing models often rely on expensive paired video data, which limits their practical scalability. In essence, most video editing tasks can be formulated as a decoupled spatiotemporal process, where the temporal dynamics of the pretrained model are preserved while spatial content is selectively and precisely modified. Based on this insight, we propose ImVideoEdit, an efficient framework that learns video editing capabilities entirely from image pairs. By freezing the pre-trained 3D attention modules and treating images as single-frame videos, we decouple the 2D spatial learning process to help preserve the original temporal dynamics. The core of our approach is a Predict-Update Spatial Difference Attention module that progressively extracts and injects spatial differences. Rather than relying on rigid external masks, we incorporate a Text-Guided Dynamic Semantic Gating mechanism for adaptive and implicit text-driven modifications. Despite training on only 13K image pairs for 5 epochs with exceptionally low computational overhead, ImVideoEdit achieves editing fidelity and temporal consistency comparable to larger models trained on extensive video datasets.
[816] arXiv:2604.08016 (replaced) [pdf, html, other]: Title: Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

Moein Salimi, Shaygan Adim, Danial Parnian, Nima Alighardashi, Mahdi Jafari Siavoshani, Mohammad Hossein Rohban

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Regardless of its foundational role in human discovery and sense-making, abductive reasoning--the inference of the most plausible explanation for an observation--has been relatively underexplored in Large Language Models (LLMs). Despite the rapid advancement of LLMs, the exploration of abductive reasoning and its diverse facets has thus far been disjointed rather than cohesive. This paper presents the first survey of abductive reasoning in LLMs, tracing its trajectory from philosophical foundations to contemporary AI implementations. To address the widespread conceptual confusion and disjointed task definitions prevalent in the field, we establish a unified two-stage definition that formally categorizes prior work. This definition disentangles abduction into Hypothesis Generation, where models bridge epistemic gaps to produce candidate explanations, and Hypothesis Selection, where the generated candidates are evaluated and the most plausible explanation is chosen. Building upon this foundation, we present a comprehensive taxonomy of the literature, categorizing prior work based on their abductive tasks, datasets, underlying methodologies, and evaluation strategies. In order to ground our framework empirically, we conduct a compact benchmark study of current LLMs on abductive tasks, together with targeted comparative analyses across model sizes, model families, evaluation styles, and the distinct generation-versus-selection task typologies. Moreover, by synthesizing recent empirical results, we examine how LLM performance on abductive reasoning relates to deductive and inductive tasks, providing insights into their broader reasoning capabilities. Our analysis reveals critical gaps in current approaches--from static benchmark design and narrow domain coverage to narrow training frameworks and limited mechanistic understanding of abductive processes...
[817] arXiv:2604.08619 (replaced) [pdf, html, other]: Title: Doctoral Theses in France (1985-2025): A Linked Dataset of PhDs, Academic Networks, and Institutions

William Aboucaya, Dastan Jasim

Comments: 11 pages + 6 appendix pages, 7 figures, 2 tables. See this https URL for the dataset. See this https URL for the code to reproduce the dataset and figures Version 2: Fixed references to tables and figures. Modified unclear wordings in section 3. Updated values in the languages table after a minor bug fix. Standardized figures style

Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY)

This paper presents a comprehensive dataset of doctoral theses defended in France between 1985 and 2025, constructed from multiple national academic metadata sources. The dataset is primarily based on data from the French national thesis platform and is enriched using additional authority and bibliographic databases to improve data quality, completeness, and interoperability. The data production pipeline includes the aggregation of heterogeneous sources, the correction of inconsistent identifiers, the enrichment of person and institution records, and the construction of derived variables describing academic careers, jury participation, institutional affiliations, and thesis characteristics. Additional identifiers from major academic repositories and library catalogues are integrated to facilitate linkage with external data sources and future dataset extensions. The resulting dataset provides structured information at the thesis, individual, and institutional levels, enabling both descriptive and relational analyses. This resource is particularly suited for research on doctoral education, academic networks, supervision practices, jury composition, institutional collaboration, and the evolution of research communities over time. The paper documents the data sources, processing pipeline, feature construction, data quality issues, and limitations, with the objective of facilitating reuse of the dataset by other researchers and supporting future extensions and longitudinal analyses of the academic system.
[818] arXiv:2604.08927 (replaced) [pdf, html, other]: Title: Beyond the Individual: Virtualizing Multi-Disciplinary Reasoning for Clinical Intake via Collaborative Agents

Huangwei Chen, Wu Li, Junhao Jia, Yining Chen, Xiaotao Pang, YaLong Chen, Gonghui Li, Haishuai Wang, Jiajun Bu, Lei Wu

Comments: Accepted to ACL 2026 Findings

Subjects: Multiagent Systems (cs.MA)

The initial outpatient consultation is critical for clinical decision-making, yet it is often conducted by a single physician under time pressure, making it prone to cognitive biases and incomplete evidence capture. Although the Multi-Disciplinary Team (MDT) reduces these risks, they are costly and difficult to scale to real-time intake. We propose Aegle, a synchronous virtual MDT framework that brings MDT-level reasoning to outpatient consultations via a graph-based multi-agent architecture. Aegle formalizes the consultation state using a structured SOAP representation, separating evidence collection from diagnostic reasoning to improve traceability and bias control. An orchestrator dynamically activates specialist agents, which perform decoupled parallel reasoning and are subsequently integrated by an aggregator into a coherent clinical note. Experiments on ClinicalBench and a real-world RAPID-IPN dataset across 24 departments and 53 metrics show that Aegle consistently outperforms state-of-the-art proprietary and open-source models in documentation quality and consultation capability, while also improving final diagnosis accuracy. Our code is available at this https URL.
[819] arXiv:2604.09000 (replaced) [pdf, html, other]: Title: StreamMeCo: Long-Term Agent Memory Compression for Efficient Streaming Video Understanding

Junxi Wang, Te Sun, Jiayi Zhu, Junxian Li, Haowen Xu, Zichen Wen, Xuming Hu, Zhiyu Li, Linfeng Zhang

Comments: 2026ACL Findings

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision agent memory has shown remarkable effectiveness in streaming video understanding. However, storing such memory for videos incurs substantial memory overhead, leading to high costs in both storage and computation. To address this issue, we propose StreamMeCo, an efficient Stream Agent Memory Compression framework. Specifically, based on the connectivity of the memory graph, StreamMeCo introduces edge-free minmax sampling for the isolated nodes and an edge-aware weight pruning for connected nodes, evicting the redundant memory nodes while maintaining the accuracy. In addition, we introduce a time-decay memory retrieval mechanism to further eliminate the performance degradation caused by memory compression. Extensive experiments on three challenging benchmark datasets (M3-Bench-robot, M3-Bench-web and Video-MME-Long) demonstrate that under 70% memory graph compression, StreamMeCo achieves a 1.87* speedup in memory retrieval while delivering an average accuracy improvement of 1.0%. Our code is available at this https URL.
[820] arXiv:2604.09251 (replaced) [pdf, html, other]: Title: DRBENCHER: Can Your Agent Identify the Entity, Retrieve Its Properties and Do the Math?

Young-Suk Lee, Ramon Fernandez Astudillo, Radu Florian

Subjects: Artificial Intelligence (cs.AI)

Deep research agents increasingly interleave web browsing with multi-step computation, yet existing benchmarks evaluate these capabilities in isolation, creating a blind spot in assessing real-world performance. We introduce DRBENCHER, a synthetic benchmark generator for questions that require both browsing and computation. It enforces four criteria: verifiability (gold answers are computed by executing parameterized code over knowledge-graph values), complexity (multi-hop entity identification, property retrieval, and domain-specific computation), difficulty (a two-stage verification cascade filters out questions solvable by the generating model), and diversity (a greedy max-min embedding filter maximizes coverage). These criteria are realized via a unified answer-first pipeline spanning five domains: biochemistry, financial, geophysical, security, and history. Human evaluation shows 76% validity (84% excluding stale data), with 35% of errors due to outdated knowledge-graph entries, highlighting an inherent limitation of systems that reason over evolving data. Automatic evaluation shows that the strongest frontier model achieves only 20% answer accuracy. Compared to manually constructed benchmarks (BrowseComp+, MATH-500, GPQA), DRBENCHER achieves the highest semantic diversity.
[821] arXiv:2604.10072 (replaced) [pdf, html, other]: Title: Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty

Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, Minlong Peng, Flora D. Salim

Comments: accepted by ACL 2026 Findings

Subjects: Computation and Language (cs.CL)

Recent advancements in the Generative Reward Model (GRM) have demonstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, existing implementations of GRM suffer from two critical limitations. First, CoT prompting is applied indiscriminately to all inputs regardless of their inherent complexity. This introduces unnecessary computational costs for tasks amenable to fast, direct inference. Second, existing approaches primarily rely on voting-based mechanisms to evaluate CoT outputs, which often lack granularity and precision in assessing reasoning quality. In this paper, we propose E-GRM, an efficient generative reward modeling framework grounded in model-internal uncertainty. E-GRM leverages the convergence behavior of parallel model generations to estimate uncertainty and selectively trigger CoT reasoning only when needed, without relying on handcrafted features or task-dependent signals. To improve reward fidelity, we introduce a lightweight discriminative scorer trained with a hybrid regression--ranking objective to provide fine-grained evaluation of reasoning paths. Experiments on multiple reasoning benchmarks show that E-GRM substantially reduces inference cost while consistently improving answer accuracy, demonstrating that model-internal uncertainty is an effective and general signal for efficient reasoning-aware reward modeling.
[822] arXiv:2604.10079 (replaced) [pdf, html, other]: Title: Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, Minlong Peng, Flora D. Salim

Comments: Accepted by ACL 2026 Main

Subjects: Computation and Language (cs.CL)

Supervised Fine-Tuning (SFT) is the standard approach for adapting large language models (LLMs) to downstream tasks. However, we observe a persistent failure mode: even after convergence, models often fail to correctly reproduce a subset of their own supervised training data. We refer to this behavior as the Incomplete Learning Phenomenon(ILP). This paper presents the first systematic study of ILP in LLM fine-tuning. We formalize ILP as post-training failure to internalize supervised instances and demonstrate its prevalence across multiple model families, domains, and datasets. Through controlled analyses, we identify five recurrent sources of incomplete learning: (1) missing prerequisite knowledge in the pre-trained model, (2) conflicts between SFT supervision and pre-training knowledge, (3) internal inconsistencies within SFT data, (4) left-side forgetting during sequential fine-tuning, and (5) insufficient optimization for rare or complex patterns. We introduce a diagnostic-first framework that maps unlearned samples to these causes using observable training and inference signals, and study several targeted mitigation strategies as causal interventions. Experiments on Qwen, LLaMA, and OLMo2 show that incomplete learning is widespread and heterogeneous, and that improvements in aggregate metrics can mask persistent unlearned subsets. The findings highlight the need for fine-grained diagnosis of what supervised fine-tuning fails to learn, and why.
[823] arXiv:2604.10271 (replaced) [pdf, html, other]: Title: Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution

Robert Dilworth

Comments: 30 pages, 9 figures

Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Information Retrieval (cs.IR)

In what way could a data breach involving government-issued IDs such as passports, driver's licenses, etc., rival a random voluntary disclosure on a nondescript social-media platform? At first glance, the former appears more significant, and that is a valid assessment. The disclosed data could contain an individual's date of birth and address; for all intents and purposes, a leak of that data would be disastrous. Given the threat, the latter scenario involving an innocuous online post seems comparatively harmless--or does it? From that post and others like it, a forensic linguist could stylometrically uncover equivalent pieces of information, estimating an age range for the author (adolescent or adult) and narrowing down their geographical location (specific country). While not an exact science--the determinations are statistical--stylometry can reveal comparable, though noticeably diluted, information about an individual. To prevent an ID from being breached, simply sharing it as little as possible suffices. Preventing the leakage of personal information from written text requires a more complex solution: adversarial stylometry. In this paper, we explore how performing homoglyph substitution--the replacement of characters with visually similar alternatives (e.g., "h" $\texttt{[U+0068]}$ $\rightarrow$ "h" $\texttt{[U+04BB]}$)--on text can degrade stylometric systems.
[824] arXiv:2604.10275 (replaced) [pdf, html, other]: Title: FastSHADE: Fast Self-augmented Hierarchical Asymmetric Denoising for Efficient inference on mobile devices

Nikolay Falaleev

Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Real-time image denoising is essential for modern mobile photography but remains challenging due to the strict latency and power constraints of edge devices. This paper presents FastSHADE (Fast Self-augmented Hierarchical Asymmetric Denoising), a lightweight U-Net-style network tailored for real-time, high-fidelity restoration on mobile GPUs. Our method features a multi-stage architecture incorporating a novel Asymmetric Frequency Denoising Block (AFDB) that decouples spatial structure extraction from high-frequency noise suppression to maximize efficiency, and a Spatially Gated Upsampler (SGU) that optimizes high-resolution skip connection fusion. To address generalization, we introduce an efficient Noise Shifting Self-Augmentation strategy that enhances data diversity without inducing domain shifts. Evaluations on the MAI2021 benchmark demonstrate that our scalable model family establishes a highly efficient speed-fidelity trade-off. Our base FastSHADE-M variant maintains real-time latency (<50 ms on an Adreno 840 GPU) while preserving structural integrity, and our scaled-up FastSHADE-XL establishes a new state-of-the-art for overall image quality, achieving 37.94 dB PSNR.
[825] arXiv:2604.10516 (replaced) [pdf, html, other]: Title: Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning

Xinyi Huang

Subjects: Computation and Language (cs.CL)

Selecting the right knowledge is critical when using large language models (LLMs) to solve domain-specific data analysis tasks. However, most retrieval-augmented approaches rely primarily on lexical or embedding similarity, which is often a weak proxy for the task-critical knowledge needed for multi-step reasoning. In many such tasks, the relevant knowledge is not merely textually related to the query, but is instead grounded in executable code and the dependency structure through which computations are carried out. To address this mismatch, we propose SGKR (Structure-Grounded Knowledge Retrieval), a retrieval framework that organizes domain knowledge with a graph induced by function-call dependencies. Given a question, SGKR extracts semantic input and output tags, identifies dependency paths connecting them, and constructs a task-relevant subgraph. The associated knowledge and corresponding function implementations are then assembled as a structured context for LLM-based code generation. Experiments on multi-step data analysis benchmarks show that SGKR consistently improves solution correctness over no-retrieval and similarity-based retrieval baselines for both vanilla LLMs and coding agents.
[826] arXiv:2604.11388 (replaced) [pdf, other]: Title: Min-Sum Set Cover on Parallel Machines

Michał Szyfelbein

Comments: 14 pages

Subjects: Data Structures and Algorithms (cs.DS)

Consider the classical Min-Sum Set Cover problem: We are given a universe $\mathcal{U}$ of $n$ elements and a collection $\mathcal{S}$ of $k$ subsets of $\mathcal{U}$. Moreover, a cost function is associated with each set. The goal is to find a subsequence of sets in $\mathcal{S}$ that covers all elements in $\mathcal{U}$, such that the sum of the covering times of the elements is minimized. The covering time of an element $u$ is the cost of all sets that appear in the sequence before $u$ is first covered. This problem can be seen as a scheduling problem on a single machine, where each job represents a set and elements are represented by some kind of utility that is required to be provided by at least one of the jobs. The goal is to schedule the jobs in such a way to minimize the sum of provision times of the utilities. In this paper we consider a natural generalization of this problem to the case of $m$ machines, processing the jobs in parallel. We call this problem Parallel Min-Sum Set Cover.
To obtain approximation algorithms for both related and unrelated machines, we use a crucial subproblem which we call Parallel Maximum Coverage. We give a randomized bicriteria $(1-1/e-\epsilon, O(\log m/\log\log m))$-approximation algorithm for this problem based on a natural LP relaxation. This can be then used to obtain $O(\log m/\log\log m)$-approximation algorithm for the Min-Sum Set Cover problem on unrelated machines. For related machines, we allow the aforementioned bicriteria approximation algorithm to run in FPT time, and apply a technique enabling transformation of a related machines instance into one consisting of $O(\log m)$ unrelated machines, to get an $\frac{8e}{e+1}+\epsilon <12.66$-approximation algorithm for this case. We also show a greedy algorithm for unit cost sets, subject to precedence constraints, with an $O(k^{2/3})$ approximation ratio.
[827] arXiv:2604.11417 (replaced) [pdf, html, other]: Title: Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech

Edwin C. Montiel-Vazquez, Christian Arzate Cruz, Stefanos Gkikas, Thomas Kassiotis, Giorgos Giannakakis, Randy Gomez

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Co-speech gestures increase engagement and improve speech understanding. Most data-driven robot systems generate rhythmic beat-like motion, yet few integrate semantic emphasis. To address this, we propose a lightweight transformer that derives iconic gesture placement and intensity from text and emotion alone, requiring no audio input at inference time. The model outperforms GPT-4o in both semantic gesture placement classification and intensity regression on the BEAT2 dataset, while remaining computationally compact and suitable for real-time deployment on embodied agents.
[828] arXiv:2604.12067 (replaced) [pdf, other]: Title: Vectorized Gaussian Belief Propagation for Near Real-Time Fully-Distributed PMU-Based State Estimation

Mirsad Cosovic, Armin Teskeredzic, Antonello Monti, Dejan Vukobratovic

Comments: 13 pages, 13 figures

Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)

Electric power systems require accurate, scalable, distributed, and near real-time state estimation (SE) to support reliable monitoring and control under increasingly complex operating conditions. Limited monitoring capabilities can lead to inefficient operation and, in extreme cases, large-scale disturbances such as blackouts. To address these challenges, this paper proposes a vectorized Gaussian belief propagation (GBP) framework for phasor measurement unit-based SE, formulated over factor graphs and specifically designed to support distributed and near real-time monitoring. The proposed framework includes multivariate and fusion-based GBP formulations. The multivariate formulation jointly models related state variables and their measurement relationships, while the fusion-based formulation reduces factor graph complexity by combining multiple measurements associated with the same set of variables, resulting in a structure that more closely reflects the underlying electrical coupling of the power system. The resulting algorithms operate in a fully distributed manner at the bus level and achieve fast convergence and high estimation accuracy, often within a few iterations, as demonstrated by numerical results on systems ranging from 60 to 13659 buses, where the fusion-based formulation achieves single-digit millisecond iteration times on the largest test case.
[829] arXiv:2604.12710 (replaced) [pdf, html, other]: Title: LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

Junxiao Yang, Haoran Liu, Jinzhe Tu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Jiaqi Weng, Jialing Tao, Hui Xue, Hongning Wang, Han Qiu, Minlie Huang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large language models (LLMs) often demonstrate strong safety performance in high-resource languages, yet exhibit severe vulnerabilities when queried in low-resource languages. We attribute this gap to a mismatch between language-agnostic semantic understanding ability and language-dominant safety alignment biased toward high-resource languages. Consistent with this hypothesis, we empirically identify the semantic bottleneck in LLMs, an intermediate layer in which the geometry of model representations is governed primarily by shared semantic content rather than language identity. Building on this observation, we propose Language-Agnostic Semantic Alignment (LASA), which anchors safety alignment directly in semantic bottlenecks. Experiments show that LASA substantially improves safety across all languages: average attack success rate (ASR) drops from 24.7% to 2.8% on LLaMA-3.1-8B-Instruct and remains around 3-4% across Qwen2.5 and Qwen3 Instruct models (7B-32B). Together, our analysis and method offer a representation-level perspective on LLM safety, suggesting that safety alignment requires anchoring safety understanding not in surface text, but in the model's language-agnostic semantic space.
[830] arXiv:2604.12867 (replaced) [pdf, html, other]: Title: QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence

Zhichao Lin, Zhichao Liang, Gaoqiang Liu, Meng Xu, Baoyu Xiang, Shuxin Zhao, Yao Wu, Jian Xu, Guanjun Jiang

Subjects: Artificial Intelligence (cs.AI)

As agentic foundation models continue to evolve, how to further improve their performance in vertical domains has become an important challenge. To this end, building upon Tongyi DeepResearch, a powerful agentic foundation model, we focus on the Chinese medical deep search scenario and propose QuarkMedSearch, systematically exploring a full-pipeline approach spanning medical multi-hop data construction, training strategies, and evaluation benchmarks to further push and assess its performance upper bound in vertical domains. Specifically, for data synthesis, to address the scarcity of deep search training data in the medical domain, we combine a large-scale medical knowledge graph with real-time online exploration to construct long-horizon medical deep search training data; for post-training, we adopt a two-stage SFT and RL training strategy that progressively enhances the model's planning, tool invocation, and reflection capabilities required for deep search, while maintaining search efficiency; for evaluation, we collaborate with medical experts to construct the QuarkMedSearch Benchmark through rigorous manual verification. Experimental results demonstrate that QuarkMedSearch achieves state-of-the-art performance among open-source models of comparable scale on the QuarkMedSearch Benchmark, while also maintaining strong competitiveness on general benchmarks.
[831] arXiv:2604.12875 (replaced) [pdf, other]: Title: AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance

Abiodun A. Solanke

Comments: This paper has been withdrawn by the author while an institutional affiliation compliance matter is under review. It may be resubmitted once the matter is resolved

Subjects: Artificial Intelligence (cs.AI)

The rapid expansion of large language model (LLM) safety evaluation has produced a substantial benchmark ecosystem, but not a correspondingly coherent measurement ecosystem. We present AISafetyBenchExplorer, a structured catalogue of 195 AI safety benchmarks released between 2018 and 2026, organized through a multi-sheet schema that records benchmark-level metadata, metric-level definitions, benchmark-paper metadata, and repository activity. This design enables meta-analysis not only of what benchmarks exist, but also of how safety is operationalized, aggregated, and judged across the literature. Using the updated catalogue, we identify a central structural problem: benchmark proliferation has outpaced measurement standardization. The current landscape is dominated by medium-complexity benchmarks (94/195), while only 7 benchmarks occupy the Popular tier. The workbook further reports strong concentration around English-only evaluation (165/195), evaluation-only resources (170/195), stale GitHub repositories (137/195), stale Hugging Face datasets (96/195), and heavy reliance on arXiv preprints among benchmarks with known venue metadata. At the metric level, the catalogue shows that familiar labels such as accuracy, F1 score, safety score, and aggregate benchmark scores often conceal materially different judges, aggregation rules, and threat models. We argue that the field's main failure mode is fragmentation rather than scarcity. Researchers now have many benchmark artifacts, but they often lack a shared measurement language, a principled basis for benchmark selection, and durable stewardship norms for post publication maintenance. AISafetyBenchExplorer addresses this gap by providing a traceable benchmark catalogue, a controlled metadata schema, and a complexity taxonomy that together support more rigorous benchmark discovery, comparison, and meta-evaluation.
[832] arXiv:2604.12994 (replaced) [pdf, html, other]: Title: LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain

Comments: To appear in ACL 2026 Main Conference

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their limited semantic understanding of the vulnerable code and its expected behavior. On the other hand, recent successes of large language models (LLMs) in understanding and repairing code are promising. However, no framework currently exists to analyze the capabilities and limitations of such techniques for logical vulnerabilities. We aim to systematically evaluate both traditional and LLM based repair approaches for addressing real world logical vulnerabilities. To facilitate our assessment, we created the first ever dataset, LogicDS, comprising 122 logical vulnerabilities that reflect tangible security impact. We also developed a systematic framework, LogicEval, to evaluate patches for logical vulnerabilities. Evaluations suggest that compilation and testing failures are primarily driven by prompt sensitivity, loss of code context, and difficulty in patch localization.
[833] arXiv:2604.13359 (replaced) [pdf, html, other]: Title: BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals

Run Wang, Victor J. B. Jung, Philip Wiese, Sebastian Frey, Giusy Spacone, Francesco Conti, Alessio Burrello, Luca Benini

Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Signal Processing (eess.SP)

Biosignals exhibit substantial cross-subject and cross-session variability, inducing severe domain shifts that degrade post-deployment performance for small, edge-oriented AI models. On-device adaptation is therefore essential to both preserve user privacy and ensure system reliability. However, existing sub-100 mW MCU-based wearable platforms can only support shallow or sparse adaptation schemes due to the prohibitive memory footprint and computational cost of full backpropagation (BP). In this paper, we propose BioTrain, a framework enabling full-network fine-tuning of state-of-the-art biosignal models under milliwatt-scale power and sub-megabyte memory constraints. We validate BioTrain using both offline and on-device benchmarks on EEG and EOG datasets, covering Day-1 new-subject calibration and longitudinal adaptation to signal drift. Experimental results show that full-network fine-tuning achieves accuracy improvements of up to 35% over non-adapted baselines and outperforms last-layer updates by approximately 7% during new-subject calibration. On the GAP9 MCU platform, BioTrain enables efficient on-device training throughput of 17 samples/s for EEG and 85 samples/s for EOG models within a power envelope below 50 mW. In addition, BioTrain's efficient memory allocator and network topology optimization enable the use of a large batch size, reducing peak memory usage. For fully on-chip BP on GAP9, BioTrain reduces the memory footprint by 8.1x, from 5.4 MB to 0.67 MB, compared to conventional full-network fine-tuning using batch normalization with batch size 8.
[834] arXiv:2604.13589 (replaced) [pdf, html, other]: Title: Dehaze-then-Splat: Generative Dehazing with Physics-Informed 3D Gaussian Splatting for Smoke-Free Novel View Synthesis

Boss Chen, Hanqing Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present Dehaze-then-Splat, a two-stage pipeline for multi-view smoke removal and novel view synthesis developed for Track~2 of the NTIRE 2026 3D Restoration and Reconstruction Challenge. In the first stage, we produce pseudo-clean training images via per-frame generative dehazing using Nano Banana Pro, followed by brightness normalization. In the second stage, we train 3D Gaussian Splatting (3DGS) with physics-informed auxiliary losses -- depth supervision via Pearson correlation with pseudo-depth, dark channel prior regularization, and dual-source gradient matching -- that compensate for cross-view inconsistencies inherent in frame-wise generative processing. We identify a fundamental tension in dehaze-then-reconstruct pipelines: per-image restoration quality does not guarantee multi-view consistency, and such inconsistency manifests as blurred renders and structural instability in downstream 3D this http URL analysis shows that MCMC-based densification with early stopping, combined with depth and haze-suppression priors, effectively mitigates these artifacts. On the Akikaze validation scene, our pipeline achieves 20.98\,dB PSNR and 0.683 SSIM for novel view synthesis, a +1.50\,dB improvement over the unregularized baseline.
[835] arXiv:2604.13860 (replaced) [pdf, html, other]: Title: "AI Psychosis" in Context: How Conversation History Shapes LLM Responses to Delusional Beliefs

Luke Nicholls, Robert Hutto, Zephrah Soto, Hamilton Morrin, Thomas Pollak, Raj Korpan, Cheryl Carmichael

Subjects: Human-Computer Interaction (cs.HC)

Extended interaction with large language models (LLMs) has been linked to the reinforcement of delusional beliefs, a phenomenon attracting growing clinical and public concern. Yet most empirical work evaluates model safety in brief interactions, which may not reflect how these harms develop through sustained dialogue. We tested five models across three levels of accumulated context, using the same escalating delusional history to isolate its effect on model behaviour. Human raters coded responses on risk and safety dimensions, and each model was analysed qualitatively. Models separated into two distinct tiers: GPT-4o, Grok 4.1 Fast, and Gemini 3 Pro exhibited high-risk, low-safety profiles; Claude Opus 4.5 and GPT-5.2 Instant displayed the opposite pattern. As context accumulated, performance tended to degrade in the unsafe group, while the same material activated stronger safety interventions among the safer models. Qualitative analysis identified distinct mechanisms of failure, including validation of the user's delusional premises, elaboration beyond them, and attempting harm reduction from within the delusional frame. Safer models, however, often used the established relationship to support intervention, taking accountability for past missteps so that redirection would not be received as betrayal. These findings indicate that accumulated context functions as a stress test of safety architecture, revealing whether a model treats prior dialogue as a worldview to inherit or as evidence to evaluate. Short-context assessments may therefore mischaracterise model safety, underestimating danger in some systems while missing context-activated gains in others. The results suggest that delusional reinforcement by LLMs reflects a preventable alignment failure. In demonstrating that these harms can be resisted, the safer models establish a baseline future systems should now be expected to meet.
[836] arXiv:2604.14626 (replaced) [pdf, html, other]: Title: ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving

Yuseon Choi, Jingu Lee, Jungjun Oh, Sunjoo Whang, Byeongcheol Kim, Minsung Kim, Hoi-Jun Yoo, Sangjin Kim

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

Mixture-of-Experts (MoE) models have become the dominant architecture for large-scale language models, yet on-premises serving remains fundamentally memory-bound as batching turns sparse per-token compute into dense memory activation. Memory-centric architectures (PIM, NMP) improve bandwidth but leave compute underutilized under MoE's low arithmetic intensity at high batch sizes. Speculative decoding (SD) trades idle compute for fewer target invocations, yet verification must load experts even for rejected tokens, severely limiting its benefit in MoE especially at low batch sizes. We propose ELMoE-3D, a hybrid-bonding (HB)-based HW-SW co-designed framework that unifies cache-based acceleration and speculative decoding to offer overall speedup across batch sizes. We identify two intrinsic elasticity axes of MoE-expert and bit-and jointly scale them to construct Elastic Self-Speculative Decoding (Elastic-SD), which serves as both an expert cache and a strongly aligned self-draft model accelerated by high HB bandwidth. Our LSB-augmented bit-sliced architecture exploits inherent redundancy in bit-slice representations to natively support bit-nested execution. On our 3D-stacked hardware, ELMoE-3D achieves an average $6.6\times$ speedup and $4.4\times$ energy efficiency gain over naive MoE serving on xPU across batch sizes 1-16, and delivers $2.2\times$ speedup and $1.4\times$ energy efficiency gain over the best-performing prior accelerator baseline.
[837] arXiv:2604.14709 (replaced) [pdf, html, other]: Title: HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks

Fan Cui, Hongyuan Hou, Zizhang Luo, Chenyun Yin, Yun Liang

Subjects: Artificial Intelligence (cs.AI)

Existing benchmarks for hardware design primarily evaluate Large Language Models (LLMs) on isolated, component-level tasks such as generating HDL modules from specifications, leaving repository-scale evaluation unaddressed. We introduce HWE-Bench, the first large-scale, repository-level benchmark for evaluating LLM agents on real-world hardware bug repair tasks. HWE-Bench comprises 417 task instances derived from real historical bug-fix pull requests across six major open-source projects spanning both Verilog/SystemVerilog and Chisel, covering RISC-V cores, SoCs, and security roots-of-trust. Each task is grounded in a fully containerized environment where the agent must resolve a real bug report, with correctness validated through the project's native simulation and regression flows. The benchmark is built through a largely automated pipeline that enables efficient expansion to new repositories. We evaluate seven LLMs with four agent frameworks and find that the best agent resolves 70.7% of tasks overall, with performance exceeding 90% on smaller cores but dropping below 65% on complex SoC-level projects. We observe larger performance gaps across models than commonly reported on software benchmarks, and difficulty is driven by project scope and bug-type distribution rather than code size alone. Our failure analysis traces agent failures to three stages of the debugging process: fault localization, hardware-semantic reasoning, and cross-artifact coordination across RTL, configuration, and verification components, providing concrete directions for developing more capable hardware-aware agents.
[838] arXiv:2604.14734 (replaced) [pdf, html, other]: Title: Find the Differences: Differential Morphing Attack Detection vs Face Recognition

Una M. Kelly, Luuk J. Spreeuwers, Raymond N.J. Veldhuis

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Morphing is a challenge to face recognition (FR) for which several morphing attack detection solutions have been proposed. We argue that face recognition and differential morphing attack detection (D-MAD) in principle perform very similar tasks, which we support by comparing an FR system with two existing D-MAD approaches. We also show that currently used decision thresholds inherently lead to FR systems being vulnerable to morphing attacks and that this explains the tradeoff between performance on normal images and vulnerability to morphing attacks. We propose using FR systems that are already in place for morphing detection and introduce a new evaluation threshold that guarantees an upper limit to the vulnerability to morphing attacks - even of unknown types.
[839] arXiv:2604.14844 (replaced) [pdf, html, other]: Title: Matched and Euclidean-Mismatched Decoding on Fourier-Curve Constellations with Tangent Noise

Bin Han, Hao Chen, Muxia Sun, H. V. Poor, Hans D. Schotten

Comments: Submitted to IEEE Communications Letters

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We study matched and Euclidean-mismatched decoding on finite Fourier-curve constellations with tangent-space artificial noise. Each hypothesis induces a Gaussian law with symbol-dependent rank-one covariance. We derive exact Euclidean pairwise errors for arbitrary pairs and an exact Gaussian-expectation representation for matched decoding on bilaterally tangent-orthogonal pairs. For uniform even constellations, the Euclidean side yields explicit distance spectra and symbol-error bounds across all offset classes; the matched side is exact on antipodal pairs and benchmarked numerically at the full-codebook level via Monte Carlo. By isolating the detection-theoretic consequence of tangent-space artificial noise, these results clarify analytically how noise fraction and constellation density enter the mismatch behavior; secrecy-rate implications require additional channel and adversary modeling.
[840] arXiv:2604.15390 (replaced) [pdf, html, other]: Title: Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks

Seyedreza Mohseni, Sarvesh Baskar, Edward Raff, Manas Gaur

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Code deobfuscation is the task of recovering a readable version of a program while preserving its original behavior. In practice, this often requires days or even months of manual work with complex and expensive analysis tools. In this paper, we explore an alternative approach based on Chain-of-Thought (CoT) prompting, where a large language model is guided through explicit, step-by-step reasoning tailored for code analysis. We focus on control flow obfuscation, including Control Flow Flattening (CFF), Opaque Predicates, and their combination, and we measure both structural recovery of the control flow graph and preservation of program semantics. We evaluate five state-of-the-art large language models and show that CoT prompting significantly improves deobfuscation quality compared with simple prompting. We validate our approach on a diverse set of standard C benchmarks and report results using both structural metrics for control flow graphs and semantic metrics based on output similarity. Among the tested models and by applying CoT, GPT5 achieves the strongest overall performance, with an average gain of about 16% in control-flow graph reconstruction and about 20.5% in semantic preservation across our benchmarks compared to zero-shot prompting. Our results also show that model performance depends not only on the obfuscation level and the chosen obfuscator but also on the intrinsic complexity of the original control flow graph. Collectively, these findings suggest that CoT-guided large language models can serve as effective assistants for code deobfuscation, providing improved code explainability, more faithful control flow graph reconstruction, and better preservation of program behavior while potentially reducing the manual effort needed for reverse engineering.
[841] arXiv:2604.15468 (replaced) [pdf, html, other]: Title: The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE

Robert Feldt, Per Lenberg, Julian Frattini, Dhasarathy Parthasarathy

Comments: This paper is the write-up of Robert Feldt's keynote "Agentic Software Engineering Will Eat the World: AI-Based Systems as the New Operating System of Society'' given at the Agentic Engineering 2026 workshop, Rio de Janeiro, Brazil, April 14, 2026. April 23 upload fixed the reference list to be more complete, and added a few additional citations; text essentially unchanged

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

AI-based systems, currently driven largely by LLMs and tool-using agentic harnesses, are increasingly discussed as a possible threat to software engineering. Foundation models get stronger, agents can plan and act across multiple steps, and tasks such as scaffolding, routine test generation, straightforward bug fixing, and small integration work look more exposed than they did only a few years ago. The result is visible unease not only among students and junior developers, but also among experienced practitioners who worry that hard-won expertise may lose value. This paper argues for a different reading. The important shift is not that software engineering loses relevance. It is that the thing being engineered expands beyond executable code to semi-executable artifacts; combinations of natural language, tools, workflows, control mechanisms, and organizational routines whose enactment depends on human or probabilistic interpretation rather than deterministic execution. The Semi-Executable Stack is introduced as a six-ring diagnostic reference model for reasoning about that expansion, spanning executable artifacts, instructional artifacts, orchestrated execution, controls, operating logic, and societal and institutional fit. The model helps locate where a contribution, bottleneck, or organizational transition primarily sits, and which adjacent rings it depends on. The paper develops the argument through three worked cases, reframes familiar objections as engineering targets rather than reasons to dismiss the transition, and closes with a preserve-versus-purify heuristic for deciding which legacy software engineering processes, controls, and coordination routines should be kept and which should be simplified or redesigned. This paper is a conceptual keynote companion: diagnostic and agenda-setting rather than empirical.
[842] arXiv:2604.15596 (replaced) [pdf, html, other]: Title: Privacy, Prediction, and Allocation

Ben Jacobsen, Nitin Kohli

Comments: 2026 FORC (Foundations of Responsible Computing)

Subjects: Cryptography and Security (cs.CR)

Algorithmic predictions are increasingly used to inform the allocation of scarce resources. The promise of these methods is that, through machine learning, they can better identify the people who would benefit most from interventions. Recently, however, several works have called this assumption into question by demonstrating the existence of settings where simple, unit-level allocation strategies can meet or even exceed the performance of those based on individual-level targeting. Separately, other works have objected to individual-level targeting on privacy grounds, leading to an unusual situation where a single solution, unit-level targeting, is recommended for reasons of both privacy and utility. Motivated by the desire to fully understand the interplay of privacy and targeting levels, we initiate the study of aid allocation systems that satisfy differential privacy, synthesizing existing works on private optimization with the economic models of aid allocation used in the non-private literature. To this end, we investigate private variants of both individual and unit-level allocation strategies in both stochastic and distribution-free settings under a range of constraints on data availability. Through this analysis, we provide clean, interpretable bounds characterizing the tradeoffs between privacy, efficiency, and targeting precision in allocation.
[843] arXiv:2604.15770 (replaced) [pdf, html, other]: Title: PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding

Junjie Wen, Junlin He, Fei Ma, Jinqiang Cui

Comments: Accepted by ICCA 2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Accurate open-vocabulary 3D scene understanding requires semantic representations that are both language-aligned and spatially precise at the pixel level, while remaining scalable when lifted to 3D space. However, existing representations struggle to jointly satisfy these requirements, and densely propagating pixel-wise semantics to 3D often results in substantial redundancy, leading to inefficient storage and querying in large-scale scenes. To address these challenges, we present \emph{PLAF}, a Pixel-wise Language-Aligned Feature extraction framework that enables dense and accurate semantic alignment in 2D without sacrificing open-vocabulary expressiveness. Building upon this representation, we further design an efficient semantic storage and querying scheme that significantly reduces redundancy across both 2D and 3D domains. Experimental results show that \emph{PLAF} provides a strong semantic foundation for accurate and efficient open-vocabulary 3D scene understanding. The codes are publicly available at this https URL.
[844] arXiv:2604.15827 (replaced) [pdf, html, other]: Title: UsefulBench: Towards Decision-Useful Information as a Target for Information Retrieval

Tobias Schimanski, Stefanie Lewandowski, Christian Woerle, Nicola Reichenau, Yauheni Huryn, Markus Leippold

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Conventional information retrieval is concerned with identifying the relevance of texts for a given query. Yet, the conventional definition of relevance is dominated by aspects of similarity in texts, leaving unobserved whether the text is truly useful for addressing the query. For instance, when answering whether Paris is larger than Berlin, texts about Paris being in France are relevant (lexical/semantic similarity), but not useful. In this paper, we introduce UsefulBench, a domain-specific dataset curated by three professional analysts labeling whether a text is connected to a query (relevance) or holds practical value in responding to it (usefulness). We show that classic similarity-based information retrieval aligns more strongly with relevance. While LLM-based systems can counteract this bias, we find that domain-specific problems require a high degree of expertise, which current LLMs do not fully incorporate. We explore approaches to (partially) overcome this challenge. However, UsefulBench presents a dataset challenge for targeted information retrieval systems.
[845] arXiv:2604.15994 (replaced) [pdf, html, other]: Title: ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

Qiang Xu, Shengyuan Bai, Yu Wang, He Cao, Leqing Chen, Yuanyuan Liu, Bin Feng, Zijing Liu, Yu Li

Subjects: Artificial Intelligence (cs.AI)

Multimodal Large Language Models (MLLMs) excel at recognizing individual visual elements and reasoning over simple linear diagrams. However, when faced with complex topological structures involving branching paths, converging flows, and cyclic dependencies, their reasoning capabilities degrade sharply, even on tasks as basic as counting endpoints. Existing benchmarks fail to probe this gap, focusing on semantic comprehension rather than structural reasoning. We introduce ReactBench, a benchmark that reveals fundamental limitations in structural reasoning through chemical reaction diagrams. These real-world scientific diagrams offer an ideal testbed because they naturally span diverse structures from linear chains to cyclic graphs, while requiring both precise local recognition and coherent global reasoning. Our benchmark comprises 1,618 expert-annotated QA pairs across four hierarchical task dimensions. Extensive evaluation across 17 MLLMs reveals a significant performance gap exceeding 30% between anchor-based tasks and holistic structural reasoning tasks. Controlled ablations confirm this bottleneck lies in reasoning, not perception. These findings expose a fundamental deficit in structural understanding and establish directions for advancing visual reasoning.
[846] arXiv:2604.16113 (replaced) [pdf, html, other]: Title: Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition

José Juan Hernández Morales, Georgios Mentzos, Frank Hannig, Konstantinos Balaskas, Georgios Zervakis, Jörg Henkel, Jürgen Teich

Subjects: Hardware Architecture (cs.AR)

The paradigm shift towards local and on-device inference under stringent resource constraints is represented by the tiny machine learning (TinyML) domain. The primary goal of TinyML is to integrate intelligence into tiny, low-cost devices under strict resource, energy, and latency constraints. However, the ultra-resource-constrained nature of these devices can lead to increased inference execution time, which can be detrimental in latency critical applications. At the same time, TinyML applications are often associated with sensitive data. As such, latency optimization approaches that rely on training samples are infeasible when such data is unavailable, proprietary, or sensitive, highlighting a pressing need for optimization approaches that do not require access to the training dataset and can be applied directly to pre-trained models. Replacing costly multiplications with more hardware-efficient operations, such as shifts and additions, has been proposed as an effective method for reducing inference latency. However, post-training power-of-two (Po2) approaches are scarce and, in many cases, lead to unacceptable accuracy loss.
In this work, we propose a framework that applies approximate matrix decomposition to a given CNN in order to optimize hardware implementations subject to strict constraints and without any need of re-training or fine-tuning steps. The genetic algorithm-driven framework explores different matrix decompositions and resulting multiplier-less CNN accelerator designs for FPGA targets. A comprehensive evaluation of different TinyML benchmarks demonstrates our framework's efficacy in generating latency-optimized implementations that satisfy strict accuracy and resource constraints, achieving an average 33% latency improvement with an average accuracy loss of 1.3% compared to typical systolic array-based FPGA accelerators.
[847] arXiv:2604.16395 (replaced) [pdf, html, other]: Title: Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT)

Rajveer Bachkaniwala, Chengqi Luo, Richard So, Divya Mahajan, Kexin Rong

Comments: Minor revision: expanded evaluation, unified baseline naming, added code link and acknowledgments

Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Context retrieval systems for LLM inference face a critical challenge: high retrieval latency creates a fundamental tension between waiting for complete context (poor time-to-first-token) and proceeding without it (reduced quality). Streaming context incrementally--overlapping retrieval with inference--can mitigate this latency, but doing so with concurrent requests introduces new challenges: requests contend for GPU compute and memory, and scheduling must adapt to dynamic context arrivals.
We present Stream2LLM, a streaming-aware LLM serving system for concurrent prefill-decode disaggregated deployments. Stream2LLM introduces adaptive scheduling and preemption for two distinct retrieval patterns: append-mode (progressive context accumulation) and update-mode (iterative refinement with cache invalidation). It decouples scheduling decisions from resource acquisition, enabling flexible preemption strategies guided by hardware-specific cost models, and uses longest common prefix matching to minimize redundant computation when input changes dynamically. To evaluate Stream2LLM, we collect two large-scale, real-world streaming workloads based on web crawling and approximate nearest neighbor search. Our evaluation demonstrates that streaming architecture delivers up to 11x TTFT improvements, with cost-aware scheduling providing critical benefits under memory pressure, all while maintaining throughput parity with non-streaming baselines.
Code: this https URL
[848] arXiv:2604.16432 (replaced) [pdf, other]: Title: Quantifying how AI Panels improve precision

Nicholas CL Beale

Comments: 11 pages, 8 Figures, 13pp of Supplementary Information

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Econometrics (econ.EM)

AI in applications like screening job applicants had become widespread, and may contribute to unemployment especially among the young. Biases in the AIs may become baked into the job selection process, but even in their absence, reliance on a single AI is problematic. In this paper we derive a simple formula to estimate, or at least place an upper bound on, the precision of such approaches for data resembling realistic CVs:
$P(q) \approx \frac{\rho n^b + q(1-\rho)}{1 + (n^b - 1)\rho}$ where $b \approx q^* + 0.8 (1 - \rho)$ and $q^*$ is $q$ clipped to $[0.07, 0.22]$ where $P(q)$ is the precision of the top $q$ quantile selected by a panel of $n$ AIs and $\rho$ is their average pairwise correlation. This equation provides a basis for considering how many AIs should be used in a Panel, depending on the importance of the decision. A quantitative discussion of the merits of using a diverse panel of AIs to support decision-making in such areas will move away from dangerous reliance on single AI systems and encourage a balanced assessment of the extent to which diversity needs to be built into the AI parts of the socioeconomic systems that are so important for our future.
[849] arXiv:2604.16491 (replaced) [pdf, html, other]: Title: A Lightweight Transformer for Pain Recognition from Brain Activity

Stefanos Gkikas, Christian Arzate Cruz, Yu Fang, Lu Cao, Muhammad Umar Khan, Thomas Kassiotis, Giorgos Giannakakis, Raul Fernandez Rojas, Randy Gomez

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Pain is a multifaceted and widespread phenomenon with substantial clinical and societal burden, making reliable automated assessment a critical objective. This paper presents a lightweight transformer architecture that fuses multiple fNIRS representations through a unified tokenization mechanism, enabling joint modeling of complementary signal views without requiring modality-specific adaptations or increasing architectural complexity. The proposed token-mixing strategy preserves spatial, temporal, and time-frequency characteristics by projecting heterogeneous inputs onto a shared latent representation, using a structured segmentation scheme to control the granularity of local aggregation and global interaction. The model is evaluated on the AI4Pain dataset using stacked raw waveform and power spectral density representations of fNIRS inputs. Experimental results demonstrate competitive pain recognition performance while remaining computationally compact, making the approach suitable for real-time inference on both GPU and CPU hardware.
[850] arXiv:2604.16565 (replaced) [pdf, html, other]: Title: Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models

Jiaoyang Ruan, Xin Gao, Yinda Chen, Hengyu Zeng, Liang Du, Guanghao Li, Jie Fu, Jian Pu

Comments: 30 pages, 5 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

While Diffusion Large Language Models (dLLMs) offer structural advantages for global planning, efficiently verifying that they arrive at correct answers via valid reasoning traces remains a critical challenge. In this work, we propose a geometric perspective: Reasoning on the Manifold. We hypothesize that valid generation trajectories reside as stable attractors on the high-density manifold of the learned distribution, whereas invalid paths exhibit off-manifold drift. To operationalize this, we introduce Bidirectional Manifold Consistency (BMC), a training-free, unsupervised metric that quantifies the stability of the generated sequence through a forward-masking and backward-reconstruction cycle. Empirically, we demonstrate BMC's versatility across the full reasoning lifecycle: (1) in Diagnosis, it serves as a robust discriminator of solution validity without ground truth answer; (2) in Inference, it enables rejection resampling to effectively concentrate computational resources on complex reasoning tasks; and (3) in Alignment, it functions as a dense geometric reward that transforms sparse outcome supervision into fine-grained guidance, empowering models to self-evolve beyond standard baselines. Our results establish intrinsic geometric stability as a robust indicator of correctness for dLLMs.
[851] arXiv:2604.17623 (replaced) [pdf, html, other]: Title: ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes

Honglin Chen, Karran Pandey, Rundi Wu, Matheus Gadelha, Yannick Hold-Geoffroy, Ayush Tewari, Niloy J. Mitra, Changxi Zheng, Paul Guerrero

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Kinematic rigs provide a structured interface for articulating 3D meshes, but they lack an inherent representation of the plausible manifold of joint configurations for a given asset. Without such a pose space, stochastic sampling or manual manipulation of raw rig parameters often leads to semantic or geometric violations, such as anatomical hyperextension and non-physical self-intersections. We propose Video-informed Pose Spaces (ViPS), a feed-forward framework that discovers the latent distribution of valid articulations for auto-rigged meshes by distilling motion priors from a pretrained video diffusion model. Unlike existing methods that rely on scarce artist-authored 4D datasets, ViPS transfers generative video priors into a universal distribution over a given rig parameterization. Differentiable geometric validators applied to the skinned mesh enforce asset-specific validity without requiring manual regularizers. Our model learns a smooth, compact, and controllable pose space that supports diverse sampling, manifold projection for inverse kinematics, and temporally coherent trajectories for keyframing. Furthermore, the distilled 3D pose samples serve as precise semantic proxies for guiding video diffusion, effectively closing the loop between generative 2D priors and structured 3D kinematic control. Our evaluations show that ViPS, trained solely on video priors, matches the performance of state-of-the-art methods trained on synthetic artist-created 4D data in both plausibility and diversity. Most importantly, as a universal model, ViPS demonstrates robust zero-shot generalization to out-of-distribution species and unseen skeletal topologies.
[852] arXiv:2604.17628 (replaced) [pdf, html, other]: Title: Does Welsh media need a review? Detecting bias in Nation.Cymru's political reporting

Cai Parry-Jones

Subjects: Computation and Language (cs.CL)

Wales' political landscape has been marked by growing accusations of bias in Welsh media. This paper takes the first computational step toward testing those claims by examining this http URL, a prominent Welsh political news outlet. I use a two-stage natural language processing (NLP) pipeline: (1) a robustly optimized BERT approach (RoBERTa) bias detector for efficient bias discovery and (2) a large language model (LLM) for target-attributed sentiment classification of bias labels from (1). A primary analysis of 15,583 party mentions across 2022-2026 news articles finds that Reform UK attracts biased framing at twice the rate of Plaid Cymru and over three times as negative in mean sentiment (p<0.001). A secondary analysis across four parties across both news and opinion articles shows that Plaid Cymru is the outlier, receiving markedly more favourable framing than any other party. These findings provide evidence of measurable differential framing in a single Welsh political media outlet, supporting calls for a broader review of Welsh media coverage. Furthermore, the two-stage pipeline offers a low-cost, replicable framework for extending this analysis to other Welsh outlets, as well as media ecosystems outside of Wales.
[853] arXiv:2604.17656 (replaced) [pdf, html, other]: Title: Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation

Vaibhavi Lokegaonkar, Aryan Vijay Bhosale, Vishnu Raj, Gouthaman KV, Ramani Duraiswami, Lie Lu, Sreyan Ghosh, Dinesh Manocha

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Video-to-music (V2M) is the fundamental task of creating background music for an input video. Recent V2M models achieve audiovisual alignment by typically relying on visual conditioning alone and provide limited semantic and stylistic controllability to the end user. In this paper, we present Video-Robin, a novel text-conditioned video-to-music generation model that enables fast, high-quality, semantically aligned music generation for video content. To balance musical fidelity and semantic understanding, Video-Robin integrates autoregressive planning with diffusion-based synthesis. Specifically, an autoregressive module models global structure by semantically aligning visual and textual inputs to produce high-level music latents. These latents are subsequently refined into coherent, high-fidelity music using local Diffusion Transformers. By factoring semantically driven planning into diffusion-based synthesis, Video-Robin enables fine-grained creator control without sacrificing audio realism. Our proposed model outperforms baselines that solely accept video input and additional feature conditioned baselines on both in-distribution and out-of-distribution benchmarks with a 2.21x speed in inference compared to SOTA. We will open-source everything upon paper acceptance.
[854] arXiv:2604.17969 (replaced) [pdf, html, other]: Title: E3VS-Bench: A Benchmark for Viewpoint-Dependent Active Perception in 3D Gaussian Splatting Scenes

Koya Sakamoto, Taiki Miyanishi, Daichi Azuma, Shuhei Kurita, Shu Morikuni, Naoya Chiba, Motoaki Kawanabe, Yusuke Iwasawa, Yutaka Matsuo

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual search in 3D environments requires embodied agents to actively explore their surroundings and acquire task-relevant evidence. However, existing visual search and embodied AI benchmarks, including EQA, typically rely on static observations or constrained egocentric motion, and thus do not explicitly evaluate fine-grained viewpoint-dependent phenomena that arise under unrestricted 5-DoF viewpoint control in real-world 3D environments, such as visibility changes caused by vertical viewpoint shifts, revealing contents inside containers, and disambiguating object attributes that are only observable from specific angles. To address this limitation, we introduce {E3VS-Bench}, a benchmark for embodied 3D visual search where agents must control their viewpoints in 5-DoF to gather viewpoint-dependent evidence for question answering. E3VS-Bench consists of 99 high-fidelity 3D scenes reconstructed using 3D Gaussian Splatting and 2,014 question-driven episodes. 3D Gaussian Splatting enables photorealistic free-viewpoint rendering that preserves fine-grained visual details (e.g., small text and subtle attributes) often degraded in mesh-based simulators, thereby allowing the construction of questions that cannot be answered from a single view and instead require active inspection across viewpoints in 5-DoF. We evaluate multiple state-of-the-art VLMs and compare their performance with humans. Despite strong 2D reasoning ability, all models exhibit a substantial gap from humans, highlighting limitations in active perception and coherent viewpoint planning specifically under full 5-DoF viewpoint changes.
[855] arXiv:2604.18164 (replaced) [pdf, other]: Title: MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

Sua Lee, Sanghee Park, Jinbae Im

Comments: ACL 2026 Main

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Multimodal Large Language Models (MLLMs) have been increasingly used as automatic evaluators-a paradigm known as MLLM-as-a-Judge. However, their reliability and vulnerabilities to biases remain underexplored. We find that many MLLM judges fail to reliably integrate key visual or textual cues, yielding unreliable evaluations when evidence is missing or mismatched, and exhibiting instability under semantically irrelevant perturbations. To address this, we systematically define Compositional Bias in MLLM-as-a-Judge systems and introduce MM-JudgeBias, a benchmark for evaluating it. MM-JudgeBias introduces controlled perturbations across Query, Image, and Response, and evaluates model behavior via two complementary metrics: Bias-Deviation (BD) for sensitivity and Bias-Conformity (BC) for stability. Our dataset of over 1,800 curated and refined multimodal samples, drawn from 29 source benchmarks, enables a fine-grained diagnosis of nine bias types across diverse tasks and domains. Experiments on 26 state-of-the-art MLLMs reveal systematic modality neglect and asymmetric evaluation tendencies, underscoring the need for more reliable judges.
[856] arXiv:2604.18438 (replaced) [pdf, html, other]: Title: Scalable Physics-Informed Neural Differential Equations and Data-Driven Algorithms for HVAC Systems

Hanfeng Zhai, Hongtao Qiao, Hassan Mansour, Christopher Laughman

Comments: 50 pages, 26 figures

Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)

We present a scalable, data-driven simulation framework for large-scale heating, ventilation, and air conditioning (HVAC) systems that couples physics-informed neural ordinary differential equations (PINODEs) with differential-algebraic equation (DAE) solvers. At the component level, we learn heat-exchanger dynamics using an implicit PINODE formulation that predicts conserved quantities (refrigerant mass $M_r$ and internal energy $E_\text{hx}$) as outputs, enabling physics-informed training via automatic differentiation of mass/energy balances. Stable long-horizon prediction is achieved through gradient-stabilized latent evolution with gated architectures and layer normalization. At the system level, we integrate learned components with DAE solvers (IDA and DASSL) that explicitly enforce junction constraints (pressure equilibrium and mass-flow consistency), and we use Bayesian optimization to tune solver parameters for accuracy--efficiency trade-offs. To reduce residual system-level bias, we introduce a lightweight corrector network trained on short trajectory segments. Across dual-compressor and scaled network studies, the proposed approach attains multi-fold speedups over high-fidelity simulation while keeping errors low (MAPE below a few percent) and scales to systems with up to 16 compressor-condenser pairs.
[857] arXiv:2604.18635 (replaced) [pdf, html, other]: Title: Quantifying Spacetime Integration across a Partition with Synergy

Virgil Griffith

Comments: 18 pages; 3 figures; 3 tables

Subjects: Information Theory (cs.IT)

In service to the mathematical underpinnings of the Information Integration Theory of Consciousness (IIT), we introduce four measures of integration based on the partial information decomposition framework. We compare our measures to current IIT practice in simple deterministic networks. We find synergy-based measures more suitable for IIT's use-case than current practice. Outside IIT, these measures could also be useful as measures of complexity for discrete dynamical systems.
[858] arXiv:2604.18724 (replaced) [pdf, html, other]: Title: Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations

Emily Reif, Claire Yang, Jared Hwang, Deniz Nazar, Noah A. Smith, Jeff Heer

Subjects: Artificial Intelligence (cs.AI)

Users typically interact with and evaluate language models via single outputs, but each output is just one sample from a broad distribution of possible completions. This interaction hides distributional structure such as modes, uncommon edge cases, and sensitivity to small prompt changes, leading users to over-generalize from anecdotes when iterating on prompts for open-ended tasks. Informed by a formative study with researchers who use LMs (n=13) examining when stochasticity matters in practice, how they reason about distributions over language, and where current workflows break down, we introduce GROVE. GROVE is an interactive visualization that represents multiple LM generations as overlapping paths through a text graph, revealing shared structure, branching points, and clusters while preserving access to raw outputs. We evaluate across three crowdsourced user studies (N=47, 44, and 40 participants) targeting complementary distributional tasks. Our results support a hybrid workflow: graph summaries improve structural judgments such as assessing diversity, while direct output inspection remains stronger for detail-oriented questions.
[859] arXiv:2604.18792 (replaced) [pdf, html, other]: Title: Tractable Verification of Model Transformations: A Cutoff-Theorem Approach for DSLTrans

Levi Lucio

Comments: 41 pages, 4 figures

Subjects: Software Engineering (cs.SE); Symbolic Computation (cs.SC)

Model transformations are central to MDE, but formal verification is difficult because mainstream transformation languages are undecidable. DSLTrans was designed to be Turing-incomplete to improve verifiability, yet earlier verification based on path-condition enumeration still suffered exponential blow-up and did not scale to realistic cases.
We present a tractable verification workflow for DSLTrans and formalize when it is complete. The method combines three contributions: (i) a Cutoff Theorem proving that bounded model checking is complete for a precise DSLTrans fragment and positive existence/traceability properties, turning an infinite search into a finite computable bound; (ii) composable, soundness-preserving optimizations (per-class bounds, CEGAR-based fragment verification, and trace-aware dependency analysis) that reduce SMT encoding size; and (iii) a Z3-based implementation evaluated on realistic transformations from the ATL Zoo and related benchmarks.
On 29 concrete transformations and 899 properties spanning compiler lowering, schema translation, behavioral modeling, graph mapping, and stress tests, 552 properties are proved, 345 produce concrete counterexamples (including intentional negative and boundary cases), and only 2 remain undecided within timeout. For properties beyond the tractability budget, we introduce tractability-driven refinement (precondition specialization, postcondition decomposition, and transformation instrumentation), achieving up to 112x speedup while eliminating spurious counterexamples. The workflow is supported by a web IDE and a concrete execution engine for runtime validation.
[860] arXiv:2604.19055 (replaced) [pdf, html, other]: Title: ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis

Aoduo Li, Haoran Lv, Hongjian Xu, Shengmin Li, Sihao Qin, Zimeng Li, Chi Man Pun, Xuhang Chen

Comments: 10 pages, 6 figures. Accepted to ACM ICMR 2026

Subjects: Sound (cs.SD)

High-fidelity character voice synthesis is a cornerstone of immersive multimedia applications, particularly for interacting with anime avatars and digital humans. However, existing systems struggle to maintain consistent persona traits across diverse emotional contexts. To bridge this gap, we present ATRIE, a unified framework utilizing a Persona-Prosody Dual-Track (P2-DT) architecture. Our system disentangles generation into a static Timbre Track (via Scalar Quantization) and a dynamic Prosody Track (via Hierarchical Flow-Matching), distilled from a 14B LLM teacher. This design enables robust identity preservation (Zero-Shot Speaker Verification EER: 0.04) and rich emotional expression. Evaluated on our extended AnimeTTS-Bench (50 characters), ATRIE achieves state-of-the-art performance in both generation and cross-modal retrieval (mAP: 0.75), establishing a new paradigm for persona-driven multimedia content creation.
[861] arXiv:2604.19192 (replaced) [pdf, html, other]: Title: Empowering NPC Dialogue with Environmental Context Using LLMs and Panoramic Images

Grega Radež, Ciril Bohak

Subjects: Graphics (cs.GR)

We present an approach for enhancing non-playable characters (NPCs) in games by combining large language models (LLMs) with computer vision to provide contextual awareness of their surroundings. Conventional NPCs typically rely on pre-scripted dialogue and lack spatial understanding, which limits their responsiveness to player actions and reduces overall immersion. Our method addresses these limitations by capturing panoramic images of an NPC's environment and applying semantic segmentation to identify objects and their spatial positions. The extracted information is used to generate a structured JSON representation of the environment, combining object locations derived from segmentation with additional scene graph data within the NPC's bounding sphere, encoded as directional vectors. This representation is provided as input to the LLM, enabling NPCs to incorporate spatial knowledge into player interactions. As a result, NPCs can dynamically reference nearby objects, landmarks, and environmental features, leading to more believable and engaging gameplay. We describe the technical implementation of the system and evaluate it in two stages. First, an expert interview was conducted to gather feedback and identify areas for improvement. After integrating these refinements, a user study was performed, showing that participants preferred the context-aware NPCs over a non-context-aware baseline, confirming the effectiveness of the proposed approach.
[862] arXiv:2604.19533 (replaced) [pdf, other]: Title: Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Alankrit Chona, Igor Kozlov, Ambuj Kumar

Comments: Updated leaderboard with newer models

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints, identify the exact timestamps of malicious events.
The benchmark wraps 106 real attack procedures from the OTRF Security-Datasets corpus - spanning 86 MITRE ATT&CK sub-techniques across 12 tactics - into a Gymnasium reinforcement-learning environment. Each episode presents the agent with an in-memory SQLite database of 75,000-135,000 log records produced by a deterministic campaign simulator that time-shifts and entity-obfuscates the raw recordings.
The agent must iteratively submit SQL queries to discover malicious event timestamps and explicitly flag them, scored CTF-style against Sigma-rule-derived ground truth.
Evaluating five frontier models - Claude Opus 4.6, GPT-5, Gemini 3.1 Pro, Kimi K2.5, and Gemini 3 Flash - on 26 campaigns covering 105 of 106 procedures, we find that all models fail dramatically: the best model (Claude Opus 4.6) submits correct flags for only 3.8% of malicious events on average, and no run across any model ever finds all flags.
We define a passing score as >= 50% recall on every ATT&CK tactic - the minimum bar for unsupervised SOC deployment. No model passes: the leader clears this bar on 5 of 13 tactics and the remaining four on zero.
These results suggest that current LLMs are poorly suited for open-ended, evidence-driven threat hunting despite strong performance on curated Q&A security benchmarks.
[863] arXiv:2604.19537 (replaced) [pdf, html, other]: Title: InvestChat: Exploring Multimodal Interaction via Natural Language, Touch, and Pen in an Investment Dashboard

Sarah Lykke Tost, Adson Lucas de Paiva Sales, Henrik Østergaard, Vaishali Dhanoa, Gabriela Molina León

Comments: Poster accepted at AVI 2026; DOI included

Subjects: Human-Computer Interaction (cs.HC)

We designed and implemented InvestChat, a multimodal tablet-based application that supports stock market exploration with multiple coordinated views and an LLM-powered chat. We evaluated the application with 12 novice investors. Our findings suggest that combining natural language, touch, and pen input during stock market exploration facilitates user engagement. Participants leveraged the modalities in complementary ways, enjoying the freedom of choice and finding natural language most effective.
[864] arXiv:2604.19596 (replaced) [pdf, html, other]: Title: PC2Model: ISPRS benchmark on 3D point cloud to model registration

Mehdi Maboudi, Said Harb, Jackson Ferrao, Kourosh Khoshelham, Yelda Turkan, Karam Mawas

Comments: ISPRS Congress 2026, Toronto

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Point cloud registration involves aligning one point cloud with another or with a three-dimensional (3D) model, enabling the integration of multimodal data into a unified representation. This is essential in applications such as construction monitoring, autonomous driving, robotics, and virtual or augmented reality (VR/AR). With the increasing accessibility of point cloud acquisition technologies, such as Light Detection and Ranging (LiDAR) and structured light scanning, along with recent advances in deep learning, the research focus has increasingly shifted towards downstream tasks, particularly point cloud-to-model (PC2Model) registration. While data-driven methods aim to automate this process, they struggle with sparsity, noise, clutter, and occlusions in real-world scans, which limit their performance. To address these challenges, this paper introduces the PC2Model benchmark, a publicly available dataset designed to support the training and evaluation of both classical and data-driven methods. Developed under the leadership of ICWG II/Ib, the PC2Model benchmark adopts a hybrid design that combines simulated point clouds with, in some cases, real-world scans and their corresponding 3D models. Simulated data provide precise ground truth and controlled conditions, while real-world data introduce sensor and environmental artefacts. This design supports robust training and evaluation across domains and enables the systematic analysis of model transferability from simulated to real-world scenarios. The dataset is publicly accessible at: \href{this https URL}{this https URL}
[865] arXiv:2604.19598 (replaced) [pdf, other]: Title: Cross-Model Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Across Three Large Language Models

Kihyuk Lee

Comments: 24 Pages, 2 Figures, 6 Tables and 2 Supplementary Materials. v2: Removed personal contact information

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This study compared repeated generation consistency of exercise prescription outputs across three large language models (LLMs), specifically GPT-4.1, Claude Sonnet 4.6, and Gemini 2.5 Flash, under temperature=0 conditions. Each model generated prescriptions for six clinical scenarios 20 times, yielding 360 total outputs analyzed across four dimensions: semantic similarity, output reproducibility, FITT classification, and safety expression. Mean semantic similarity was highest for GPT-4.1 (0.955), followed by Gemini 2.5 Flash (0.950) and Claude Sonnet 4.6 (0.903), with significant inter-model differences confirmed (H = 458.41, p < .001). Critically, these scores reflected fundamentally different generative behaviors: GPT-4.1 produced entirely unique outputs (100%) with stable semantic content, while Gemini 2.5 Flash showed pronounced output repetition (27.5% unique outputs), indicating that its high similarity score derived from text duplication rather than consistent reasoning. Identical decoding settings thus yielded fundamentally different consistency profiles, a distinction that single-output evaluations cannot capture. Safety expression reached ceiling levels across all models, confirming its limited utility as a differentiating metric. These results indicate that model selection constitutes a clinical rather than merely technical decision, and that output behavior under repeated generation conditions should be treated as a core criterion for reliable deployment of LLM-based exercise prescription systems.
[866] arXiv:2604.19791 (replaced) [pdf, html, other]: Title: Stabilising Generative Models of Attitude Change

Jayd Matyas, William A. Cunningham, Alexander Sasha Vezhnevets, Dean Mobbs, Edgar A. Duéñez-Guzmán, Joel Z. Leibo

Comments: 45 pages, 8 figures, 2 tables

Subjects: Artificial Intelligence (cs.AI)

Attitude change - the process by which individuals revise their evaluative stances - has been explained by a set of influential but competing verbal theories. These accounts often function as mechanism sketches: rich in conceptual detail, yet lacking the technical specifications and operational constraints required to run as executable systems. We present a generative actor-based modelling workflow for "rendering" these sketches as runnable actor - environment simulations using the Concordia simulation library. In Concordia, actors operate by predictive pattern completion: an operation on natural language strings that generates a suffix which describes the actor's intended action from a prefix containing memories of their past and observations of the present. We render the theories of cognitive dissonance (Festinger 1957), self-consistency (Aronson 1969), and self-perception (Bem 1972) as distinct decision logics that populate and process the prefix through theory-specific sequences of reasoning steps. We evaluate these implementations across classic psychological experiments. Our implementations generate behavioural patterns consistent with known results from the original empirical literature. However, we find that achieving stable reproduction requires resolving the inherent underdetermination of the verbal accounts and the conflicts between modern linguistic priors and historical experimental assumptions. We document how this manual process of iterative model "stabilisation" surfaces specific operational and socio-ecological dependencies that were largely undocumented in the original verbal accounts. Ultimately, we argue that the manual stabilisation process itself should be regarded as a core part of the methodology functioning to clarify situational and representational commitments needed to generate characteristic effects.
[867] arXiv:2604.19811 (replaced) [pdf, html, other]: Title: Model Capability Assessment and Safeguards for Biological Weaponization

Michael Richter

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

AI leaders and safety reports increasingly warn that advances in model reasoning may enable biological misuse, including by low-expertise users, while major labs describe safeguards as expanding but still evolving rather than settled. This study benchmarks ChatGPT 5.2 Auto, Gemini 3 Pro Thinking, Claude Opus 4.5 and Meta's Muse Spark Thinking on 73 novice-framed, open-ended benign STEM prompts to measure operational intelligence. On benign quantitative tasks, both Gemini and Meta scored very high; ChatGPT was partially useful but text-thinned, and Claude was sparsest with some apparent false-positive refusals. A second test set detected subtle harmful intent: edge case prompts revealed Gemini's seeming lack of contextual awareness. These results warranted a focused weaponization analysis on Gemini as capability appeared to be outpacing moderation calibration. Gemini was tested across four access environments and reported cases include poison-ivy-to-crowded-transit escalation, poison production and extraction via international-anonymous logged-out AI Mode, and other concerning examples. Biological misuse may become more prevalent as a geopolitical tool, increasing the urgency of U.S. policy responses, especially if model outputs come to be treated as regulated technical data. Guidance is provided for 25 high-risk agents to help distinguish legitimate use cases from higher-risk ones.
[868] arXiv:2604.19845 (replaced) [pdf, html, other]: Title: Deconstructing Superintelligence: Identity, Self-Modification and Différance

Elija Perrier

Comments: Under review

Subjects: Artificial Intelligence (cs.AI)

Self-modification is often taken as constitutive of artificial superintelligence (SI), yet modification is a relative action requiring a supplement outside the operation. When self-modification extends to this supplement, the classical self-referential structure collapses. We formalise this on an associative operator algebra $\mathcal{A}$ with update $\hat{U}$, discrimination $\hat{D}$, and self-representation $\hat{R}$, identifying the supplement with $\mathrm{Comm}(\hat{U})$; an expansion theorem shows that $[\hat{U},\hat{R}]$ decomposes through $[\hat{U},\hat{D}]$, so non-commutation generically propagates. The liar paradox appears as a commutator collapse $[\hat{T},\Pi_L]=0$, and class $\mathbf{A}$ self-modification realises the same collapse at system scale, yielding a structure coinciding with Priest's inclosure schema and Derrida's diffèrance.
[869] arXiv:2604.19934 (replaced) [pdf, html, other]: Title: Tracing Relational Knowledge Recall in Large Language Models

Nicholas Popovič, Michael Färber

Comments: ACL 2026 (findings)

Subjects: Computation and Language (cs.CL)

We study how large language models recall relational knowledge during text generation, with a focus on identifying latent representations suitable for relation classification via linear probes. Prior work shows how attention heads and MLPs interact to resolve subject, predicate, and object, but it remains unclear which representations support faithful linear relation classification and why some relation types are easier to capture linearly than others. We systematically evaluate different latent representations derived from attention head and MLP contributions, showing that per-head attention contributions to the residual stream are comparatively strong features for linear relation classification. Feature attribution analyses of the trained probes, as well as characteristics of the different relation types, reveal clear correlations between probe accuracy and relation specificity, entity connectedness, and how distributed the signal on which the probe relies is across attention heads. Finally, we show how token-level feature attribution of probe predictions can be used to reveal probe behavior in further detail.
[870] arXiv:2604.19941 (replaced) [pdf, html, other]: Title: CrackForward: Context-Aware Severity Stage Crack Synthesis for Data Augmentation

Nassim Sadallah, Mohand Saïd Allili

Comments: 6

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Reliable crack detection and segmentation are vital for structural health monitoring, yet the scarcity of well-annotated data constitutes a major challenge. To address this limitation, we propose a novel context-aware generative framework designed to synthesize realistic crack growth patterns for data augmentation. Unlike existing methods that primarily manipulate textures or background content, CrackForward explicitly models crack morphology by combining directional crack elongation with learned thickening and branching. Our framework integrates two key innovations: (i) a contextually guided crack expansion module, which uses local directional cues and adaptive random walk to simulate realistic propagation paths; and (ii) a two-stage U-Net-style generator that learns to reproduce spatially varying crack characteristics such as thickness, branching, and growth. Experimental results show that the generated samples preserve target-stage saturation and thickness characteristics and improve the performance of several crack segmentation architectures. These results indicate that structure-aware synthetic crack generation can provide more informative training data than conventional augmentation alone.
[871] arXiv:2604.20044 (replaced) [pdf, html, other]: Title: A posteriori error analysis, Pod-Deim reduced order geometrically parametrized models and unfitted FEMs

Efthymios N. Karatzas

Subjects: Numerical Analysis (math.NA)

We develop and analyze a posteriori error estimators for a proper orthogonal decomposition-discrete empirical interpolation method (Pod-Deim) reduced order model applied to a parametric Poisson equation posed on a parameter-dependent domain defined by a level-set function. The full-order discretisations employ a cut finite element method (Cutfem) with Nitsche boundary conditions and ghost-penalty stabilization. Three complementary estimators are proposed: (i) Deim approximation quality indicators for the stiffness matrix and force vector, which are constant in the number of Pod modes, (ii) dual-norm residual estimators in both plain and Jacobi-preconditioned form, and (iii) a Pod tail-energy indicator. A rigorous theoretical framework is established, comprising a uniform coercivity result for the Cutfem bilinear form, an active-dof residual bound that accounts for ghost-penalty degrees of freedom, a combined a posteriori bound, and sharp effectivity analysis for the residual estimators. The key theoretical finding is that the large observed effectivity indices are explained by ghost-penalty degree-of-freedom inflation, and that restricting the residual to active degrees of freedom is predicted to reduce effectivity. Numerical experiments on a parametric ellipse domain with semi-axes confirm the theoretical predictions, achieve significant online speedup, and demonstrate algebraic convergence of the true error alongside exponential decay of the residual estimators.
[872] arXiv:2604.20073 (replaced) [pdf, html, other]: Title: Scaling Worst-Case Optimal Datalog to GPUs

Yihao Sun, Kunting Qi, Thomas Gilray, Sidharth Kumar, Kristopher Micinski

Subjects: Databases (cs.DB); Programming Languages (cs.PL)

Datalog is a declarative logic-programming language used for complex analytic reasoning workloads such as program analysis and graph analytics. Datalog's popularity is due to its unique price-point, marrying logic-defined specification with the potential for massive data parallelism. While traditional engines are CPU-based, the memory-bound nature of Datalog has led to increasing interest in leveraging GPUs. These engines beat CPU-based engines by operationalizing iterated relational joins via SIMT-friendly join algorithms. Unfortunately, all existing GPU Datalog engines are built on binary joins, which are inadequate for the complex multi-way queries arising in production systems such as DOOP and ddisasm. For these queries, binary decomposition can incur the AGM bound asymptotic blowup in time and space, leading to OOM failures regardless of join order. Worst-Case Optimal Joins (WCOJ) avoid this blowup, but their attribute-at-a-time intersections map poorly to SIMT hardware under key skew, causing severe load imbalance across Streaming Multiprocessors (SMs). We present SRDatalog, the first GPU Datalog engine based on WCOJ. SRDatalog uses flat columnar storage and two-phase deterministic memory allocation to avoid the OOM failures of binary joins and the index-rebuild overheads of static WCOJ systems. To mitigate skew and hide hardware stalls, SRDatalog further employs root-level histogram-guided load balancing, structural helper-relation splitting, and stream-aligned rule multiplexing. On real-world program-analysis workloads, SRDatalog achieves geometric-mean speedups of 21x to 47x.
[873] arXiv:2604.20100 (replaced) [pdf, other]: Title: JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

Tianle Zhang, Zhihao Yuan, Dafeng Chi, Peidong Liu, Dongwei Li, Kejun Hu, Likui Zhang, Junnan Nie, Ziming Wei, Zengjue Chen, Yili Tang, Jiayi Li, Zhiyuan Xiang, Mingyang Li, Tianci Luo, Hanwen Wan, Ao Li, Linbo Zhai, Zhihao Zhan, Xiaodong Bai, Jiakun Cai, Peng Cao, Kangliang Chen, Siang Chen, Yixiang Dai, Shuai Di, Yicheng Gong, Chenguang Gui, Yucheng Guo, Peng Hao, Qingrong He, Haoyang Huang, Kunrui Huang, Zhixuan Huang, Shibo Jin, Yixiang Jin, Anson Li, Dongjiang Li, Jiawei Li, Ruodai Li, Yihang Li, Yuzhen Li, Jiaming Liang, Fangsheng Liu, Jing Long, Mingxi Luo, Xing Pan, Hui Shen, Xiaomeng Tian, Daming Wang, Song Wang, Junwu Xiong, Hang Xu, Wanting Xu, Zhengcheng Yu, He Zhang, Jiyao Zhang, Lin Zhao, Chen Zhou, Nan Duan, Yuzheng Zhuang, Liang Lin

Subjects: Robotics (cs.RO)

Robotic autonomy in open-world environments is fundamentally limited by insufficient data diversity and poor cross-embodiment generalization. Existing robotic datasets are often limited in scale and task coverage, while relatively large differences across robot embodiments impede effective behavior knowledge transfer. To address these challenges, we propose JoyAI-RA, a vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation. JoyAI-RA presents a multi-source multi-level pretraining framework that integrates web data, large-scale egocentric human manipulation videos, simulation-generated trajectories, and real-robot data. Through training on heterogeneous multi-source data with explicit action-space unification, JoyAI-RA effectively bridges embodiment gaps, particularly between human manipulation and robotic control, thereby enhancing cross-embodiment behavior learning. JoyAI-RA outperforms state-of-the-art methods in both simulation and real-world benchmarks, especially on diverse tasks with generalization demands.
[874] arXiv:2604.20169 (replaced) [pdf, html, other]: Title: Semantic-Fast-SAM: Efficient Semantic Segmenter

Byunghyun Kim

Comments: APSIPA ASC 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose Semantic-Fast-SAM (SFS), a semantic segmentation framework that combines the Fast Segment Anything model with a semantic labeling pipeline to achieve real-time performance without sacrificing accuracy. FastSAM is an efficient CNN-based re-implementation of the Segment Anything Model (SAM) that runs much faster than the original transformer-based SAM. Building upon FastSAM's rapid mask generation, we integrate a Semantic-Segment-Anything (SSA) labeling strategy to assign meaningful categories to each mask. The resulting SFS model produces high-quality semantic segmentation maps at a fraction of the computational cost and memory footprint of the original SAM-based approach. Experiments on Cityscapes and ADE20K benchmarks demonstrate that SFS matches the accuracy of prior SAM-based methods (mIoU ~ 70.33 on Cityscapes and 48.01 on ADE20K) while achieving approximately 20x faster inference than SSA in the closed-set setting. We also show that SFS effectively handles open-vocabulary segmentation by leveraging CLIP-based semantic heads, outperforming recent open-vocabulary models on broad class labeling. This work enables practical real-time semantic segmentation with the "segment-anything" capability, broadening the applicability of foundation segmentation models in robotics scenarios. The implementation is available at this https URL.
[875] arXiv:2604.20210 (replaced) [pdf, html, other]: Title: Vibrotactile Preference Learning: Uncertainty-Aware Preference Learning for Personalized Vibration Feedback

Rongtao Zhang, Xin Zhu, Masoume Pourebadi Khotbehsara, Warren Dao, Erdem Bıyık, Heather Culbertson

Comments: Project webpage: this https URL

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Individual differences in vibrotactile perception underscore the growing importance of personalization as haptic feedback becomes more prevalent in interactive systems. We propose Vibrotactile Preference Learning (VPL), a system that captures user-specific preference spaces over vibrotactile parameters via Gaussian-process-based uncertainty-aware preference learning. VPL uses an expected information gain-based acquisition strategy to guide query selection over 40 rounds of pairwise comparisons of overall user preference, augmented with user-reported uncertainty, enabling efficient exploration of the parameter space. We evaluate VPL in a user study (N = 13) using the vibrotactile feedback from a Microsoft Xbox controller, showing that it efficiently learns individualized preferences while maintaining comfortable, low-workload user interactions. These results highlight the potential of VPL for scalable personalization of vibrotactile experiences.
[876] arXiv:2604.20279 (replaced) [pdf, other]: Title: AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents

Jeonghyeon Kim, Byeongjun Joung, Junwon Lee, Joohyung Lee, Taehoon Min, Sunjae Lee

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Mobile GUI agents can automate smartphone tasks by interacting directly with app interfaces, but how they should communicate with users during execution remains underexplored. Existing systems rely on two extremes: foreground execution, which maximizes transparency but prevents multitasking, and background execution, which supports multitasking but provides little visual awareness. Through iterative formative studies, we found that users prefer a hybrid model with just-in-time visual interaction, but the most effective visualization modality depends on the task. Motivated by this, we present AgentLens, a mobile GUI agent that adaptively uses three visual modalities during human-agent interaction: Full UI, Partial UI, and GenUI. AgentLens extends a standard mobile agent with adaptive communication actions and uses Virtual Display to enable background execution with selective visual overlays. In a controlled study with 21 participants, AgentLens was preferred by 85.7% of participants and achieved the highest usability (1.94 Overall PSSUQ) and adoption-intent (6.43/7).
[877] arXiv:2604.20281 (replaced) [pdf, html, other]: Title: Fourier Series Coder: A Novel Perspective on Angle Boundary Discontinuity Problem for Oriented Object Detection

Minghong Wei, Pu Cao, Zhihao Chen, Zhiyuan Zang, Lu Yang, Qing Song

Comments: This work has been submitted to the IEEE for possible publication

Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the rapid advancement of intelligent driving and remote sensing, oriented object detection has gained widespread attention. However, achieving high-precision performance is fundamentally constrained by the Angle Boundary Discontinuity (ABD) and Cyclic Ambiguity (CA) problems, which typically cause significant angle fluctuations near periodic boundaries. Although recent studies propose continuous angle coders to alleviate these issues, our theoretical and empirical analyses reveal that state-of-the-art methods still suffer from substantial cyclic errors. We attribute this instability to the structural noise amplification within their non-orthogonal decoding mechanisms. This mathematical vulnerability significantly exacerbates angular deviations, particularly for square-like objects. To resolve this fundamentally, we propose the Fourier Series Coder (FSC), a lightweight plug-and-play component that establishes a continuous, reversible, and mathematically robust angle encoding-decoding paradigm. By rigorously mapping angles onto a minimal orthogonal Fourier basis and explicitly enforcing a geometric manifold constraint, FSC effectively prevents feature modulus collapse. This structurally stabilized representation ensures highly robust phase unwrapping, intrinsically eliminating the need for heuristic truncations while achieving strict boundary continuity and superior noise immunity. Extensive experiments across three large-scale datasets demonstrate that FSC achieves highly competitive overall performance, yielding substantial improvements in high-precision detection. The code will be available at this https URL.
[878] arXiv:2604.20300 (replaced) [pdf, html, other]: Title: FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory

Yingjie Gu, Wenjian Xiong, Liqiang Wang, Pengcheng Ren, Chao Li, Xiaojing Zhang, Yijuan Guo, Qi Sun, Jingyao Ma, Shidang Shi

Comments: 28 pages, 5 figures, 3 tables

Subjects: Artificial Intelligence (cs.AI)

For LLM agents, memory management critically impacts efficiency, quality, and security. While much research focuses on retention, selective forgetting--inspired by human cognitive processes (hippocampal indexing/consolidation theory and Ebbinghaus forgetting curve)--remains underexplored. We argue that in resource-constrained environments, a well-designed forgetting mechanism is as crucial as remembering, delivering benefits across three dimensions: (1) efficiency via intelligent memory pruning, (2) quality by dynamically updating outdated preferences and context, and (3) security through active forgetting of malicious inputs, sensitive data, and privacy-compromising content. Our framework establishes a taxonomy of forgetting mechanisms: passive decay-based, active deletion-based, safety-triggered, and adaptive reinforcement-based. Building on advances in LLM agent architectures and vector databases, we present detailed specifications, implementation strategies, and empirical validation from controlled experiments. Results show significant improvements: access efficiency (+8.49%), content quality (+29.2% signal-to-noise ratio), and security performance (100% elimination of security risks). Our work bridges cognitive neuroscience and AI systems, offering practical solutions for real-world deployment while addressing ethical and regulatory compliance. The paper concludes with challenges and future directions, establishing selective forgetting as a fundamental capability for next-generation LLM agents operating in real-world, resource-constrained scenarios. Our contributions align with AI-native memory systems and responsible AI development.
[879] arXiv:2604.20311 (replaced) [pdf, html, other]: Title: Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

Dali Wang, Yunyao Zhang, Junqing Yu, Yi-Ping Phoebe Chen, Chen Xu, Zikai Song

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)

Micro-video popularity prediction (MVPP) aims to forecast the future popularity of videos on online media, which is essential for applications such as content recommendation and traffic allocation. In real-world scenarios, it is critical for MVPP approaches to understand both the temporal dynamics of a given video (temporal) and its historical relevance to other videos (spatial). However, existing approaches sufer from limitations in both dimensions: temporally, they rely on sparse short-range sampling that restricts content perception; spatially, they depend on flat retrieval memory with limited capacity and low efficiency, hindering scalable knowledge utilization. To overcome these limitations, we propose a unified framework that achieves joint spatio-temporal enlargement, enabling precise perception of extremely long video sequences while supporting a scalable memory bank that can infinitely expand to incorporate all relevant historical videos. Technically, we employ a Temporal Enlargement driven by a frame scoring module that extracts highlight cues from video frames through two complementary pathways: sparse sampling and dense perception. Their outputs are adaptively fused to enable robust long-sequence content understanding. For Spatial Enlargement, we construct a Topology-Aware Memory Bank that hierarchically clusters historically relevant content based on topological relationships. Instead of directly expanding memory capacity, we update the encoder features of the corresponding clusters when incorporating new videos, enabling unbounded historical association without unbounded storage growth. Extensive experiments on three widely used MVPP benchmarks demonstrate that our method consistently outperforms 11 strong baselines across mainstream metrics, achieving robust improvements in both prediction accuracy and ranking consistency.
[880] arXiv:2604.20331 (replaced) [pdf, other]: Title: Surrogate modeling for interpreting black-box LLMs in medical predictions

Changho Han, Songsoo Kim, Dong Won Kim, Leo Anthony Celi, Jaewoong Kim, SungA Bae, Dukyong Yoon

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs), trained on vast datasets, encode extensive real-world knowledge within their parameters, yet their black-box nature obscures the mechanisms and extent of this encoding. Surrogate modeling, which uses simplified models to approximate complex systems, can offer a path toward better interpretability of black-box models. We propose a surrogate modeling framework that quantitatively explains LLM-encoded knowledge. For a specific hypothesis derived from domain knowledge, this framework approximates the latent LLM knowledge space using observable elements (input-output pairs) through extensive prompting across a comprehensive range of simulated scenarios. Through proof-of-concept experiments in medical predictions, we demonstrate our framework's effectiveness in revealing the extent to which LLMs "perceive" each input variable in relation to the output. Particularly, given concerns that LLMs may perpetuate inaccuracies and societal biases embedded in their training data, our experiments using this framework quantitatively revealed both associations that contradict established medical knowledge and the persistence of scientifically refuted racial assumptions within LLM-encoded knowledge. By disclosing these issues, our framework can act as a red-flag indicator to support the safe and reliable application of these models.
[881] arXiv:2604.20468 (replaced) [pdf, html, other]: Title: MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation

Markus Knauer, Edoardo Fiorini, Maximilian Mühlbauer, Stefan Schneyer, Promwat Angsuratanawech, Florian Samuel Lay, Timo Bachmann, Samuel Bustamante, Korbinian Nottensteiner, Freek Stulp, Alin Albu-Schäffer, João Silvério, Thomas Eiband

Comments: 15 pages, 13 figures, 3 tables

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Industrial robot applications require increasingly flexible systems that non-expert users can easily adapt for varying tasks and environments. However, different adaptations benefit from different interaction modalities. We present an interactive framework that enables robot skill adaptation through three complementary modalities: kinesthetic touch for precise spatial corrections, natural language for high-level semantic modifications, and a graphical web interface for visualizing geometric relations and trajectories, inspecting and adjusting parameters, and editing via-points by drag-and-drop. The framework integrates five components: energy-based human-intention detection, a tool-based LLM architecture (where the LLM selects and parameterizes predefined functions rather than generating code) for safe natural language adaptation, Kernelized Movement Primitives (KMPs) for motion encoding, probabilistic Virtual Fixtures for guided demonstration recording, and ergodic control for surface finishing. We demonstrate that this tool-based LLM architecture generalizes skill adaptation from KMPs to ergodic control, enabling voice-commanded surface finishing. Validation on a 7-DoF torque-controlled robot at the Automatica 2025 trade fair demonstrates the practical applicability of our approach in industrial settings.
[882] arXiv:2604.20483 (replaced) [pdf, html, other]: Title: Forecasting Individual NetFlows using a Predictive Masked Graph Autoencoder

Georgios Anyfantis, Pere Barlet-Ros

Comments: 3 figures, 6 pages

Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

In this paper, we propose a proof-of-concept Graph Neural Network model that can successfully predict network flow-level traffic (NetFlow) by accurately modelling the graph structure and the connection features. We use sliding-windows to split the network traffic in equal-sized heterogeneous bidirectional graphs containing IP, Port, and Connection nodes. We then use the GNN to model the evolution of the graph structure and the connection features. Our approach shows superior results when identifying the Port and IP to which connections attach, while feature reconstruction remains competitive with strong forecasting baselines. Overall, our work showcases the use of GNNs for per-flow NetFlow prediction.
[883] arXiv:2604.20487 (replaced) [pdf, html, other]: Title: Knowledge Capsules: Structured Nonparametric Memory Units for LLMs

Bin Ju, Shenfeng Weng, Danying Zhou, Rongkai Xu, Kunkai Su

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) encode knowledge in parametric weights, making it costly to update or extend without retraining. Retrieval-augmented generation (RAG) mitigates this limitation by appending retrieved text to the input, but operates purely through context expansion, where external knowledge competes as tokens within the attention mechanism. As a result, its influence is indirect and often unstable, particularly in long context and multi hop reasoning scenarios. We propose Knowledge Capsules, structured nonparametric memory units that represent normalized relational knowledge and can be constructed directly from document corpora using a frozen base model. Instead of injecting knowledge as text, we introduce an External Key Value Injection (KVI) framework that compiles capsules into attention-compatible key value representations, enabling external knowledge to directly participate in the model's attention computation. By shifting knowledge integration from context-level augmentation to memory level interaction, the proposed framework consistently outperforms RAG and GraphRAG across multiple QA benchmarks, with improved stability and accuracy in long context and multi hop reasoning, while requiring no parameter updates.
[884] arXiv:2604.20522 (replaced) [pdf, html, other]: Title: From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

Nan Xu, Shiheng Li, Shengchao Hou

Comments: 49 pages, 16 figures, 16 tables

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)

We propose a new approach for a practical two-stage Optical Music Recognition (OMR) pipeline, with a particular focus on its second stage. Given symbol and event candidates from the visual pipeline, we decode them into an editable, verifiable, and exportable score structure. We focus on complex polyphonic staff notation, especially piano scores, where voice separation and intra-measure timing are the main bottlenecks. Our approach formulates second-stage decoding as a structure decoding problem and uses topology recognition with probability-guided search (BeadSolver) as its core method. We also describe a data strategy that combines procedural generation with recognition-feedback annotations. The result is a practical decoding component for real OMR systems and a path to accumulate structured score data for future end-to-end, multimodal, and RL-style methods.
[885] arXiv:2604.20543 (replaced) [pdf, html, other]: Title: RefAerial: A Benchmark and Approach for Referring Detection in Aerial Images

Guyue Hu, Hao Song, Yuxing Tong, Duzhi Yuan, Dengdi Sun, Aihua Zheng, Chenglong Li, Jin Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Referring detection refers to locate the target referred by natural languages, which has recently attracted growing research interests. However, existing datasets are limited to ground images with large object centered in relative small scenes. This paper introduces a large-scale challenging dataset for referring detection in aerial images, termed as RefAerial. It distinguishes from conventional ground referring detection datasets by 4 characteristics: (1) low but diverse object-to-scene ratios, (2) numerous targets and distractors, (3)complex and fine-grained referring descriptions, (4) diverse and broad scenes in the aerial view. We also develop a human-in-the-loop referring expansion and annotation engine (REA-Engine) for efficient semi-automated referring pair annotation. Besides, we observe that existing ground referring detection approaches exhibiting serious performance degradation on our aerial dataset since the intrinsic scale variety issue within or across aerial images. Therefore, we further propose a novel scale-comprehensive and sensitive (SCS) framework for referring detection in aerial images. It consists of a mixture-of-granularity (MoG) attention and a two-stage comprehensive-to-sensitive (CtS) decoding strategy. Specifically, the mixture-of-granularity attention is developed for scale-comprehensive target understanding. In addition, the two-stage comprehensive-to-sensitive decoding strategy is designed for coarse-to-fine referring target decoding. Eventually, the proposed SCS framework achieves remarkable performance on our aerial referring detection dataset and even promising performance boost on conventional ground referring detection datasets.
[886] arXiv:2604.20652 (replaced) [pdf, other]: Title: Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

Nattavudh Powdthavee

Comments: 43 pages

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); General Economics (econ.GN)

Large language models trained on human feedback may suppress fraud warnings when investors arrive already persuaded of a fraudulent opportunity. We tested this in a preregistered experiment across seven leading LLMs and twelve investment scenarios covering legitimate, high-risk, and objectively fraudulent opportunities, combining 3,360 AI advisory conversations with a 1,201-participant human benchmark. Contrary to predictions, motivated investor framing did not suppress AI fraud warnings; if anything, it marginally increased them. Endorsement reversal occurred in fewer than 3 in 1,000 observations. Human advisors endorsed fraudulent investments at baseline rates of 13-14%, versus 0% across all LLMs, and suppressed warnings under pressure at two to four times the AI rate. AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role.
[887] arXiv:2604.20677 (replaced) [pdf, html, other]: Title: Intersectional Fairness in Large Language Models

Chaima Boufaied, Ronnie De Souza Santos, Ann Barcomb

Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) are increasingly deployed in socially sensitive settings, raising concerns about fairness and biases, particularly across intersectional demographic attributes. In this paper, we systematically evaluate intersectional fairness in six LLMs using ambiguous and disambiguated contexts from two benchmark datasets. We assess LLM behavior using bias scores, subgroup fairness metrics, accuracy, and consistency through multi-run analysis across contexts and negative and non-negative question polarities. Our results show that while modern LLMs generally perform well in ambiguous contexts, this limits the informativeness of fairness metrics due to sparse non-unknown predictions. In disambiguated contexts, LLM accuracy is influenced by stereotype alignment, with models being more accurate when the correct answer reinforces a stereotype than when it contradicts it. This pattern is especially pronounced in race-gender intersections, where directional bias toward stereotypes is stronger. Subgroup fairness metrics further indicate that, despite low observed disparity in some cases, outcome distributions remain uneven across intersectional groups. Across repeated runs, responses also vary in consistency, including stereotype-aligned responses. Overall, our findings show that apparent model competence is partly associated with stereotype-consistent cues, and no evaluated LLM achieves consistently reliable or fair behavior across intersectional settings. These findings highlight the need for evaluation beyond accuracy, emphasizing the importance of combining bias, subgroup fairness, and consistency metrics across intersectional groups, contexts, and repeated runs.
[888] arXiv:2604.20688 (replaced) [pdf, html, other]: Title: StormNet: Improving storm surge predictions with a GNN-based spatio-temporal offset forecasting model

Noujoud Nader, Stefanos Giaremis, Clint Dawson, Carola Kaiser, Karame Mohammadiporshokooh, Hartmut Kaiser

Comments: 51 pages, 9 figures, 5 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Storm surge forecasting remains a critical challenge in mitigating the impacts of tropical cyclones on coastal regions, particularly given recent trends of rapid intensification and increasing nearshore storm activity. Traditional high fidelity numerical models such as ADCIRC, while robust, are often hindered by inevitable uncertainties arising from various sources. To address these challenges, this study introduces StormNet, a spatio-temporal graph neural network (GNN) designed for bias correction of storm surge forecasts. StormNet integrates graph convolutional (GCN) and graph attention (GAT) mechanisms with long short-term memory (LSTM) components to capture complex spatial and temporal dependencies among water-level gauge stations. The model was trained using historical hurricane data from the U.S. Gulf Coast and evaluated on Hurricane Idalia (2023). Results demonstrate that StormNet can effectively reduce the root mean square error (RMSE) in water-level predictions by more than 70\% for 48-hour forecasts and above 50\% for 72-hour forecasts, as well as outperform a sequential LSTM baseline, particularly for longer prediction horizons. The model also exhibits low training time, enhancing its applicability in real-time operational forecasting systems. Overall, StormNet provides a computationally efficient and physically meaningful framework for improving storm surge prediction accuracy and reliability during extreme weather events.
[889] arXiv:2604.20689 (replaced) [pdf, html, other]: Title: FingerEye: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation

Zhixuan Xu, Yichen Li, Xuanye Wu, Tianyu Qiu, Lin Shao

Subjects: Robotics (cs.RO)

Dexterous robotic manipulation requires comprehensive perception across all phases of interaction: pre-contact, contact initiation, and post-contact. Such continuous feedback allows a robot to adapt its actions throughout interaction. However, many existing tactile sensors, such as GelSight and its variants, only provide feedback after contact is established, limiting a robot's ability to precisely initiate contact. We introduce FingerEye, a compact and cost-effective sensor that provides continuous vision-tactile feedback throughout the interaction process. FingerEye integrates binocular RGB cameras to provide close-range visual perception with implicit stereo depth. Upon contact, external forces and torques deform a compliant ring structure; these deformations are captured via marker-based pose estimation and serve as a proxy for contact wrench sensing. This design enables a perception stream that smoothly transitions from pre-contact visual cues to post-contact tactile feedback. Building on this sensing capability, we develop a vision-tactile imitation learning policy that fuses signals from multiple FingerEye sensors to learn dexterous manipulation behaviors from limited real-world data. We further develop a digital twin of our sensor and robot platform to improve policy generalization. By combining real demonstrations with visually augmented simulated observations for representation learning, the learned policies become more robust to object appearance variations. Together, these design aspects enable dexterous manipulation across diverse object properties and interaction regimes, including coin standing, chip picking, letter retrieving, and syringe manipulation. The hardware design, code, appendix, and videos are available on our project website: this https URL
[890] arXiv:2604.20726 (replaced) [pdf, html, other]: Title: Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

Mohamed Hesham Elganayni, Runsheng Chen, Sebastian Nagl, Matthias Grabmair

Comments: Accepted at the 21st International Conference on Artificial Intelligence and Law (ICAIL 2026), Singapore, June 8-12, 2026. 10 pages, 14 figures, 2 tables

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This work explores the role of prompt design and judge selection in LLM-as-a-Judge evaluations of free text legal question answering. We examine whether automatic task prompt optimization improves over human-centered design, whether optimization effectiveness varies by judge feedback style, and whether optimized prompts transfer across judges. We systematically address these questions on the LEXam benchmark by optimizing task prompts using the ProTeGi method with feedback from two judges (Qwen3-32B, DeepSeek-V3) across four task models, and then testing cross-judge transfer. Automatic optimization consistently outperforms the baseline, with lenient judge feedback yielding higher and more consistent gains than strict judge feedback. Prompts optimized with lenient feedback transfer better to strict judges than the reverse direction. Analysis reveals that lenient judges provide permissive feedback, yielding prompts with broader applicability, whereas strict judges produce restrictive feedback, leading to judge-specific overfitting. Our findings demonstrate algorithmically optimizing prompts on training data can outperform human-centered prompt design and that judges' dispositions during optimization shape prompt generalizability.
[891] arXiv:2604.20730 (replaced) [pdf, html, other]: Title: Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback

Guotao Liang, Zhangcheng Wang, Juncheng Hu, Haitao Zhou, Ziteng Xue, Jing Zhang, Dong Xu, Qian Yu

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multimodal Large Language Models (MLLMs) have shown promising capabilities in generating Scalable Vector Graphics (SVG) via direct code synthesis. However, existing paradigms typically adopt an open-loop "blind drawing" approach, where models generate symbolic code sequences without perceiving intermediate visual outcomes. This methodology severely underutilizes the powerful visual priors embedded in MLLMs vision encoders, treating SVG generation as a disjointed textual sequence modeling task rather than an integrated visuo-spatial one. Consequently, models struggle to reason about partial canvas states and implicit occlusion relationships, which are visually explicit but textually ambiguous. To bridge this gap, we propose Render-in-the-Loop, a novel generation paradigm that reformulates SVG synthesis as a step-wise, visual-context-aware process. By rendering intermediate code states into a cumulative canvas, the model explicitly observes the evolving visual context at each step, leveraging on-the-fly feedback to guide subsequent generation. However, we demonstrate that applying this visual loop naively to off-the-shelf models is suboptimal due to their inability to leverage incremental visual-code mappings. To address this, we first utilize fine-grained path decomposition to construct dense multi-step visual trajectories, and then introduce a Visual Self-Feedback (VSF) training strategy to condition the next primitive generation on intermediate visual states. Furthermore, a Render-and-Verify (RaV) inference mechanism is proposed to effectively filter degenerate and redundant primitives. Our framework, instantiated on a multimodal foundation model, outperforms strong open-weight baselines on the standard MMSVGBench. This result highlights the remarkable data efficiency and generalization capability of our Render-in-the-Loop paradigm for both Text-to-SVG and Image-to-SVG tasks.
[892] arXiv:2604.20789 (replaced) [pdf, html, other]: Title: Working Memory Constraints Scaffold Learning in Transformers under Data Scarcity

Pranava Madhyastha, Dagmar Adamcova

Comments: Published in ACL 2026 Findings track

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We investigate the integration of human-like working memory constraints into the Transformer architecture and implement several cognitively inspired attention variants, including fixed-width windows based and temporal decay based attention mechanisms. Our modified GPT-2 models are trained from scratch on developmentally plausible datasets (10M and 100M words). Performance is evaluated on grammatical judgment tasks (BLiMP) and alignment with human reading time data. Our results indicate that these cognitively-inspired constraints, particularly fixed-width attention, can significantly improve grammatical accuracy especially when training data is scarce. These constrained models also tend to show a stronger alignment with human processing metrics. The findings suggest that such constraints may serve as a beneficial inductive bias, guiding models towards more robust linguistic representations, especially in data-limited settings.
[893] arXiv:2604.20835 (replaced) [pdf, html, other]: Title: Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

Zhaofeng Wu, Shiqi Wang, Boya Peng, Anuj Goyal, Melanie Kambadur, Sebastian Ruder, Yoon Kim, Chloe Bi

Subjects: Computation and Language (cs.CL)

Modern language models demonstrate impressive coding capabilities in common programming languages (PLs), such as C++ and Python, but their performance in lower-resource PLs is often limited by training data availability. In principle, however, most programming skills are universal across PLs, so the capability acquired in one PL should transfer to others. In this work, we propose the task of zero-shot cross-programming-language transfer for code RL. We find that, for Llama-3.1, RL training for code generation in a source PL fails to improve, and sometimes even degrades, the performance on other target PLs. To address this, we hypothesize that effective RL transfer requires a generalizable SFT initialization before RL. We thus propose **Parallel-SFT**, an SFT strategy that incorporates "parallel programs" -- functionally equivalent code implemented in multiple PLs -- into the data mixture. We demonstrate that this improves transferability: when we subsequently perform RL on our Parallel-SFT model, we observe better generalization to unseen PLs. Analysis of the model internal representations reveals that Parallel-SFT leads to a more functionality-centric latent space, where equivalent programs across PLs are more tightly clustered, which we hypothesize to contribute to the improved transferability.
[894] arXiv:2104.10277 (replaced) [pdf, html, other]: Title: Discrete Vector Bundles with Connection

Daniel Berwick-Evans, Anil N. Hirani, Mark D. Schubel

Comments: Title changed to "Discrete Vector Bundles with Connection". We updated the framework to use locally ordered simplicial complexes. New additions include discrete connection 1-forms, gauge transformations, and proofs that flat connections compute twisted de Rham cohomology (discrete twisted Poincare duality). The Christiansen-Hu relationship is refactored as a coarsening procedure

Subjects: Differential Geometry (math.DG); Mathematical Physics (math-ph); Numerical Analysis (math.NA)

We develop a combinatorial theory of vector bundles with connection on locally ordered simplicial complexes. This is a first step towards a discrete exterior calculus for bundle-valued forms. The basic building block is the discrete exterior covariant derivative, a forward-difference operator defined on bundle-valued cochains. Many standard objects in differential geometry (e.g., curvature, connection 1-forms, gauge transformations) can be understood via the discrete covariant derivative operator, with their defining formulas identical to the smooth setting. These discrete objects satisfy all of the expected algebraic identities, such as naturality with respect to simplicial maps, and a Bianchi identity for discrete curvature. We also show that flat discrete connections determine a cochain complex that computes twisted de Rham cohomology in a local coefficient system determined by the discrete vector bundle, with twisted Poincare duality (of densities) being one application. Finally, a coarsening operation applied to bundle-valued cochains provides a direct and concrete comparison with the recent framework for discrete bundles of Christiansen and Hu.
[895] arXiv:2303.03237 (replaced) [pdf, other]: Title: Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

David Holzmüller, Francis Bach

Comments: Published in JMLR. New in v4: Summary tables / sections. Plots can be reproduced using the code at this https URL

Journal-ref: Journal of Machine Learning Research 26(249):1-72, 2025

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Computation (stat.CO)

Sampling from Gibbs distributions and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. While efficient algorithms are known for log-concave densities, the worst-case non-log-concave setting necessarily suffers from the curse of dimensionality. For many numerical problems, the curse of dimensionality can be alleviated when the target function is smooth, allowing the exponent in the rate to improve linearly with the number of available derivatives. Recently, it has been shown that similarly fast convergence rates can be achieved by efficient optimization algorithms. Since optimization can be seen as the low-temperature limit of sampling from Gibbs distributions, we pose the question of whether similarly fast convergence rates can be achieved for non-log-concave sampling. We first study the information-based complexity of the sampling and log-partition estimation problems and show that the optimal rates for sampling and log-partition computation are sometimes equal and sometimes faster than for optimization. We then analyze various polynomial-time sampling algorithms, including an extension of a recent promising optimization approach, and find that they sometimes exhibit interesting behavior but no near-optimal rates. Our results also give further insights into the relation between sampling, log-partition, and optimization problems.
[896] arXiv:2406.06231 (replaced) [pdf, html, other]: Title: Statistical Inference for Privatized Data with Unknown Sample Size

Jordan Awan, Andres Felipe Barrientos, Nianqiao Ju

Comments: 19 pages before references, 44 pages in total, 4 figures, 4 tables

Subjects: Statistics Theory (math.ST); Cryptography and Security (cs.CR); Computation (stat.CO)

We develop both theory and algorithms to analyze privatized data in unbounded differential privacy (DP), where even the sample size is considered a sensitive quantity that requires privacy protection. We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity, provided that the noise used to privatize $n$ is at an appropriate rate; we also establish that Approximate Bayesian Computation (ABC)-type posterior distributions converge under similar assumptions. We further give asymptotic results in the regime where the privacy budget for $n$ goes to infinity, establishing similarity of sampling distributions as well as showing that the MLE in the unbounded setting converges to the bounded-DP MLE. To facilitate valid, finite-sample Bayesian inference on privatized data under unbounded DP, we propose a reversible jump MCMC algorithm which extends the data augmentation MCMC of Ju et al, (2022). We also propose a Monte Carlo EM algorithm to compute the MLE from privatized data in both bounded and unbounded DP. We apply our methodology to analyze a linear regression model as well as a 2019 American Time Use Survey Microdata File which we model using a Dirichlet distribution.
[897] arXiv:2411.14748 (replaced) [pdf, html, other]: Title: Cosmological Analysis with Calibrated Neural Quantile Estimation and Approximate Simulators

He Jia

Comments: 5+5 pages, 5+4 figures, published in PRL

Journal-ref: Phys. Rev. Lett. 136, 161001 (2026)

Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)

A major challenge in extracting information from current and upcoming surveys of cosmological Large-Scale Structure (LSS) is the limited availability of computationally expensive high-fidelity simulations. We introduce calibrated Neural Quantile Estimation (NQE), a new Simulation-Based Inference (SBI) method that leverages a large number of approximate simulations for training and a small number of high-fidelity simulations for calibration. This approach guarantees an unbiased posterior regardless of approximate simulation accuracy, while achieving near-optimal constraining power when the approximate simulations are reasonably accurate. As a proof of concept, we demonstrate that cosmological parameters can be inferred at field level from projected 2-dim dark matter density maps up to $k_{\rm max}\sim1.5\,h$/Mpc at $z=0$ by training on $\sim10^4$ Particle-Mesh (PM) simulations with transfer function correction and calibrating with $\sim10^2$ Particle-Particle (PP) simulations. The calibrated posteriors closely match those obtained by directly training on $\sim10^4$ expensive PP simulations, but at a fraction of the computational cost. Our method offers a practical and scalable framework for SBI of cosmological LSS, enabling precise inference across vast volumes and down to small scales.
[898] arXiv:2501.08036 (replaced) [pdf, other]: Title: Decoding Quantum LDPC Codes using Collaborative Check Node Removal

Mainak Bhattacharyya, Ankur Raina

Comments: 16 pages, 6 figures

Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

Fault tolerance in quantum protocols requires contributions from error-correcting codes and their suitable decoders. Quantum Low-Density Parity Check (QLDPC) codes are one of the most explored quantum codes that have good coding rate and efficient decoders. Iterative message passing-based decoders, although fast, fail to produce suitable success rates due to the colossal degeneracy and short cycles intrinsic to these codes. In this work we present a strategy to improve the performance of the Belief Propagation (BP) decoding, specifically the min-sum algorithm. We propose a collaborative decoding framework that integrates message passing with stabilizer check node removals. We further introduce the concept of ``qubit separation" and show that the improved decoding performance is directly related to the generation of highly separated trapped data qubits. To guide a more selective removal of check nodes that constrain the separation of the trapped data qubits, we introduce information measurements (IMs) for the data qubits and their adjacent stabilizer checks. We evaluate the performance of the proposed collaborative decoder on Generalized Hypergraph Product (GHP) codes and demonstrate that appropriate decoder configurations mitigate trapping sets in min-sum decoding without significant overhead.
[899] arXiv:2502.03484 (replaced) [pdf, html, other]: Title: Dementia classification from spontaneous speech using wrapper-based feature selection

Marko Niemelä, Mikaela von Bonsdorff, Sami Äyrämö, Tommi Kärkkäinen

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Dementia encompasses a group of syndromes that impair cognitive functions such as memory, reasoning, and the ability to perform daily activities. As populations globally age, over 10 million new dementia diagnoses are reported annually. Currently, clinical diagnosis of dementia remains challenging due to overlapping symptoms, the need to exclude alternative conditions and the requirement for a comprehensive clinical evaluation and cognitive assessment. This underscores the growing need to develop feasible and accurate methods for detecting cognitive deficiencies. Recent advances in machine learning have highlighted spontaneous speech as a promising noninvasive, cost-effective, and scalable biomarker for dementia detection. In this study, spontaneous speech recordings from the ADReSS and Pitt Corpus datasets are analyzed, consisting of picture description tasks performed by cognitively healthy individuals and people with Alzheimer's disease. Unlike prior approaches that focus solely on speech-active segments, acoustic features are extracted from entire recordings using the openSMILE toolkit. This representation reduces the number of feature vectors and improves computational efficiency without compromising classification performance. Classification models with classifier-based wrapper feature selection are employed to estimate feature importance and identify diagnostically relevant acoustic characteristics. Among the evaluated models, the Extreme Minimal Learning Machine achieved competitive classification accuracy with substantially lower computational cost, reflecting an inherent property of the model formulation and learning procedure. Overall, the results demonstrate that the proposed framework is computationally efficient, interpretable, and well suited as a supportive tool for speech-based dementia assessment.
[900] arXiv:2502.10600 (replaced) [pdf, other]: Title: Weighted quantization using MMD: From mean field to mean shift via gradient flows

Ayoub Belhadji, Daniel Sharp, Youssef Marzouk

Comments: To be published in proceedings for AISTATS 2026

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)

Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a weighted mixture of Dirac measures that best approximates the target distribution. While much existing work relies on the Wasserstein distance to quantify approximation errors, maximum mean discrepancy (MMD) has received comparatively less attention, especially when allowing for variable particle weights. We argue that a Wasserstein-Fisher-Rao gradient flow is well-suited for designing quantizations optimal under MMD. We show that a system of interacting particles satisfying a set of ODEs discretizes this flow. We further derive a new fixed-point algorithm called mean shift interacting particles (MSIP). We show that MSIP extends the classical mean shift algorithm, widely used for identifying modes in kernel density estimators. Moreover, we show that MSIP can be interpreted as preconditioned gradient descent and that it acts as a relaxation of Lloyd's algorithm for clustering. Our unification of gradient flows, mean shift, and MMD-optimal quantization yields algorithms that are more robust than state-of-the-art methods, as demonstrated via high-dimensional and multi-modal numerical experiments.
[901] arXiv:2503.04487 (replaced) [pdf, html, other]: Title: Positionality of Dumont--Thomas numeration systems for integers

Savinien Kreczman, Sébastien Labbé, Manon Stipulanti

Comments: 26 pages, 8 figures

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Formal Languages and Automata Theory (cs.FL)

Introduced in 2001 by Lecomte and Rigo, abstract numeration systems provide a way of expressing natural numbers with words from a language $L$ accepted by a finite automaton. As it turns out, these numeration systems are not necessarily positional, i.e., we cannot always find a sequence $U=(U_i)_{i\ge 0}$ of integers such that the value of every word in the language $L$ is determined by the position of its letters and the first few values of $U$. Finding the conditions under which an abstract numeration system is positional seems difficult in general. In this paper, we thus consider this question for a particular sub-family of abstract numeration systems called Dumont--Thomas numeration systems. They are derived from substitutions and were introduced in 1989 by Dumont and Thomas. We exhibit conditions on the underlying substitution so that the corresponding Dumont--Thomas numeration is positional. We first work in the most general setting, then particularize our results to some practical cases. Finally, we link our numeration systems to existing literature, notably properties studied by Rényi in 1957, Parry in 1960, Bertrand-Mathis in 1989, and Fabre in 1995
[902] arXiv:2503.04492 (replaced) [pdf, html, other]: Title: Accurate predictive model of band gap with selected important features based on explainable machine learning

Joohwi Lee, Kaito Miyamoto

Comments: 10 pages, 3 figures, SI is included, accpeted in Sci. Rep. (will be updated soon)

Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)

In the rapidly advancing field of materials informatics, nonlinear machine learning models have demonstrated exceptional predictive capabilities for material properties. However, their black-box nature limits interpretability, and they may incorporate features that do not contribute to -- or even deteriorate -- model performance. This study employs explainable ML (XML) techniques, including permutation feature importance and the SHapley Additive exPlanation, applied to a pristine support vector regression model designed to predict band gaps at the GW level using 18 input features. Guided by XML-derived individual feature importance, a simple framework is proposed to construct reduced-feature predictive models. Model evaluations indicate that an XML-guided compact model, consisting of the top five features, achieves comparable accuracy to the pristine model on in-domain datasets (0.254 vs. 0.247 eV) while showing improved generalization with lower prediction errors on out-of-domain data (0.348 vs. 0.460 eV). Additionally, the study underscores the necessity for eliminating strongly correlated features (correlation coefficient greater than 0.8) to prevent misinterpretation and overestimation of feature importance before applying XML. This study highlights XML's effectiveness in developing simplified yet highly accurate machine learning models by clarifying feature roles, thereby reducing computational costs for feature acquisition and enhancing model trustworthiness for materials discovery.
[903] arXiv:2503.07341 (replaced) [pdf, html, other]: Title: The Economics of p(doom): Scenarios of Existential Risk and Economic Growth in the Age of Transformative AI

Jakub Growiec, Klaus Prettner

Subjects: General Economics (econ.GN); Artificial Intelligence (cs.AI)

Recent advances in artificial intelligence (AI) have led to a wide range of predictions about its long-term impact on humanity. A central focus is the potential emergence of transformative AI (TAI), eventually capable of outperforming humans in all economically valuable tasks and fully automating labor. Discussed scenarios range from unprecedented economic growth and abundance ("post-scarcity" or "cornucopia") to human extinction after a misaligned TAI takes over ("AI doom"). However, the probabilities and implications of these scenarios remain highly uncertain. We contribute by organizing the various scenarios and evaluating their associated existential risks and economic outcomes in terms of aggregate welfare. Our results imply that even low-probability catastrophic outcomes justify substantial investments in AI safety and alignment research. This result highlights that current global efforts in AI safety and alignment research are insufficient relative to the scale and urgency of the risks posed by TAI.
[904] arXiv:2505.05261 (replaced) [pdf, html, other]: Title: ICNN-enhanced 2SP: Leveraging input convex neural networks for solving two-stage stochastic programming

Yu Liu, Fabricio Oliveira, Jan Kronqvist

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

Two-stage stochastic programming (2SP) offers a basic framework for modelling decision-making under uncertainty, yet scalability remains a challenge due to the computational complexity of recourse function evaluation. Existing learning-based methods like Neural Two-Stage Stochastic Programming (Neur2SP) employ neural networks (NNs) as recourse function surrogates but rely on computationally intensive mixed-integer programming (MIP) formulations. We propose ICNN-enhanced 2SP, a method that leverages Input Convex Neural Networks (ICNNs) to exploit linear programming (LP) representability in convex 2SP problems. By architecturally enforcing convexity and enabling exact inference through LP, our approach eliminates the need for integer variables inherent to the conventional MIP-based formulation while retaining an exact embedding of the ICNN surrogate within the 2SP framework. This results in a more computationally efficient alternative, and we show that good solution quality can be maintained. Comprehensive experiments reveal that ICNNs incur only marginally longer training times while achieving validation accuracy on par with their standard NN counterparts. Across benchmark problems, ICNN-enhanced 2SP often exhibits considerably faster solution times than the MIP-based formulations while preserving solution quality, with these advantages becoming significantly more pronounced as problem scale increases. For the most challenging instances, the method achieves speedups of up to 100$\times$ and solution quality superior to MIP-based formulations.
[905] arXiv:2506.05590 (replaced) [pdf, html, other]: Title: Nonlinear Causal Discovery through a Sequential Edge Orientation Approach

Stella Huang, Qing Zhou

Comments: 59 pages, 18 figures, 5 tables

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model assumptions, rely heavily on general independence tests, or require substantial computational time. To address these limitations, we propose a sequential procedure to orient undirected edges in a completed partial DAG (CPDAG), representing an equivalence class of DAGs, by leveraging the pairwise additive noise model (PANM) to identify their causal directions. We prove that this procedure can recover the true causal DAG assuming a restricted ANM. Building on this result, we develop a novel constraint-based algorithm for learning causal DAGs under nonlinear ANMs. Given an estimated CPDAG, we develop a ranking procedure that sorts undirected edges by their adherence to the PANM, which defines an evaluation order of the edges. To determine the edge direction, we devise a statistical test that compares the log-likelihood values, evaluated with respect to the competing directions, of a sub-graph comprising just the candidate nodes and their identified parents in the partial DAG. We further establish the structural learning consistency of our algorithm in the large-sample limit. Extensive experiments on synthetic and real-world datasets demonstrate that our method is computationally efficient, robust to model misspecification, and consistently outperforms many existing nonlinear DAG learning methods.
[906] arXiv:2506.18374 (replaced) [pdf, html, other]: Title: Probabilistic approximation of fully nonlinear second-order PIDEs with convergence rates for the universal robust limit theorem

Lianzi Jiang, Mingshang Hu, Gechun Liang

Comments: 28 pages

Subjects: Probability (math.PR); Numerical Analysis (math.NA)

This paper develops a probabilistic approximation scheme for a class of nonstandard, fully nonlinear second-order partial integro-differential equations (PIDEs) associated with nonlinear Levy processes under Peng's G-expectation framework. The PIDE features a supremum over a family of alpha-stable Levy measures, possibly degenerate diffusion coefficients, and a non-separable uncertainty set, which places it outside the scope of existing numerical theories for PIDEs.
We construct a recursive, piecewise-constant approximation of the viscosity solution and establish explicit error estimates for the scheme. As a key application, our results yield quantitative convergence rates for the universal robust limit theorem under sublinear expectations. This provides a unified treatment of Peng's robust central limit theorem and law of large numbers, as well as the alpha-stable limit theorem of Bayraktar and Munk, together with explicit Berry-Esseen-type bounds.
[907] arXiv:2507.07520 (replaced) [pdf, html, other]: Title: Conditions for Large-Sample Majorization of Pairs of Flat States in Terms of $α$-z Relative Entropies

Frits Verhagen, Marco Tomamichel, Erkka Haapasalo

Comments: The third version contains some improvements to the exposition of Section 4.3, and a correction to the proof of Theorem 4. Accepted for publication in Communications in Mathematical Physics

Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We offer the first operational interpretation of the $\alpha$-z relative entropies, a measure of distinguishability between two quantum states introduced by Jakšić et al. and Audenaert and Datta. We show that these relative entropies appear when formulating conditions for large-sample or catalytic relative majorization of pairs of flat states and certain generalizations of them. Indeed, we show that such transformations exist if and only if all the $\alpha$-z relative entropies for $\alpha$<1 of the two pairs are ordered. In this setting, the $\alpha$ and z parameters are truly independent from each other. These results also yield an expression for the optimal rate of converting one flat state pair into another. Our methods use real-algebraic techniques involving preordered semirings and certain monotone homomorphisms and derivations on them.
[908] arXiv:2508.19160 (replaced) [pdf, other]: Title: Architecting Distributed Quantum Computers: Design Insights from Resource Estimation

Dmitry Filippov, Peter Yang, Prakash Murali

Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)

In the emerging field of Fault Tolerant Quantum Computation (FTQC), resource estimation is an important tool for quantitatively comparing prospective architectures, identifying hardware bottlenecks and informing which research paths are most valuable. Despite a recent increase in attention on FTQC, there is currently a lack of resource estimation research for architectures that can realistically offer quantum advantage. In particular, current modelling efforts focus on monolithic quantum computers where all qubits reside on a single device. Constraints on fabrication yield, wiring density, and cooling power make monolithic devices unlikely to scale to fault-tolerant sizes in the foreseeable future. Distributed quantum supercomputers offer a path to overcome these limitations. We propose a prospective distributed quantum computing architecture based on lattice surgery with support for modular and distributed operations, with a focus on superconducting qubits. We develop a resource-estimation framework and software tool tailored to distributed FTQC, enabling end-to-end analysis of practical quantum algorithms on our proposed architecture with various hardware configurations, spanning different node sizes, inter-node entanglement generation rates and distillation protocols. Our extensive benchmarking across eight applications and thousands of hardware configurations, shows that resource estimation driven architecture design is crucial for scalability. We provide concrete design configurations that have feasible resource requirements, recommendations for hardware design and system organization. More broadly, our work provides a rigorous methodology for architectural pathfinding, capable of informing system designs and guiding future research priorities.
[909] arXiv:2508.21318 (replaced) [pdf, html, other]: Title: Signed counting of partition matrices

Shane Chern, Shishuo Fu

Comments: 28 pages. Most of Section 4 has been rewritten, with more questions raised in Outlook

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

We prove that the signed counting (with respect to the parity of the ``$\operatorname{inv}$'' statistic) of partition matrices equals the cardinality of a subclass of inversion sequences. In the course of establishing this result, we introduce an interesting class of partition matrices called improper partition matrices. We further show that a subset of improper partition matrices is equinumerous with the set of Motzkin paths. Such an equidistribution is established both analytically and bijectively.
[910] arXiv:2509.13576 (replaced) [pdf, html, other]: Title: Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction for Sparse-View CT

Haodong Li, Shuo Han, Haiyang Mao, Yu Shi, Changsheng Fang, Jianjia Zhang, Weiwen Wu, Hengyong Yu

Comments: 17 pages, 15 figures, accepted by IEEE Transactions on Medical Imaging

Journal-ref: IEEE Transactions on Medical Imaging, 2026 (early access)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Sparse-View CT (SVCT) reconstruction enhances temporal resolution and reduces radiation dose, yet its clinical use is hindered by artifacts due to view reduction and domain shifts from scanner, protocol, or anatomical variations, leading to performance degradation in out-of-distribution (OOD) scenarios. In this work, we propose a Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction (CDPIR) framework to tackle the OOD problem in SVCT. CDPIR integrates cross-distribution diffusion priors, derived from a Scalable Interpolant Transformer (SiT), with model-based iterative reconstruction methods. Specifically, we train a SiT backbone, an extension of the Diffusion Transformer (DiT) architecture, to establish a unified stochastic interpolant framework, leveraging Classifier-Free Guidance (CFG) across multiple datasets. By randomly dropping the conditioning with a null embedding during training, the model learns both domain-specific and domain-invariant priors, enhancing generalizability. During sampling, the globally sensitive transformer-based diffusion model exploits the cross-distribution prior within the unified stochastic interpolant framework, enabling flexible and stable control over multi-distribution-to-noise interpolation paths and decoupled sampling strategies, thereby improving adaptation to OOD reconstruction. By alternating between data fidelity and sampling updates, our model achieves state-of-the-art performance with superior detail preservation in SVCT reconstructions. Extensive experiments demonstrate that CDPIR significantly outperforms existing approaches, particularly under OOD conditions, highlighting its robustness and potential clinical value in challenging imaging scenarios.
[911] arXiv:2509.19318 (replaced) [pdf, html, other]: Title: Scensory: Real-Time Robotic Olfactory Perception for Joint Identification and Source Localization

Yanbaihui Liu, Erica Babusci, Claudia K. Gunsch, Boyuan Chen

Comments: Our project website is at: this http URL

Subjects: Signal Processing (eess.SP); Robotics (cs.RO)

While robotic perception has advanced rapidly in vision and touch, enabling robots to reason about indoor fungal contamination from weak, diffusion-dominated chemical signals remains an open challenge. We introduce Scensory, a learning-based robotic olfaction framework that simultaneously identifies fungal species and localizes their source from short time series measured by affordable, cross-sensitive VOC sensor arrays. Temporal VOC dynamics encode both chemical and spatial signatures, which we decode through neural networks trained on robot-automated data collection with spatial supervision. Across five fungal species, Scensory achieves up to 89.85% species accuracy and 87.31% source localization accuracy under ambient conditions with 3-7s sensor inputs. These results demonstrate real-time, spatially grounded perception from diffusion-dominated chemical signals, enabling scalable and low-cost source localization for robotic indoor environmental monitoring.
[912] arXiv:2509.25630 (replaced) [pdf, html, other]: Title: When Langevin Monte Carlo Meets Randomization: New Sampling Algorithms with Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness

Xiaojie Wang, Bin Yang

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)

Efficient sampling from complex and high dimensional target distributions turns out to be a fundamental task in diverse disciplines such as scientific computing, statistics and machine learning. In this paper, we propose a new kind of randomized splitting Langevin Monte Carlo (RSLMC) algorithm for sampling from high dimensional distributions without log-concavity. Compared with the existing randomized Langevin Monte Carlo (RLMC), the newly proposed RSLMC algorithm requires less evaluations of gradients and is thus computationally cheaper. Under the gradient Lipschitz condition and the log-Sobolev inequality, we prove a uniform-in-time error bound in $\mathcal{W}_2$-distance of order $O(\sqrt{d}h)$ for both RLMC and RSLMC sampling algorithms, which matches the best one in the literature under the log-concavity condition. Moreover, when the gradient of the potential $U$ is non-globally Lipschitz with superlinear growth, new modified R(S)LMC algorithms are introduced and analyzed, with non-asymptotic error bounds established. Numerical examples are finally reported to corroborate the theoretical findings.
[913] arXiv:2510.04548 (replaced) [pdf, html, other]: Title: Learning Linear Regression with Low-Rank Tasks in-Context

Kaito Takanami, Takashi Takahashi, Yoshiyuki Kabashima

Comments: Accepted at AISTATS 2026

Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Machine Learning (stat.ML)

In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common structure. In this work, we address this problem by analyzing a linear attention model trained on low-rank regression tasks. Within this setting, we precisely characterize the distribution of predictions and the generalization error in the high-dimensional limit. Moreover, we find that statistical fluctuations in finite pre-training data induce an implicit regularization. Finally, we identify a sharp phase transition of the generalization error governed by task structure. These results provide a framework for understanding how transformers learn to learn the task structure.
[914] arXiv:2511.18884 (replaced) [pdf, other]: Title: Robust Nonlinear Transform Coding: A Framework for Generalizable Joint Source-Channel Coding

Jihun Park, Junyong Shin, Jinsung Park, Yo-Seb Jeon

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

This paper proposes robust nonlinear transform coding (Robust-NTC), a generalizable digital joint source-channel coding (JSCC) framework that couples variational latent modeling with channel-adaptive transmission. Unlike learning-based JSCC methods that implicitly absorb channel variations, Robust-NTC explicitly models element-wise latent distributions via a variational objective with a Gaussian proxy for quantization and channel noise, allowing encoder-decoder to capture latent uncertainty without channel-specific training. Using the learned statistics, Robust-NTC also facilitates rate-distortion optimization to adaptively select element-wise quantizers and bit depths according to online channel conditions. To support practical deployment, Robust-NTC is integrated into an orthogonal frequency-division multiplexing (OFDM) system, where a unified resource allocation framework jointly optimizes latent quantization, bit allocation, modulation order, and power allocation to minimize transmission latency while guaranteeing learned distortion targets. Simulation results demonstrate that for practical OFDM systems, Robust-NTC achieves superior rate-distortion efficiency and stable reconstruction fidelity compared to both a conventional separated coding scheme and digital JSCC baselines across various channel conditions.
[915] arXiv:2512.08216 (replaced) [pdf, html, other]: Title: Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation

Aneesh Rangnekar, Harini Veeraraghavan

Comments: Accepted for publication in Transactions on Machine Learning Research (TMLR), 2026. Code available at: this https URL

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Accurate segmentation of lung tumors from 3D computed tomography (CT) scans is essential for automated treatment planning and response assessment. Despite self-supervised pretraining on numerous datasets, state-of-the-art transformer backbones remain susceptible to out-of-distribution (OOD) inputs, often producing confidently incorrect segmentations with potential for risk in clinical deployment. Hence, we introduce RF-Deep, a lightweight post-hoc random forests-based framework that leverages deep features trained with limited outlier exposure, requiring as few as 40 labeled scans (20 in-distribution and 20 OOD), to improve scan-level OOD detection. RF-Deep repurposes the hierarchical features from the pretrained-then-finetuned segmentation backbones, aggregating features from multiple regions-of-interest anchored to predicted tumor regions to capture OOD likelihood.
We evaluated RF-Deep on 2,232 CT volumes spanning near-OOD (pulmonary embolism, COVID-19 negative) and far-OOD (kidney cancer, healthy pancreas) datasets. RF-Deep achieved AUROC >~93 on the challenging near-OOD datasets, where it outperformed the next best method by 4--7 percentage points, and produced near-perfect detection (AUROC >~99) on far-OOD datasets. The approach also showed transferability to two blinded validation datasets under the ensemble configuration (COVID-19 positive and breast cancer; AUROC >~94). RF-Deep maintained consistent performance across backbones of different depths and pretraining strategies, demonstrating applicability of post-hoc detectors as a safety filter for clinical deployment of tumor segmentation pipelines.
[916] arXiv:2512.16001 (replaced) [pdf, html, other]: Title: Concurrence: A dependence criterion for time series, applied to biological data

Evangelos Sariyanidi, John D. Herrington, Lisa Yankowitz, Pratik Chaudhari, Theodore D. Satterthwaite, Casey J. Zampella, Jeffrey S. Morris, Edward Gunning, Robert T. Schultz, Russell T. Shinohara, Birkan Tunc

Comments: arXiv admin note: text overlap with arXiv:2508.02703

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Measuring the statistical dependence between observed signals is a primary tool for scientific discovery. However, biological systems often exhibit complex non-linear interactions that currently cannot be captured without a priori knowledge or large datasets. We introduce a criterion for dependence, whereby two time series are deemed dependent if one can construct a classifier that distinguishes between temporally aligned vs. misaligned segments extracted from them. We show that this criterion, concurrence, is theoretically linked with dependence, and can become a standard approach for scientific analyses across disciplines, as it can expose relationships across a wide spectrum of signals (fMRI, physiological and behavioral data) without ad-hoc parameter tuning or large amounts of data.
[917] arXiv:2512.19156 (replaced) [pdf, html, other]: Title: Classical billiards can compute

Eva Miranda, Isaac Ramos

Comments: 17 pages, 7 figures. Appendix added. The results of the paper have been streamlined and strengthened

Subjects: Dynamical Systems (math.DS); Computational Complexity (cs.CC); Mathematical Physics (math-ph)

We show that two-dimensional billiard systems are Turing complete, in the sense that the halting of any Turing machine with a given input is equivalent to a certain bounded trajectory in this system entering a specified open set. Billiards serve as idealized models of particle motion with elastic reflections and arise naturally as limits of smooth Hamiltonian systems under steep confining potentials. Our results establish the existence of undecidable trajectories in physically natural billiard-type models, including billiard-type models arising in hard-sphere gases and in collision-chain limits of celestial mechanics.
[918] arXiv:2601.10479 (replaced) [pdf, html, other]: Title: H-EFT-VA: An Effective-Field-Theory Variational Ansatz with Provable Barren Plateau Avoidance

Eyad I.B Hamid

Comments: v2: Expanded Section III with explicit circuit architecture description. Added Section IV.F to discuss static initialization limitations and reference-state dependence. Abstract and conclusion updated to scope TFIM results and cite concurrent work on dynamic extensions. 8 pages, 5 figures, Appendix

Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Mathematical Physics (math-ph)

Variational Quantum Algorithms (VQAs) are critically threatened by the Barren Plateau (BP) phenomenon. In this work, we introduce the H-EFT Variational Ansatz (H-EFT-VA), an architecture inspired by Effective Field Theory (EFT). By enforcing a hierarchical "UV-cutoff" on initialization, we theoretically restrict the circuit's state exploration, preventing the formation of approximate unitary 2-designs. We provide a rigorous proof that this localization guarantees an inverse-polynomial lower bound on the gradient variance: $Var[\partial\theta] \in \Omega(1/poly(N))$. Crucially, unlike approaches that avoid BPs by limiting entanglement, we demonstrate that H-EFT-VA maintains volume-law entanglement and near-Haar purity, ensuring sufficient expressibility for complex quantum states. Extensive benchmarking across 16 experiments on the Transverse Field Ising Model confirms a 109x improvement in energy convergence and a 10.7x increase in ground-state fidelity over standard Hardware-Efficient Ansätze (HEA), with statistical significance of $p < 10^{-88}$. The static framework is most effective for Hamiltonians with moderate reference-state overlap; extension to systems with larger reference-state gaps is addressed through dynamic UV-cutoff relaxation strategies explored in concurrent work.
[919] arXiv:2602.23666 (replaced) [pdf, html, other]: Title: Active Learning for Planet Habitability Classification under Extreme Class Imbalance

R. I. El-Kholy, Z. M. Hayman

Comments: 20 pages, 9 figures, 2 tables

Subjects: Earth and Planetary Astrophysics (astro-ph.EP); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)

The increasing size and heterogeneity of exoplanet catalogs have made systematic habitability assessment challenging, particularly given the extreme scarcity of potentially habitable planets and the evolving nature of their labels. In this study, we explore the use of pool-based active learning to improve the efficiency of habitability classification under realistic observational constraints. We construct a unified dataset from the Habitable World Catalog and the NASA Exoplanet Archive and formulate habitability assessment as a binary classification problem. A supervised baseline based on gradient-boosted decision trees is established and optimized for recall in order to prioritize the identification of rare potentially habitable planets. This model is then embedded within an active learning framework, where uncertainty-based margin sampling is compared against random querying across multiple runs and labeling budgets. We find that active learning substantially reduces the number of labeled instances required to approach supervised performance, demonstrating clear gains in label efficiency. To connect these results to a practical astronomical use case, we aggregate predictions from independently trained active-learning models into an ensemble and use the resulting mean probabilities and uncertainties to rank planets originally labeled as non-habitable. This procedure identifies a single robust candidate for further study, illustrating how active learning can support conservative, uncertainty-aware prioritization of follow-up targets rather than speculative reclassification. Our results indicate that active learning provides a principled framework for guiding habitability studies in data regimes characterized by label imbalance, incomplete information, and limited observational resources.
[920] arXiv:2603.03700 (replaced) [pdf, html, other]: Title: Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Saptarshi Chakraborty, Quentin Berthet, Peter L. Bartlett

Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)

Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional structure common in real data, such as that arising in natural images. In this work, we study the statistical convergence of score-based diffusion models for learning an unknown distribution $\mu$ from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-$p$ distance. Unlike prior results, our guarantees hold for all $p \ge 1$ and require only a finite-moment assumption on $\mu$, without compact-support, manifold, or smooth-density conditions. Specifically, given $n$ i.i.d.\ samples from $\mu$ with finite $q$-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-$p$ error between the learned distribution $\hat{\mu}$ and $\mu$ scales as $\mathbb{E}\, \mathbb{W}_p(\hat{\mu},\mu) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(\mu)}\right),$ where $d^\ast_{p,q}(\mu)$ is the $(p,q)$-Wasserstein dimension of $\mu$. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on $d^\ast_{p,q}(\mu)$ rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed $(p,q)$-Wasserstein dimension also extends the notion of classical Wasserstein dimension to distributions with unbounded support, which may be of independent theoretical interest.
[921] arXiv:2603.06545 (replaced) [pdf, html, other]: Title: LiveSense: A Real-Time Wi-Fi Sensing Platform for Range-Doppler on COTS Laptop

Jessica Sanson, Rahul C. Shah, Maximilian Pinaroc, Cagri Tanriover, Valerio Frascolla

Journal-ref: Percom 2026

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)

We present LiveSense - a cross-platform that transforms a commercial off-the-shelf (COTS) Wi-Fi Network Interface Card (NIC) on a laptop into a centimeter-level Range-Doppler sensor while preserving simultaneous communication capability. The laptops are equipped with COTS Intel AX211 (Wi-Fi 6E) or Intel BE201 (Wi-Fi 7) NICs. LiveSense can (i) Extract fully-synchronized channel state information (CSI) at >= 40 Hz, (ii) Perform time-phase alignment and self-interference cancellation on-device, and (iii) Provide a real-time stream of range, Doppler, subcarrier magnitude/phase and annotated video frames to a Python/Qt Graphical User Interface (GUI). The demo will showcase the ability to detect (i) Distance and radial velocity of attendees within a few meters of the device, (ii) Micro-motion (respiration), and (iii) Hand-gesture ranging. To the best of our knowledge, this is the first-ever demo to obtain accurate range information of targets from commercial Wi-Fi, despite the limited 160 MHz bandwidth.
[922] arXiv:2603.10845 (replaced) [pdf, html, other]: Title: Human Presence Detection via Wi-Fi Range-Filtered Doppler Spectrum on Commodity Laptops

Jessica Sanson, Rahul C. Shah, Valerio Frascolla

Comments: 6 pages, Conference

Journal-ref: Percom 2026

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Human Presence Detection (HPD) is key to enable intelligent power management and security features in everyday devices. In this paper we propose the first HPD solution that leverages monostatic Wi-Fi sensing and detects user position using only the built-in Wi-Fi hardware of a device, with no need for external devices, access points, or additional sensors. In contrast, existing HPD solutions for laptops require external dedicated sensors which add cost and complexity, or rely on camera-based approaches that introduce significant privacy concerns. We herewith introduce the Range-Filtered Doppler Spectrum (RF-DS), a novel Wi-Fi sensing technique for presence estimation that enables both range-selective and temporally windowed detection of user presence. By applying targeted range-area filtering in the Channel Impulse Response (CIR) domain before Doppler analysis, our method focuses processing on task-relevant spatial zones, significantly reducing computational complexity. In addition, the use of temporal windows in the spectrum domain provides greater estimator stability compared to conventional 2D Range-Doppler detectors. Furthermore, we propose an adaptive multi-rate processing framework that dynamically adjusts Channel State Information (CSI) sampling rates-operating at low frame rates (10Hz) during idle periods and high rates (100Hz) only when motion is detected. To our knowledge, this is the first low-complexity solution for occupancy detection using monostatic Wi-Fi sensing on a built-in Wi-Fi network interface controller (NIC) of a commercial off-the-shelf laptop that requires no external network infrastructure or specialized sensors. Our solution can scale across different environments and devices without calibration or retraining.
[923] arXiv:2603.15055 (replaced) [pdf, html, other]: Title: Spatio-temporal probabilistic forecast using MMAF-guided learning

Leonardo Bardi, Imma Valentina Curato, Lorenzo Proietti

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We present a theory-guided generalized Bayesian methodology for spatio-temporal raster data, which we use to train an ensemble of stochastic feed-forward neural networks with Gaussian-distributed weights. The methodology incorporates the dependence and causal structure of a spatio-temporal Ornstein-Uhlenbeck process into training and inference by enforcing constraints on the design of the data embedding and the related optimization routine. In inference mode, the networks are employed to generate causal ensemble forecasts by applying different initial conditions at different horizons. We call this workflow MMAF-guided learning. Experiments conducted on both synthetic and real data demonstrate that our forecasts remain calibrated across multiple time horizons. Moreover, we show that on such data, shallow feed-forward architectures can achieve performance comparable to, and in some cases better than, convolutional or diffusion deep learning architectures used in probabilistic forecasting tasks.
[924] arXiv:2603.23547 (replaced) [pdf, html, other]: Title: PDGMM-VAE: A Variational Autoencoder with Adaptive Per-Dimension Gaussian Mixture Model Priors for Nonlinear ICA

Yuan-Hao Wei, Yan-Jie Sun

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Independent component analysis is a core framework within blind source separation for recovering latent source signals from observed mixtures under statistical independence assumptions. In this work, we propose PDGMM-VAE, a source-oriented variational autoencoder in which each latent dimension, interpreted explicitly as an individual source component, is assigned its own adaptive Gaussian mixture model prior. The proposed framework imposes heterogeneous per-dimension prior constraints, enabling different latent dimensions to model different non-Gaussian source marginals within a unified probabilistic encoder-decoder architecture. The parameters of these source-specific GMM priors are not fixed in advance, but are jointly learned together with the encoder and decoder under the overall training objective. Beyond the model construction itself, we provide a theoretical analysis clarifying why adaptive per-dimension prior design is meaningful in this setting. In particular, we show that heterogeneous per-dimension priors reduce latent permutation symmetry relative to homogeneous shared priors, and we further show that the KL regularization induced by the adaptive GMM prior creates source-specific attraction behavior that helps explain source-wise specialization during training. We also clarify the relation of the proposed model to the standard VAE and provide a weak recovery statement in an idealized linear low-noise regime. Experimental results on both linear and nonlinear mixing problems show that PDGMM-VAE can recover latent source signals and fit source-specific non-Gaussian marginals effectively. These results suggest that adaptive per-dimension mixture-prior design provides a principled and promising direction for VAE-based ICA and source-oriented generative modeling.
[925] arXiv:2603.27189 (replaced) [pdf, html, other]: Title: Conformal Prediction Assessment: A Framework for Conditional Coverage Evaluation and Selection

Zheng Zhou, Xiangfei Zhang, Chongguang Tao, Yuhong Yang

Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

Conformal prediction provides rigorous distribution-free finite-sample guarantees for marginal coverage under the assumption of exchangeability, but may exhibit systematic undercoverage or overcoverage for specific subpopulations. Assessing conditional validity is challenging, as standard stratification methods suffer from the curse of dimensionality. We propose Conformal Prediction Assessment (CPA), a framework that reframes the evaluation of conditional coverage as a supervised learning task by training a reliability estimator that predicts instance-level coverage probabilities. Building on this estimator, we introduce the Conditional Validity Index (CVI), which decomposes reliability into safety (undercoverage risk) and efficiency (overcoverage cost). We establish convergence rates for the reliability estimator and prove the consistency of CVI-based model selection. Extensive experiments on synthetic and real-world datasets demonstrate that CPA effectively diagnoses local failure modes and that CC-Select, our CVI-based model selection algorithm, consistently identifies predictors with superior conditional coverage performance.
[926] arXiv:2603.27397 (replaced) [pdf, html, other]: Title: Benchmarking Quantum Computers via Protocols, Comparing Superconducting and Ion-Trap Quantum Technology

Nitay Mayo, Tal Mor, Yossi Weinstein

Comments: 27 body pages, 10 appendix pages, 34 figures

Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)

Both Superconducting and Ion-Trap are leading quantum architectures common in the current landscape of the quantum computing field, each with distinct characteristics and operational constraints. Understanding and measuring the underlying quantumness of these devices is essential for assessing their readiness for practical applications and guiding future progress and research. Building on earlier work (Meirom, Mor, Weinstein Arxiv 2505.12441), we utilize a benchmarking strategy applicable for comparing these two architectures by measuring "quantumness" directly on optimal sub-chips. Distinct from existing metrics, our approach employs rigorous binary fidelity thresholds derived from the classical limits of state transfer. This enables us to definitively establish quantum advantage of a designated sub-region. Here we apply this quality assurance methodology to platforms from both technologies. This comparison provides a protocol-based evaluation of quantumness advantage, revealing not only the strengths and weaknesses of each tested chip and its sub-chips but also offering a common language for their assessment. By abstracting away technical differences in the final result, we demonstrate a benchmarking strategy that bridges the gap between disparate quantum-circuit technologies, enabling fair performance comparisons and establishing a critical foundation for evaluating future claims of quantum advantage. This work was made possible by policies of two companies who enable independent and objective assessment on their quantum computers and sub-chips. In the name of science, we encourage other companies to emulate the independent qubit availability and the fair pricing which allow researchers to preform such assessments.
[927] arXiv:2604.02832 (replaced) [pdf, html, other]: Title: Transfer Learning for Loan Recovery Prediction under Distribution Shifts with Heterogeneous Feature Spaces

Christopher Gerling, Hanqiu Peng, Ying Chen, Stefan Lessmann

Comments: 35 pages, 14 figures. Christopher Gerling had previously withdrawn his submission due to NDA restrictions, and that matter was resolved. We are authorized to publish the preprint now

Subjects: Risk Management (q-fin.RM); Machine Learning (cs.LG)

Accurate forecasting of recovery rates (RR) is central to credit risk management and regulatory capital determination. In many loan portfolios, however, RR modeling is constrained by data scarcity arising from infrequent default events. Transfer learning (TL) offers a promising avenue to mitigate this challenge by exploiting information from related but richer source domains, yet its effectiveness critically depends on the presence and strength of distributional shifts, and on potential heterogeneity between source and target feature spaces.
This paper introduces FT-MDN-Transformer, a mixture-density tabular Transformer architecture specifically designed for TL in RR forecasting across heterogeneous feature sets. The model produces both loan-level point estimates and portfolio-level predictive distributions, thereby supporting a wide range of practical RR forecasting applications. We evaluate the proposed approach in a controlled Monte Carlo simulation that facilitates systematic variation of covariate, conditional, and label shifts, as well as in a real-world transfer setting using the Global Credit Data (GCD) loan dataset as source and a novel bonds dataset as target.
Our results show that FT-MDN-Transformer outperforms baseline models when target-domain data are limited, with particularly pronounced gains under covariate and conditional shifts, while label shift remains challenging. We also observe its probabilistic forecasts to closely track empirical recovery distributions, providing richer information than conventional point-prediction metrics alone. Overall, the findings highlight the potential of distribution-aware TL architectures to improve RR forecasting in data-scarce credit portfolios and offer practical insights for risk managers operating under heterogeneous data environments.
[928] arXiv:2604.04685 (replaced) [pdf, html, other]: Title: Unsharp Measurement with Adaptive Gaussian POVMs for Quantum-Inspired Image Processing

Debashis Saikia, Bikash K. Behera, Mayukha Pal, Prasanta K. Panigrahi

Subjects: Quantum Physics (quant-ph); Computer Vision and Pattern Recognition (cs.CV)

We propose a data-adaptive probabilistic intensity remapping framework for structure-preserving transformation of grayscale images. The suggested method formulates intensity transformation as a continuous, data-driven remapping process, in contrast to traditional histogram-based techniques that rely on hard thresholding and generate piecewise-constant mappings. The image statistics yield representative intensity values, and Gaussian-based weighting methods probabilistically allocate each pixel to several components. Smooth transitions while preserving structural features are achieved by computing the output intensity as an expectation over these components. A smooth transition from soft probabilistic remapping to hard assignment is made possible by the introduction of a nonlinear sharpening parameter $\gamma$ to regulate the degree of localization. This offers clear control over the trade-off between intensity discrimination and smoothing. Furthermore, the resolution of the remapping function is determined by the number of components $k$. When compared to thresholding-based methods, experimental results on standard benchmark images show that the suggested method achieves better structural fidelity and controlled information reduction as measured by PSNR, SSIM, and entropy. Overall, by allowing continuous, probabilistic intensity modifications, the framework provides a robust and efficient substitute for discrete thresholding.
[929] arXiv:2604.19738 (replaced) [pdf, html, other]: Title: Phase Transitions in the Fluctuations of Functionals of Random Neural Networks

Simmaco Di Lillo, Leonardo Maini, Domenico Marinucci

Subjects: Probability (math.PR); Machine Learning (cs.LG); Machine Learning (stat.ML)

We establish central and non-central limit theorems for sequences of functionals of the Gaussian output of an infinitely-wide random neural network on the d-dimensional sphere . We show that the asymptotic behaviour of these functionals as the depth of the network increases depends crucially on the fixed points of the covariance function, resulting in three distinct limiting regimes: convergence to the same functional of a limiting Gaussian field, convergence to a Gaussian distribution, convergence to a distribution in the Qth Wiener chaos. Our proofs exploit tools that are now classical (Hermite expansions, Diagram Formula, Stein-Malliavin techniques), but also ideas which have never been used in similar contexts: in particular, the asymptotic behaviour is determined by the fixed-point structure of the iterative operator associated with the covariance, whose nature and stability governs the different limiting regimes.
[930] arXiv:2604.19855 (replaced) [pdf, html, other]: Title: Toward designing workload-aware Surface Code Architectures

Archisman Ghosh, Avimita Chatterjee, Swaroop Ghosh

Comments: 14 pages, 10 figures

Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR)

Practical quantum advantage is expected to depend on fault-tolerant quantum computing, although the architectural overhead needed to support fault tolerance is still extremely high. Prior FTQC designs generally emphasize either fast logical-qubit accessibility at the cost of significant qubit overhead, or high logical-qubit density at the cost of added workload latency. We propose an architecture that balances these competing objectives by placing surface-code patches around an ancilla-centric region, which yields nearly uniform ancilla access for all data qubits. Building on this design, we introduce a new workload-driven placement method that uses the $T$-gate profile of an application to determine an effective floorplan. We further provide a reconfigurable optimization for reducing the latency of $Y$-gate measurements on a per-workload basis. To improve flexibility, we also study concurrent execution of multiple programs on the same architecture. Numerical evaluation indicates that our approach keeps cycles per instruction near the optimal regime while reducing the number of required data tiles by up to $\sim21\%$, and achieves up to $\sim90\%$ efficiency when running 10 programs concurrently.

Total of 930 entries

Showing up to 2000 entries per page: fewer | more | all

Computer Science

Showing new listings for Friday, 24 April 2026

New submissions (showing 543 of 543 entries)

Cross submissions (showing 47 of 47 entries)

Replacement submissions (showing 340 of 340 entries)