Programme
Christian Barnard Hall, Coral Beach Hotel & Resort, Paphos
Panelists: Rafael C. Cardoso · Amit Chopra · Louise Dennis · Julio Cesar dos Reis · Michael Winikoff
Day 1 — Monday, May 25
| 08:45–09:00 | Welcome & Opening |
| 09:00–10:15 | Engineering Methodologies and Frameworks I |
| 09:00–09:20 | Reconciling Simulation and Distributed Deployment in a MAS Development Framework Regular Demo SessionIn a world where agents are attracting renewed attention as a model for distributing artificial intelligence applications, the ability to test and simulate systems of agents becomes more important. However, agent-based simulation platforms and network-distributed multi-agent system deployment frameworks traditionally differ strongly in their approach to agent interaction and execution model, although each could benefit from featuring the capabilities of the other. Following in the tracks of other works that have attempted to bridge the gap for specific cases, this paper makes a primary investigation on the requirements that a framework would need to satisfy in order to support both large-scale simulations and real-life deployments with the same agent code. We present a minimal set of principles upon which features can be developed, relating to interaction support, execution control, and environment modeling. We developed a set of features that allow compatibility between an ABM application and an existing MAS framework and a proof-of-concept implementation of a classic ABM application, showing how ABM features can be built into an agent deployment framework so that the same application can run in simulation and in real-life-deployment mode, validating the claim that the same framework could be used for both scenarios. |
| 09:20–09:40 | Evaluating the Benefits of Orpheus for Iterative and Incremental Development in MAS Regular Demo SessionTraditional agent-oriented programming languages hinder iterative and incremental development in multiagent systems (MAS) because message-centric interaction protocols intertwine coordination logic with internal agent behaviors, causing changes in interaction requirements to cascade across system components. Building on the Orpheus programming model, which uses declarative information protocols in the Blindingly Simple Protocol Language (BSPL) to decouple coordination from agent reasoning, this work analyzes how such abstractions support iterative and incremental development. By localizing change and mitigating the propagation of interaction updates, Orpheus enables more flexible, robust, and maintainable MAS implementations. Tool support that automatically generates protocol-aware interfaces further facilitates evolving communication requirements without pervasive code revisions. The evaluation demonstrates significant advantages for development agility. |
| 09:40–10:00 | Engineering Norm-Aware BDI Agents RegularNorms are an important abstraction for multi-agent systems (MAS) because they specify flexible constraints. As constraints they help bring order to agent behaviour. However, being flexible, they can be violated, and thus do not over-constrain autonomous entities. In order to build MAS with cognitive agents that use norms we need to be able to engineer these agents to be norm-aware, i.e. being able to take norms into account in their reasoning. Such agents need the ability to decide to: (1) knowingly violate a norm, whilst understanding the consequences; (2) conform with a norm, which may require adapting its plans, e.g. adding extra steps; and (3) change their choice of plan for a given goal, due to consequences of a norm. Existing work on norm-aware agents does not cover all these cases. Additionally, to support adoption, an approach to engineering norm-aware agents should be usable with existing agent-oriented programming languages. This paper defines an approach, including a step-by-step process, to engineering norm-aware BDI (Belief-Desire-Intention) agents that handle these three cases and that can be used with off-the-shelf BDI agent languages. |
| 10:00–10:15 | Towards a Model-Driven Continuous Assurance Framework for Autonomous Systems RegularAutonomous systems must sustain justified confidence in their correctness and safety throughout their operational lifecycle. Traditional assurance methods separate development-time assurance from runtime assurance, yielding fragmented arguments that cannot adapt to runtime changes or system updates. Towards addressing this, we propose a unified Continuous Assurance Framework integrating design-time, runtime, and evolution-time assurance within a traceable, model-driven workflow, and instantiate its design-time phase using two formal verification methods: RoboChart for functional correctness and PRISM for probabilistic risk analysis. We also propose a model-driven transformation pipeline, implemented as an Eclipse plugin, that automatically regenerates structured assurance arguments whenever formal specifications or their verification results change, ensuring traceability. We demonstrate our approach on a nuclear inspection robot scenario, and discuss alignment with regulator-endorsed best practices. |
| 10:15–11:00 | Coffee Break |
| 11:00–11:50 | Engineering Methodologies and Frameworks II |
| 11:00–11:15 | A Graphical Interface for Visualising and Debugging MASPY Agents Regular Demo SessionMASPY is a Python multi-agent system framework grounded in the BDI model, designed to lower the barrier to entry for developers building and experimenting with multi-agent systems. Although MASPY provides rich internal information about agent reasoning and system execution, its visualisation support is limited to a command-line interface, hindering the inspection and analysis of complex behaviours. To address this, we present MASPY-GUI, a graphical interface for visualising and debugging MASPY systems. MASPY-GUI receives execution data directly from the framework and presents it through four interactive views: a dashboard summarising system state and intentions, an agents view detailing beliefs, goals, and intention histories, an environment view visualising perceptions and their evolution over time, and a messages view showing inter-agent communication and supporting message exchange diagrams. Empirical results show that the interface scales well with increasing numbers of agents and intentions, and to a lesser extent with message volume. A user study also reports a high System Usability Scale score and positive qualitative feedback. |
| 11:15–11:30 | OptiMA: A Transaction-Based Framework with Schedule Optimization for Very Complex Multi-Agent Systems RegularIn recent years, multi-agent system (MAS) research has increasingly focused on complex systems composed of collaborating agents based on large language models (LLMs) to accomplish sophisticated tasks. These systems rely on dynamic and heterogeneous sets of agents, concurrent access to multiple external tools, and cascading operations that span multiple agents. Existing work on the coordination of MAS mostly adopts semantic approaches for model design such as agent architectures, interaction protocols, and hierarchical structures. However, we argue that greater attention to the execution layer is required to coordinate modern MAS models. To address this gap, we propose OptiMA, a framework that encapsulates atomic agent actions as transactions and uses lock-based concurrency control to provide a high level of consistency and isolation. It gives system designers explicit control over the execution process by incorporating transaction templates into the design process. To reduce potential performance drawbacks of this approach, OptiMA employs a transaction scheduling mechanism. In this paper, we also present the theoretical results on the transaction scheduling problem (TxnSP) and introduce a metaheuristic approach for schedule optimization, which is used within the OptiMA framework. Experimental results show that transaction scheduling can improve performance up to 45% for the tested model configurations, indicating that high system consistency can be achieved while minimizing performance cost. |
| 11:30–11:50 | MAMS-Deploy: A Hypermedia-Driven Multi-Agent System for Autonomous Microservice Deployment Regular Demo SessionThis paper presents a novel Multi-Agent MicroServices (MAMS) framework, centered on a formal deployment ontology. This ontology models the environment as a dynamic knowledge graph, separating deployment intent from runtime reality. We demonstrate how Belief-Desire-Intention (BDI) agents autonomously reason over this graph to assemble, deploy, and manage containerized systems. Agents explore the environment via a hypermedia-driven API following Hypermedia as the Engine of Application State (HATEOAS) principles, react to system-wide events using WebSub notifications, and coordinate tasks through direct messaging. This approach enables agents to translate high-level goals into low-level Docker API commands, providing a practical blueprint for bridging the gap between declarative agent reasoning and the imperative nature of modern cloud infrastructure. |
| 11:50–12:30 | Interaction & Coordination I |
| 11:50–12:05 | Integrating Agentic Web Standards in Multi-Agent Systems Using Hypermedia RegularThe development of language agents is leading to the creation of new standards for multi-agent systems. Many such standards, including the Model Context Protocol (MCP), the Agent2Agent (A2A) Protocol, and the Universal Tool Calling Protocol (UTCP), enable agents to interact with Web-based systems, resulting in the creation of an Agentic Web. Hypermedia is a core feature of the Web that enables users to discover information at run time, and has been used to allow agents to interact with Web services and physical devices in open Web environments through hypermedia standards, such as the Web of Things (WoT) Thing Description. However, MCP and A2A do not natively define hypermedia controls, which introduces coupling between agents and their environments, limiting their independent evolution in open environments. Building on prior work about hypermedia multi-agent systems, we develop a uniform hypermedia interface applicable to different Agentic Web standards. This interface enables agents to discover and interact with entities of different types. This interface relies on signifiers, which are hypermedia descriptions of affordances. We provide automatic generation of signifiers for MCP tools, A2A agents, UTCP tools, and WoT Things. We propose a natural language-based selection method for such signifiers so that agents perceive only contextually relevant signifiers. We expose each signifier perceived by an agent as an MCP tool that performs the action described by the signifier. Finally, we demonstrate and evaluate these concepts in a robotic lab environment. |
| 12:05–12:15 | From Task Allocation to Risk Clearing: A Unifying Interface for Mixed Human-Agent Societies ShortAs humans, robots, and software agents increasingly share safety-critical environments, coordination must move from static task allocation to managing uncertain commitments. Existing frameworks fall short: they either assume rigid, static teams or learn opaque joint policies that are hard to adapt and difficult to integrate with human decision-makers. To overcome these limitations, we propose Risk-Aware Option Clearing (ROC), a unifying coordination mechanism in which agents expose options (temporally extended skills) paired with risk summaries that predict outcome distributions. A central clearinghouse then assigns tasks by optimizing risk-adjusted mission utility under deadlines and safety constraints. ROC is a family of mechanisms, ranging from deployments where the clearinghouse learns outcome models from data to ones that consume full distributional predictions from agents. By treating risk-aware options as the basic coordination unit, ROC sketches a scalable, transparent infrastructure for integrating heterogeneous agents into future mixed human-agent societies and outlines a research agenda for such risk-aware clearing layers. |
| 12:15–12:30 | Towards Collaborative BDI Agents for Human-AI Teamwork Regular Demo SessionThis paper presents the development of Belief-Desire-Intention (BDI) agents for human-AI teamwork in the Overcooked AI environment, and investigates their effectiveness and transparency as collaborative partners. The proposed agent architecture explicitly models beliefs, goals, and plans, enabling dynamic adaptation of strategies in response to human actions and evolving environmental conditions. Results from a controlled user study demonstrate that the BDI agent achieves superior collaboration and greater transparency compared with state-of-the-art Deep Reinforcement Learning agents. Participants perceived the agent's behaviour as more predictable and its intentions as more readily interpretable, thereby supporting enhanced mutual understanding in collaborative tasks. |
| 12:30–14:00 | Lunch Break |
| 14:00–15:15 | Interaction & Coordination II |
| 14:00–14:20 | Strabo: Declarative Specification and Implementation of Agentic Interaction Protocols Regular Demo SessionThe last few years have witnessed major advances in the modeling and implementation of multiagent systems based on declarative interaction protocols. Our contribution, Strabo, establishes the relevance of these advances to ongoing industry efforts in Agentic AI. Specifically, we consider UCP, the Universal Commerce Protocol, a recent Google-led effort to standardize e-commerce interactions for AI agents. Our exercise is in two parts. One, we model the part of UCP dealing with checkouts as a declarative Langshaw protocol and implement agents using Peach, a programming model for Langshaw. This part of the exercise brings out the advantages of formal, declarative specifications. Two, we show that Peach agents can interoperate with UCP agents implemented by Google, thereby establishing the fidelity of our approach with respect to UCP. Such interoperation enables the incremental introduction of declarative protocols and agents into a conventional setting, indicating a pathway by which EMAS ideas could influence practice without demanding a wholesale update. |
| 14:20–14:40 | Castor and Pollux: A First Demonstration of a BSPL Protocol Discovery Tool Regular Demo SessionSpecifying interaction protocols for Multi-Agent Systems remains a challenging task, especially when adopting information-centric approaches such as BSPL, which require designers to anticipate message structures, parameter roles, and acceptable interleavings. To address this challenge, we propose a novel framework for discovering BSPL protocols rather than specifying them directly. Our approach is based on the intuition that a protocol can be derived from the combination of domain information and distribution control. We introduce Castor, a declarative language that enables designers to model elementary pieces of information, their potential sources and recipients, and their aggregation into business-meaningful information chunks. On top of Castor, we present Pollux, an automated synthesis component that transforms Castor models into BSPL protocols. Pollux formulates protocol generation as a planning problem, where candidate messages are treated as actions and BSPL semantic constraints are enforced during the search. Ambiguities arising from alternative information sources are resolved through configurable heuristics, allowing the exploration of multiple safe and consistent protocol variants. Together, Castor and Pollux support incremental, tool-assisted discovery of BSPL interaction protocols. |
| 14:40–15:00 | SPEAR: An Engineering Case Study of Multi-Agent Coordination for Smart Contract Auditing RegularWe present SPEAR, a multi-agent coordination framework for smart contract auditing that applies established MAS patterns in a realistic security analysis workflow. SPEAR models auditing as a coordinated mission carried out by specialized agents: a Planning Agent prioritizes contracts using risk-aware heuristics, an Execution Agent allocates tasks via the Contract Net protocol, and a Repair Agent autonomously recovers from brittle generated artifacts using a programmatic-first repair policy. Agents maintain local beliefs updated through AGM-compliant revision, coordinate via negotiation and auction protocols, and revise plans as new information becomes available. An empirical study compares the multi-agent design with centralized and pipeline-based alternatives under controlled failure scenarios, focusing on coordination, recovery behavior, and resource use. |
| 15:00–15:15 | Reasoning with Untruthful Announcements RegularUntruthful announcement is a significant part of multi-agent communications. Providing a formal account of such announcements is important for representing and reasoning about effects of actions and epistemic planning in multi-agent domains. This paper attacks the problem of dealing with lying and misleading announcements by defining update models for them. It also shows that these update models yield intuitive results when applying on a pointed Kripke structure for reasoning about the beliefs of agents. |
| 15:15–15:25 | Demo Session I — Lightning Talks
|
| 15:30–16:15 | Demo Session I + Afternoon Coffee — live demonstrations (same papers as above) |
| 16:15–17:35 | Governance, Trust & Explainability |
| 16:15–16:30 | Metanormative Theory for RL-Based Moral Agents RegularThe overlapping disciplines of machine ethics and AI alignment are concerned with designing artificial agents that are aligned with human values and act in ethically acceptable ways. A recent trend is to use reinforcement learning (RL) in the design of such agents while abstracting away from work in moral philosophy. This paper explores the following question: What does it mean for an RL agent to act morally, or to act in ways that are ethically acceptable? We address this question by pursuing two (related) goals. The first is to draw out some ideas from the recent philosophical work in metanormative theory that can guide our thinking about artificial moral agency. The second goal is to examine the architectures of RL agents through the lens of these ideas. This should allow us to identify the RL-based approaches that hold the greatest promise in the context of machine ethics and AI alignment. |
| 16:30–16:45 | The Alignment Flywheel: A Governance-Centric Hybrid MAS for Architecture-Agnostic Safety Regular Demo SessionMulti-agent systems provide mature methodologies for role decomposition, coordination, and normative governance, capabilities that remain essential as increasingly powerful autonomous decision components are embedded within agent-based systems. While learned and generative models substantially expand system capability, their safety behavior is often entangled with training, making it opaque, difficult to audit, and costly to update after deployment. This paper formalizes the Alignment Flywheel as a governance-centric hybrid MAS architecture that decouples decision generation from safety governance. A Proposer, representing any autonomous decision component, generates candidate trajectories, while a Safety Oracle returns raw safety signals through a stable interface. An enforcement layer applies explicit risk policy at runtime, and a governance MAS supervises the Oracle through auditing, uncertainty-driven verification, and versioned refinement. The central engineering principle is patch locality: many newly observed safety failures can be mitigated by updating the governed oracle artifact and its release pipeline rather than retracting or retraining the underlying decision component. The architecture is implementation-agnostic with respect to both the Proposer and the Safety Oracle, and specifies the roles, artifacts, protocols, and release semantics needed for runtime gating, audit intake, signed patching, and staged rollout across distributed deployments. The result is a hybrid MAS engineering framework for integrating highly capable but fallible autonomous systems under explicit, version-controlled, and auditable oversight. |
| 16:45–16:55 | Brokering as a Decision Loop: Trust-Aware Multi-Agent Architectures for Data Marketplaces ShortEarth-observation data marketplaces increasingly depend on repeated service commitments made under uncertainty about provider behavior and delivery risk. Existing brokering workflows treat each request separately. Evidence from prior transactions is retained locally within bilateral relationships and is not used to inform subsequent decisions. This paper makes the case that brokering should be treated as a data management problem. Allocation, pricing, and trust assessment can be expressed as queries over accumulated marketplace knowledge rather than as ad hoc negotiations. We outline SPECTRAM, a multi-agent brokering architecture in which a logical orchestrator coordinates specialized agents over a shared knowledge graph. Agents do not invoke one another, and all interactions are mediated by the orchestrator's state and the shared graph. Service commitments rely on trust and service performance from prior transactions, while transaction outcomes are recorded and used for subsequent decisions. We emphasize agent roles, read/write separation, and orchestration as key design choices for trust-aware brokering. This short paper outlines a research direction for next-generation earth-observation marketplaces and identifies open challenges for MAS and data marketplaces. The proposed framing applies to service marketplaces where transaction history informs future commitments. |
| 16:55–17:15 | Ex-Plan: Explaining BDI Agent Behaviour Through Contrastive Plan Analysis Regular Demo SessionBelief-Desire-Intention (BDI) agents can make decisions that are difficult to interpret from observed behaviour alone. We present Ex-Plan, a trace-based methodology for generating contrastive explanations of AgentSpeak agent behaviour, answering queries of the form "Why did the agent do X rather than Y?". Given an explanandum event and a foil literal, Ex-Plan reconstructs the relevant decision context through AgentSpeak plan structure (triggers and context guards) together and a linear execution trace, and returns a minimal set of trace events that witness where the foil became infeasible. We evaluate the approach on three AgentSpeak scenarios using structured execution logs, demonstrating how discriminating plan conditions can be grounded to specific divergence points in the trace. |
| 17:15–17:35 | AgentSpeaX: Explain, Actually RegularExplainability is a key requirement for autonomous agents operating in interactive and human-facing environments. In Belief-Desire-Intention (BDI) agent architectures, the explicit representation of beliefs, goals, and plans suggests that explanations can be grounded in an agent's actual execution. In this paper, we present AgentSpeaX, a conceptual framework for execution-level explainability in AgentSpeak(L) agents, along with a working prototype. AgentSpeaX defines a formal explanation model based on instantiated plan executions, ensuring that explanations are faithful to what the agent actually did rather than to abstract plan schemas. The framework supports contrastive explanations and explanation exchange through dedicated communicative acts. We instantiate the formalism for the Jason language and provide a Prolog-based explainer as a reference realization that generates explanations based on Jason agent execution. |
Day 2 — Tuesday, May 26
| 08:45–09:40 | Neural & Symbolic Agents |
| 08:45–08:55 | LLM-Symbolic Systems: Why and How ShortThere is considerable interest in using Large Language Models (LLMs) to control agents. However, the known weaknesses of LLMs raise concerns about whether their performance would be adequate. This paper provides empirical evidence that LLMs do not provide adequate performance, thus motivating the incorporation of symbolic components in an agentic system (the "why" part of the title). It also provides guidance on how to develop such systems. |
| 08:55–09:15 | VEsNA Goes Very Fast: Event-Based BDI Control in Formula One Racing Regular Demo SessionSymbolic BDI agents are often perceived as ill-suited for high-speed, continuous, and adversarial domains due to the computational demands of deliberation and the challenges of interfacing with low-level control. This paper presents an engineering study showing that unmodified Jason/AgentSpeak(L) agents, embodied through VEsNA in a Godot-based Formula One simulation, can achieve competitive multi-car racing behaviour at speed using an event-driven symbolic perception-action loop. Our focus is not end-to-end low-level vehicle control: the environment handles continuous vehicle dynamics, low-level actuation, and last-resort collision avoidance, while agents deliberate over compact add/remove percepts (e.g., curves and proximity-based interaction) to select tactical behaviours such as braking/acceleration, racing-line switching, overtaking, and defensive manoeuvres. We describe the architecture, the symbolic plan library, and an empirical evaluation based on telemetry logs, providing evidence that symbolic BDI tactical control can remain responsive in high-frequency continuous domains without modifications to the Jason/AgentSpeak(L) reasoning cycle. We further show that propensities can be integrated by annotating plans with temperaments (e.g., aggressive/calm) to bias plan selection and induce heterogeneous driving styles without modifying the BDI reasoning cycle. |
| 09:15–09:30 | An Extended BDI Case Study Toward Campus Mail Delivery Regular Demo SessionThis paper presents a complex robotic case study for using BDI to program a mobile robot to navigate a tunnel environment. The objective is to travel through the tunnel network of Carleton University and deliver mail between buildings. The proposed solution uses an iRobot CREATE robot programmed in Jason, supported by LiDAR sensing and a network of Bluetooth beacons for localization. Building on prior work that limited the robot to line-following behaviour in a simplified environment, this approach extended to full navigation of the university tunnels. Additionally, a realistic simulation environment has been presented to support testing and iterative development. This work demonstrates the feasibility of applying Jason to a real-world robotic task and provides insights into its performance and practical challenges. |
| 09:30–09:40 | Towards Operationalizing Accountability for Self-Improving Multi-Agent Systems Short Demo SessionAccountability mechanisms can help agents improve their behavior by learning from substandard outcomes. When a substandard situation arises, an accountee requests accountors to render accounts, evaluates them to derive remedies, and provides feedback that accountors use to update their procedural knowledge. We present ongoing work on designing such mechanisms for self-improving multi-agent systems. Our current focus is on agents equipped with skills created by developers, where the objective is to enable agents to improve these skills over time. Accountors render accounts in natural language, and we use LLMs to evaluate accounts, provide feedback, and update skills based on this feedback. We present a JaCaMo-based implementation for a home heating scenario where an agent wastes energy heating a room with a tilted window. Preliminary experiments with Claude Opus 4.5 show promising results: the agent learns preventive behaviors (checking and closing the window before heating) in 90% of cases and corrective behaviors (closing the window during heating) in 90% of cases. We extend our experiments to include a human accountor who intentionally left the window open for a bird to fly out. In this scenario, the agent learns to respect the human's intention (stopping or delaying the heating without closing the window) in 80% of prescriptive and 40% of corrective cases. The agent also learns to resume heating once the human closes the window in 40% of cases. These results show that using accountability for self-improvement transcends debugging, enabling collaborative behaviors that respect human intentions. |
| 09:40–10:10 | Neuro-Symbolic & Hybrid Agents and MAS I |
| 09:40–09:55 | The Secret Life of Traces: A MAS Engineering Perspective RegularIn this paper we explore the secret life of systems' execution traces: we propose a unified theoretical architecture based on our experience with traces in the runtime verification, explainability and learning domains. A proof of con- cept of how to exploit traces for the three problems above is presented, rooting in our recent work on VEsNA, an integration of the Jason interpreter for the AgentS- peak(L) language, the Godot game engine, and natural language interfaces |
| 09:55–10:10 | A Dual-System Neuro-Symbolic Framework with Accident Prediction for Autonomous Driving RegularMachine learning models demonstrate strong perception and pattern recognition capabilities in complex traffic scenarios, while symbolic decision-making mechanisms retain significant advantages in interpretability and rule constraints. This paper proposes MLAPM-MAS, a dual system Neuro-Symbolic accident prediction framework based on the ML-MAS architecture, designed to enable collaboration between machine learning accident prediction models and BDI reasoning. System 1 models dynamic vehicle interactions using a Temporal Graph Attention Network constructed from interpretable interaction features, enabling continuous estimation of collision risk in complex traffic scenarios. The resulting accident prediction assessments are mapped onto symbolic beliefs and decision constraints, which are then supplied to System 2, a BDI agent, to guide subsequent planning and execution. Experimental results on the CARLA simulation benchmark show that incorporating accident-prediction constraints improves safety-related metrics and reduces certain traffic violations. |
| 10:15–11:00 | Coffee Break |
| 11:00–12:25 | Neuro-Symbolic & Hybrid Agents and MAS II |
| 11:00–11:10 | Neuro-Symbolic Pump Scheduling for Safe and Cost-Efficient Water Distribution Networks ShortThe Pump Scheduling Problem is a highly challenging real-world control task in Water Distribution Networks (WDNs) that aims to minimise operational costs while meeting safety requirements (e.g., minimum and maximum allowable tank levels). Latest Deep Reinforcement Learning (DRL) techniques are effective for cost optimisation but can still violate safety constraints at deployment despite explicit safety considerations during training. Furthermore, evolving safety requirements (e.g., due to seasonal considerations) make retraining for minor safety specification changes disproportionately expensive. To address these challenges, we present a neuro-symbolic framework that pairs a pre-trained DRL agent with a symbolic Belief-Desire-Intention (BDI) agent for WDN safety supervision. Our implementation and preliminary empirical results demonstrate improved safety compliance over a DRL-only baseline while maintaining comparable cost performance. |
| 11:10–11:30 | A Hybrid Neuro-Symbolic BDI Multi-Agent Architecture for LLM-Based Unit Test Generation RegularAutomating unit test generation with Large Language Models (LLMs) has shown great potential, yet standard generative approaches often struggle with systematic path exploration and the effective integration of deterministic execution feedback. This paper introduces a hybrid neuro-symbolic multi-agent architecture, developed within the JaCaMo framework, where agents follow a Belief-Desire-Intention (BDI) deliberation cycle to perform goal-directed test generation, that re-frames unit test generation as a goal-directed, agentic search process. By combining the creative reasoning of LLMs with the formal precision of symbolic analysis, the architecture utilizes a society of role-specialized agents and artifacts to transform coverage gaps into symbolic "Logic Hints". This creates an iterative neuro-symbolic feedback loop capable of resolving complex branch conditions and navigating deep logic paths that typically stagnate one-shot prompting methods. Evaluation across seven benchmarks shows that our architecture achieves a 100% success rate in reaching the targeted coverage threshold, consistently outperforming all baselines. Statistical tests (Friedman, Wilcoxon) confirm superior reliability and search efficiency with large effect sizes (A12 > 0.8), proving that structured agentic autonomy effectively bridges the gap between LLM reasoning and formal software testing requirements. |
| 11:30–11:50 | AmI HMAS: Hybrid Agents with Individual and Collective Experience-Aware Code-Based Planning for Smart Environments Regular Demo SessionAd-hoc, goal-driven interactions in smart environments (homes, offices, hotels) have been a long lasting objective in Ambient Intelligence (AmI). Advances in Large Language Models (LLM) for reasoning and use of hypermedia environments for multi-agent systems are bringing this objective closer to achievement. We describe the functionality and implementation of AmI HMAS, a framework for agent-based, goal-driven, LLM-supported inter- actions with smart environments. AmI HMAS maps existing HomeAssistant deployments into semantically represented, navigable Hypermedia Environments, enabling discovery of real-world smart devices. The framework combines classic agency with LLM reasoning to perform environment exploration, request interpretation, community-based exchange of experience, and action planning. AmI HMAS leverages an engine that enables storage and reuse of past interaction experiences during reasoning, distinguishing between environment state requests, explicit commands and implicit / ambiguous requests. The planning approach is designed to produce BehaviorTree code-based procedural plans, that enable plan life cycle management and reuse. Plan components can be exchanged in a community of agents that manage different smart environments, leveraging the power of the community to improve solving requests. We evaluate the system quantitatively across two distinct setups (simulated homes in the HomeBench benchmark and cross-environment transfer in a smart research lab simulation), measuring planning success rates, signifier fast-path hit rates, LLM call reduction, and planning latency across different request types (explicit, ambiguous, single or multi-command, achievable or impossible) and experience reuse settings. |
| 11:50–12:05 | Ahoy: LLMs Enacting Multiagent Interaction Protocols Regular Demo SessionAn interaction protocol formalizes how the agents in a multiagent system interact, which facilitates implementing agents. Existing approaches yield agent implementations specific to the selected protocols. How can we engineer intelligent agents that can enact protocols but are programming-free? Our contribution, Ahoy, addresses this question by creating LLM agents that dynamically select and enact declarative protocols to achieve user goals. We demonstrate that an Ahoy agent can correctly and intelligently enact multiple protocols—concurrently if appropriate to the user goal—without specialized training. Ahoy's significance lies in that it brings together declarative protocols and LLMs, both approaches that promise improved knowledge engineering for agents. |
| 12:05–12:25 | Exploiting the MAOP Approach for Multi-Level Explainability of Multi-Agent Systems RegularExplainability is increasingly becoming an essential non functional requirement for supporting stakeholders to understand complex systems. In multi-agent systems (MAS), we have previously introduced a multi-level explainability framework to explain the behavior of individual agents. In that framework, explainability is investigated from a software engineering perspective and supports stakeholders playing different roles in the software development life cycle (i.e., developer, designer, and end-user). In this paper, we extend that view by moving from an individual agent to a multi-agent system perspective. In particular, we enhance the multi-level explainability framework for MAS by exploiting the benefits of the Multi-Agent Oriented Programming (MAOP) approach. In this view, additional first-class abstractions concerning organization, environment, and interactions introduce a clear separation of concerns in engineering the system and explaining the behavior of MAS. |
| 12:30–14:00 | Lunch Break |
| 14:00–15:20 | Cognitive Architectures for Language Agents I |
| 14:00–14:10 | The Role of Cognitive Architectures for Generative AI Agents: An Exploration Based on AutoGen and CoALA Student Demo SessionAdvances in generative AI have led to the emergence of several practical frameworks for developing agents based on generative models. On the one hand, these frameworks are effective enabling technologies, allowing for flexibly exploiting Large Language Models (LLMs). On the other hand, they typically do not provide any specific high-level architectural blueprint for designing agents, as those found instead in research contexts. To this purpose, in this paper we present a prototype framework called Cognitive AutoGen (CoAG), that enriches AutoGen - which is a well-known and widely adopted practical framework for developing LLM-based autonomous agents - with a cognitive layer, inspired by the CoALA (Cognitive Architectures for Language Agents) cognitive architecture proposal. Besides the framework, we describe the assessment framework that we used to compare regular AutoGen agents against the ones implemented with CoAG, along with a first case study and results. |
| 14:10–14:30 | S-ORA: Situated Reasoning and Asynchronous Tool Use for Language Agents Regular Demo SessionThe ability to use tools is an essential feature of language agents. In the current tool use paradigm, however, tools are local or remote procedures invoked synchronously with respect to the agent's reasoning process—blocking further progress until a result is returned. This becomes problematic for long-running operations or concurrent tool use. Furthermore, language agents typically operate tools based on minimal descriptions that lack procedural guidance and safety constraints. In this paper, we introduce a fundamentally different tool use paradigm inspired by the Agents \& Artifacts (A\&A) metamodel: we model tools as domain objects with their own lifecycle and state, and whose usage interface is inherently asynchronous. To support this paradigm, we introduce tool manuals—operational knowledge that complements existing tool descriptions with detailed functional specifications, procedural knowledge, and safety constraints. We also present S-ORA (Situate-Observe-Reason-Act), an architecture that operationalizes the CoALA framework and extends the ReAct cycle with two phases: Situate (learning from manuals and focusing attention on relevant tools) and Observe (perceiving environmental changes asynchronously). We demonstrate our approach through two scenarios: managing a simulated nuclear reactor with critical safety constraints and enabling tool-mediated coordination among agents. Results show that S-ORA agents equipped with tool manuals can successfully manage concurrent long-running operations, follow usage protocols and safety constraints, and coordinate through shared tools—capabilities not achievable with the current tool use paradigm. |
| 14:30–14:50 | Bridging Reflective and Semantic Memory for Lifelong Learning in LLM-Based Agents RegularLarge Language Model (LLM) agents increasingly rely on external memory to support long-term reasoning, adaptation, and generalization. While prior work has explored reflective memory for self-evaluation and semantic memory for storing abstract knowledge, the interaction between them remains underexplored. This article introduces a unified memory architecture in which reflection and semantic memory co-evolve to support lifelong learning. Reflection is used to evaluate and refine semantic memory through selective strengthening, stabilization, refinement, and forgetting, while semantic memory enriches the reflection process by grounding self-critique in accumulated knowledge. This bidirectional coupling enables more effective use of past experience and improves downstream decision-making. Experiments on the ARC Challenge dataset with the Phi-2 model show consistent improvements over baselines using isolated or loosely coupled memory mechanisms. Overall, our proposed approach advances the design of adaptive LLM-based agents capable of robust, long-horizon reasoning through tightly integrated, heterogeneous memory systems. |
| 14:50–15:05 | QUEST: A RAG-Based Planning Memory to Augment Task Solving of LLM-Based Cognitive Agents RegularAgents based on Large Language Models (LLMs) are increasingly used as autonomous entities across various domains, but they still face persistent planning challenges, such as hallucinations and the generation of infeasible plans when relying solely on LLMs' pre-trained data for domain-specific tasks. This study proposes QUEST, a novel framework that leverages Retrieval-Augmented Generation (RAG) to integrate memory-augmented techniques into agent planning. QUEST operates in two phases: an offline construction phase that indexes knowledge bases using LLM-generated questions and summarized descriptions, and an online phase that employs a structured two-step retrieval process via agentic tools. We evaluate QUEST in a fictional text-based proof-of-concept scenario through an ablation study. Using an LLM-as-a-Judge paradigm with a dual-metric evaluation, Rule-Level Compliance and Holistic Safety, we assess the generated plans against baseline and standard RAG setups. The results suggest that QUEST can improve aspects of plan generation in this setting, showing higher levels of constraint adherence and relatively stable evaluator behavior compared to the baselines. These findings provide preliminary evidence that integrating structured knowledge representations may support more reliable agent planning, highlighting directions for further investigation in more complex and diverse environments. |
| 15:05–15:20 | A Hybrid Role-Based Reference Architecture for LLM-Enhanced Multi-Agent Systems RegularLarge language models (LLMs) are transforming how we build multi-agent systems (MAS); yet, many LLM-centric frameworks still lack the engineering rigour that agent-oriented software engineer- ing (AOSE) provides, resulting in systems that are powerful but difficult to maintain and scale. In our previous work, we critically examined the "role" concept across definition, specification, and implementation, and proposed a preliminary hybrid role-based architecture where roles are treated as first-class run-time entities that support four different action implementation types. However, that earlier work remained at a concep- tual level: it identified the need for typed actions and runtime roles but did not provide a formal meta-model specifying how these constructs re- late to one another, nor did it offer a concrete realization or validation. Building on that foundation, this paper closes this gap by defining a role meta-model for LLM-enhanced agents that specifies the core role con- structs, their interfaces, constraints, and interaction relationships, with clear variation points for design-time and run-time implementation. We realize this meta-model as a framework-agnostic Java annotation set: any Java-based agent framework can adopt the annotations to expose roles, actions, and interaction points declaratively in code and validate them at run-time. We demonstrate the applicability of our approach by implementing a hotel reservation scenario in the SCOP framework, where each agent type is realized through dedicated role specifications and role implementations combining hybrid action types. Finally, we discuss practical design considerations—deliberation-execution separa- tion, action-type boundary decisions, and observability and debugging— offering guidance toward production-grade LLM-enhanced MAS. |
| 15:20–15:30 | Demo Session II — Lightning Talks
|
| 15:30–16:15 | Demo Session II + Afternoon Coffee — live demonstrations (same papers as above) |
| 16:15–16:35 | Cognitive Architectures for Language Agents II |
| 16:15–16:35 | ARARA: A LLM-Based Multi-Agent Development Framework for Conversational Recommender Systems RegularLarge Language Models (LLMs) have accelerated the development of Conversational Recommender Systems (CRSs), enabling flexible language understanding and tool-augmented reasoning. However, most LLM-based CRS implementations embed architectural decisions—such as role specialization, memory access, and coordination logic—implicitly within prompts or tightly coupled pipelines. This conflation of reasoning structure with model capacity limits transparency, hinders systematic analysis, and makes architectural extensions ad hoc. We introduce ARARA, a modular, provider-agnostic framework that models CRS as a multi-agent orchestration problem. ARARA defines explicit abstractions for agents, users, modules, orchestration strategies, memory, and reusable skills, establishing governed execution semantics in which conversational behavior emerges from structured role specialization and controlled coordination. To evaluate structural effects, we instantiate ARARA on RecBench+ conversational recommendation benchmarks in the Movie and Book domains, comparing single-LLM baselines with matched multi-agent configurations under identical memory settings. Results show that architectural decomposition and governed routing systematically improve robustness, accuracy, and user satisfaction as reasoning complexity increases, particularly under implicit inference and deceptive inputs. Our findings demonstrate that CRS effectiveness is not solely a function of model capacity, but critically shaped by architectural design. By elevating coordination, memory, and specialization to first-class architectural primitives, ARARA reframes CRS development as an inspectable and extensible engineering discipline grounded in structured multi-agent orchestration. |
| 16:45–17:45 | Panel Discussion |