Search — integration

47 results · 25 issues · 22 papers · 0 companies

Issues

25 matches

[BUG] add_joint_free() allows parent bodies other than the world. MuJoCo requires the parent body to be the world. This discrepancy needs to be managed with a warning.Friction
github:newton-physics/newton5/13/2026integration
Newton’s `add_joint_free()` allows parent bodies other than the world, but MuJoCo requires the parent to be the world. They want to manage the discrepancy with a warning to avoid confusing behavior differences.
newton-physicsmujocoapi-compatjointsmigrationwarnings
How do I run IsaacSim via python script with ros2 bridge enabledPain
nvidia-forum:simulation5/13/2026integration
User asks how to run Isaac Sim via a Python script with the ROS2 bridge enabled. This implies friction in documentation or APIs for programmatic launch/configuration of ROS2 integration.
isaac-simpythonros2-bridgeautomationheadlessintegration
Jetson AGX Orin fails to power on ISP1 at bootBlocker
nvidia-forum:robotics-edge-computing5/13/2026hardware-integration
A Jetson AGX Orin report indicates ISP1 fails to power on at boot. This prevents expected camera/ISP functionality from being available after startup.
jetsonagx-orinispbootcamera
Two cameras on the jetson orin nano are not workingBlocker
nvidia-forum:robotics-edge-computing5/13/2026hardware-integration
A user reports two cameras on Jetson Orin Nano are not working. No further diagnostic information is included.
jetsonorin-nanocamerasbringupsensors
Jetson AGX Orin Developer Kit completely dead / no power responseBlocker
nvidia-forum:robotics-edge-computing5/13/2026hardware-integration
A Jetson AGX Orin Developer Kit is reported completely dead with no power response. No additional context is provided.
jetsonagx-orinpowerbootdevkit
The USB microphone cannot record properly on the Jetson Thor platformPain
nvidia-forum:robotics-edge-computing5/13/2026hardware-integration
A USB microphone reportedly cannot record properly on the Jetson Thor platform. No additional details are provided in the post.
jetsonthorusb-audiomicrophonemultimodal
Some random PCIe signals between SBC and NVMePain
nvidia-forum:robotics-edge-computing5/13/2026hardware-integration
A user reports random PCIe signals between an SBC and an NVMe device. The post contains no additional diagnostic content.
pcienvmesignal-integrityjetsoncarrier-board
Nova Orin Init for Nova Carter RobotFriction
nvidia-forum:isaac-ros5/13/2026hardware-integration
A user asks about Nova Orin initialization for the Nova Carter Robot in the Isaac ROS forum. The post contains no additional details.
hardware
Nova Orin Init for Nova Carter RobotFriction
nvidia-forum:isaac5/13/2026hardware-integration
A user asks about Nova Orin initialization for the Nova Carter Robot in the Isaac forum. The post contains no additional details.
hardware
When using a GMSL camera, image acquisition may fail midwayPain
nvidia-forum:robotics-edge-computing5/13/2026hardware-integration
When using a GMSL camera, image acquisition may fail midway. No additional details are provided.
gmslcameraimage-capturejetsonstability
Jetson Orin NX Super 16GB not powering on after reverse polarity - D65/Q25 suspected (P3768-A04)Blocker
nvidia-forum:robotics-edge-computing5/13/2026hardware-integration
Jetson Orin NX Super 16GB reportedly does not power on after reverse polarity, with suspected component damage (D65/Q25). This blocks device operation.
jetsonorin-nxpowerhardware-damagebringup
CAN0 data bitrate issuePain
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
A user reports a CAN0 data bitrate issue. No additional details are provided in the post.
canbuscan0bitratejetsonrobotics-io
Robot-Fluid Coupled SimulationFriction
nvidia-forum:simulation5/12/2026other
A user asks about robot-fluid coupled simulation. No further detail is provided.
AGX T5000 eeprom报错Pain
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
A user reports an EEPROM error on AGX T5000. No additional context is provided.
eepromagxprovisioningjetsonhardware
New sensor capture image failed with jetson linux 36.3.0Blocker
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
A new sensor capture image reportedly fails with Jetson Linux 36.3.0. This indicates a capture pipeline problem on that release.
jetson-linux36.3.0cameraimage-captureregression
Jetson orin nano booting issueBlocker
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
A user reports a Jetson Orin Nano booting issue. No further details are included.
jetsonorin-nanobootstabilitybringup
Support hardware-coherent CPU/GPU array launchesPain
github:NVIDIA/warp5/12/2026hardware-integration
Warp plans work to support GPU kernel launches with hardware-coherent CPU memory, pinned CPU arrays, and peer GPU arrays when directly addressable. The issue emphasizes preserving clear diagnostics for invalid cases.
hardwaredocswarp
Orin Nano Super Dev Kit - MSS SDRAM init failure (err 0x48480112) - module previously functionalBlocker
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
Orin Nano Super Dev Kit shows an MSS SDRAM init failure (err 0x48480112) though the module was previously functional. This prevents the system from booting normally.
jetsonorin-nanosdrambootinit-failure
Orin NX 16GB – Request for DRAM Supplier Consistency (Hynix)Pain
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
A user requests DRAM supplier consistency (Hynix) for Orin NX 16GB. This indicates concerns about BOM variability impacting deployments.
jetsonorin-nxdramsupply-chainfleet
Carrier-Board PCIeFriction
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
A user asks about carrier-board PCIe. The post has no additional details.
carrier-boardpciejetsonhardware-designintegration
MGBE to ethernet RJ45Friction
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
A user asks about connecting MGBE to an ethernet RJ45. No further information is included.
mgbeethernetrj45jetsoncarrier-board
Nightly failure (2026-05-12): Warp nightly testsPain
github:newton-physics/newton5/12/2026other
Newton's scheduled nightly workflow failed specifically in the Warp nightly tests suite while other suites passed. The issue references the failing GitHub Actions runs and logs.
newtonwarp
How to display DDR memory info such as manufacturerFriction
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
User asks how to display DDR memory information such as manufacturer. This is a diagnostics/visibility request.
jetsonddrmemorydiagnosticsmanufacturing
Usb3.0 device recognize as usb2.0 devicePain
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
A USB 3.0 device is recognized as a USB 2.0 device. This reduces bandwidth and can break high-throughput peripherals.
jetsonusb3usb2link-speedperipherals
Jetson ORIN not coming out of recoveryBlocker
nvidia-forum:robotics-edge-computing5/12/2026hardware-integration
Jetson ORIN is reported as not coming out of recovery. This blocks normal boot and device provisioning.
jetsonorinrecovery-modeflashingboot

Papers

22 matches

SOCC-ICP: Semantics-Assisted Odometry based on Occupancy Grids and ICP
2605.150745/14/2026Johannes Scherer, Sebastian Hirt, Henri Meeß
Reliable pose estimation in previously unseen environments is a fundamental capability of autonomous systems. Existing LiDAR odometry methods typically employ point-, surfel-, or NDT-based map representations, which are distinct from the semantic occupancy grids commonly used for downstream tasks such as motion planning. We introduce SOCC-ICP, a semantics-assisted odometry framework that jointly performs Semantic OCCupancy grid mapping and LiDAR scan alignment. Each map voxel encodes geometric and semantic statistics, enabling adaptive point-to-point or point-to-plane ICP based on local planarity. Further, the occupancy grid naturally filters dynamic objects through raycasting-based free-space updates. Across diverse evaluation scenarios, SOCC-ICP achieves performance competitive with state-of-the-art LiDAR odometry and remains robust in geometrically degenerate environments, even in the absence of semantic cues. When semantic labels are available, integrating them into map construction, downsampling, and correspondence weighting yields further accuracy gains. By unifying odometry and semantic occupancy grid mapping within a single representation, SOCC-ICP eliminates redundant map structures and directly provides a map suitable for downstream robotic applications.
sensorsperceptionintegration
Chrono-Gymnasium: An Open-Source, Gymnasium-Compatible Distributed Simulation Framework
2605.149115/14/2026Bocheng Zou, Harry Zhang, Khailanii Slaton, Jingquan Wang …
High-fidelity physics simulation is essential for closing the sim-to-real gap in robotics and complex mechanical systems. However, the computational overhead of high-fidelity engines often limits their use in data-intensive tasks like Reinforcement Learning (RL) and global optimization. We introduce Chrono-Gymnasium, a distributed computing framework that scales the high-fidelity multi-body dynamics of Project Chrono across large-scale computing clusters. Built upon the Ray framework, Chrono-Gymnasium provides a standardized Gymnasium interface, enabling seamless integration with modern machine learning libraries while providing built-in synchronization and messaging primitives for distributed execution. We demonstrate the framework's capabilities through two distinct case studies: (1) the training of an RL agent for autonomous robotic navigation in complex terrains, and (2) the Bayesian Optimization of a planetary lander's design parameters to ensure landing stability. Our results show that Chrono-Gymnasium reduces wall-clock time for high-fidelity simulations without sacrificing physical accuracy, offering a scalable path for the design and control of complex robotic systems.
rlenv-apiintegration
Learning Direct Control Policies with Flow Matching for Autonomous Driving
2605.148325/14/2026Marcello Ceresini, Federico Pirazzoli, Andrea Bertogalli, Lorenzo Cipelli …
We present a flow-matching planner for autonomous driving that directly outputs actionable control trajectories defined by acceleration and curvature profiles. The model is conditioned on a bird's-eye-view (BEV) raster of the surrounding scene and generates control sequences in a small number of Ordinary Differential Equations (ODE) integration steps, enabling low-latency inference suitable for real-time closed-loop re-planning. We train exclusively on urban scenarios (real urban city streets, intersections and roundabouts of the city of Parma, Italy) collected from a 2D traffic simulator with reactive agents, and evaluate in closed-loop on both in-distribution and markedly out-of-distribution environments, including multi-lane highways and unseen urban scenarios. Our results show that the model generalizes reliably to these unseen conditions, maintaining stable closed-loop control and successfully completing scenarios that differ substantially from the training distribution. We attribute this to the BEV representation, which provides a geometry-centric view of the scene that is inherently less sensitive to distributional shifts, and to the flow-matching formulation, which learns a smooth vector field that degrades gracefully under distribution shift. We provide video demonstrations of closed-loop behavior at https://marcelloceresini.github.io/DirectControlFlowMatching.
integration
Exploring Bottlenecks in VLM-LLM Navigation: How 3D Scene Understanding Capability Impacts Zero-Shot VLN
2605.148015/14/2026Ziyi Xia, Chaoran Xiong, Litao Wei, Xinhao Hu …
Zero-shot vision-and-language navigation (VLN) has gained significant attention due to its minimal data collection costs and inherent generalization. This paradigm is typically driven by the integration of pre-trained Vision-Language Models (VLMs) and Large Language Models (LLMs), where VLMs construct 3D scene graphs while LLMs handle high-level reasoning and decision-making. However, a critical bottleneck exists in this system: current 3D perception models prioritize pixel-level accuracy, directly conflicting with the strict computational limits and real-time efficiency demanded by embodied navigation. To address this gap, this paper quantifies the actual impact of 3D scene understanding capability on VLN performance. Based on typical VLM-LLM frameworks, we propose statistical success rate (SR) upper bounds for two core subsystems: 1) the slow LLM planner, which relies on topological mapping semantics, and 2) the fast reactive navigator, which utilizes spatial coordinates and bounding boxes to execute LLM decisions. Evaluations using state-of-the-art 3D scene understanding models validate our proposed bounds and reveal a perception saturation phenomenon, indicating that improvements in perception accuracy beyond a certain threshold yield diminishing returns in navigation success. Our findings suggest that 3D scene understanding for VLN should pivot away from strict pixel-level precision, prioritizing instead navigation-relevant core vocabularies and accurate bounding box proportions.
perceptionintegration
SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization
2605.147045/14/2026Posheng Chen, Powen Cheng, Gueter Josmy Faure, Hung-Ting Su …
In real-world scenes, target objects may reside in regions that are not visible. While humans can often infer the locations of occluded objects from context and commonsense knowledge, this capability remains a major challenge for vision-language models (VLMs). To address this gap, we introduce SceneFunRI, a benchmark for Reasoning the Invisible. Based on the SceneFun3D dataset, SceneFunRI formulates the task as a 2D spatial reasoning problem via a semi-automatic pipeline and comprises 855 instances. It requires models to infer the locations of invisible functional objects from task instructions and commonsense reasoning. The strongest baseline model (Gemini 3 Flash) only achieves an CAcc@75 of 15.20, an mIoU of 0.74, and a Dist of 28.65. We group our prompting analysis into three categories: Strong Instruction Prompting, Reasoning-based Prompting, and Spatial Process of Elimination (SPoE). These findings indicate that invisible-region reasoning remains an unstable capability in current VLMs, motivating future work on models that more tightly integrate task intent, commonsense priors, spatial grounding, and uncertainty-aware search.
integration
SR-Platform: An Agentic Pipeline for Natural Language-Driven Robot Simulation Environment Synthesis
2605.147005/14/2026Ben Wei Lim, Minh Duc Le, Thang Truong, Thanh Nguyen Canh
Generating robot simulation environments remains a major bottleneck in simulation-based robot learning. Constructing a training-ready MuJoCo scene typically requires expertise in 3D asset modeling, MJCF specification, spatial layout, collision avoidance, and robot-model integration. We present SR-Platform, a production-deployed agentic system that converts free-form natural language descriptions into executable, physically valid MuJoCo environments. SR-Platform decomposes scene synthesis into four stages: an LLM-based orchestrator that converts user intent into a structured scene plan; an asset forge that retrieves cached assets or generates new 3D geometry through LLM-to-CadQuery synthesis; a layout architect that assigns object poses and verifies industrial constraints; and a bridge layer that assembles the final MJCF scene and merges the selected robot model. The system is deployed as a nine-service Docker stack with WebSocket progress streaming, MinIO-backed mesh storage, Qdrant-based semantic asset retrieval, Redis job state, and InfluxDB telemetry. Using 30 days of production telemetry covering 611 successful LLM calls, SR-Platform generates five-object scenes with a median end-to-end latency of approximately 50 s, while cache-accelerated scenes complete in approximately 30-40 s. The asset forge shows an 11.3% first-attempt retry rate with automatic recovery, and cached asset retrieval removes per-object LLM calls for previously generated object types. These results show that agentic scene synthesis can reduce the manual effort required to create diverse robot training environments, enabling users to produce executable MuJoCo scenes from plain English prompts in under one minute.
crashusddeploymentintegrationmujoco
DiffPhD: A Unified Differentiable Solver for Projective Heterogeneous Materials in Elastodynamics with Contact-Rich GPU-Acceleration
2605.145265/14/2026Shih-Yu Lai, Sung-Han Tien, Jui-I Huang, Yen-Chen Tseng …
Differentiable simulation of soft bodies is a foundation for system identification, trajectory optimization, and Real2Sim transfer. Yet, existing methods such as the differentiable Projective Dynamics (DiffPD) struggle when faced with heterogeneous materials with extreme stiffness contrasts, hyperelasticity under large deformations, and contact-rich interactions, which are common scenarios in the real world. We present DiffPhD, a unified GPU-accelerated differentiable Projective Dynamics framework for heterogeneous materials that tackles these intertwined challenges simultaneously. Our key insight is a careful integration of: (i) stiffness-aware projective weights to embed heterogeneity into the global system; (ii) trust-region eigenvalue filtering lifted to the backward pass for stable hyperelastic gradients and a type-II Anderson Acceleration scheme with dual-gate convergence to stabilize forward iteration under large stiffness contrasts; and (iii) a unified GPU pipeline that reuses a single sparse factor across forward, backward, and contact computations, with stiffness-amplified Rayleigh damping folded into the same factor for heterogeneity-aware dissipation at zero recurring cost. DiffPhD achieves strict gradient accuracy while delivering up to an order-of-magnitude speedup over prior differentiable solvers on heterogeneous, hyperelastic, contact-rich benchmarks. Crucially, this speedup does not come at the cost of stability: DiffPhD remains convergent on stiffness contrasts up to 100x where prior PD solvers degrade. This unlocks end-to-end gradient-based optimization on regimes previously bottlenecked by either solver fragility or per-iteration cost -- shell--joint composite creatures, soft characters wielding stiff weapons, and soft-gripper robotic manipulation -- all handled within a single forward--backward pass.
renderingmanipulationintegration
Energy-Efficient Quadruped Locomotion with Compliant Feet
2605.144115/14/2026Pramod Pal, Shishir Kolathaya, Ashitava Ghosal
Quadruped robots are often designed with rigid feet to simplify control and maintain stable contact during locomotion. While this approach is straightforward, it limits the ability of the legs to absorb impact forces and reuse stored elastic energy, leading to higher energy expenditure during locomotion. To explore whether compliant feet can provide an advantage, we integrate foot compliance into a reinforcement learning (RL) locomotion controller and study its effect on walking efficiency. In simulation, we train eight policies corresponding to eight different spring stiffness values and then cross-evaluate their performance by measuring mechanical energy consumed per meter traveled. In experiments done on a developed quadruped, the energy consumption for the intermediate stiffness spring is lower by ~ 17% when compared to a very stiff or a very flexible spring incorporated in the feet, with similar trends appearing in the simulation results. These results indicate that selecting an appropriate foot compliance can improve locomotion efficiency without destabilizing the robot during motion.
rllocomotionintegration
Integration of an Agent Model into an Open Simulation Architecture for Scenario-Based Testing of Automated Vehicles
2605.135395/13/2026Christian Geller, Daniel Becker, Jobst Beckmann, Lutz Eckstein
Simulative and scenario-based testing are crucial methods in the safety assurance for automated driving systems. To ensure that simulation results are reliable, the real world must be modeled with sufficient fidelity, including not only the static environment but also the surrounding traffic of a vehicle under test. Thus, the availability of traffic agent models is of common interest to model naturalistic and parameterizable behavior, similar to human drivers. The interchangeability of agent models across different simulation environments represents a major challenge and necessitates harmonization and standardization. To address this challenge, we present a standardized and modular simulation integration architecture that enables the tool-independent integration of traffic agent models. The architecture builds upon the Open Simulation Interface (OSI) as a structured message format and the Functional Mock-up Interface (FMI) for dynamic model exchange. Rather than introducing yet another model or simulation tool, we provide a reusable reference implementation that translates these standards into a practical integration blueprint, including clear interfaces, data mappings, and execution semantics. The generic nature of the architecture is demonstrated by integrating an exemplary agent model into three widely used simulation environments: OpenPASS, CARLA, and CarMaker. As part of the evaluation, we show that the model yields consistent behavior in all simulation platforms, thereby validating the interoperability, modularity, and standard compliance of the proposed architecture. The reference implementation lowers integration barriers, serves as a foundation for future research, and is made publicly available at github.com/ika-rwth-aachen/agent-model-integration
integration
Asymptotically Optimal Ergodic Coverage on Generalized Motion Fields
2605.134425/13/2026Christian Hughes, Yilang Liu, Yanis Lahrach, Julia Engdahl …
Autonomous robotic exploration in remote and extreme environments allows scientists to model complex transport phenomena and collective behaviors described by continuously deforming flow fields. Although these environments are naturally modeled as time-varying domains, most adaptive exploration methods assume static environments and fail to provide adequate coverage or satisfy any formal guarantees. This is especially the case in oceanography where autonomous underwater systems (UxS) have highly restrictive compute and payload requirements that necessitate path planning methods that yield robust data collection strategies in open-loop and underactuated settings. In this work, to address the aforementioned issues, we propose to formulate adaptive search as an ergodic coverage problem and investigate certifying coverage in the ergodic sense over evolving domains with flow-induced dynamics. We expand upon recent work demonstrating maximum mean discrepancy (MMD) as a functional ergodic metric, and derive a flow-adaptive formulation that explicitly accounts for domain evolution within the coverage objective. We show that this approach preserves ergodic coverage guarantees in ambient flows and enables effective exploration in under-actuated, and even open-loop planning settings by integrating environment dynamics. Experiments validate that our method generalizes to diverse spatiotemporal processes including ocean exploration, and tracking human and cattle movement. Physical experiments on aerial and legged robotic platforms validate our ability to obtain ergodic coverage in non-convex, flow-restricted environments while respecting robot dynamics.
integration
Galilean State Estimation for Inertial Navigation Systems with Unknown Time Delay
2605.132665/13/2026Giulio Delama, Martin Scheiber, Yixiao Ge, Tarek Hamel …
Many Inertial Navigation Systems (INS) use Global Navigation Satellite System (GNSS) position as the primary measurement to drive filter performance and bound error growth. However, commercial-grade GNSS receivers introduce unknown measurement delays ranging from 50 ms to 300 ms depending on sensor quality and operating mode. Such time delays can significantly degrade INS performance unless they are explicitly compensated for. Existing algorithms commonly estimate this delay offline, run the filter concurrently with GNSS measurements using buffered Inertial Measurement Unit (IMU) data, and predict the current state by forward-integrating buffered inertial measurements via IMU preintegration. The state-of-the-art online method is an Extended Kalman Filter (EKF) that explicitly models the time delay as a state parameter, which defines the preintegration duration. This paper introduces a novel geometric framework for modeling time-delayed INS, in which Galilean symmetry is leveraged to provide a joint representation of space and time for consistent state estimation. An Equivariant Filter (EqF) is derived for the coupled estimation of navigation states and time delay. Validation is performed on two fixed-wing Uncrewed Aerial Vehicles (UAV) with GNSS time lags of 90 ms and 120 ms. The test flights last two to three minutes. Simulations further investigate delays up to 500 ms and provide a statistical comparison against the state-of-the-art EKF. Results show that the EqF preserves accuracy and consistency, while the EKF lacks consistency and its performance degrades significantly with increasing measurement delays.
sensorsintegration
SECOND-Grasp: Semantic Contact-guided Dexterous Grasping
2605.131175/13/2026Han Yi Shin, Heeju Ko, Jaewon Mun, Qixing Huang …
Achieving reliable robotic manipulation, such as dexterous grasping, requires a synergy between physically stable interactions and semantic task guidance, yet these objectives are often treated as separate, disjoint goals. In this paper, we investigate how to integrate dexterous grasping techniques, i.e., physically stable grasps for object lifting and language-guided grasp generation, to achieve both physical stability and semantic understanding. To this end, we propose SECOND-Grasp (SEmantic CONtact-guided Dexterous Grasping), a unified framework that enables robotic hands to dynamically adjust grasping strategies based on semantic reasoning while ensuring physical feasibility. We begin by obtaining coarse contact proposals through vision-language reasoning to infer where contacts should occur based on object properties, followed by segmentation to localize these regions across views. To further ensure consistency across multiple viewpoints, we introduce Semantic-Geometric Consistency Refinement (SGCR), which refines initial contact predictions by enforcing semantic consistency across views and removing geometrically invalid regions, yielding reliable 3D contact maps. Then, we derive a feasible hand pose for each contact map via inverse kinematics, generating a supervision signal for policy learning. Our approach, trained on DexGraspNet, consistently outperforms baselines in lifting success rate on both seen and unseen categories, achieving 98.2% and 97.7%, respectively, while also improving intent-aware grasping by 12.8% and 26.2%. We further show promising results on additional datasets and robotic hands, including Shadow Hand and Allegro Hand.
rlmanipulationperceptionintegration
TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning
2605.122365/12/2026Matthew M. Hong, Jesse Zhang, Anusha Nagabandi, Abhishek Gupta
Fine-tuning pre-trained robot policies with reinforcement learning (RL) often inherits the bottlenecks introduced by pre-training with behavioral cloning (BC), which produces narrow action distributions that lack the coverage necessary for downstream exploration. We present a unified framework that enables the exploration necessary to enable efficient robot policy finetuning by bridging BC pre-training and RL fine-tuning. Our pre-training method, Context-Smoothed Pre-training (CSP), injects forward-diffusion noise into policy inputs, creating a continuum between precise imitation and broad action coverage. We then fine-tune pre-trained policies via Timestep-Modulated Reinforcement Learning (TMRL), which trains the agent to dynamically adjust this conditioning during fine-tuning by modulating the diffusion timestep, granting explicit control over exploration. Integrating seamlessly with arbitrary policy inputs, e.g., states, 3D point clouds, or image-based VLA policies, we show that TMRL improves RL fine-tuning sample efficiency. Notably, TMRL enables successful real-world fine-tuning on complex manipulation tasks in under one hour. Videos and code available at https://weirdlabuw.github.io/tmrl/.
rlmanipulationintegrationvla
World Action Models: The Next Frontier in Embodied AI
2605.120905/12/2026Siyin Wang, Junhao Shi, Zhaoyang Fu, Xinzhe He …
Vision-Language-Action (VLA) models have achieved strong semantic generalization for embodied policy learning, yet they learn reactive observation-to-action mappings without explicitly modeling how the physical world evolves under intervention. A growing body of work addresses this limitation by integrating world models, predictive models of environment dynamics, into the action generation pipeline. We term this emerging paradigm World Action Models (WAMs): embodied foundation models that unify predictive state modeling with action generation, targeting a joint distribution over future states and actions rather than actions alone. However, the literature remains fragmented across architectures, learning objectives, and application scenarios, lacking a unified conceptual framework. We formally define WAMs and disambiguate them from related concepts, and trace the foundations and early integration of VLA and world model research that gave rise to this paradigm. We organize existing methods into a structured taxonomy of Cascaded and Joint WAMs, with further subdivision by generation modality, conditioning mechanism, and action decoding strategy. We systematically analyze the data ecosystem fueling WAMs development, spanning robot teleoperation, portable human demonstrations, simulation, and internet-scale egocentric video, and synthesize emerging evaluation protocols organized around visual fidelity, physical commonsense, and action plausibility. Overall, this survey provides the first systematic account of the WAMs landscape, clarifies key architectural paradigms and their trade-offs, and identifies open challenges and future opportunities for this rapidly evolving field.
synthetic-datarlintegrationvlaworld-model
RoboBlockly Studio: Conversational Block Programming with Embodied Robot Feedback for Computational Thinking
2605.120595/12/2026Leyi Li, Chenyu Du, Jiafei Sun, Erick Purwanto …
Computational thinking (CT) is increasingly promoted as a core literacy, yet learners and teachers face challenges in connecting abstract program logic to meaningful outcomes. We design and evaluate RoboBlockly Studio, an integrated interactive system that combines block-based programming, a conversational AI teaching agent, and embodied robot execution. RoboBlockly Studio creates a tight iterative loop of authoring, running, observing, and revising. Informed by interviews with five programming teachers, the system was designed to support four goals: (1) preserving learner agency in computational thinking, (2) making program behavior transparent and interpretable, (3) grounding programming in embodied, classroom-aligned tasks, and (4) scaffolding reflection through pedagogically grounded AI dialogue. We deployed RoboBlockly Studio with 32 high school students, observing how robot and AI feedback influenced students' interactions with code, reflections on problem-solving strategies, and understanding of CT concepts. We discuss design insights and implications for creating interactive, embodied learning environments that integrate AI and robotics to support CT learning in computing education.
integration
Rainbow Deep Q-Learning with Kinematics-Aware Design for Cooperative Delta and 3-RRS Parallel Robot Insertion
2605.116975/12/2026Hassen Nigatu, Gaokun Shi, Jituo Li, Wang Jin …
This paper presents a kinematics-aware deep reinforcement learning framework based on Rainbow Deep Q-Networks (DQN) for cooperative peg-in-hole manipulation by a Delta parallel robot and a 3-RRS (Revolute--Revolute--Spherical) parallel manipulator. A key contribution is the integration of a geometric design-optimization stage that precedes learning: the 3-RRS geometry is tuned to maximize the singularity-free workspace and improve conditioning, which in turn enlarges the safe region in which the reinforcement learning policy can explore. Together the two manipulators expose a 6~degree-of-freedom (DoF) controllable subspace (three Delta translations, two 3-RRS rotations, and one 3-RRS vertical translation); the peg-in-hole task is invariant to rotation about the peg axis, so the task-relevant manifold is five dimensional. The cooperative insertion problem is cast as a Markov Decision Process with a 12-dimensional state vector and a discrete action set containing $6 \times 2 = 12$ incremental commands (one positive and one negative per controlled DoF). A shaped reward combines dense proximity guidance, penalties for kinematic and workspace violations, and sparse bonuses for successful insertions. The Rainbow DQN -- integrating double Q-learning, dueling architecture, prioritized replay, multi-step returns, noisy linear layers for exploration, and a distributional value head -- is trained with a two-stage curriculum. The co-designed framework is validated in a high-fidelity kinematic simulator, where it achieves stable policy convergence, reliable insertions, and reduced constraint violations compared against a vanilla DQN agent and a classical sampling-based planner.
rlmanipulationintegration
PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
2605.109255/11/2026Xinyu Guo, Bin Xie, Wei Chai, Xianchi Deng …
Large-scale pretraining has made Vision-Language-Action (VLA) models promising foundations for generalist robot manipulation, yet adapting them to downstream tasks remains necessary. However, the common practice of full fine-tuning treats pretraining as initialization and can shift broad priors toward narrow training-distribution patterns. We propose PriorVLA, a novel framework that preserves pretrained priors and learns to leverage them for effective adaptation. PriorVLA keeps a frozen Prior Expert as a read-only prior source and trains an Adaptation Expert for downstream specialization. Expert Queries capture scene priors from the pretrained VLM and motor priors from the Prior Expert, integrating both into the Adaptation Expert to guide adaptation. Together, PriorVLA updates only 25% of the parameters updated by full fine-tuning. Across RoboTwin 2.0, LIBERO, and real-world tasks, PriorVLA achieves stronger overall performance than full fine-tuning and state-of-the-art VLA baselines, with the largest gains under out-of-distribution (OOD) and few-shot settings. PriorVLA improves over pi0.5 by 11 points on RoboTwin 2.0-Hard and achieves 99.1% average success on LIBERO. Across eight real-world tasks and two embodiments, PriorVLA reaches 81% in-distribution (ID) and 57% OOD success with standard data. With only 10 demonstrations per task, PriorVLA reaches 48% ID and 32% OOD success, surpassing pi0.5 by 24 and 22 points, respectively.
manipulationintegrationvla
xApp Empowered Resource Management for Non-Terrestrial Users in 5G O-RAN Networks
2605.107045/11/2026Mohammed M. H. Qazzaz, Syed Ali Zaidi, Aubida A. Al-Hameed, Abdelaziz Salama …
This paper introduces a proactive Unmanned Aerial Vehicle (UAV) mobility management xApp for Open Radio Access Network (O-RAN) Near Real-Time Radio Intelligent Controller (Near-RT RIC) environments, employing Double Deep Q-Network (DDQN) reinforcement learning (RL) enhanced with transfer learning to optimise handover decisions for UAVs operating along predetermined flight trajectories. Unlike reactive approaches that respond to signal degradation, the proposed framework anticipates network conditions and minimises both outage probability and handover frequency through predictive optimisation. The system leverages centralised weight averaging to consolidate knowledge from multiple flight scenarios into a global model capable of generalising to previously unseen operational environments without extensive retraining. A comprehensive evaluation demonstrates that the proposed framework achieves a favourable trade-off between handover frequency and connectivity reliability, reducing handover events by up to 54.6% compared to greedy approaches while maintaining outage probability at practically negligible levels. The results validate the effectiveness of intelligent learning-based approaches for UAV mobility management in next-generation O-RAN architectures, thereby contributing to seamless integration of aerial user equipment into cellular networks.
rlintegration
DeepSight: Long-Horizon World Modeling via Latent States Prediction for End-to-End Autonomous Driving
2605.105645/11/2026Lingjun Zhang, Changjie Wu, Linzhe Shi, Jiangyang Li …
End-to-end autonomous driving systems are increasingly integrating Vision-Language Model (VLM) architectures, incorporating text reasoning or visual reasoning to enhance the robustness and accuracy of driving decisions. However, the reasoning mechanisms employed in most methods are direct adaptations from general domains, lacking in-depth exploration tailored to autonomous driving scenarios, particularly within visual reasoning modules. In this paper, we propose a driving world model that performs parallel prediction of latent semantic features for consecutive future frames in the bird's-eye-view (BEV) space, thereby enabling long-horizon modeling of future world states. We also introduce an efficient and adaptive text reasoning mechanism that utilizes additional social knowledge and reasoning capabilities to further improve driving performance in challenging long-tail scenarios. We present a novel, efficient, and effective approach that achieves state-of-the-art (SOTA) results on the closed-loop Bench2drive benchmark. Codes are available at: https://github.com/hotdogcheesewhite/DeepSight.
synthetic-dataintegrationworld-model
HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions
2605.102015/11/2026Zhenhao Shen, Zeming Yang, Yue Chen, Yuran Wang …
Generalizable manipulation involving cross-type object interactions is a critical yet challenging capability in robotics. To reliably accomplish such tasks, robots must address two fundamental challenges: ``where to manipulate'' (contact point localization) and ``how to manipulate'' (subsequent interaction trajectory planning). Existing foundation-model-based approaches often adopt end-to-end learning that obscures the distinction between these stages, exacerbating error accumulation in long-horizon tasks. Furthermore, they typically rely on a single uniform model, which fails to capture the diverse, category-specific features required for heterogeneous objects. To overcome these limitations, we propose HeteroGenManip, a task-conditioned, two-stage framework designed to decouple initial grasp from complex interaction execution. First, Foundation-Correspondence-Guided Grasp module leverages structural priors to align the initial contact state, thereby significantly reducing the pose uncertainty of grasping. Subsequently, Multi-Foundation-Model Diffusion Policy (MFMDP) routes objects to category-specialized foundation models, integrating fine-grained geometric information with highly-variable part features via a dual-stream cross-attention mechanism. Experimental evaluations demonstrate that HeteroGenManip achieves robust intra-category shape and pose generalization. The framework achieves an average 31\% performance improvement in simulation tasks with broad type setting, alongside a 36.7\% gain across four real-world tasks with different interaction types.
rlmanipulationintegration
Monocular Biomechanical Tracking of Fingers with Inverse Kinematics to Foundation Models
2605.092585/9/2026R. James Cotton, Pouyan Firouzabadi, Wendy Murray
Accurate hand and finger tracking from video has significant clinical applications for monitoring activities of daily living and measuring range of motion, yet monocular video approaches for obtaining hand biomechanics remain under-developed. We present a method that combines the SAM 3D Body foundation model with inverse kinematics optimization in a full-body biomechanical model to extract anatomically-constrained finger joint angles from single-view video. We port SAM 3D Body from PyTorch to JAX for integration with MuJoCo-MJX, enabling GPU-accelerated optimization, and develop a novel mapping between the Momentum Human Rig (MHR) outputs and biomechanical model markers. Validation against 8-camera multiview reconstruction on 4,590 frames from 7 participants performing a variety of hand poses and object manipulation tasks shows finger joint angle errors of approximately 10 degrees and hand position errors of approximately 6 mm, after Procrustes alignment. Results were consistent across camera viewpoints and robust to different methods for producing reference values from multiview video. This work extends monocular biomechanical analysis to detailed finger tracking, expanding access to quantitative characterization of hand movement from readily available video.
manipulationsensorsintegrationmujocofoundation-model
Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views
2605.075505/8/2026Grzegorz Wilczynski, Mikołaj Zielinski, Bartosz Świrta, Dominik Belter …
3D vision systems are fundamentally constrained by their reliance on visual overlap: reconstruction methods require it for geometric alignment, while generative models use it to enforce multi-view consistency. This limitation is particularly acute in real-world scenarios such as distributed swarm robotics or crowd-sourced data collection, where capturing overlapping perspectives, both in terms of spatial and appearance overlap, is often impossible. We introduce Generative Reconstruction from Disjoint Views as a new paradigm, establish a comprehensive dataset, and propose specialized evaluation metrics for zero-overlap scenarios. Our benchmarking demonstrates that existing state-of-the-art methods fail catastrophically on this task, producing disconnected geometries or semantically incoherent reconstructions. To address these limitations, we propose GLADOS, a general, modular framework that operates through three stages: (1) Generative Bridging, where foundation models synthesize intermediate perspectives to connect disjoint inputs; (2) Robust Coarse 3D Reconstruction, that establish coarse geometric scaffold via global alignment which absorbs local contradictions from generative process; and (3) Iterative Context Expansion and Consistency Optimization to fill missing regions and unify the reconstruction. As an architectureagnostic framework, GLADOS enables seamless integration of future advances in generation, reconstruction, and inpainting. The source code is available at: https://github.com/gwilczynski95/GLADOS.
multi-agentintegration

Tip: search matches titles, descriptions, tags, categories, authors and source IDs. Back to overview