Search — other
Issues
25 matches- nvidia-forum:isaac5/15/2026docs-onboarding
Same install failure is reposted in another NVIDIA forum category: `ros-jazzy-isaac-ros-foundationpose` cannot be installed in the Isaac ROS environment. This blocks use of FoundationPose.
isaacisaac-rosros2-jazzyfoundationposeinstallationpackaging - Isaac Newton on LaputaFrictionhn:isaac lab5/14/2026other
A Hacker News item titled “Isaac Newton on Laputa” appears under an Isaac Lab feed but provides no actionable product feedback in the content captured.
isaac-labcommunitysignal-noisehn - PEAKFrictiongithub:NVIDIA/warp5/14/2026other
A GitHub issue titled “PEAK” in NVIDIA/warp contains unrelated, non-technical text and no actionable request or bug details. This creates triage noise.
warpgithub-issuestriagesignal-noise - github:newton-physics/newton5/13/2026integration
Newton’s `add_joint_free()` allows parent bodies other than the world, but MuJoCo requires the parent to be the world. They want to manage the discrepancy with a warning to avoid confusing behavior differences.
newton-physicsmujocoapi-compatjointsmigrationwarnings - github:newton-physics/newton5/13/2026training-infra
Newton wants reusable automation to run a release-candidate validation matrix (OS, Python, CUDA/driver, etc.) for sign-off. Today this is tracked and run manually; they want a consistent launch and reporting workflow for RC branches/tags/commits.
newton-physicsrelease-engineeringrc-testingtest-matrixcudadriversautomation - github:newton-physics/newton5/13/2026rendering
The cloth_franka example simulates in centimeters but only partially converts data back to meters for visualization. Debug overlays like COM markers and joint/contact arrows appear meters away due to a cm→m mismatch.
renderingnewton - github:newton-physics/newton5/13/2026crashes-stability
A dexterous hand imported via URDF fails to grasp and lift a bottle; the object slides and remains unliftable. The same bottle can be lifted using a Franka example, suggesting contact/friction or grasp modeling differences for the hand.
crashusdrenderingmanipulationisaac-labnewton - github:newton-physics/newton5/13/2026crashes-stability
A dexterous hand imported via URDF cannot grasp a bottle reliably; the bottle slides and cannot be lifted. The reporter notes the Franka example can lift the same object, implying a hand-specific contact/friction issue.
crashusdrenderinghardwaremanipulationisaac-labnewtonwarp - Robot-Fluid Coupled SimulationFrictionnvidia-forum:simulation5/12/2026other
A user asks about robot-fluid coupled simulation. No further detail is provided.
- github:newton-physics/newton5/12/2026other
Newton’s macOS runner is getting progressively slower, likely tied to cache_kernel behavior during CPU compilation. A temporary mitigation moves the macOS job to post-merge but is intended to be reverted once fixed.
- github:isaac-sim/IsaacLab5/12/2026other
Velocity-only write paths on Isaac Lab Articulation do not invalidate cached derived body state buffers. Downstream code may read stale body velocity/state after writing velocities to sim.
isaac-sim - nvidia-forum:simulation5/12/2026other
MultiMeshRayCaster outputs (pos_w, ray_hits_w) reportedly fail to update after dynamic teleportation in _apply_action. This suggests stale raycast data after state changes.
- github:newton-physics/newton5/12/2026other
Adding a D6 joint with 1 angular DOF and 1 linear DOF to SolverMuJoCo can produce a repeated joint name error because the name-amending logic doesn't change names in this case. This causes model build failure due to name collisions.
newton - github:newton-physics/newton5/12/2026other
Newton's scheduled nightly workflow failed specifically in the Warp nightly tests suite while other suites passed. The issue references the failing GitHub Actions runs and logs.
newtonwarp - nvidia-forum:simulation5/12/2026other
Calling ag.get_character with a character skeleton root path returns None. This blocks workflows that rely on obtaining a character_graph handle.
- github:isaac-sim/IsaacLab5/11/2026rendering
In a CloudXR + OpenXR setup, frames stream correctly but inbound messages and hand-tracking poses are silently dropped between client and Isaac Sim’s OpenXR plugin. This blocks teleop commands and hand tracking for interactive workflows.
renderinghardwaredeploymentintegrationisaac-simisaac-lab - github:isaac-sim/IsaacLab5/11/2026crashes-stability
In Isaac Lab v3.0.0-beta, lift_cube_sm.py ignores the --viz kit option and no Kit/Isaac Sim window opens despite the process running. A one-line change to AppLauncher initialization appears to fix it locally.
crashrenderinghardwaredocsisaac-simisaac-lab - nvidia-forum:simulation5/11/2026other
Character animation playback in Isaac Sim does not work as expected. This indicates issues in animation playback or sequencing.
isaac-sim - nvidia-forum:simulation5/11/2026other
Animations are not applied correctly to a character. This suggests a rig/retarget/apply pipeline problem in Isaac Sim.
- github:newton-physics/newton5/10/2026other
Mouse dragging in cable examples introduces overly large forces, possibly from large torques applied to segments. This makes interactive manipulation unreliable.
- github:newton-physics/newton5/10/2026other
Request to assign a uniform color to cable segments by default because current visualization is overly colorful. This is a usability/visual quality improvement.
- github:newton-physics/newton5/10/2026other
Request to add a benchmark to detect CPU performance regressions, motivated by a prior issue. This would improve detection of slowdowns in the CPU code path.
- github:NVIDIA/warp5/10/2026docs-onboarding
Warp docs expose dtype-specific HashGrid query types alongside the intended wp.HashGridQuery, which users shouldn't need to handle. The request is to consolidate documentation/stubs around the single public type while keeping internal codegen behavior.
docsfeature-requestwarp - nvidia-forum:simulation5/9/2026otherisaac-sim
- github:isaac-sim/IsaacLab5/9/2026training-infrarlrenderinghardwaredocsintegrationisaac-simisaac-lab
Papers
13 matches- Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model2605.149505/14/2026Tao Lin, Yuxin Du, Jiting Liu, Nuobei Zhu …
Vision-Language-Action models have emerged as a promising paradigm for robotic manipulation by unifying perception, language grounding, and action generation. However, they often struggle in scenarios requiring precise spatial understanding, as current VLA models primarily rely on 2D visual representations that lack depth information and detailed spatial relationships. While recent approaches incorporate explicit 3D inputs such as depth maps or point clouds to address this issue, they often increase system complexity, require additional sensors, and remain vulnerable to sensing noise and reconstruction errors. Another line of work explores implicit 3D-aware spatial modeling directly from RGB observations without extra sensors, but it often relies on large geometry foundation models, resulting in higher training and deployment costs. To address these challenges, we propose Evo-Depth, a lightweight depth-enhanced VLA framework that enhances spatially grounded manipulation without relying on additional sensing hardware or compromising deployment efficiency. Evo-Depth employs a lightweight Implicit Depth Encoding Module to extract compact depth features from multi-view RGB images. These features are incorporated into vision-language representations through a Spatial Enhancement Module via depth-aware modulation, enabling efficient spatial-semantic enhancement. A Progressive Alignment Training strategy is further introduced to align the resulting depth-enhanced representations with downstream action learning. With only 0.9B parameters, Evo-Depth achieves superior performance across four simulation benchmarks. In real-world experiments, Evo-Depth attains the highest average success rate while also exhibiting the smallest model size, lowest GPU memory usage, and highest inference frequency among compared methods.
deploymentmanipulationperceptionvla - Let Robots Feel Your Touch: Visuo-Tactile Cortical Alignment for Embodied Mirror Resonance2605.145715/14/2026Tianfang Zhu, Ning An, Rui Wang, Jiasi Gao …
Observing touch on another's body can elicit corresponding tactile sensations in the observer, a phenomenon termed mirror touch that supports empathy and social perception. This visuo-tactile resonance is thought to rely on structural correspondence between visual and somatosensory cortices, yet robotic systems lack computational frameworks that instantiate this principle. Here we demonstrate that cortical correspondence can be operationalized to endow robots with mirror touch. We introduce Mirror Touch Net, which imposes semantic, distributional and geometric alignment between visual and tactile representations through multi-level constraints, enabling prediction of millimetre-scale tactile signals across 1,140 taxels on a robotic hand from RGB images. Manifold analysis reveals that these constraints reshape visual representations into geometry consistent with the tactile manifold, reducing the complexity of cross-modal mapping. Extending this alignment framework to cross-domain observations of human hands enables tactile prediction and reflexive responses to observed human touch. Our results link a neural principle of visuo-tactile resonance to robotic perception, providing an explainable route towards anticipatory touch and empathic human-robot interaction. Code is available at https://github.com/fun0515/Mirror-Touch-Net.
sensorsperception - MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving2605.142015/13/2026Rajeev Yasarla, Deepti Hegde, Hsin-Pai Cheng, Shizhong Han …
Vision-language-action (VLA) models are effective as end-to-end motion planners, but can be brittle when evaluated in closed-loop settings due to being trained under traditional imitation learning framework. Existing closed-loop supervision approaches lack scalability and fail to completely model a reactive environment. We propose MAPLE, a novel framework for reactive, multi-agent rollout of a dynamic driving scenario in the latent space of the VLA model. The ego vehicle and nearby traffic agents are independently controlled over multi-step horizons, while being reactive to other agents in the scene, enabling closed-loop training. MAPLE consists of two training stages: (1) supervised fine-tuning on the latent rollouts based on ground-truth trajectories, followed by (2) reinforcement learning with global and agent -specific rewards that encourage safety, progress, and interaction realism. We further propose diversity rewards that encourage the model to generate planning behaviors that may not be present in logged driving data. Notably, our closed-loop training framework is scalable and does not require external simulators, which can be computationally expensive to run and have limited visual fidelity to the real-world. MAPLE achieves state-of-the-art driving performance on Bench2Drive and demonstrates scalable, closed-loop multi-agent play for robust E2E autonomous driving systems.
rlmulti-agentvla - SToRe3D: Sparse Token Relevance in ViTs for Efficient Multi-View 3D Object Detection2605.141105/13/2026Sandro Papais, Lezhou Feng, Charles Cossette, Lingting Ge
Vision Transformers (ViTs) enable strong multi-view 3D detection but are limited by high inference latency from dense token and query processing across multiple views and large 3D regions. Existing sparsity methods, designed mainly for 2D vision, prune or merge image tokens but do not extend to full-model sparsity or address 3D object queries. We introduce SToRe3D, a relevance-aligned sparsity framework that jointly selects 2D image tokens and 3D object queries while storing filtered features for reactivation. Mutual 2D-3D relevance heads allocate compute to driving-critical content and preserve other embeddings. Evaluated on nuScenes and our new nuScenes-Relevance benchmark, SToRe3D achieves up to 3x faster inference with marginal accuracy loss, establishing real-time large-scale ViT-based 3D detection while maintaining accuracy on planning-critical agents.
perception - Manipulation Planning for Construction Activities with Repetitive Tasks2605.137545/13/2026Wangyi Liu, Dasharadhan Mahalingam, Fanru Gao, Ci-Jyun Liang …
In this paper, we study the problem of manipulation skill acquisition for performing construction activities consisting of repetitive tasks (e.g., building a wall or installing ceiling tiles). Our approach involves setting up a simulated construction activity in a Virtual Reality (VR) environment, where the user can provide demonstrations of the object manipulation skills needed to perform the construction activity. We then exploit the screw geometry of motion to approximate the demonstrated motion as a sequence of constant screw motions. For performing the construction activity, we generate the sequence of manipulation task instances and then compute the joint space motion plan corresponding to each instance using Screw Linear Interpolation (ScLERP) and Resolved Motion Rate Control (RMRC). We evaluate our framework by executing two representative construction tasks: constructing brick walls and installing multiple ceiling tiles. Each task is performed using only a single demonstration, a pick-and-place action for the bricks, and a single ceiling tile installation. Our experiments with a 7-DoF robot in both simulation and hardware demonstrate that the approach generalizes robustly to arbitrarily long construction activities that involve repetitive motions and demand precision, even when provided with just one demonstration. For instance, we can construct walls of arbitrary layout and length by leveraging a single demonstration of placing one brick on top of another.
manipulation - Integration of an Agent Model into an Open Simulation Architecture for Scenario-Based Testing of Automated Vehicles2605.135395/13/2026Christian Geller, Daniel Becker, Jobst Beckmann, Lutz Eckstein
Simulative and scenario-based testing are crucial methods in the safety assurance for automated driving systems. To ensure that simulation results are reliable, the real world must be modeled with sufficient fidelity, including not only the static environment but also the surrounding traffic of a vehicle under test. Thus, the availability of traffic agent models is of common interest to model naturalistic and parameterizable behavior, similar to human drivers. The interchangeability of agent models across different simulation environments represents a major challenge and necessitates harmonization and standardization. To address this challenge, we present a standardized and modular simulation integration architecture that enables the tool-independent integration of traffic agent models. The architecture builds upon the Open Simulation Interface (OSI) as a structured message format and the Functional Mock-up Interface (FMI) for dynamic model exchange. Rather than introducing yet another model or simulation tool, we provide a reusable reference implementation that translates these standards into a practical integration blueprint, including clear interfaces, data mappings, and execution semantics. The generic nature of the architecture is demonstrated by integrating an exemplary agent model into three widely used simulation environments: OpenPASS, CARLA, and CarMaker. As part of the evaluation, we show that the model yields consistent behavior in all simulation platforms, thereby validating the interoperability, modularity, and standard compliance of the proposed architecture. The reference implementation lowers integration barriers, serves as a foundation for future research, and is made publicly available at github.com/ika-rwth-aachen/agent-model-integration
integration - BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning2605.133825/13/2026Ruiheng Wang, Shuanghao Bai, Haoran Zhang, Badong Chen …
While autoregressive (AR) Vision-Language-Action (VLA) models have demonstrated formidable reasoning capabilities in robotic tasks, their sequential decoding process often incurs high inference latency and may amplify error accumulation during long-horizon execution. Discrete Diffusion Language Models (dLLMs) provide a promising alternative through parallel token refinement, but their practical deployment in robotics remains limited by repeated denoising function evaluations (NFEs) and the difficulty of directly applying standard KV caching to bidirectional iterative decoding. To bridge these paradigms, we propose BlockVLA, a framework that adapts pretrained AR backbones into an efficient discrete diffusion policy through a block diffusion paradigm. BlockVLA maintains autoregressive dependencies at the block level while enabling parallel denoising within each block, thereby combining global causal coherence with local parallel generation. This design enables prefix KV-cache reuse across completed blocks, reduces the effective cost of iterative denoising, and provides a smoother transition from AR pretraining to diffusion-based policy fine-tuning. We conduct extensive evaluations on the LIBERO and SimplerEnv benchmarks. Experimental results demonstrate that our BlockVLA achieves a 3.3$\times$ inference acceleration over standard discrete diffusion baselines. Furthermore, our model exhibits superior training efficiency, with success rates converging substantially faster than baselines, a gain that is particularly pronounced in complex, long-horizon tasks, where BlockVLA achieves significant performance gains in the early stages of training. This work establishes Block Diffusion as a robust bridge between large-scale pretrained AR models and efficient, high-frequency real-time robotic control.
rldeploymentvla - Calibration-Free Gas Source Localization with Mobile Robots: Source Term Estimation Based on Concentration Measurement Ranking2605.132085/13/2026Wanting Jin, Agatha Duranceau, İzzet Kağan Erünsal, Alcherio Martinoli
Efficient Gas Source Localization (GSL) in real-world settings is crucial, especially in emergency scenarios. Mobile robots equipped with low-cost, in-situ gas sensors offer a safer alternative to human inspection in hazardous environments. Probabilistic algorithms enhance GSL efficiency with scattered gas measurements by comparing gas concentration measurements gathered by robots to physical dispersion models. However, accurately deriving gas concentrations from data acquired with low-cost sensors is challenging due to the nonlinear sensor response, environmental dependencies (e.g., humidity, temperature, and other gas influences), and robot motion. Mitigating these disturbance factors requires frequent sensor calibration in controlled environments, which is often impractical for real-world deployments. To overcome these issues, we propose a novel feature extraction algorithm that leverages the relative ranking of gas measurements within the dynamically accumulated dataset. By comparing the rank differences between gathered and modeled values, we estimate the probabilistic distribution of source locations across the entire environment. We validate our approach in high-fidelity simulations and physical experiments, demonstrating consistent localization accuracy with uncalibrated gas sensors. Compared to existing methods, our technique eliminates the need for gas sensor calibration, making it well-suited for real-world applications.
- A Proprioceptive-Only Benchmark for Quadruped State Estimation: ATE, RPE, and Runtime Trade-offs Between Filters and Smoothers2605.116745/12/2026Ylenia Nisticò, João Carlos Virgolino Soares, Joan Solà, Claudio Semini
We compare three state-of-the-art proprioceptive state estimators for quadruped robots: MUSE [1], the Invariant Extended Kalman Filter (IEKF) [2], and the Invariant Smoother (IS) [3], on the CYN-1 sequence of the GrandTour Dataset [4]. Our goal is to give practitioners clear guidance on accuracy and computation time: we report long-term accuracy (Absolute Trajectory Error, ATE), short-term accuracy (translational and rotational Relative Pose Error, RPE), and per-update computation time on a fixed hardware/software stack. On this dataset, RPEs are broadly similar across methods, while IEKF and IS achieve a lower ATE than MUSE. Runtime results highlight the accuracy-latency trade-offs across the three approaches. In the discussion, we outline the evaluation choices used to ensure a fair comparison and analyze factors that influence short-horizon metrics. Overall, this study provides a concise snapshot of accuracy and cost, helping readers choose an estimator that fits their application constraints, with all evaluation code and documentation released open-source at https://github.com/iit-DLSLab/state_estimation_benchmark for full reproducibility.
locomotiondocs - Nautilus: From One Prompt to Plug-and-Play Robot Learning2605.116655/12/2026Yufeng Jin, Jianfei Guo, Xiaogang Jia, Yu Deng …
Robot learning research is fragmented across policy families, benchmark suites, and real robots; each implementation is entangled with the others in a complex combination matrix, making it an engineering nightmare to port any single element. General-purpose coding agents may occasionally bridge specific setups, but cannot close this gap at scale because they lack the procedural priors and validation practices that characterize robotics research workflows. We propose NAUTILUS, an open-source harness that turns a single user prompt -- for example, "Evaluate policy A with benchmark B" -- into ready-to-use reproduction, evaluation, fine-tuning, and deployment workflows. NAUTILUS provides: plug-and-play agent skill sets with distilled priors from robotics research; typed contracts among policies, simulators/benchmarks, and real-world robots; unified interfaces and execution environments; and a trustworthy agentic coding workflow with explicit, automated validation, and testing at each milestone. NAUTILUS can not only automatically generate the required adapters and containers for existing implementations, but also wrap and onboard new or user-provided policies, simulators/benchmarks, and robots, all connected via a uniform interface. This expands cross-validation coverage without hand-written glue code. Like a nautilus shell that grows by adding chambers, NAUTILUS scales by extending its execution in chambered units, making it a research harness for scalability rather than a hand-curated framework, and aiming to reduce the engineering burden of cross-family reproduction and evaluation in the ever-growing robot learning ecosystem.
rldeployment - MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems2605.109045/11/2026Marco Coscoy, Zewei Zhou, Seth Z. Zhao, Henry Wei …
Vehicle-to-Everything (V2X) communication has emerged as a promising paradigm for autonomous driving, enabling connected agents to share complementary perception information and negotiate with each other to benefit the final planning. Existing V2X benchmarks, however, fall short in two ways: (i) open-loop evaluations fail to capture the inherently closed-loop nature of driving, leading to evaluation gaps, and (ii) current closed-loop evaluations lack behavioral and interactive diversity to reflect real-world driving. Thus, it is still unclear the extent of benefits of multi-agent systems for closed-loop driving. In this paper, we introduce MDrive, a closed-loop cooperative driving benchmark comprising 225 scenarios grounded in both NHTSA pre-crash typologies and real-world V2X datasets. Our benchmark results demonstrate that multi-agent systems are generally better than single-agent counterparts. However, current multi-agent systems still face two important challenges: (i) perception sharing enhances perceptions, but doesn't always translate to better planning; (ii) negotiation improves planning performance but harms it in complex and dense traffic scenarios. MDrive further provides an open-source toolbox for scenario generation, Real2Sim conversion, and human-in-the-loop simulation. Together, MDrive establishes a reproducible foundation for evaluating and improving the generalization and robustness of cooperative driving systems.
crashperceptionmulti-agent - Is Your Driving World Model an All-Around Player?2605.108585/11/2026Lingdong Kong, Ao Liang, Tianyi Yan, Hongsi Liu …
Today's driving world models can generate remarkably realistic dash-cam videos, yet no single model excels universally. Some generate photorealistic textures but violate basic physics; others maintain geometric consistency but fail when subjected to closed-loop planning. This disconnect exposes a critical gap: the field evaluates how real generated worlds appear, but rarely whether they behave realistically. We introduce WorldLens, a unified benchmark that measures world-model fidelity across the full spectrum, from pixel quality and 4D geometry to closed-loop driving and human perceptual alignment, through five complementary aspects and 24 standardized dimensions. Our evaluation of six representative models reveals that no existing approach dominates across all axes: texture-rich models violate geometry, geometry-aware models lack behavioral fidelity, and even the strongest performers achieve only 2-3 out of 10 on human realism ratings. To bridge algorithmic metrics with human perception, we further contribute WorldLens-26K, a 26,808-entry human-annotated preference dataset pairing numerical scores with textual rationales, and WorldLens-Agent, a vision-language evaluator distilled from these judgments that enables scalable, explainable auto-assessment. Together, the benchmark, dataset, and agent form a unified ecosystem for assessing generated worlds not merely by visual appeal, but by physical and behavioral fidelity.
synthetic-dataperceptionworld-model - Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving2605.100345/11/2026Aron Distelzweig, Faris Janjoš, Andreas Look, Anna Rothenhäusler …
Recent Autonomous Driving (AD) works such as GigaFlow and PufferDrive have unlocked Reinforcement Learning (RL) at scale as a training strategy for driving policies. Yet such policies remain disconnected from established benchmarks, leaving the performance of large-scale RL for driving on standardized evaluations unknown. We present BehaviorBench -- a comprehensive test suite that closes this gap along three axes: Evaluation, Complexity, and Behavior Diversity. In terms of Evaluation, we provide an interface connecting PufferDrive to nuPlan, which, for the first time, enables policies trained via RL at scale to be evaluated on an established planning benchmark for autonomous driving. Complementarily, we offer an evaluation framework that allows planners to be benchmarked directly inside the PufferDrive simulation, at a fraction of the time. Regarding Complexity, we observe that today's standardized benchmarks are so simple that near-perfect scores are achievable by straight lane following with collision checking. We extract a meaningful, interaction-rich split from the Waymo Open Motion Dataset (WOMD) on which strong performance is impossible without multi-agent reasoning. Lastly, we address Behavior Diversity. Existing benchmarks commonly evaluate planners against a single rule-based traffic model, the Intelligent Driver Model (IDM). We provide a diverse suite of interactive traffic agents to stress-test policies under heterogeneous behaviors, beyond just using IDM. Overall, our benchmarking analysis uncovers the following insight: despite learning interactive behaviors in an emergent manner, policies trained via pure self-play under standard reward functions overfit to their training opponents and fail to generalize to other traffic agent behaviors. Building on this observation, we propose a hybrid planner that combines a PPO policy with a rule-based planner.
crashrlhardwaremulti-agent