You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This model is licensed under a combination of CC BY-NC 4.0 and the NVIDIA Open Model License. By clicking "Agree", you accept the terms of this license.

Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models

Paper: arxiv.org
Blog post & Audio samples: kyutai.org

Overview

This model is a full-duplex spoken dialogue model post-trained with reinforcement learning (RL) to improve interactivity. Starting from Moshi (Défossez et al., 2024) or PersonaPlex (Roy et al., 2026), our post-training targets the four canonical axes of full-duplex interactivity: pause handling, turn-taking, backchanneling, and user interruption, using axis-specific rewards with GRPO and an LLM Judge reward to preserve response quality.

Compared to the base models, the post-trained models reduce cases where the model inappropriately barges in on the user, substantially improve turn-taking response latency, and promote well-timed backchanneling, as evaluated on both Full-Duplex-Bench v1 (static, using pre-recorded audio input) and Full-Duplex-Bench v2 (dynamic, using real-time multi-turn dialogue).

Training Data

We construct RL training data from Seamless Interaction (Agrawal et al., 2025), a 4,000-hour two-party human conversation corpus in which each speaker is recorded on a separate channel. For each of the four interactivity axes, we use voice activity detection to automatically extract up to 2,000 relevant segments from this corpus.

Models

We release two RL-trained models, one for each base model.

🤗 kyutai/moshika-rl-seamless: based on kyutai/moshika-pytorch-bf16
🤗 kyutai/personaplex-rl-seamless: based on nvidia/personaplex-7b-v1

Usage

This model uses the PersonaPlex architecture and is compatible with the official PersonaPlex inference code. PersonaPlex extends Moshi with support for dialogue control via a text prompt and voice cloning via an audio prompt.

git clone https://github.com/NVIDIA/personaplex
cd personaplex
pip install moshi/.
python -m moshi.server --hf-repo kyutai/personaplex-rl-seamless

Before the conversation begins, you provide a text prompt and a voice prompt. See the official PersonaPlex README for further details on installation and usage.

In our experiments, we used one of the official text prompts You enjoy having a good conversation. throughout, except for the User Interruption task of Full-Duplex-Bench v1, where we used You are a wise and friendly teacher. Answer questions or provide advice in a clear and engaging way. as suggested in the official repository. For the voice prompt, we used a 3-second clip, SC0001.wav, though the web UI only lets you select from the 18 fixed voice prompts provided by NVIDIA.

Bias, Risks, and Limitations

This model is intended for research use only and is not recommended for providing advice or performing any professional duty. It should not be used to impersonate other people or for any malicious purpose.

The rule-based reward design for each axis requires manual engineering and becomes increasingly difficult to scale as the number of axes grows. We have also observed that the conversational style of the training data can affect the model's safety behavior, making the incorporation of safety-aware rewards or constraints into the RL process an important direction for future work.

License

This model's weights have the following lineage, each layer carrying its own license:

Model	Weights	License
Moshi	kyutai/moshiko-pytorch-bf16	CC BY 4.0
PersonaPlex	nvidia/personaplex-7b-v1 delta from Moshi	NVIDIA Open Model License
This model	kyutai/personaplex-rl-seamless delta from PersonaPlex	CC BY-NC 4.0

For practical purposes, we distribute the final merged weights directly. Users must comply with both the CC BY-NC 4.0 license and the NVIDIA Open Model License. Please refer to each license for the full terms and conditions.

Citation

@article{ohashi2026multifaceted,
  title={Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models},
  author={Ohashi, Atsumoto and Zeghidour, Neil and D{\'e}fossez, Alexandre and Kharitonov, Eugene},
  journal={arXiv preprint arXiv:2606.11167},
  year={2026}
}