Skip to content

feat(soniox): add real-time translation support and rewrite SpeechStream#5111

Open
MSameerAbbas wants to merge 1 commit intolivekit:mainfrom
MSameerAbbas:feat/soniox-full-feature-support
Open

feat(soniox): add real-time translation support and rewrite SpeechStream#5111
MSameerAbbas wants to merge 1 commit intolivekit:mainfrom
MSameerAbbas:feat/soniox-full-feature-support

Conversation

@MSameerAbbas
Copy link
Contributor

@MSameerAbbas MSameerAbbas commented Mar 15, 2026

Summary

Adds real-time translation support to the Soniox STT plugin (#4943), along with a cleanup of SpeechStream to align with the patterns used by other plugins like Deepgram.

New features

  • Real-time translation (one-way and two-way) via TranslationConfig dataclass. Translations surface as alternatives[1] on SpeechEvent, fully backward-compatible since all consumers only read alternatives[0].
  • max_endpoint_delay_ms parameter (500-3000ms) for tuning endpoint detection latency.
  • models.py with Literal type aliases (SonioxRTModels, SonioxLanguages) for IDE autocomplete -- follows the same pattern as the Google STT plugin.
  • Flush sentinel handling: _FlushSentinel is now mapped to Soniox's documented end-of-stream signal for clean session shutdown. Previously it was not handled.

SpeechStream cleanup

While adding translation, I noticed a few things in SpeechStream that could be simplified to match how other plugins (Deepgram, Google) structure their streaming:

  • Simplified connection lifecycle: Consolidated into a single _run() that connects, runs tasks, and cleans up. The base class _main_task() already handles retry logic, so the plugin doesn't need its own retry loop.
  • Reduced task count (4 -> 3): The intermediate audio_queue between _prepare_audio_task and _send_audio_task was consolidated into a single _send_task that reads _input_ch directly.
  • ws as parameter: Subtasks receive the WebSocket as a parameter rather than reading self._ws, similar to how the Deepgram plugin passes connection state.
  • Error propagation: Server errors now raise APIConnectionError (5xx) or APIStatusError (4xx) so the base class can decide whether to retry. Unexpected WebSocket closure raises instead of silently returning.

Translation design

When translation is enabled, Soniox returns tokens with a translation_status field. The plugin routes tokens into dict-keyed accumulators:

  • translation_status in ("none", "original", absent) -> final["original"]
  • translation_status == "translation" -> final["translation"]

At endpoint: alternatives[0] = original text, alternatives[1] = translation (if present). When translation is off, all tokens route to "original" and the event has a single alternative -- identical to the previous behavior. One code path handles both cases.

What was NOT changed

  • _TokenAccumulator class -- already clean, kept as-is.
  • STT class -- kept as-is.
  • All STTOptions defaults preserved (model, sample_rate, num_channels, etc.).
  • Context dataclasses (ContextObject, ContextGeneralItem, ContextTranslationTerm) -- unchanged.

Files changed

  • livekit-plugins/livekit-plugins-soniox/livekit/plugins/soniox/stt.py -- SpeechStream rewrite, added TranslationConfig + STTOptions fields
  • livekit-plugins/livekit-plugins-soniox/livekit/plugins/soniox/__init__.py -- export TranslationConfig, SonioxLanguages, SonioxRTModels
  • livekit-plugins/livekit-plugins-soniox/livekit/plugins/soniox/models.py -- new file with Literal type aliases

Test plan

  • Two-way translation (en/ur) -- verified both directions produce correct alternatives[0] and alternatives[1]
  • One-way translation (to ur) -- verified single target language translation
  • No translation (backward compat) -- verified single alternative, identical to previous behavior
  • max_endpoint_delay_ms -- verified API accepts the parameter
  • Ruff format and lint -- all checks passed
  • mypy strict -- 0 new errors (1 pre-existing across all STT plugins)
  • Unit test suite (294 passed, 2 skipped, 9 errors from missing LiveKit server -- pre-existing)

Refs: #4943

Rewrite the Soniox STT plugin to support all WebSocket API features and
fix structural issues in the streaming implementation.

New features:
- Real-time translation (one-way and two-way) via TranslationConfig
- Configurable max_endpoint_delay_ms (500-3000ms)
- Typed Literal autocomplete for models, languages, and translation type
- Flush sentinel mapped to FINALIZE_MSG for clean session shutdown

Structural fixes:
- Remove dead reconnect machinery (_reconnect_event was never set)
- Eliminate unnecessary intermediate audio queue (2 tasks -> 1)
- Pass ws as parameter to subtasks instead of mutable self._ws
- Single connection lifecycle in _run(); base class handles retries
- Proper error semantics (5xx -> APIConnectionError, 4xx -> APIStatusError)
- Raise on unexpected WS closure instead of silent hang
- Handle _FlushSentinel (was silently dropped)
- Remove unreachable except clause

Translation design:
- alternatives[0] = original text (always present)
- alternatives[1] = translated text (when translation is enabled)
- Fully backward-compatible: all consumers read alternatives[0]
- Dict-keyed accumulators with no special cases

Refs: livekit#4943
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review

@MSameerAbbas
Copy link
Contributor Author

Hey @tinalenguyen, I saw this was assigned to you - hope it's helpful! Would love your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant