> For the complete documentation index, see [llms.txt](https://docs.atlas.design/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.atlas.design/atlas-ai-studio-overview/node-index/audio-nodes.md).

# Audio Nodes

Audio Nodes are how Atlas generates and processes audio content: dialogue, music, sound effects, and voice transformations. Most game audio pipelines start with reference recordings, voice actors, or licensed libraries. Atlas audio nodes let teams generate placeholder and production-tier audio directly from text prompts, then transform and route it through standard game audio formats (MP3, WAV).

## When to use audio nodes

* **Voiceover prototyping.** Generate placeholder dialogue for NPCs, cutscenes, tutorials, and quest briefings during pre-production, before committing to professional voice actors.
* **Multi-character conversation generation.** Use Text to Dialogue or Audio Speech Workflow to produce full multi-speaker conversations with distinct voices per character.
* **Dynamic music systems.** Generate background music, combat themes, ambient soundscapes, and menu music from text descriptions. Useful for procedural audio systems where pre-rendered tracks don't fit every scenario.
* **Sound effects from text.** Generate one-shot effects, UI sounds, and environmental audio without sound library licensing.
* **Voice variety from a single recording.** Apply Voice Changer to a single source voice and produce multiple character variants for NPC barks, system announcements, or background chatter.

### Audio Speech Workflow

A pre-built workflow template for generating character dialogue audio with voice assignment and optional voice transformation. Combines text-to-speech synthesis with voice tagging and modification to produce character-specific speech output in MP3 format.

<figure><img src="/files/tM0lUdIema2SRJwiGPQo" alt="" width="563"><figcaption></figcaption></figure>

* **Add Audio Tags** node parses dialogue lines in `A:` / `B:` format and assigns distinct voices to each speaker
* **Text-to-speech** nodes convert tagged dialogue into synthesized audio using selectable backend models
* **Voice Changer** node applies transformations (pitch, timbre, effects) while maintaining speech timing and emotional tone
* Output format is MP3, suitable for direct integration into game audio pipelines
* Dialogue rhythm and pacing are preserved through voice modification stages
* Supports multiple speakers in a single workflow for conversation sequences

**Useful for:** NPC dialogue generation, cutscene voiceover prototyping, dynamic conversation systems, placeholder voice acting during development.

### Audio Nodes Overview

A collection of nodes for generating, transforming, and processing audio content within AI workflows. Enables synthesis of speech, music, and sound effects from text or other inputs, with support for voice manipulation, format conversion, and multi-speaker dialogue systems.

<figure><img src="/files/O7YBXp3uGtBT7elMZKlG" alt="" width="563"><figcaption></figcaption></figure>

* **Text-to-speech nodes** convert written dialogue and narration into spoken audio using selectable backend models
* **Voice modification nodes** apply pitch, timbre, and effect transformations to generated or imported audio
* **Audio tagging nodes** parse speaker-labeled text formats and route distinct voices to different characters
* **Format conversion nodes** output to common game audio formats (MP3, WAV) for engine integration
* Workflow templates combine multiple audio nodes for common scenarios like dialogue generation and voice acting
* Supports both single-voice synthesis and multi-speaker conversation sequences
* Audio timing and emotional delivery are preserved through processing chains

**Useful for:** Voiceover prototyping, placeholder dialogue during development, NPC barks and ambient speech, dynamic audio content generation, cutscene audio mockups.

### Text to Speech (ElevenLabs)

Converts written text into synthesized speech audio using ElevenLabs voice models. Generates spoken dialogue, narration, or voiceover content from text input with selectable voice characteristics and output format options.

* Accepts plain text input for conversion to speech audio
* Voice selection determines speaker identity, tone, and delivery style
* Output format options include MP3 and WAV for game engine compatibility
* Supports emotional delivery and natural speech patterns based on text content
* Model selection affects voice quality, language support, and synthesis characteristics
* Preserves punctuation-driven pacing and emphasis in spoken output

**Useful for:** Character dialogue prototyping, NPC voice generation, cutscene narration, placeholder voiceover during development, dynamic speech content for procedural dialogue systems.

### Simple Text to Speech

Converts text input into synthesized speech audio with selectable backend models. Provides a streamlined interface for generating voiceover and dialogue without multi-speaker tagging or voice transformation stages.

* **Text input** accepts plain dialogue, narration, or script lines without speaker labels
* **Backend selection** offers multiple text-to-speech models with varying voice quality and synthesis characteristics
* Outputs audio in standard formats compatible with game engine audio systems
* Single-voice synthesis — does not parse or route multiple speakers (use Audio Speech Workflow for multi-character dialogue)
* Suitable for rapid prototyping of voiceover content or generating placeholder audio
* Does not include built-in voice modification — chain with Voice Changer node for pitch or timbre adjustments

**Useful for:** Quick voiceover prototyping, NPC barks and system announcements, tutorial narration, placeholder dialogue during pre-production, ambient voice content.

### Add Audio Tags (ElevenLabs)

Parses dialogue text in speaker-labeled format and inserts voice assignment tags for downstream text-to-speech processing. Recognizes `A:` / `B:` / `C:` prefixes and maps each speaker to a distinct voice identifier, enabling multi-character conversations within a single audio workflow.

<figure><img src="/files/XcQd64Jk6qu3VQqJ5VAU" alt="" width="563"><figcaption></figcaption></figure>

* Accepts plain text input with speaker labels (`A: Hello`, `B: Hi there`) and outputs tagged text ready for synthesis
* Each unique speaker prefix is automatically assigned a different voice identifier
* Supports unlimited speakers within a single dialogue block; speaker assignments persist throughout the text
* Tagged output connects directly to text-to-speech nodes that recognize voice metadata
* Preserves line breaks and dialogue structure while adding voice routing information
* No manual voice configuration required; speaker-to-voice mapping is handled automatically based on label order

**Useful for:** Multi-character cutscene dialogue, NPC conversation sequences, branching dialogue prototyping, automated voice casting for scripted exchanges.

### Text to Dialogue (ElevenLabs)

Generates multi-speaker conversation audio from text, with automatic voice assignment and character-based voice selection. Parses dialogue scripts and synthesizes speech for each speaker using distinct voices, producing conversation-ready audio output.

<figure><img src="/files/1vW5es5hq96pBNi2zaho" alt="" width="563"><figcaption></figcaption></figure>

* **Voice assignment** maps each speaker in the script to a selectable voice from the backend's voice library
* **Character category browser** organizes available voices by archetype (hero, villain, merchant, etc.) for quick casting
* Accepts dialogue text with speaker labels and outputs synchronized multi-speaker audio
* Maintains conversation timing and turn-taking between speakers automatically
* Outputs audio in formats suitable for direct integration into game dialogue systems
* Supports rapid iteration on character voice selection without re-entering dialogue text

**Useful for:** NPC conversation prototyping, cutscene dialogue generation, placeholder voiceover during development, character voice casting exploration, dynamic dialogue system testing.

### Voice Changer (ElevenLabs)

Applies real-time voice transformations to audio input, modifying pitch, timbre, and tonal characteristics while preserving speech timing and intelligibility. Processes synthesized or recorded audio to create character-specific vocal variations.

<figure><img src="/files/d5sltsftSQd2S8kdjv2n" alt="" width="563"><figcaption></figcaption></figure>

* Accepts audio input from text-to-speech nodes or external audio sources
* Transforms vocal characteristics including pitch shift, formant adjustment, and timbre modification
* Background noise removal option cleans audio artifacts during transformation
* Maintains original speech rhythm, pacing, and emotional delivery through processing
* Outputs modified audio in standard formats compatible with game audio pipelines
* Works downstream of voice tagging and text-to-speech nodes in multi-stage workflows
* Speech timing alignment ensures lip-sync compatibility is preserved

**Useful for:** Creating vocal variety for NPCs from a single voice source, transforming placeholder dialogue, generating distinct character voices in conversation sequences, prototype voiceover with limited recording assets.

### Voice Selection

A utility node for browsing, auditioning, and selecting text-to-speech voices from available backend voice libraries. Provides categorized voice lists with metadata (language, gender, style tags) and in-editor audio preview to streamline voice assignment for character dialogue and narration.

<figure><img src="/files/IO5SUObGha6Zi3foATB1" alt="" width="563"><figcaption></figcaption></figure>

* Organizes available voices by category (Narration, Character, Emotional, etc.) and language for quick filtering
* Displays voice metadata including gender, accent, age range, and descriptive style tags
* Built-in audio preview plays sample recordings of each voice directly in the editor without generating new audio
* Selected voice identifier outputs as a string parameter for connection to downstream text-to-speech nodes
* Supports multilingual voice libraries; language filters adapt to the backend's available voice catalog
* Voice availability and categorization depend on the connected backend model

**Useful for:** Character voice casting, dialogue prototyping, matching voice tone to narrative context, rapid iteration on NPC speech styles, placeholder voiceover selection.

### Composition Plan (ElevenLabs)

A node for structuring and organizing multi-segment audio compositions with planned speaker assignments, timing, and content direction. Generates a structured plan that can be consumed by downstream audio generation nodes to produce coherent multi-part audio sequences.

* Accepts text descriptions of audio segments, speaker roles, and sequence structure as input
* Outputs a structured composition plan defining segment order, speaker assignments, and content guidelines
* Enables pre-planning of complex audio sequences before synthesis, separating creative direction from generation
* Plan format is compatible with ElevenLabs audio generation workflows for execution
* Supports multi-speaker scenarios where different voices are assigned to specific segments or roles
* Allows iteration on composition structure without regenerating audio until the plan is finalized

**Useful for:** Planning multi-character dialogue sequences, structuring narrative voiceover with multiple segments, organizing cutscene audio with speaker transitions, prototyping complex conversation flows before full synthesis.

### Create Music (ElevenLabs)

Generates music tracks from text prompts or structured composition plans. Accepts natural language descriptions of musical style, mood, instrumentation, and structure, producing audio output suitable for in-game music, prototyping, and placeholder soundtracks.

<figure><img src="/files/PX2c50nOcTlZGtL0ebLM" alt="" width="563"><figcaption></figcaption></figure>

* Accepts text prompts describing genre, tempo, instrumentation, and emotional tone
* Supports structured **Composition Plan** JSON input for more precise control over musical arrangement and timing
* **Strict duration** toggle enforces exact output length when enabled; relaxed mode allows natural musical phrase endings
* Output varies in length based on strict duration setting (e.g., 2:30 vs 4:00 for the same composition plan)
* Produces audio files ready for integration into game engines
* Works standalone with text prompts or in workflows consuming composition plan data from upstream nodes

**Useful for:** Background music generation, combat themes, ambient soundscapes, menu music, rapid prototyping of audio mood boards, placeholder soundtrack creation during development.

### Simple Music Generation

Generates original music compositions from text descriptions using selectable backend models. Produces audio output suitable for background music, ambient soundscapes, and musical prototyping in game environments.

<figure><img src="/files/2XA6onxqflwoY1XYZx8s" alt="" width="563"><figcaption></figcaption></figure>

* **Text prompt input** describes musical style, mood, instrumentation, and tempo for the desired composition
* **Backend selection dropdown** offers multiple music generation engines with varying style capabilities and output characteristics
* Outputs audio in standard formats compatible with game engine import pipelines
* Generation parameters vary by selected backend; some models support extended duration or specific genre specialization
* Produces royalty-free music assets that can be integrated directly into game builds
* Single-node workflow for rapid iteration on musical concepts without external audio software

**Useful for:** Placeholder background music during development, ambient soundscapes for exploration areas, dynamic music prototyping, mood-based audio testing, quick musical mockups for cutscenes.

### Simple Modify Music

Transforms existing music audio through four modification modes: generating vocal lyrics over instrumental tracks, inpainting specific time segments, remixing with style changes, or creating cover versions with different instrumentation or vocals.

<figure><img src="/files/oIv3roBkdMHzuquJyehK" alt="" width="563"><figcaption></figcaption></figure>

* **Lyrics Only mode** adds sung vocals to instrumental music using provided text prompts
* **Inpainting mode** regenerates a specified time range within the track while preserving surrounding audio continuity
* **Remix mode** restructures the existing composition with tempo, arrangement, or stylistic variations
* **Cover mode** recreates the track with alternative genre, instrumentation, or vocal interpretation
* Accepts audio input in common formats alongside text prompts describing the desired modification
* Preserves musical structure and timing where appropriate to the selected mode
* Output suitable for adaptive music systems, combat variations, and dynamic soundtrack prototyping

**Useful for:** Creating alternate versions of combat music, generating vocal variants of existing themes, prototyping adaptive soundtrack transitions, producing genre variations of menu music.

### Sound Effects (ElevenLabs)

Generates sound effects from text descriptions using ElevenLabs audio synthesis. Takes a written prompt describing the desired sound and outputs an audio file suitable for game integration.

<figure><img src="/files/3tbMUVcs4XDpyV2PwmGS" alt="" width="563"><figcaption></figcaption></figure>

* **Text prompt input** describes the sound effect characteristics (type, duration, mood, context)
* **Duration parameter** controls the length of the generated sound effect in seconds
* Outputs audio in standard game-compatible formats for immediate engine integration
* Produces one-shot effects, ambient loops, UI sounds, and environmental audio from natural language descriptions
* Quality and acoustic properties vary based on prompt specificity and descriptive detail
* No audio source material required—generates entirely from text prompts

**Useful for:** Rapid prototyping of placeholder sound effects, generating UI audio feedback, creating one-off environmental sounds, iterating on audio design concepts during pre-production.

## Common pitfalls

* **Using simple Text to Speech for multi-character dialogue.** Simple Text to Speech does not parse speaker labels. For conversations with multiple voices, use Audio Speech Workflow or Text to Dialogue.
* **Skipping voice selection.** Default voices often don't match the character archetype you want. Use Voice Selection to audition and pick before generating large amounts of audio.
* **Modifying audio that's already going to Lipsync.** Voice Changer alters pitch and timbre, but downstream Lipsync timing assumes the speech rhythm is preserved. If you need both, apply Voice Changer first, then run Lipsync on the transformed audio.
* **Not specifying duration on Sound Effects.** Without an explicit duration parameter, generated effects may be too short or too long for game integration. Always set duration to match the in-game usage.
* **Generating final-quality audio without a budget.** Music and dialogue generation consume more credits than utility operations. For pre-production prototyping use Simple Text to Speech and Simple Music Generation; reserve full Audio Speech Workflow runs and ElevenLabs nodes for hero moments.

## Related nodes

* [Input Nodes](/atlas-ai-studio-overview/node-index/input-nodes.md) — Input Audio supplies external audio for processing, Voice Changer, or analysis. Input Text supplies dialogue scripts.
* [Utility Nodes](/atlas-ai-studio-overview/node-index/utility-nodes.md) — Text Generation (LLM), Combine Text, and Structured Output can generate or structure dialogue scripts that then feed audio nodes.
* [Video Nodes](/atlas-ai-studio-overview/node-index/video-nodes.md) — Lipsync pairs audio generated here with a character image to produce dialogue video.
* [API Nodes](/atlas-ai-studio-overview/node-index/api-nodes.md) — for exposing audio-generation workflows as callable endpoints for in-engine procedural audio.

## Frequently asked questions

**What audio formats does the platform produce?**

MP3 and WAV are standard outputs. Both formats are compatible with Unity, Unreal Engine, Godot, and most other game engines without conversion.

**Can I use generated audio commercially in shipped games?**

Audio rights depend on the underlying generation backend's terms (e.g., ElevenLabs has its own commercial licensing). Verify the backend's terms before shipping generated audio in commercial products. For risk-sensitive use cases, treat generated audio as prototype-grade and replace with licensed or studio-recorded audio before release.

**How do I generate dialogue with different voices for different characters?**

Use Audio Speech Workflow or Text to Dialogue with speaker-labeled dialogue (`A: Hello` / `B: Hi there`). The Add Audio Tags node parses these labels and routes each speaker to a distinct voice automatically.

**Can the music generation nodes produce loopable tracks?**

Create Music and Simple Music Generation can be prompted for loop-friendly compositions, though precise loop-point control is limited. For seamless looping, edit the output in a DAW or use shorter tracks designed to loop.

**What's the difference between Simple Text to Speech and Text to Speech (ElevenLabs)?**

Simple Text to Speech is a generic wrapper with multiple backend options for fast prototyping. Text to Speech (ElevenLabs) is a specialized node for the ElevenLabs voice library with finer control over voice selection and emotional delivery.

**Can I integrate audio nodes into a real-time game pipeline?**

Yes, via [API Nodes](/atlas-ai-studio-overview/node-index/api-nodes.md). Export your audio workflow as an API and call it from your game backend or engine. Note that generation latency means real-time on-demand audio works best for non-time-critical content (post-action barks, ambient generation) rather than instant response audio.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.atlas.design/atlas-ai-studio-overview/node-index/audio-nodes.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
