Audio Nodes
Audio Nodes generate speech, music, and sound effects from text and other inputs. Powers character dialogue, NPC voices, dynamic music, environmental sound design, and full multi-speaker conversation
Audio Nodes are how Atlas generates and processes audio content: dialogue, music, sound effects, and voice transformations. Most game audio pipelines start with reference recordings, voice actors, or licensed libraries. Atlas audio nodes let teams generate placeholder and production-tier audio directly from text prompts, then transform and route it through standard game audio formats (MP3, WAV).
When to use audio nodes
Voiceover prototyping. Generate placeholder dialogue for NPCs, cutscenes, tutorials, and quest briefings during pre-production, before committing to professional voice actors.
Multi-character conversation generation. Use Text to Dialogue or Audio Speech Workflow to produce full multi-speaker conversations with distinct voices per character.
Dynamic music systems. Generate background music, combat themes, ambient soundscapes, and menu music from text descriptions. Useful for procedural audio systems where pre-rendered tracks don't fit every scenario.
Sound effects from text. Generate one-shot effects, UI sounds, and environmental audio without sound library licensing.
Voice variety from a single recording. Apply Voice Changer to a single source voice and produce multiple character variants for NPC barks, system announcements, or background chatter.
Audio Speech Workflow
A pre-built workflow template for generating character dialogue audio with voice assignment and optional voice transformation. Combines text-to-speech synthesis with voice tagging and modification to produce character-specific speech output in MP3 format.

Add Audio Tags node parses dialogue lines in
A:/B:format and assigns distinct voices to each speakerText-to-speech nodes convert tagged dialogue into synthesized audio using selectable backend models
Voice Changer node applies transformations (pitch, timbre, effects) while maintaining speech timing and emotional tone
Output format is MP3, suitable for direct integration into game audio pipelines
Dialogue rhythm and pacing are preserved through voice modification stages
Supports multiple speakers in a single workflow for conversation sequences
Useful for: NPC dialogue generation, cutscene voiceover prototyping, dynamic conversation systems, placeholder voice acting during development.
Audio Nodes Overview
A collection of nodes for generating, transforming, and processing audio content within AI workflows. Enables synthesis of speech, music, and sound effects from text or other inputs, with support for voice manipulation, format conversion, and multi-speaker dialogue systems.

Text-to-speech nodes convert written dialogue and narration into spoken audio using selectable backend models
Voice modification nodes apply pitch, timbre, and effect transformations to generated or imported audio
Audio tagging nodes parse speaker-labeled text formats and route distinct voices to different characters
Format conversion nodes output to common game audio formats (MP3, WAV) for engine integration
Workflow templates combine multiple audio nodes for common scenarios like dialogue generation and voice acting
Supports both single-voice synthesis and multi-speaker conversation sequences
Audio timing and emotional delivery are preserved through processing chains
Useful for: Voiceover prototyping, placeholder dialogue during development, NPC barks and ambient speech, dynamic audio content generation, cutscene audio mockups.
Text to Speech (ElevenLabs)
Converts written text into synthesized speech audio using ElevenLabs voice models. Generates spoken dialogue, narration, or voiceover content from text input with selectable voice characteristics and output format options.
Accepts plain text input for conversion to speech audio
Voice selection determines speaker identity, tone, and delivery style
Output format options include MP3 and WAV for game engine compatibility
Supports emotional delivery and natural speech patterns based on text content
Model selection affects voice quality, language support, and synthesis characteristics
Preserves punctuation-driven pacing and emphasis in spoken output
Useful for: Character dialogue prototyping, NPC voice generation, cutscene narration, placeholder voiceover during development, dynamic speech content for procedural dialogue systems.
Simple Text to Speech
Converts text input into synthesized speech audio with selectable backend models. Provides a streamlined interface for generating voiceover and dialogue without multi-speaker tagging or voice transformation stages.
Text input accepts plain dialogue, narration, or script lines without speaker labels
Backend selection offers multiple text-to-speech models with varying voice quality and synthesis characteristics
Outputs audio in standard formats compatible with game engine audio systems
Single-voice synthesis — does not parse or route multiple speakers (use Audio Speech Workflow for multi-character dialogue)
Suitable for rapid prototyping of voiceover content or generating placeholder audio
Does not include built-in voice modification — chain with Voice Changer node for pitch or timbre adjustments
Useful for: Quick voiceover prototyping, NPC barks and system announcements, tutorial narration, placeholder dialogue during pre-production, ambient voice content.
Add Audio Tags (ElevenLabs)
Parses dialogue text in speaker-labeled format and inserts voice assignment tags for downstream text-to-speech processing. Recognizes A: / B: / C: prefixes and maps each speaker to a distinct voice identifier, enabling multi-character conversations within a single audio workflow.

Accepts plain text input with speaker labels (
A: Hello,B: Hi there) and outputs tagged text ready for synthesisEach unique speaker prefix is automatically assigned a different voice identifier
Supports unlimited speakers within a single dialogue block; speaker assignments persist throughout the text
Tagged output connects directly to text-to-speech nodes that recognize voice metadata
Preserves line breaks and dialogue structure while adding voice routing information
No manual voice configuration required; speaker-to-voice mapping is handled automatically based on label order
Useful for: Multi-character cutscene dialogue, NPC conversation sequences, branching dialogue prototyping, automated voice casting for scripted exchanges.
Text to Dialogue (ElevenLabs)
Generates multi-speaker conversation audio from text, with automatic voice assignment and character-based voice selection. Parses dialogue scripts and synthesizes speech for each speaker using distinct voices, producing conversation-ready audio output.

Voice assignment maps each speaker in the script to a selectable voice from the backend's voice library
Character category browser organizes available voices by archetype (hero, villain, merchant, etc.) for quick casting
Accepts dialogue text with speaker labels and outputs synchronized multi-speaker audio
Maintains conversation timing and turn-taking between speakers automatically
Outputs audio in formats suitable for direct integration into game dialogue systems
Supports rapid iteration on character voice selection without re-entering dialogue text
Useful for: NPC conversation prototyping, cutscene dialogue generation, placeholder voiceover during development, character voice casting exploration, dynamic dialogue system testing.
Voice Changer (ElevenLabs)
Applies real-time voice transformations to audio input, modifying pitch, timbre, and tonal characteristics while preserving speech timing and intelligibility. Processes synthesized or recorded audio to create character-specific vocal variations.

Accepts audio input from text-to-speech nodes or external audio sources
Transforms vocal characteristics including pitch shift, formant adjustment, and timbre modification
Background noise removal option cleans audio artifacts during transformation
Maintains original speech rhythm, pacing, and emotional delivery through processing
Outputs modified audio in standard formats compatible with game audio pipelines
Works downstream of voice tagging and text-to-speech nodes in multi-stage workflows
Speech timing alignment ensures lip-sync compatibility is preserved
Useful for: Creating vocal variety for NPCs from a single voice source, transforming placeholder dialogue, generating distinct character voices in conversation sequences, prototype voiceover with limited recording assets.
Voice Selection
A utility node for browsing, auditioning, and selecting text-to-speech voices from available backend voice libraries. Provides categorized voice lists with metadata (language, gender, style tags) and in-editor audio preview to streamline voice assignment for character dialogue and narration.

Organizes available voices by category (Narration, Character, Emotional, etc.) and language for quick filtering
Displays voice metadata including gender, accent, age range, and descriptive style tags
Built-in audio preview plays sample recordings of each voice directly in the editor without generating new audio
Selected voice identifier outputs as a string parameter for connection to downstream text-to-speech nodes
Supports multilingual voice libraries; language filters adapt to the backend's available voice catalog
Voice availability and categorization depend on the connected backend model
Useful for: Character voice casting, dialogue prototyping, matching voice tone to narrative context, rapid iteration on NPC speech styles, placeholder voiceover selection.
Composition Plan (ElevenLabs)
A node for structuring and organizing multi-segment audio compositions with planned speaker assignments, timing, and content direction. Generates a structured plan that can be consumed by downstream audio generation nodes to produce coherent multi-part audio sequences.
Accepts text descriptions of audio segments, speaker roles, and sequence structure as input
Outputs a structured composition plan defining segment order, speaker assignments, and content guidelines
Enables pre-planning of complex audio sequences before synthesis, separating creative direction from generation
Plan format is compatible with ElevenLabs audio generation workflows for execution
Supports multi-speaker scenarios where different voices are assigned to specific segments or roles
Allows iteration on composition structure without regenerating audio until the plan is finalized
Useful for: Planning multi-character dialogue sequences, structuring narrative voiceover with multiple segments, organizing cutscene audio with speaker transitions, prototyping complex conversation flows before full synthesis.
Create Music (ElevenLabs)
Generates music tracks from text prompts or structured composition plans. Accepts natural language descriptions of musical style, mood, instrumentation, and structure, producing audio output suitable for in-game music, prototyping, and placeholder soundtracks.

Accepts text prompts describing genre, tempo, instrumentation, and emotional tone
Supports structured Composition Plan JSON input for more precise control over musical arrangement and timing
Strict duration toggle enforces exact output length when enabled; relaxed mode allows natural musical phrase endings
Output varies in length based on strict duration setting (e.g., 2:30 vs 4:00 for the same composition plan)
Produces audio files ready for integration into game engines
Works standalone with text prompts or in workflows consuming composition plan data from upstream nodes
Useful for: Background music generation, combat themes, ambient soundscapes, menu music, rapid prototyping of audio mood boards, placeholder soundtrack creation during development.
Simple Music Generation
Generates original music compositions from text descriptions using selectable backend models. Produces audio output suitable for background music, ambient soundscapes, and musical prototyping in game environments.

Text prompt input describes musical style, mood, instrumentation, and tempo for the desired composition
Backend selection dropdown offers multiple music generation engines with varying style capabilities and output characteristics
Outputs audio in standard formats compatible with game engine import pipelines
Generation parameters vary by selected backend; some models support extended duration or specific genre specialization
Produces royalty-free music assets that can be integrated directly into game builds
Single-node workflow for rapid iteration on musical concepts without external audio software
Useful for: Placeholder background music during development, ambient soundscapes for exploration areas, dynamic music prototyping, mood-based audio testing, quick musical mockups for cutscenes.
Simple Modify Music
Transforms existing music audio through four modification modes: generating vocal lyrics over instrumental tracks, inpainting specific time segments, remixing with style changes, or creating cover versions with different instrumentation or vocals.

Lyrics Only mode adds sung vocals to instrumental music using provided text prompts
Inpainting mode regenerates a specified time range within the track while preserving surrounding audio continuity
Remix mode restructures the existing composition with tempo, arrangement, or stylistic variations
Cover mode recreates the track with alternative genre, instrumentation, or vocal interpretation
Accepts audio input in common formats alongside text prompts describing the desired modification
Preserves musical structure and timing where appropriate to the selected mode
Output suitable for adaptive music systems, combat variations, and dynamic soundtrack prototyping
Useful for: Creating alternate versions of combat music, generating vocal variants of existing themes, prototyping adaptive soundtrack transitions, producing genre variations of menu music.
Sound Effects (ElevenLabs)
Generates sound effects from text descriptions using ElevenLabs audio synthesis. Takes a written prompt describing the desired sound and outputs an audio file suitable for game integration.

Text prompt input describes the sound effect characteristics (type, duration, mood, context)
Duration parameter controls the length of the generated sound effect in seconds
Outputs audio in standard game-compatible formats for immediate engine integration
Produces one-shot effects, ambient loops, UI sounds, and environmental audio from natural language descriptions
Quality and acoustic properties vary based on prompt specificity and descriptive detail
No audio source material required—generates entirely from text prompts
Useful for: Rapid prototyping of placeholder sound effects, generating UI audio feedback, creating one-off environmental sounds, iterating on audio design concepts during pre-production.
Common pitfalls
Using simple Text to Speech for multi-character dialogue. Simple Text to Speech does not parse speaker labels. For conversations with multiple voices, use Audio Speech Workflow or Text to Dialogue.
Skipping voice selection. Default voices often don't match the character archetype you want. Use Voice Selection to audition and pick before generating large amounts of audio.
Modifying audio that's already going to Lipsync. Voice Changer alters pitch and timbre, but downstream Lipsync timing assumes the speech rhythm is preserved. If you need both, apply Voice Changer first, then run Lipsync on the transformed audio.
Not specifying duration on Sound Effects. Without an explicit duration parameter, generated effects may be too short or too long for game integration. Always set duration to match the in-game usage.
Generating final-quality audio without a budget. Music and dialogue generation consume more credits than utility operations. For pre-production prototyping use Simple Text to Speech and Simple Music Generation; reserve full Audio Speech Workflow runs and ElevenLabs nodes for hero moments.
Related nodes
Input Nodes — Input Audio supplies external audio for processing, Voice Changer, or analysis. Input Text supplies dialogue scripts.
Utility Nodes — Text Generation (LLM), Combine Text, and Structured Output can generate or structure dialogue scripts that then feed audio nodes.
Video Nodes — Lipsync pairs audio generated here with a character image to produce dialogue video.
API Nodes — for exposing audio-generation workflows as callable endpoints for in-engine procedural audio.
Frequently asked questions
What audio formats does the platform produce?
MP3 and WAV are standard outputs. Both formats are compatible with Unity, Unreal Engine, Godot, and most other game engines without conversion.
Can I use generated audio commercially in shipped games?
Audio rights depend on the underlying generation backend's terms (e.g., ElevenLabs has its own commercial licensing). Verify the backend's terms before shipping generated audio in commercial products. For risk-sensitive use cases, treat generated audio as prototype-grade and replace with licensed or studio-recorded audio before release.
How do I generate dialogue with different voices for different characters?
Use Audio Speech Workflow or Text to Dialogue with speaker-labeled dialogue (A: Hello / B: Hi there). The Add Audio Tags node parses these labels and routes each speaker to a distinct voice automatically.
Can the music generation nodes produce loopable tracks?
Create Music and Simple Music Generation can be prompted for loop-friendly compositions, though precise loop-point control is limited. For seamless looping, edit the output in a DAW or use shorter tracks designed to loop.
What's the difference between Simple Text to Speech and Text to Speech (ElevenLabs)?
Simple Text to Speech is a generic wrapper with multiple backend options for fast prototyping. Text to Speech (ElevenLabs) is a specialized node for the ElevenLabs voice library with finer control over voice selection and emotional delivery.
Can I integrate audio nodes into a real-time game pipeline?
Yes, via API Nodes. Export your audio workflow as an API and call it from your game backend or engine. Note that generation latency means real-time on-demand audio works best for non-time-critical content (post-action barks, ambient generation) rather than instant response audio.
Last updated