Everything you need to know about creating AI-generated images, videos, and avatar content.
The studio has five main tabs across the top. Each one opens a different creative tool:
| Tab | What It Does | Cost |
|---|---|---|
| Image | Generate images from text prompts using Flux, SDXL, and other AI models | 5 credits |
| Video | Generate short videos from text or images using Kling, Runway, etc. | 10 credits |
| Avatar | Create talking head videos with AI avatars or your own photos | 15 credits |
| Smart Video | Full automated video pipeline — scripting, visuals, narration, music | 25 credits |
| Voice | Text-to-speech with 2,000+ voices from HeyGen, ElevenLabs, Azure | 3 credits |
Type a description of the image you want. Use the Inspire button for random creative prompts, or Enhance to improve your prompt with AI. Choose your model, style, and aspect ratio in the settings panel below the prompt.
Describe the video scene you want, or start from an image. Select your video model (Kling, Runway, etc.) and aspect ratio. Videos typically take 1–3 minutes to render.
You can also use the Create Video button on any image card in your feed to animate an existing image.
The Avatar tab has two sub-tabs:
Choose from 1,200+ professional AI avatars. Use the search bar and gender filter to find the right one. Hover over any avatar to see a large preview with video animation.
Choose from 5,800+ stock talking photos, or upload your own image. Your photo will be animated to speak your script with realistic lip-sync.
The background section has three modes, accessible via tabs:
Pick from 8 preset colors: Dark, Green Screen, White, Navy, Purple, Midnight, Cream, and Red. Use the color picker circle for any custom color. Green Screen (#00b140) is useful if you plan to composite the avatar onto other footage later.
Switch to the Image tab and paste a direct URL to any image. The image will be placed behind your avatar. Supports .jpg, .png, and other standard image formats.
You can also paste a video URL (.mp4, .webm, .mov) — the system will automatically detect it and loop the video behind your avatar.
Switch to the Search tab to find professional backgrounds from Pexels. Use the quick-tag buttons (Office, Nature, Studio, City, Abstract, Gradient) or type your own search. Click any photo to select it as your background.
| Style | Description |
|---|---|
| Full Body | Default — shows the avatar from the waist up with natural body language |
| Close-Up | Zoomed in on the face and shoulders — great for personal, direct-to-camera content |
| Circle | Avatar appears in a circular frame — ideal for overlaying on presentations or web content |
For portrait (9:16) format, the avatar is automatically scaled up by 25% to fill the vertical frame, so it doesn't appear small and floating.
The voice picker shows only HeyGen-compatible voices. Use the search bar, language dropdown, and gender filter to find the right voice. Click the play button on any voice card to preview it.
Add emotional tone to the AI voice. Not all voices support all emotions — if unsupported, the voice will fall back to its natural tone.
| Emotion | Best For |
|---|---|
| Natural | Default neutral delivery — works for everything |
| Excited | Product launches, announcements, high-energy content |
| Friendly | Onboarding, tutorials, customer-facing content |
| Serious | News, reports, corporate announcements |
| Soothing | Meditation, wellness, ASMR-style content |
| Broadcaster | News anchor style — polished and authoritative |
Add a title or subtitle directly burned into the video. Type your text in the overlay field, then configure:
| Setting | Options |
|---|---|
| Position | Top, Center, or Bottom of the frame |
| Size | Small (20pt), Medium (28pt), Large (40pt), XL (56pt) |
| Bold | Toggle bold weight on/off |
Check the Captions box to automatically burn subtitles into the video. HeyGen generates captions from your script and synchronizes them with the audio. This is rendered server-side — no additional processing needed.
Check Remove Photo BG to strip the original background from the talking photo or avatar. This is useful when combining with a custom background — it prevents the avatar's original background from showing through.
Check Test Mode to generate a free, low-resolution preview. This does NOT consume credits. Use it to test avatar + voice combinations, check background placement, or preview text overlays before committing to a full render.
The Smart Video tab is a fully automated pipeline that takes a topic, URL, or raw text and produces a complete video with multiple scenes, AI-generated visuals, voiceover narration, and background music.
Choose a directive style (Explainer, Viral, Story, Product, Listicle, News), set the number of scenes, write your prompt or paste a URL, and hit generate. The alien spinner pipeline will show real-time progress through each stage.
The Voice tab gives you access to 2,000+ voices from HeyGen, ElevenLabs, and Azure. Type your text, select a voice, and generate an audio file. Voices can be filtered by language, gender, and provider.
| Format | Resolution | Best For |
|---|---|---|
| Landscape (16:9) | 1280 × 720 | YouTube, presentations, websites |
| Portrait (9:16) | 720 × 1280 | TikTok, Instagram Reels, YouTube Shorts |
| Square (1:1) | 1080 × 1080 | Instagram posts, LinkedIn, social ads |
| Action | Cost |
|---|---|
| Image generation | 5 credits |
| Video generation | 10 credits |
| Avatar video | 15 credits |
| Smart video pipeline | 25 credits |
| Text-to-speech | 3 credits |
| Test mode avatar | Free (0 credits) |
Credits are only deducted when the generation succeeds. If a job fails, your credits are returned.
Combine features for maximum impact: Use a Pexels "office" background + Close-Up avatar style + Friendly emotion + Captions for a professional talking-head video that looks like it was produced in a studio.
Test before you commit: Always use Test Mode first to verify your avatar, voice, and background look good together. Test Mode is free and renders in seconds.
Green screen for compositing: Use the Green Screen color preset if you plan to key out the background in a video editor later. Combine with Remove Photo BG (matting) for a clean key.
Portrait for social media: Choose 9:16 Portrait format for TikTok, Reels, and Shorts. The avatar auto-scales to fill the vertical frame.
Hover to preview: Hover over any avatar or talking photo thumbnail to see a large preview. Avatars show a video animation preview from their Bunny CDN cache.