Table of Contents
I see people on social media complaining all the time: “This AI-generated stuff is totally unusable.” Today, I’m sharing my personal workflow so you can stop playing the “prompt lottery.”
Most people have a huge misconception about AI video. They think it’s a “vending machine”: Insert Coin (Prompt), Get Product (Finished Movie). But actually, current AI video tools are more like “extremely powerful but highly uncontrollable footage generators.”
If you want to make a short film that’s actually watchable and has narrative logic, you need a strict workflow, not just dumb luck. After months of high-intensity testing, I’ve summarized an SOP that takes you from idea to final cut.
Note: This is the SOP I personally use, it doesn’t represent everyone or every industry.
Creativity & Storyboarding: Treat the LLM as Your Producer
A lot of people open the video generator and start typing prompts immediately. Big mistake. Before generating a single video, you have to solve the logic problems first.
I’m not an expert on film montage or narrative pacing, but I still want to create creative films or shorts. This is where we use LLMs. I use the cutting-edge GPT-5 or Gemini 3 to act as my “Screenwriter” and “Storyboard Artist.”
How to do it?
1. Tell it the characteristics of the platform you want to publish on.
2. Chat with the LLM about the genre and story outline.
3. Generate storyboard details.
Don’t just ask it to write a story, make it write a Shot List. You need to keep chatting with the LLM until you are satisfied. The value here is getting a fundamental “construction blueprint”, the more detailed the description the LLM gives you, the easier your job will be later.
Visual Tone: Storyboard Images Are Your Anchors
This step is exactly what separates the “rookies” from the “pros.” Rookies go straight to Text to Video AI. Pros start with Text to Image, and then rely on convert the picture into a video.
Current video models are still weak at maintaining character and scene consistency. If you just use text to generate video, your characters and scenes will morph wildly between shots.
How to do it?
1. After getting your storyboard info, ask the LLM to generate image prompts for you.
2. Use different models to test the look—like Nano Banana, GPT Image, Flux, Seedream, Midjourney, etc.—and pick the style you like best.
3. Feed that preferred style back to the LLM and ask it to optimize all the other prompts based on that visual information.
4. Generate images for every single shot and character.
Supplement: If you need to blend images, try generating different variations and then using Nano Banana Pro for fusion. You’ll be pleasantly surprised!
This is like the “makeup test shot” and “conceptual design drawing” before shooting a film. Only when the drawing is right can the video possibly be right.
Cinematography: Action!
Since the visual content is already locked in by your reference image using Image to Video AI, your text prompts in this phase should focus purely on movement.
Wrong Prompt: “A man standing in the rain, wearing a trench coat, neon background…”
Right Prompt: “Slow zoom in, raindrops splashing on shoulder, man’s eyes shift slightly to the left, hair gently blowing in the wind.”
Little Tricks:
Micro-movements are more realistic: Don’t force AI to do complex parkour or fight scenes (they fall apart easily). Ask for blinking, breathing, shifting light and shadow, or swirling smoke. These “atmospheric movements” have a super high success rate.
Camera movement is key: Be specific—pan, tilt, zoom, or tracking shot. Good camera work can hide content flaws.
How to do it?
- Let the LLM help you generate the video prompts (motion descriptions).
- Use different models to attempt the video generation. My top recommendations are Google Veo 3.1 or 3, Sora 2, and Kling v2.6.
- Based on the results, chat with the LLM to optimize details and regenerate shots.
Supplement: Why I recommend these specific models:
- They are the most advanced models right now.
- Their prompt comprehension feels significantly stronger than others.
- The generation quality is noticeably better.
- These models generate built-in audio, saving you from doing foley work (if characters are talking, you might need dubbing—in that case, just turn off audio generation).
Insert video: https://www.youtube.com/watch?v=nHuSzKZv1FQ
Insert video: https://www.youtube.com/watch?v=-uS0hEF9CSg
Insert video: https://www.youtube.com/watch?v=P3OIRADU8Ug
Video Editing: Injecting Soul
Once you have a pile of AI-generated clips, what you have is just “footage.” Your job is to stitch them together and give them a “soul.”
Supplement:
1. Find music that fits your theme or use AI to ge\nerate the score.
2. If you need characters to speak more in certain scenes, use a Lip-Sync tool.
3. Drag everything into your editor (like CapCut), merge the clips, and add filters, transitions, and color grading to give it that cinematic blockbuster feel.
Insert video: Emerald Siege: The Order of the Deep
The Final Cut: Humans Are Still the Core
This workflow sounds way more annoying than “one-click generation,” right?
Exactly! Current AI video tools are essentially efficiency tools, not replacement tools. They save us the cost of building sets, renting cameras, and hiring actors, but they cannot save you the cost of aesthetic judgment and narrative logic.
If you want to make shorts or movies, don’t wait for the “perfect model” to appear. The current toolchain from the brain of the LLM to the final render of the AI Video Generator is already powerful enough for you to tell a visually stunning story right from your bedroom.