Deadline For Alternatives Productions

> [!abstract] Summary > Deadline is the render farm manager used in PROTOCOL. Artists submit jobs via **Prism**, and render wranglers monitor the farm. Always test locally first (FML + 1/4 rez), always use **network paths**, never exceed **priority 50**, and check the Monitor regularly. Logs are your most important debugging tool. --- # What is Deadline? Deadline is a **render farm management and job scheduling system** — the central nervous system of the studio's computing infrastructure. Instead of rendering on a single workstation (which could take days), Deadline distributes work across multiple machines, dramatically accelerating the workflow. ## Is Deadline Running on Your Machine? - Press **Ctrl + Shift + Esc** and check for a process called **"Deadline Worker"** running at high CPU while nothing else is open. - ![[image-129.png]] - You can also look for the Deadline icon in the system tray — if visible, Deadline is active. - ![[image-130.png]] --- # Core Concepts | Concept | Definition | Key Point | | ---------- | ------------------------------- | --------------------------------- | | **Job** | Complete submission to Deadline | 1 job = 1 entire submission | | **Task** | Chunk of frames within a job | 1 task = N frames (configurable) | | **Frame** | Single image in sequence | Smallest unit, 1 frame = 1 image | | **Worker** | Render machine on farm | Executes tasks, renders frames | | **Pool** | Priority queue for jobs | Controls scheduling order | | **Group** | Worker categorization | Controls which workers can render | | **Pulse** | Farm maintenance service | Keeps farm healthy and optimized | ### Jobs, Tasks, and Frames in Depth A **Job** is created each time you submit work from Houdini. Its frame range is split into **Tasks** (chunks) distributed to individual workers. **Frames Per Task** controls chunk size: | Frames Per Task | Tasks for 100 Frames | Parallelization | | --------------- | -------------------- | --------------- | | 1 | 100 tasks | Maximum | | 5 | 20 tasks | Balanced | | 10 | 10 tasks | Lower overhead | | 100 | 1 task | None | > [!tip] Choosing Frames Per Task > - **Fast renders** → 5 frames per task > - **Heavy/memory-intensive renders** → 1–2 frames per task > > If a task is cancelled, **all frames in that task must re-render**, even completed ones. Higher chunk sizes also risk running out of memory since the scene stays loaded between frames. --- # Submitting a Render ## With Prism (Standard Workflow) 1. Replace the USD Render ROP with a **PRISM LOP RENDER** node — give it a meaningful name for identifier creation. 2. Check **Submit Job**. 3. In Save to Disk, check **USD files from disk generated from node input**. - Check **Write Stage to disk before render (-submission)** - Uncheck **Flush data after each frame** - ![[image-134.png]] 4. Click **Execute**. ![[image-135.png]] 5. Configure submission settings: | Setting | Guidance | | ---------------- | ------------------------------------------------------------------------------------------------- | | **Priority** | See priority table below — never exceed 50 | | **Frames/Task** | 1–5 for heavy renders, 5–10 for light ones | | **Task Timeout** | ~100 min per frame as baseline — increase slightly above your FML render time for complex shots | | **Pool** | Target your assigned pool | | **Machine Limit**| Divide your group's allocated machines by the number of active jobs | | **Submit Suspended** | **Uncheck** — submit active unless your render wrangler says otherwise | 6. Double-check in the **Deadline Monitor** that your render has been spooled. --- # Pre-Render Checklist > [!danger] Never skip this before submitting to the farm 1. **Test locally first** — never submit a scene that hasn't rendered on your machine. 2. **Do an FML render** (First, Middle, Last frame — full resolution). This also tells you how much RAM your shot needs. - ==If you need more than 32 GB of RAM, don't send it to a 32 GB machine — it will fail or be very slow.== 3. **Do a 1/4 resolution full frame range** render. 4. Only then submit the **final full frame range**. 5. **Use network paths only** — assets on your local machine won't exist on render nodes. - Always use: `\\server\projects\...` (mapped paths should already be configured) 6. **Don't monopolize the farm** — priority max is 50, use machine limits. --- # Priority Reference | Priority | Use Case | Example | | -------- | ------------------------ | ------------------------------------ | | 10 | Debugging | Testing a broken render | | 20 | Low priority experiments | Look-dev tests, previs, turntable | | 40 | Nuke render | Compositing jobs | | **50** | **Normal work** | **Regular render passes** | > [!warning] Priority cap > ==Never submit above priority 50.== Higher priority locks the farm for everyone else. --- # The Deadline Monitor ## Interface Panels 1. **Jobs Panel** — all submitted jobs, their status and progress 2. **Tasks Panel** — frame breakdown for the selected job 3. **Workers Panel** — status of all render nodes - ![[image-131.png]] 4. **Reporting** — logs, error reports, render times - ![[image-132.png]] ## Job Color Indicators | Color | Meaning | | --------------- | ------------------------- | | Green | Rendering, progressing | | Yellow | Waiting in queue | | Orange / Brown | Job has errors | | Red | Job failed | | Blue | Job completed | ## Stopping or Pausing a Job 1. Open the Deadline Monitor. ![|1200x29](https://lh7-qw.googleusercontent.com/docsz/AD_4nXc10ul6RpUHoiFt2xO1PYfGwVzgSBFvMNqEvrDT6C-kESUKeGYNoy4u8KuY6-YwRFQBfc3wEUunNj9EBkPaFwJSsdbHvr4os5a4rdUFciERcDaK5L62HXgm7ScniiX4eTXZ1QVIWA?key=9su5uaLo5Sy19Hi9QMwVRw) 2. Search for your machine name (or another artist's). ![](https://lh7-qw.googleusercontent.com/docsz/AD_4nXczlu-OwcygdtYLBYQfGFXR2B73MuQP7JA32xOGbkzHzWEcxP61_-X4nhuwaacFeuEMiU7SLVMCwnTURzbG80nMUGAPv2r2CZLhV6V8APbgcBWAKMoeJt3fFhrkMGpbYq40hWk0?key=9su5uaLo5Sy19Hi9QMwVRw) 3. Right-click the machine → **Disable Worker** → **Kill Worker** and **Kill Worker If Necessary**. The machine is now in disabled mode. ![](https://lh7-qw.googleusercontent.com/docsz/AD_4nXfzVl3tWeN1x69n73j99PcLCcXPlRy2brFyG6kHzQLNwu3MGpJZJ3M0a8TI2plMCrSfko6gpNz4vFGVFdgMPOOxLtfY8i4NFtBV8p82iXfsQMGILx_Dgfbi2DoV8L1aGK5FvmX7ow?key=9su5uaLo5Sy19Hi9QMwVRw) You can also **right-click a job → Pause Job** to suspend it temporarily. --- # Reading Logs > [!tip] Logs are your most important debugging tool — always check them first when something goes wrong. ## Task Log Click a job → double-click a task → select the task to view its **LOG**. Logs contain render progression, input/output paths, and all errors. ![[image-136.png]] To diagnose errors, copy-paste the log into an AI assistant (Claude, ChatGPT, Gemini, etc.) for a quick explanation. ## Worker Log Double-click a worker to see its logs — useful for spotting machine-level issues, repeated failures, or hardware problems. ![[image-137.png]] --- # Worker Management ## Worker Status Reference | Status | Meaning | Action | | ----------- | ---------------------------- | ------------------------------------------------------------- | | **Offline** | Not currently available | Pulse-start or reboot the machine manually | | **Stalled** | Hasn't updated in 5+ minutes | System may auto-restart — otherwise Pulse-start or reboot | | **Disabled**| Manually disabled | System may auto-restart — otherwise Pulse-start or reboot | | **Idle** | Online, waiting for tasks | No action needed | | **Rendering**| Processing tasks | Let it work | At a scheduled time, all machines are put on Deadline — verify this in the Monitor. If a machine isn't responding: ### Enable a Disabled Worker Right-click on worker → **Enable Worker** ![[Screenshot 2025-12-09 134515.png]] ### Restart a Worker Remotely Right-click on worker → **Remote Control** → **Worker Command** → **Restart Worker** ![[Screenshot 2025-12-09 134609.png]] --- # Pool & Group Management ## Pools — Job Scheduling **Pools** are priority queues that control which jobs render first. Workers monitor assigned pools and prefer jobs from pools listed earlier in their assignment order. **When submitting:** - **Primary Pool** = your project's assigned pool - **Secondary Pool** = RAM tier (64 GB, 32 GB, or 16 GB) — match this to your shot's RAM requirements ### Managing Pools In the Monitor (requires Super User): **Tools → Manage Pools** 1. Click **New** → enter pool name 2. Select pool → select workers → click **Add** 3. Use **Promote/Demote** to adjust priority order > [!note] Pool Configuration > **WIP** — pool structure for Alternatives Productions to be defined. ## Groups — Worker Categorization **Groups** categorize workers by hardware/software capability. Unlike pools, group order does **not** affect scheduling. | Group | Purpose | Workers | | -------------- | ------------------ | ------------------------------- | | gpu_available | Has GPU cards | RTX-equipped machines | | houdini_farm | Houdini installed | All render nodes with Houdini | | high_memory | 64 GB+ RAM | Specialized simulation machines | | fast_ssd | NVMe storage | Fast I/O machines | | interactive | For real-time work | Studio playback machines | --- # Common Mistakes to Avoid > [!bug] Don't do these - **Forgetting dependencies** — submit upstream jobs first. Let Deadline handle execution order. ==Never manually re-render dependent passes, and never delete the cleanup job that removes USD cache files.== - **Infinite retries on broken jobs** — disable auto-retry for debugging. Set limited retries (2–3) for production. Fix and resubmit manually. - **Not monitoring render progress** — check the Monitor regularly. ==Don't submit and forget.== Address errors immediately by contacting your render wrangler. - **Local file paths** — assets on your local machine don't exist on render nodes. Always use network paths. - **Monopolizing the farm** — if you see unused machines in another group's pool, communicate in the deadline channel before taking them. --- # For Render Wranglers Only > [!note] This section is for render wranglers managing the farm. **Responsibilities:** - Launch renders on Deadline (artists submit suspended — you activate them) - Manage and resolve Deadline errors - Communicate farm usage and worker repartition across groups - Educate artists about render mistakes and optimization **Key Deadline Components:** | Component | Role | | -------------------- | ----------------------------------------------------------------------- | | **Deadline Client** | Software on artist workstations and render nodes, talks to Repository | | **Deadline Launcher**| Background service running on each machine | | **Deadline Worker** | Physical or virtual machine that executes render tasks | | **Deadline Monitor** | UI for tracking all farm activity | | **Pulse** | Maintenance service — auto-restarts stalled workers | **Farm communication — escalate to the deadline channel when:** - No one is using the farm and you want to claim extra machines beyond your quota - You see a group taking too many machines - You encounter unknown errors - You see too many stalled workers and need to coordinate reboots