> [!abstract] Summary > **Deadline Log Analyser** — a Python tool by Thomas Spony that inspects Deadline render jobs and flags risky ones (timeouts, RAM issues, oversized USD scenes, un-mipmapped textures, wrong server output). Read-only — diagnoses, never modifies. Use cases: **daily triage** and **single-job check**. Output: HTML / CSV / Markdown reports grouped by Group → User → Job with severity badges. > [!info] Author & contact > **Thomas Spony.** Read-only tool — looks at render reports and the Deadline job database; never changes a job, render, or anything on the servers. --- # Build the .exe (one-time) ```bash # Install build tools (pip, not npm) pip install usd-core pyinstaller # Build the .exe pyinstaller --onefile --windowed --name DeadlineCleaner --collect-all pxr deadline_cleaner.py ``` > [!video] Explanation video > [Open in Google Drive](https://drive.google.com/file/d/1CZJn9puQDk0ywYm9UgEDb7l4ql433b6e/view?usp=sharing) > [!example] Source .py > [Open in Google Drive](https://drive.google.com/file/d/1yST-Ezu1ctN5EEA77ogf-7y-7tNXSoMe/view?usp=sharing) --- # What it flags - [x] MIPMAP flagging. - [x] AOV excess. - [x] Memory pressure. - [ ] USD dependencies. - [ ] Non-indexed attribute `shop_materialpath` string. - [ ] Invalid mesh. - [ ] Bad primvar sample size. - [ ] Displacement shader doesn't modify P. - [ ] Singular matrix detected. *(Find online answers for the unchecked items.)* ## Display improvements - [ ] Organise errors per category. --- # 1. What it's for Student renders sometimes bring the farm — and the `DATA_PFE` server — to its knees. Deadline Cleaner reads logs Deadline already produced and flags the jobs most likely responsible so you can fix them before they crash the farm again. Two typical uses: - **Daily triage** — *"show me everything that failed on the farm this week."* - **Single-job check** — *"why did *my* job fail / why was it so slow?"* # 2. Before you start - **Windows** with the **Deadline Client** installed (you already have it if you submit renders). The tool auto-discovers `deadlinecommand.exe`. - For **USD inspection** (texture + scene dependency analysis), the packaged `.exe` already bundles the USD libraries. # 3. Launching the tool - **Double-click `DeadlineCleaner.exe`** — the window opens immediately. - If using the Python source: open a terminal in its folder, run `python deadline_cleaner.py`. A **green** "*Deadline Client found*" line at the top means the tool can talk to the farm. **Red** = Deadline Client missing — ask a pipeline TD. # 4. The window — control by control | Control | What it does | |---|---| | **Look back** | How far back to search: `24h`, `7d`, `30d`, `all`. Ignored if Job IDs are filled. | | **Max jobs to scan** | Slider — `5, 10, 20, 50, 100, …, All`. Newest first. Start small (20–50). | | **Job IDs** | Optional. Paste 24-char job IDs (one per line) to analyse exactly those jobs, skipping the time search. See §7. | | **Groups** | Tick/untick the student groups (`HYT, JRN, VRT, CNB, PMF, COR, GTE, OTHER`). | | **Deadline job status** | Which statuses to pull: `Failed`, `Active`, `Completed`, `Suspended`, `Pending`. Defaults to **Failed + Active**. | | **Inspect USD scenes** | Optional, slower. Opens each USD scene to count dependencies and check textures. | | **Output folder** | Default `report`. | | **Analyze** | Runs the analysis. Progress in the box below; the report opens in your browser when done. | # 5. Running your first analysis 1. Leave **Look back** at `7d`. 2. Set **Max jobs** to `20` (quick). 3. Leave **Groups** all ticked, **status** at Failed + Active. 4. Click **Analyze**. 5. After a few seconds the HTML report opens in your browser. # 6. Reading the report Organised **Group → User → Job**. ### Severity | Badge | Meaning | |---|---| | 🔴 **critical** | Almost certainly hurting the farm — fix first. | | 🟠 **high** | A real problem; fix soon. | | 🟡 **medium** | Worth correcting; not an emergency. | | 🔵 **low** | Minor / informational. | | ⚪ **info** | Nothing wrong detected. | Groups and users with **critical** or **high** items expand automatically; quieter ones stay collapsed. Top chips jump you to a group. ### A job card Shows job name, frame, elapsed time, peak RAM, AOV count, texture summary, USD dependency size, output path, and a list of **flags**. Two states are shown on purpose: - **task: …** — how one frame ended (success / timeout / failed / …). - **job: …** — overall Deadline job status. A job marked **Failed** can still have many frames that rendered fine — only the bad frame failed. # 7. Checking one specific job Spotted a red job in **Deadline Monitor**? 1. Right-click → **Copy Job ID**. 2. Paste into the tool's **Job IDs** box (multiple supported, one per line). 3. Click **Analyze**. Fast — fetches only those jobs and ignores Look-back / status filters. # 8. What the flags mean | Flag | Severity | What it means | |---|---|---| | `timeout` | critical | Deadline killed the task — ran too long. | | `stuck_progress` | critical | <5 % rendered after 30 min — render not converging. | | `output_to_data_pfe` | critical | Render is writing EXRs to `V:` (DATA_PFE). Must use `W:` (DATA_FRM). Directly loads the server that crashes. | | `ram_saturation` | critical | Peak RAM came within 1 GB of the worker's total — machine was about to swap/crash. | | `long_frame` | high | A single frame took over 1 h 30 — the ceiling. | | `memory_grower` | high | Karma's memory kept climbing during the render (leak / runaway scene). | | `usd_load_failed` | high | Husk could not open the USD scene — broken or missing reference. | | `non_zero_exit` | high | Renderer exited with an error, no more specific cause found. | | `mipmap_missing` | high / medium | Textures not mipmapped (`UNTILED`). Convert to `.rat` (Karma) / `.tex` (RenderMan). | | `aov_excess` | medium | More than 20 AOVs — usually more than the shot needs. | | `wont_fit_32gb` | medium | Peak RAM ≥ 31 GB — can't run on a 32 GB machine. | | `env_config_error` | medium | Worker had no `HUSK_PATH` — render never started. | | `usd_dep_heavy` | medium | 200+ USD layers — very heavy scene graph. | | `wont_fit_16gb` | low | Peak RAM ≥ 15 GB — can't run on a 16 GB machine. | | `mipmap_offspec` | low | Tiled, but not renderer-native `.rat` / `.tex` format. | | `render_warnings` | low | Non-fatal warnings (missing primvars, light/displacement notes, etc.). | ### Quick fixes - **`output_to_data_pfe`** → repoint render output to `W:/`. - **`mipmap_missing` / `mipmap_offspec`** → convert textures to `.rat` (Karma) or `.tex` (RenderMan). - **`aov_excess`** → delete render vars the shot doesn't use. - **`long_frame` / `timeout` / `memory_grower`** → lighten the scene: lower samples, reduce volume detail, check for runaway geometry/instances. - **`wont_fit_16gb` / `wont_fit_32gb`** → reduce memory or pin to bigger workers. # 9. The output files | File | Use | |---|---| | `report.html` | The main report. | | `report.csv` | Same data, sortable spreadsheet. | | `per_student.md` | Plain-text rollup grouped by group → student. | # 10. Command line (optional) ```bash # Failed + Active jobs from the last 7 days python deadline_cleaner.py --deadline --since 7d -o report # Only failed jobs, last 24h, with USD inspection python deadline_cleaner.py --deadline --job-status Failed --since 24h --inspect-usd -o report # Specific jobs by ID python deadline_cleaner.py --deadline --job-ids 6a00cd5b756fadd627ead018,69f9fceda9224a4ba9985021 -o report ``` Run with no arguments to open the GUI. # 11. Troubleshooting | Symptom | Fix | |---|---| | Red *"deadlinecommand not found"* line | Deadline Client missing — use a workstation that has it. | | *"Nothing to analyze"* | Widen Look back, tick more job statuses, or check Job IDs are valid 24-hex strings. | | USD row says `no_pxr` | USD libraries not available — use the packaged `.exe` (bundles them). | | Textures show *"no data"* for a RenderMan job | RenderMan doesn't log texture stats by default. Tick **Inspect USD scenes**. | | Lots of `unreachable` textures | Texture paths point to drives not mapped on this machine — the job itself is fine, the tool just can't measure those files from here. | # 12. What it can't tell you - It reads **reports of renders that already ran** — it can't predict an unrun job. - It diagnoses; it does **not** fix jobs or scenes. Use it to produce a report, then correct the scene in Houdini / Prism yourself. - Nuke comp jobs and Prism `cleanup` jobs are intentionally skipped — they aren't the renders that crash the farm. --- # 🔗 Related - [[../DEADLINE MOC|DEADLINE MOC]]. - [[Deadline Error List|Deadline Error List]] — manual reference. - [[Deadline For Alternatives Productions|Deadline for Alternatives Productions]]. - [[Reconfigure Deadline|Reconfigure Deadline]]. - [[../../TOOLS/Notes/Deadline Renderman Denoiser - Loris Eck|Deadline RenderMan Denoiser]]. - [[../../PROTOCOL/OPTIMISATION/OPTIMISATION MOC|OPTIMISATION MOC]] — root causes the analyser detects.