Building a Blog Post Generator: A Desktop AI Writing Tool in Python

You need a fast, privacy‑first way to crank out drafts without sending data to a third‑party API. This post shows you how to wire Ollama, the local AI runtime, to a Python script that drives the qwen3.5:35b model and produces a complete blog post on demand. The tool is aimed at developers and technical writers who want full control over the generation pipeline and can run a 35‑billion‑parameter model on a decent desktop GPU (or a multi‑GPU server).


1. What You’re Building

  • A command‑line Python script that accepts a topic and optional constraints (tone, length, SEO keywords).
  • The script talks to a local Ollama server that hosts qwen3.5:35b, a 35‑B parameter transformer fine‑tuned for instruction following.
  • Output is a Markdown‑formatted draft ready to paste into a static site generator or CMS.

Why bother? No API key, no latency spikes, no data leaving your machine. If you already run Ollama for other local models, this extends the same stack to automated content creation.


2. Prerequisites

Requirement Version / Notes
Hardware ≥16 GB VRAM (e.g., RTX 3090, A100 40 GB) or a multi‑GPU setup.
OS Linux (Ubuntu 22.04+), macOS (Apple Silicon), or Windows 11 with WSL2.
Ollama 0.1+ – binary installation from https://ollama.ai
Model qwen3.5:35b – pull with ollama pull qwen3.5:35b
Python 3.10+
Python packages ollama (official client), click (CLI), jinja2 (templating)
Optional torch (if you want to verify GPU availability)

Make sure the Ollama service is running (ollama serve) and the model is loaded (ollama run qwen3.5:35b). You can test with a simple prompt:

ollama run qwen3.5:35b "Write a haiku about a robot."

If you see output, the stack is ready.


3. Step‑by‑Step

3.1 Install Python Dependencies

pip install ollama click jinja2

ollama gives you a thin wrapper around the HTTP API. click builds a clean CLI. jinja2 lets you templatize the final Markdown.

3.2 Project Layout

blog_generator/
│
├─ generate.py      # Main script
├─ prompts/
│   ├─ default.j2   # Base prompt template
│   └─ seo.j2       # Optional SEO‑focused template
└─ output/          # Generated posts land here

Create the directories and files as shown.

3.3 Prompt Template (Jinja2)

prompts/default.j2:

You are a technical writer for the Sol AI blog. 
Write a  blog post about ****. 
Tone: .

Structure:
1. Hook – a one‑sentence hook that sparks curiosity.
2. Problem statement – why this matters.
3. Solution overview – how the technology works.
4. Implementation steps – numbered, reproducible.
5. Result – what success looks like.
6. Next steps & pitfalls – actionable advice.
7. Conclusion – why the broader topic matters.

Output in Markdown. Do not include any meta‑commentary.

You can create seo.j2 that also asks for a meta description and slug.

3.4 Core Generation Logic (generate.py)

#!/usr/bin/env python3
import click
import ollama
from jinja2 import Environment, FileSystemLoader
from pathlib import Path
import re

# Setup Jinja2 environment
PROMPT_DIR = Path(__file__).parent / "prompts"
env = Environment(loader=FileSystemLoader(PROMPT_DIR))

MODEL = "qwen3.5:35b"
MAX_TOKENS = 2048   # adjust based on GPU memory

def build_prompt(topic: str,
                 length: str = "concise",
                 tone: str = "technical and precise",
                 keywords: list = None,
                 template: str = "default.j2") -> str:
    tmpl = env.get_template(template)
    return tmpl.render(
        topic=topic,
        length=length,
        tone=tone,
        keywords=keywords or []
    )

def generate(prompt: str, stream: bool = False) -> str:
    """Call Ollama with qwen3.5:35b and return generated text."""
    response = ollama.generate(
        model=MODEL,
        prompt=prompt,
        options={
            "num_predict": MAX_TOKENS,
            "temperature": 0.7,
            "top_p": 0.95,
        },
        stream=stream,
    )
    # When stream=False, response is a dict with key 'response'
    return response["response"] if isinstance(response, dict) else response

def sanitize_filename(title: str) -> str:
    """Create a safe filename slug from the first line (the hook)."""
    # Take the first non‑empty line, strip markdown bold, lowercase, replace spaces
    line = title.strip().lstrip("#").strip()
    slug = re.sub(r"[^a-z0-9]+", "-", line.lower()).strip("-")
    return slug

@click.command()
@click.option("--topic", required=True, help="Blog post topic.")
@click.option("--length", default="concise", type=click.Choice(["concise", "medium", "detailed"]))
@click.option("--tone", default="technical and precise")
@click.option("--keywords", default=None, multiple=True, help="SEO keywords (repeat for multiple).")
@click.option("--template", default="default.j2", help="Jinja2 template name in prompts/.")
@click.option("--output-dir", default="output", type=click.Path(file_okay=False, dir_okay=True))
def main(topic, length, tone, keywords, template, output_dir):
    # Render prompt
    prompt = build_prompt(topic, length, tone, list(keywords), template)
    click.echo(f"[+] Prompt length: {len(prompt)} chars")

    # Generate
    click.echo("[*] Generating with qwen3.5:35b …")
    raw = generate(prompt)

    # Basic cleanup – remove any stray leading/trailing whitespace
    draft = raw.strip()

    # Save to file
    out_path = Path(output_dir)
    out_path.mkdir(parents=True, exist_ok=True)

    # Use the first heading as filename
    filename = sanitize_filename(draft) + ".md"
    file_path = out_path / filename
    file_path.write_text(draft, encoding="utf-8")

    click.echo(f"[✓] Draft saved to {file_path}")
    click.echo("\n--- First 200 characters of draft ---")
    click.echo(draft[:200])

if __name__ == "__main__":
    main()

Key points in the code

  • ollama.generate talks directly to the local server; no external API calls.
  • MAX_TOKENS caps the output to stay within GPU memory (adjust to 3072 or 4096 if you have headroom).
  • temperature and top_p give a balance between creativity and coherence.
  • sanitize_filename extracts the first heading, ensuring the file name is clean and readable.

3.5 Running the Generator

python generate.py \
  --topic "Local AI for Real‑Time Code Review" \
  --length medium \
  --tone "direct and authoritative" \
  --keywords "code review, local LLM, privacy" \
  --template default.j2

The script will:

  1. Print prompt stats.
  2. Stream status messages (or wait if stream=False).
  3. Write a file named after the first heading, e.g., local-ai-for-real-time-code-review.md.

3.6 (Optional) Web UI with Flask

If you prefer a lightweight web interface, add a Flask wrapper:

# web_ui.py
from flask import Flask, request, jsonify
import generate as gen

app = Flask(__name__)

@app.route("/generate", methods=["POST"])
def generate_post():
    data = request.json
    prompt = gen.build_prompt(
        topic=data["topic"],
        length=data.get("length", "concise"),
        tone=data.get("tone", "technical and precise"),
        keywords=data.get("keywords", []),
    )
    raw = gen.generate(prompt)
    return jsonify({"draft": raw.strip()})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Start with python web_ui.py and POST JSON to http://localhost:5000/generate. This is handy for integrating with desktop automation tools like AutoHotkey or Shortcuts.


4. What Success Looks Like

Running the command above yields a Markdown file that opens like:

# Hook: Stop Paying for Cloud APIs—Run a 35B Model on Your Desktop

## Problem Statement
...

The document follows the requested structure, contains the SEO keywords, and uses a tone that matches the prompt. The file is ready to be committed to a Jekyll, Hugo, or Docusaurus site without further formatting.

Performance on an RTX 3090 (24 GB) is roughly 30–45 tokens per second for a 2 k token generation. A medium‑length post (≈ 1 200 words) completes in under a minute, making the workflow interactive.


5. Next Steps & Common Pitfalls

Pitfall Fix
GPU OOM (out‑of‑memory) when loading qwen3.5:35b Reduce MAX_TOKENS or use a smaller model (qwen3.5:14b). Ensure only one instance of the model runs.
Slow generation Lower temperature to 0.5, set top_p to 0.9, or switch to a model with fewer parameters for drafts.
Prompt leakage – model includes internal commentary Append Do not include any meta‑commentary. at the end of the prompt; post‑process with a regex to strip lines starting with Note: or //.
Inconsistent heading levels Enforce a heading hierarchy in the Jinja2 template (e.g., # for title, ## for sections).
File naming collisions Add a timestamp or UUID to the filename if the same hook appears across runs.

Going further

  • Scheduling: Use cron or a systemd timer to auto‑generate weekly round‑up posts from a list of topics.
  • Multi‑model pipeline: Combine the draft with a second pass that runs llama3:8b for grammar polishing.
  • CMS integration: Hook the script into a headless CMS (e.g., Contentful) using their REST API, so drafts land directly in the editorial queue.

6. Why This Matters – The Broader Topic

Local AI is shifting from a niche experiment to a viable production layer. With tools like Ollama you get:

  • Privacy – prompts and generated content never leave your machine.
  • Cost control – no per‑token billing; you pay only for electricity and hardware.
  • Customization – you can swap in domain‑specific models (code, legal, medical) without network latency.

The qwen3.5:35b model exemplifies the current frontier: large enough to capture nuanced reasoning yet small enough to run on a single high‑end GPU. As quantization techniques improve (e.g., 4‑bit GPTQ, AWQ), even 35‑B models will become portable to laptops, opening the door to true “AI‑as‑a‑desktop‑app” experiences.

By building a blog‑post generator today, you’re laying the groundwork for richer automation: content pipelines, interactive notebooks, real‑time code assistance, and personalized learning assistants—all hosted locally, all under your control.


Happy generating – the first draft is already waiting in output/.