How we built Saramsha: speech-to-text and LLMs in clinical workflows \ Kalpas

Discharge documentation is one of the most overlooked bottlenecks in hospital operations. A doctor finishes a 12-hour shift, and before they can go home, they need to write a structured discharge summary for every patient they discharged that day. That’s 30–40 minutes per patient, done while tired, under pressure, with the next admission already waiting.

When we started building Saramsha, the problem statement was clear. The technical path was not.

The core loop

The product workflow is deceptively simple: doctor speaks, AI generates a structured summary, doctor reviews and approves. But each step in that loop has real complexity.

Voice input isn’t just recording audio. It’s handling variable speaking pace, medical terminology, background noise in busy wards, and doctors who don’t follow a script. We had to build a transcription pipeline that could handle all of this without requiring doctors to change how they speak.

Structured generation is harder than open-ended generation. A discharge summary has mandatory fields. Each field has clinical significance. “Diagnosis” isn’t just a label — it drives insurance claims, readmission tracking, and downstream care. Getting an LLM to reliably populate structured fields from unstructured voice input required significant prompt engineering and validation layers.

Review UX turned out to be as important as the AI. Doctors won’t use a tool they don’t trust. The review interface needed to show exactly what the AI generated, highlight anything that needs attention, and make editing fast. We spent more time on this than on the generation pipeline.

What we learned

The biggest surprise was how much the role-based workflow mattered. Nurses often have more context than doctors realize — they’re at the bedside, they document observations, they know the patient’s daily progression. Building a workflow where nurses can draft and doctors approve (rather than doctors doing everything from scratch) unlocked real efficiency gains.

The second surprise was document uploads. Doctors and nurses have lab reports, prescriptions, and handwritten notes that are critical context for a discharge summary. Adding document ingestion alongside voice input significantly improved summary quality.

What’s next

We’re working on EHR integration so Saramsha can pull patient history directly rather than requiring manual uploads. Multi-language transcription is close behind — a significant portion of clinical narration in Indian hospitals happens in regional languages mixed with English.

If you’re building in healthcare and want to talk about what we’ve learned, reach out.