Context‑Augmented Machine Translation for CAT Tools
Echo bridges Machine Translation (MT) and Computer‑Assisted Translation (CAT) using smaller, context‑enriched LLMs. Leverage translation memories, lexicons, and recent edits to improve quality while keeping data private.
✨ Highlights
🧠 Context‑Enriched LLMs
Smaller models + rich context (TMs, termbases, recency cache) can rival larger LLMs without context.
🔁 Self‑Updating Memory
Translator corrections flow back into the system automatically—no manual TM uploads required.
🔌 CAT Tool Integration
Trados TranslationProvider plugin retrieves context and suggestions directly in the translator’s workflow.
🛡️ Privacy‑First
Local‑first and on‑prem friendly. Keep sensitive data in‑house while improving quality.
🛠️ How it works
1) Ingest
Upload TMX/SDLTM and lexicons. Content is embedded into Qdrant for retrieval.
2) Translate
REST API + Trados plugin call LLMs (OpenAI, Anthropic, etc.) with relevant context.
3) Improve
Edits are captured and cached (FIFO per session) to increase consistency and quality.
🗺️ Roadmap
Prototype
- ✅ Stage 1: End‑to‑end demo (API + plugin) and basic metrics
- ✅ Stage 2: Automatic feedback to memory
- ✅ Stage 3: Projects & auth basics
- 🚧 Stage 4: Session cache (needs online backend)
- 🎯 Stage 5: HyDE + new models + evaluations
Research
Goal: show that smaller, context‑enriched models match or exceed larger LLMs on standard MT metrics.
📊 Metrics Testbed
Our research
🎯 Who is this for?
👩💻 Translators
Faster, more consistent output within Trados; no new tools to learn.
🏢 Agencies
Quality and throughput improvements with existing TMs and workflows.
🏛️ Enterprise & Gov
On‑prem options for sensitive content; clear path to compliance.
🎓 Academia
A reproducible testbed and publishable results via shared tasks/benchmarks.