Full StackEdTech2025

Worksheet Translation Platform

US Professor
The Problem

Existing translation tools strip all formatting from documents, leaving ESL students with inferior, text-only versions of worksheets.

Project Overview

Translate Landing Page - 'Translate Worksheets, Preserve Learning'
Hero: "Translate Worksheets, Preserve Learning" landing page

This multi-tenant SaaS platform allows teachers to upload a PDF worksheet and receive a fully translated, layout-preserved version in minutes - no reformatting required. When teachers use tools like Google Translate, they get back plain text. Every table, alignment, fill-in-the-blank, and image is lost. Reconstructing a worksheet manually can take hours. This platform solves that by preserving the original formatting across five supported languages: Spanish, French, Japanese, Simplified Chinese, and Italian.


Each school district gets its own branded subdomain with a custom logo, colors, and role-based access for Teachers, Org Admins, and a Super Admin.

The Challenge

The core problem was that existing translation tools strip all formatting from documents. ESL students end up with inferior, text-only versions of worksheets because teachers can't afford to spend hours reconstructing layouts after translation. The platform needed to handle async document processing reliably - includiisg retries on failure - while keeping the UI fully non-blocking. It also had to enforce strict data isolation between school districts in a multi-tenant setup.

My Role and Contributions

I designed and built the entire frontend and backend of the platform. A colleague handled the document processing pipeline (Apache Airflow). My work covered the multi-tenant architecture with strict data isolation, an async translation worker with checkpoint resumption, AI-powered translation quality scoring, teacher and admin dashboards, and the email notification system.

How It Works

The flow is entirely asynchronous. A teacher uploads a PDF, which gets stored in S3. The backend queues a translation job via BullMQ. A worker picks it up and triggers Apache Airflow for document reconstruction. Once done, the translated PDF goes through a Gemini AI quality check, and the teacher receives an email with a download link. The UI never blocks during this process.

Teacher Dashboard - Simplified Upload and Management
Teacher Dashboard: Simplified document upload and status tracking

Checkpoint-Based Worker

Document processing can take several minutes and can fail mid-way. The worker saves progress at each stage - text extracted, translation triggered, quality check complete - and resumes from the last checkpoint on retry, preventing wasted API calls and ensuring reliability.

Gemini Vision Quality Scoring

Instead of comparing raw text strings, the system sends both the original and translated PDFs directly to Gemini 2.0 Flash and asks it to visually compare them. It checks placeholder preservation, number accuracy, and structural fidelity - and auto-fails translations that fall below a quality threshold.

Polyglot Readability Analysis

After translation, a Python script (invoked from Node.js) calculates the reading level of both the source and translated document, so teachers can confirm the material is still grade-appropriate for their students.

Technologies Used

  • Frontend: Next.js, TypeScript, TanStack Query
  • Backend: Node.js, Express, Prisma, PostgreSQL
  • Queuing: BullMQ + Redis
  • Storage: AWS S3
  • AI: Gemini 2.0 Flash (quality scoring & visual comparison)
  • Document Processing: Apache Airflow (external service)
  • Email: Resend

Outcomes

  • Full async pipeline - no UI blocking, jobs process in parallel
  • Translations auto-rejected if formatting quality drops below threshold
  • Each district's data fully isolated at the database and storage level
  • Swagger docs and BullBoard queue monitor available for operator visibility
Translated Output - Chinese translation with layout preserved
Results: Complex worksheet translated to Chinese with formatting 100% intact