Enterprise · Full-Stack
Wolters Kluwer — Journal Automation Platform
A full-stack web app that turns 5 scattered data sources into 45 branded PowerPoint presentations — automatically. What took a marketing team weeks now takes under 10 minutes.
45
Journals
5 → 1
Sources → click
~95%
Time saved
35/35
Audit pass
The Problem
Wolters Kluwer's legal publishing division produces 45 academic law journals. Every year, the marketing team builds a branded partnership overview presentation for each journal — subscriber trends, usage analytics, geographic reach, citation rankings, page counts.
The data lives in 5 separate systems: Tableau (subscribers, geography, segments), SIQ (platform usage, top articles), HeinOnline (visit statistics), Scopus/Clarivate/Google Scholar (rankings), and internal spreadsheets (page counts per issue).
The old process: manually copy-paste data from each source into a 16-slide PowerPoint template, recreate charts, cross-reference journal names across systems (which don't agree on naming), and repeat 45 times. This took multiple weeks every year and was error-prone — the same journal might appear as "b-Arbitra | Belgian Review of Arbitration" in Tableau and "b-Arbitra: Belgian Review of Arbitration" in SIQ.
The Solution
A full-stack web application that automates the entire pipeline from raw data to finished, auditable presentations.
Upload Dashboard
Drag files or click to upload
Tableau
12 files
SIQ
8 files
HeinOnline
3 files
Rankings
4 files
Page Counts
2 files
Marketing
6 files
Frontend
React 19 + TypeScript + Tailwind + shadcn/ui
The upload dashboard uses a 6-zone drag-and-drop interface with auto-categorization — drop any file and the system identifies which data source it belongs to based on filename patterns and content structure.
A real-time data completeness matrix tracks coverage across all 7 data categories for each of the 45 journals. During generation, live WebSocket updates stream progress per journal so you can see exactly where the pipeline is.
The built-in file browser supports individual and bulk download of generated presentations. An audit UI renders chart-level validation results — every data point in every chart is traced back to its source row.
Backend
FastAPI + Python
The data pipeline fuzzy-matches journal names across all 5 sources using Python's SequenceMatcher combined with explicit alias tables. This handles the naming inconsistencies automatically — no manual mapping required for known journals.
Multi-year data aggregation pulls 3 years of subscriber and usage trends to build time-series charts. Template-based PPTX generation works via direct XML manipulation of PowerPoint's OOXML format, preserving full editability — the marketing team can still open and tweak any generated file in PowerPoint.
Conditional slide inclusion handles missing data gracefully: if a journal has no HeinOnline stats, that slide is deleted entirely rather than showing empty charts. A reverse-engineering audit engine extracts chart data from the generated PPTX XML and validates it against the source data.
Integrations
A custom Tableau REST API client handles automated data pulls using Personal Access Token authentication, extracting subscriber demographics, geographic distribution, and segment breakdowns directly from published Tableau views.
A rate-limited Google Scholar metrics scraper extracts h5-index and h5-median values for journal ranking slides. Bulk file upload with keyword-based auto-categorization routes uploaded files to the correct processing pipeline without manual sorting.
Pipeline Flow
Data Upload
Auto-Processing
PPTX Generation
Audit & Validate
Download
Key Technical Challenges
1. PPTX XML Corruption
PowerPoint's OOXML format is unforgiving. Three separate corruption vectors discovered and fixed: external data references to SharePoint in template charts, Python's XML writer dropping the standalone="yes" declaration, and dangling slide relationship entries after slide deletion. Each caused silent corruption that only manifested when opening in PowerPoint.
<!-- Template chart with external SharePoint reference -->
<c:externalData r:id="rId1">
<c:autoUpdate val="0"/>
</c:externalData>
<!-- Fix: strip all externalData elements before writing -->
# chart_xml.findall(".//c:externalData", ns)
# → Remove each element from its parent Template charts contained SharePoint references that corrupted the output when the linked workbook wasn't available.
2. Journal Name Normalization
The same journal appears under different names, abbreviations, and Unicode variants across 5 data systems. Built a multi-strategy resolver: explicit alias mapping → fuzzy SequenceMatcher → acronym lookup → keyword detection. Handles 45 journals across all sources with zero manual intervention.
3. 3D Bar Chart Labels
Discovered that OOXML's bar3DChart type does not support the dLblPos attribute at any level. Any attempt to set label positioning corrupts the file. Solved by working within PowerPoint's default placement and using showVal toggle only.
4. Conditional Slide Management
Each of the 16 template slides maps to a specific data requirement. If a journal has no HeinOnline data, that slide is deleted. Marketing slides from a separate PPTX are injected and matched via fuzzy matching. Orphaned media cleaned up to prevent file bloat.
Validation
0/35
charts validated against source data
The Result
- ✓ 45 journal presentations generated from a single button click
- ✓ Sub-10-minute full pipeline (was multiple weeks)
- ✓ 35/35 audit pass rate — every chart validated against source data
- ✓ Fully editable output — marketing team can still tweak in PowerPoint
- ✓ Reproducible — regenerate any time data updates
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 19, TypeScript, Vite, Tailwind CSS, shadcn/ui, Recharts |
| Backend | FastAPI, Python 3.13, pandas, openpyxl |
| Real-time | WebSocket (native FastAPI) |
| PPTX Engine | Direct XML manipulation (ElementTree) |
| Integrations | Tableau REST API, Google Scholar scraper |
| Validation | Custom audit engine (PPTX XML ↔ source data) |
Let's Talk
I take on 3 new clients per month.
The businesses that move first win. Let's find where you're leaving 20+ hours a week on the table.
Book Your Free Audit● 2 spots remain for March — next availability: April