Maven Meals Data Integration Platform
A production-grade serverless architecture connecting multiple SaaS systems into a unified, real-time business intelligence solution.
The Challenge
Critical recipe data locked in disconnected systems with no automated path to reporting
Maven Meals' recipe data originates from a separate ordering system where staff manually enter information into Galley Solutions, their catering management platform. Galley holds the detailed recipe specifications, ingredients, and packaging requirements - but getting that data out for business reporting was the problem.
Manual Data Entry
Recipe data hand-entered into Galley from external ordering system - no direct integration
Locked-In Data
Galley holds 2,900+ recipes with 160+ fields each - but no API access initially available
Order Flow
Commerce platform with DynamoDB streaming real-time orders needing recipe correlation
Reporting Gap
Business needed unified insights across orders, recipes, and packaging - no path existed
The Solution Evolution
From scrappy problem-solving to elegant architecture
Manual PDF Exports
With no API access to Galley, the only way to extract recipe data was through printed PDF packaging guides. Staff would export PDFs from Galley and save them to Google Drive - a tedious, error-prone process that couldn't scale.
Automated PDF Parsing Pipeline
We built an automated PDF parsing solution - Lambda functions that monitor Google Drive, download new PDFs, parse structured content, detect changes via content hashing, and load data into PostgreSQL. Not elegant, but it worked.
GraphQL API Integration
Working with Galley's team, we developed a proper API integration using
their GraphQL endpoint. This unlocked access to all 160+ recipe fields, real-time sync,
and even a custom deletedAt filter they added at our request for reliable
deletion tracking.
PDF Parsing
- ~20 fields extracted
- Manual PDF generation
- Format-dependent parsing
- Deletion detection by absence
GraphQL API
- 160+ fields available
- Automated hourly sync
- Delta detection via timestamps
- Actual deletion timestamps
The Lesson
Sometimes the "right" solution isn't available yet. Building a working solution with available tools - even if imperfect - demonstrates value and often opens doors to better approaches. The PDF parsing pipeline proved the ROI that justified investing in proper API integration.
The Solution
Event-driven serverless architecture with real-time data pipelines
Primary Pipeline Real-Time Order Processing
Featured Integration GraphQL Recipe Synchronization
Output Pipeline Business Intelligence Reports
Event Triggers
Key Integrations
Real-Time Order Processing
Orders from the commerce platform flow through DynamoDB Streams for real-time capture. A Lambda function processes each change event, normalizes data across 110+ attributes, and writes to PostgreSQL for unified querying.
- Event-driven architecture with no polling overhead
- Customer lifecycle tracking (new vs. returning)
- Geographic distribution and delivery route analysis
- Financial analytics with pricing and discount breakdowns
Galley Recipe Data Sync
A two-Lambda architecture fetches recipe data from Galley's GraphQL API and synchronizes to PostgreSQL. This design maintains database isolation while enabling external API access without expensive NAT Gateway costs.
- GraphQL integration with delta detection via timestamps
- Custom
deletedAtfilter developed with Galley team - S3-based decoupling for reliability and audit trail
- $32/month savings by eliminating NAT Gateway
Google Workspace Integration
PDF packaging guides from Google Drive are mirrored to S3, parsed by Lambda, and loaded into PostgreSQL. Business users generate formatted Google Sheets reports on demand via a web interface with automatic change detection.
- Service account authentication with Google APIs
- Hash-based content deduplication for PDFs
- Conditional formatting highlights week-over-week changes
- Rate limit handling for Sheets API quotas
Custom Browser Extension
A Tampermonkey userscript adds functionality to the Galley web interface, allowing users to create recipes directly from ingredient pages with a single click. The extension intercepts SPA navigation and calls a secure API Gateway endpoint.
- SPA-compatible navigation detection via
history.pushState - CORS-restricted API Gateway endpoint
- Duplicate detection and auto-tagging
- GraphQL mutations through Lambda proxy
Infrastructure & Cost
Optimized serverless architecture at $15/month
Optimization Strategies
Technical Stack
Languages & Frameworks
AWS Services
External APIs
Results
Operational Impact
- Manual PDF processing eliminated
- Hourly updates vs. daily manual refresh
- Automatic recipe change detection
- Zero manual data entry errors
Technical Achievements
Complete audit trail with timestamp tracking for every change
Security & Data Integrity
Built with defense-in-depth principles to protect sensitive business data at every step
Network Isolation
PostgreSQL database runs in a private VPC subnet with no public internet access. Only authorized Lambda functions within the VPC can connect.
- Private subnet deployment
- Security groups restrict ingress to Lambda only
- No NAT Gateway exposes internal traffic
IAM Least Privilege
Each Lambda function has a dedicated IAM role with only the specific permissions required for its task. No shared credentials or over-permissioned roles.
- Per-function IAM roles
- Resource-level permissions (specific ARNs)
- No wildcard (*) actions
Secrets Management
All sensitive credentials stored in AWS Secrets Manager with automatic rotation. No hardcoded passwords, API keys, or connection strings in code.
- AWS Secrets Manager for all credentials
- Database passwords auto-rotated
- API keys encrypted at rest
Encryption Everywhere
Data encrypted both in transit and at rest. TLS for all API connections, AWS-managed encryption keys for storage, and SSL-only database connections.
- TLS 1.2+ for all external APIs
- RDS encryption with AWS KMS
- S3 server-side encryption (SSE-S3)
Data Validation
Schema validation at every integration point prevents malformed or malicious data from entering the system. Type checking and sanitization before database writes.
- Input validation on all Lambda handlers
- Parameterized SQL queries (no injection)
- Schema validation for GraphQL responses
Audit & Monitoring
Comprehensive logging via CloudWatch captures every data transformation. CloudTrail tracks all AWS API activity for security auditing and compliance.
- CloudWatch Logs for all Lambda executions
- CloudTrail for AWS API audit trail
- Error alerting via CloudWatch Alarms
Data Integrity Guarantee
Every record processed through this pipeline maintains a complete audit trail. Idempotent operations ensure data consistency even during retries or failures. The system reliably processes thousands of records daily with zero data loss since deployment.
Key Technical Challenges
VPC Isolation Pattern
Designed two-Lambda architecture to maintain database security while accessing external APIs without expensive NAT Gateway ($32/month savings).
GraphQL Delta Detection
Implemented timestamp-based change detection to minimize API calls and processing time, syncing only modified records.
Sheets API Optimization
Consolidated formatting calls to stay within rate limits while maintaining professional output quality for business reports.
Deletion Tracking
Collaborated with Galley's team to add a deletedAt filter to their GraphQL API, enabling reliable deletion detection with actual timestamps.
SPA Navigation Handling
Browser extension intercepts history.pushState for single-page application compatibility in the Galley interface.
Have a Similar Challenge?
Let's discuss how a custom integration solution can streamline your business operations.
Get in Touch