Case Study

Maven Meals Data Integration Platform

A production-grade serverless architecture connecting multiple SaaS systems into a unified, real-time business intelligence solution.

Lambda AWS Lambda GraphQL GraphQL RDS PostgreSQL EventBridge Event-Driven

The Challenge

Critical recipe data locked in disconnected systems with no automated path to reporting

Maven Meals' recipe data originates from a separate ordering system where staff manually enter information into Galley Solutions, their catering management platform. Galley holds the detailed recipe specifications, ingredients, and packaging requirements - but getting that data out for business reporting was the problem.

Manual Data Entry

Recipe data hand-entered into Galley from external ordering system - no direct integration

Locked-In Data

Galley holds 2,900+ recipes with 160+ fields each - but no API access initially available

Order Flow

Commerce platform with DynamoDB streaming real-time orders needing recipe correlation

Reporting Gap

Business needed unified insights across orders, recipes, and packaging - no path existed

The Solution Evolution

From scrappy problem-solving to elegant architecture

1
Initial State

Manual PDF Exports

With no API access to Galley, the only way to extract recipe data was through printed PDF packaging guides. Staff would export PDFs from Galley and save them to Google Drive - a tedious, error-prone process that couldn't scale.

Pain Point: Data locked in PDFs, manual comparison for changes
2
Creative Solution

Automated PDF Parsing Pipeline

We built an automated PDF parsing solution - Lambda functions that monitor Google Drive, download new PDFs, parse structured content, detect changes via content hashing, and load data into PostgreSQL. Not elegant, but it worked.

Google Drive Google Drive
S3 S3 Mirror
Lambda PDF Parser
PostgreSQL PostgreSQL
Win: Automated extraction, change detection, version tracking
Limitation: Dependent on PDF format, limited field access
3
Current Architecture

GraphQL API Integration

Working with Galley's team, we developed a proper API integration using their GraphQL endpoint. This unlocked access to all 160+ recipe fields, real-time sync, and even a custom deletedAt filter they added at our request for reliable deletion tracking.

PDF Parsing

  • ~20 fields extracted
  • Manual PDF generation
  • Format-dependent parsing
  • Deletion detection by absence

GraphQL API

  • 160+ fields available
  • Automated hourly sync
  • Delta detection via timestamps
  • Actual deletion timestamps
The PDF solution still runs in parallel - proving the value led to API access

The Lesson

Sometimes the "right" solution isn't available yet. Building a working solution with available tools - even if imperfect - demonstrates value and often opens doors to better approaches. The PDF parsing pipeline proved the ROI that justified investing in proper API integration.

The Solution

Event-driven serverless architecture with real-time data pipelines

Primary Pipeline Real-Time Order Processing

DynamoDB DynamoDB Order Events
Stream
Lambda Lambda Stream Processor
Normalize
PostgreSQL PostgreSQL Unified Store
134,000+ orders processed 110+ attributes normalized Real-time via streams

Output Pipeline Business Intelligence Reports

PostgreSQL PostgreSQL Unified Data
Query
Lambda Lambda Report Generator
Format
Google Sheets Google Sheets Business Reports
On-demand generation Conditional formatting Week-over-week change highlights

Event Triggers

DynamoDB DynamoDB Streams Real-time order changes
EventBridge EventBridge Schedule Hourly recipe sync
S3 S3 Events File-based triggers
API Gateway API Gateway On-demand reports

Key Integrations

01

Real-Time Order Processing

DynamoDB DynamoDB Lambda Lambda PostgreSQL PostgreSQL

Orders from the commerce platform flow through DynamoDB Streams for real-time capture. A Lambda function processes each change event, normalizes data across 110+ attributes, and writes to PostgreSQL for unified querying.

134,000+ Orders Processed
110+ Attributes Extracted
<30s Processing Time
  • Event-driven architecture with no polling overhead
  • Customer lifecycle tracking (new vs. returning)
  • Geographic distribution and delivery route analysis
  • Financial analytics with pricing and discount breakdowns
02

Galley Recipe Data Sync

GraphQL GraphQL S3 S3 PostgreSQL PostgreSQL

A two-Lambda architecture fetches recipe data from Galley's GraphQL API and synchronizes to PostgreSQL. This design maintains database isolation while enabling external API access without expensive NAT Gateway costs.

Lambda
Lambda 1 No VPC Internet Access
JSON via S3
S3
S3 Bucket Staging Event Trigger
S3 Event
VPC
Lambda
Lambda 2 VPC-Isolated Database Access
2,900+ Recipes Synced
160+ Fields per Recipe
Hourly Sync Frequency
  • GraphQL integration with delta detection via timestamps
  • Custom deletedAt filter developed with Galley team
  • S3-based decoupling for reliability and audit trail
  • $32/month savings by eliminating NAT Gateway
03

Google Workspace Integration

Drive Drive S3 S3 RDS RDS Sheets Sheets

PDF packaging guides from Google Drive are mirrored to S3, parsed by Lambda, and loaded into PostgreSQL. Business users generate formatted Google Sheets reports on demand via a web interface with automatic change detection.

  • Service account authentication with Google APIs
  • Hash-based content deduplication for PDFs
  • Conditional formatting highlights week-over-week changes
  • Rate limit handling for Sheets API quotas
04

Custom Browser Extension

Tampermonkey Userscript API Gateway API Gateway Galley API Galley API

A Tampermonkey userscript adds functionality to the Galley web interface, allowing users to create recipes directly from ingredient pages with a single click. The extension intercepts SPA navigation and calls a secure API Gateway endpoint.

  • SPA-compatible navigation detection via history.pushState
  • CORS-restricted API Gateway endpoint
  • Duplicate detection and auto-tagging
  • GraphQL mutations through Lambda proxy

Infrastructure & Cost

Optimized serverless architecture at $15/month

Resource Monthly Cost
Lambda Lambda Functions ~$1.50
RDS RDS PostgreSQL ~$10.00
EC2 EC2 Bastion ~$3.00
S3 S3 + CloudFront <$0.10
CloudWatch Monitoring ~$0.25
Total ~$15/month

Optimization Strategies

ARM64 Graviton 20% better price/performance
GP3 Storage 20% savings over GP2
Business-Hours Scheduling No overnight processing
VPC Isolation Pattern $32/month NAT Gateway savings

Technical Stack

Languages & Frameworks

Python Python 3.12
GraphQL GraphQL
PostgreSQL PostgreSQL

AWS Services

Lambda Lambda
RDS RDS
DynamoDB DynamoDB
S3 S3
EventBridge EventBridge
Secrets Manager Secrets Manager
API Gateway API Gateway
CloudWatch CloudWatch

External APIs

Results

Operational Impact

  • Manual PDF processing eliminated
  • Hourly updates vs. daily manual refresh
  • Automatic recipe change detection
  • Zero manual data entry errors

Technical Achievements

99.9% Uptime
<30s Processing
60%+ Cost Reduction

Complete audit trail with timestamp tracking for every change

Security & Data Integrity

Built with defense-in-depth principles to protect sensitive business data at every step

Network Isolation

PostgreSQL database runs in a private VPC subnet with no public internet access. Only authorized Lambda functions within the VPC can connect.

  • Private subnet deployment
  • Security groups restrict ingress to Lambda only
  • No NAT Gateway exposes internal traffic

IAM Least Privilege

Each Lambda function has a dedicated IAM role with only the specific permissions required for its task. No shared credentials or over-permissioned roles.

  • Per-function IAM roles
  • Resource-level permissions (specific ARNs)
  • No wildcard (*) actions

Secrets Management

All sensitive credentials stored in AWS Secrets Manager with automatic rotation. No hardcoded passwords, API keys, or connection strings in code.

  • AWS Secrets Manager for all credentials
  • Database passwords auto-rotated
  • API keys encrypted at rest

Encryption Everywhere

Data encrypted both in transit and at rest. TLS for all API connections, AWS-managed encryption keys for storage, and SSL-only database connections.

  • TLS 1.2+ for all external APIs
  • RDS encryption with AWS KMS
  • S3 server-side encryption (SSE-S3)

Data Validation

Schema validation at every integration point prevents malformed or malicious data from entering the system. Type checking and sanitization before database writes.

  • Input validation on all Lambda handlers
  • Parameterized SQL queries (no injection)
  • Schema validation for GraphQL responses

Audit & Monitoring

Comprehensive logging via CloudWatch captures every data transformation. CloudTrail tracks all AWS API activity for security auditing and compliance.

  • CloudWatch Logs for all Lambda executions
  • CloudTrail for AWS API audit trail
  • Error alerting via CloudWatch Alarms

Data Integrity Guarantee

Every record processed through this pipeline maintains a complete audit trail. Idempotent operations ensure data consistency even during retries or failures. The system reliably processes thousands of records daily with zero data loss since deployment.

Key Technical Challenges

1

VPC Isolation Pattern

Designed two-Lambda architecture to maintain database security while accessing external APIs without expensive NAT Gateway ($32/month savings).

2

GraphQL Delta Detection

Implemented timestamp-based change detection to minimize API calls and processing time, syncing only modified records.

3

Sheets API Optimization

Consolidated formatting calls to stay within rate limits while maintaining professional output quality for business reports.

4

Deletion Tracking

Collaborated with Galley's team to add a deletedAt filter to their GraphQL API, enabling reliable deletion detection with actual timestamps.

5

SPA Navigation Handling

Browser extension intercepts history.pushState for single-page application compatibility in the Galley interface.

Have a Similar Challenge?

Let's discuss how a custom integration solution can streamline your business operations.

Get in Touch