Data Engineer · Analytics Pipelines · Sports Data · BI & Visualization
Remote · Corpus Christi, TX
Data Engineer with 5+ years designing and operating analytics pipelines on GCP and AWS, with a focus on sports data infrastructure, real-time event processing, and cloud-based ETL systems.
Built production pipelines handling live MLB game feeds at Sportradar/Synergy Sports. At Vikua, delivered GCP analytical models that cut time-to-insight by 45%, maintained 99.7% pipeline uptime, and reduced cloud compute costs by 18% across 6 client environments.
Currently completing an MIT MicroMaster in Statistical Modeling & Computation. Fluent in English and Spanish.
💡 My focus: turning technical execution into measurable business impact.
Languages: Python, SQL, Ruby
Cloud Platforms: GCP (BigQuery, Cloud Composer, Cloud Storage), AWS (S3, Redshift), Azure SQL
ETL & Orchestration: Airflow, Tray.io, Zapier, REST APIs, Pandas, Terraform
BI & Visualization: Power BI, Plotly, Streamlit, Zoho Analytics
Sports Data: Sportradar Platform, Statcast, pybaseball, Pitch-by-pitch Tracking
Other: DuckDB, Parquet, GitHub Actions, Pytest
A multi-season MLB analytics platform built on a Bronze/Silver/Gold medallion architecture. Ingests real data from FanGraphs and Baseball Savant, transforms it through DuckDB, and serves it via an interactive Streamlit dashboard.
Live Demo →
🔗 github.com/ivanrivasgr/baseball-data-warehouse
Production-style sports data platform on a full Bronze/Silver/Gold architecture: raw GPS tracking ingestion, validation, Parquet transformation, and a player analytics layer. Includes CI/CD via GitHub Actions, Apache Airflow DAG, and Terraform provisioning a 3-layer AWS S3 data lake.
Live Demo →
🔗 github.com/ivanrivasgr/soccer-data-platform-demo
Fully automated PII-safe data pipeline on GCP integrating multiple heterogeneous sources into a unified Master User Model using Bronze/Silver/Gold layering in BigQuery. SHA256 hashing and boolean masking for sensitive fields, orchestrated with Cloud Composer/Airflow.
🔗 github.com/ivanrivasgr/gcp_data_architecture_project
Automated pipeline in Ruby that detects uploaded CSV files in Dropbox, cleans and transforms the data, and routes them to destination folders. Includes file mapping logic, CSV validation, and scheduled execution.
🔗 github.com/ivanrivasgr/ruby_dropbox_file_automation-
- Sports data infrastructure & real-time event pipelines
- Cloud data architecture & orchestration (Airflow, Terraform)
- BI automation & dashboard design
- Statistical modeling & predictive analytics
📧 ivanfgruber@gmail.com
🌐 linkedin.com/in/ifrg
"Architecture is not about storing data — it's about how data flows to create value."