Skip to content

SANKET MATROJA

New York, NY 10038 · (201) 856-1595 · sanketmatroja07@gmail.com

linkedin.com/in/sanketmatroja07 · github.com/sanketmatroja07 · sanketmatroja.com

SUMMARY

AI Engineer with full-stack expertise building production AI systems — fraud detection platforms, algorithmic recommendation engines, NLP sentiment pipelines, and LLM-powered tools. 14 shipped projects across 30+ technologies. I architect resilient AI systems with deterministic fallbacks, composable rule engines, and zero-downtime pipelines. MS Computer Science (GPA 3.77) with professional experience building data pipelines processing 50K+ records daily.

TECHNICAL SKILLS

AI / ML:OpenAI GPT-4, Anthropic Claude, Hugging Face Transformers, TextBlob, Sentence Transformers, Cosine Similarity, MMR Diversification, Sentiment Analysis, NLP Pipelines
Frontend:React 19, Next.js 14/16, TypeScript, Tailwind CSS, Framer Motion, React Native (Expo), Zustand, Recharts, Leaflet
Backend:FastAPI, Python, Node.js, Express.js, Celery, REST API Design, WebSocket, Background Workers
Databases:PostgreSQL, MongoDB, Redis, Supabase, Firebase/Firestore, SQLite, SQL (CTEs, window functions, query optimization)
Infrastructure:Docker, AWS (S3, Lambda, EC2, RDS, Kinesis), Vercel, CI/CD, Apache Kafka
Practices:System Design, Microservices, Monorepo Architecture, ETL Pipelines, RBAC, Schema Migration, Caching (LRU), Rate Limiting

PROFESSIONAL EXPERIENCE

Data & AI Engineer

Apr 2024 – Jul 2024

LEAP (Contract)

Kuala Lumpur, Malaysia
  • Designed automated ETL pipelines (Python, Pandas) extracting data from 4 e-commerce platform APIs, transforming 50,000+ records monthly into a centralized PostgreSQL data warehouse
  • Built data validation framework achieving 99% accuracy with automated anomaly detection, schema compliance checks, and business rule enforcement before production loading
  • Optimized PostgreSQL queries with indexing strategies and query plan analysis, improving analytics workload performance by 40% for BI tool consumption
  • Automated daily data collection with error handling and retry logic, reducing manual processing by 8 hours/week and achieving 99.5% pipeline uptime
  • Managed AWS S3 data infrastructure with lifecycle policies for cost optimization across raw, processed, and analytics data tiers

AI & FULL-STACK PROJECTS

RiskPulse — AI Fraud Detection Platform

Next.js, FastAPI, PostgreSQL, Redis, Docker, OpenAI/Anthropic

  • Architected full-stack SaaS fraud detection platform with composable rule engine supporting 5 detection strategies (threshold, velocity, blacklist, pattern, composite) configurable by business analysts without engineering involvement
  • Built LLM-powered investigation narrative generation with graceful fallback to rule-based heuristics — zero downtime when AI APIs are unavailable
  • Implemented RBAC (4 roles), entity graph exploration, real-time dashboards, background Redis workers for async detection pipeline, and Stripe billing integration

Coffee Sommelier — Zero-LLM Recommendation Engine

Next.js, FastAPI, PostgreSQL, Python (scikit-learn)

  • Built deterministic recommendation engine using weighted cosine similarity + MMR diversification — sub-50ms latency, $0 LLM cost, serving 4 frontends through a single API
  • Implemented haversine geo-filtering and admin-configurable scoring weights for A/B testing across consumer, admin, widget, and B2B interfaces

Mind Mirror — NLP Journaling Platform with Zero-Downtime AI

React, FastAPI, MongoDB, Hugging Face, AWS Lambda

  • Engineered 3-tier sentiment pipeline: LRU cache (256 entries, 30-min TTL) → Hugging Face RoBERTa API (85% accuracy) → TextBlob fallback (70% accuracy) — zero user-facing failures
  • Achieved ~40% cache hit rate reducing API calls; dual deployment via Uvicorn + AWS Lambda (Mangum) with MongoDB connection pool optimization for cold starts

SignalVault — GPT-4 Market Intelligence Platform

Next.js, FastAPI, Supabase, Redis, Celery, OpenAI GPT-4

  • Built multi-source data aggregation pipeline (Reddit, Twitter, YouTube, Google Trends, HN, Product Hunt) with GPT-4 signal analysis and Supabase real-time storage
  • Implemented scheduled data collection with Celery + APScheduler for continuous market intelligence monitoring

ResumeTailor — AI Resume Optimization Tool

Next.js, FastAPI, Anthropic Claude, LibreOffice

  • Built end-to-end pipeline: DOCX parse → block extraction → Anthropic Claude tailoring → DOCX/PDF export via headless LibreOffice for job-specific resume optimization

EDUCATION

Pace University – Seidenberg School of Computer Science

Aug 2024 – May 2026

Master of Science, Computer Science — GPA: 3.77

Database Systems, Data Engineering, Cloud Computing, Distributed Systems, Machine Learning, Algorithms

ITM Vocational University

Jan 2021 – May 2024

Bachelor of Technology, Computer Science — GPA: 3.87

ADDITIONAL

Work Authorization: F-1 Student Visa, CPT/OPT Eligible

Availability: Immediate for internships; Full-time after May 2026 graduation

Languages: English (Fluent), Hindi (Native)