Open to new opportunities

Software Engineer · AI Systems

Murad Al-Balushi

I build the control, evaluation, and safety layers that make LLM systems reliable enough for production.

✓Replaced subjective model evaluation with execution-based validation
✓Enforced LLM spend limits in agent workflows
✓Constrained model outputs to eliminate hallucination
✓Built compliance-ready fintech infrastructure — zero security incidents

View Projects Deep-Dive Case Studies

murad@prod — reliability-layer

Professional Experience

360Remit

Software Developer

Muscat, Oman

Jan 2025 – Mar 2026

Owned end-to-end VAPT for a regulated fintech platform as the risk authority between vendors and engineering; validated findings, cut false positives 40%+, closed 100% of critical issues pre-launch, zero security incidents at go-live.
Engineered vendor synchronization pipeline for 500k+ records (delta detection, conflict resolution, bidirectional sync), cutting manual processing from 3–5 days to under 5 minutes with 100% DB integrity.
Delivered MTO, eKYC, and AML integrations and designed phased infra (DR, capacity, data residency), enabling platform launch within regulatory deadlines while unblocking user-facing onboarding flows.
Built SQL/Python humanization pipeline converting 500k+ vendor records to presentation-ready data, eliminating manual cleaning and cutting prep time 95%+.

Highlight Projects

Evaluation, cost control, safety, and research reproduction — the layers that make AI systems production-ready

Code Arbiter Architecture: Coding task → LLM provider (OpenAI / Anthropic / LM Studio) → generated code → isolated Docker sandbox (no network, memory capped) → pytest execution → failure classification → HTML comparison report

AI Code Generation Evaluation Engine (Code Arbiter)

Execution-based benchmarking — run the code, classify the failure

Replaced subjective LLM code review with deterministic execution-based validation. Runs generated code in isolated Docker sandboxes, classifies failures across syntax, runtime, logic, and temporal reasoning, and benchmarks multiple models under identical conditions.

PythonDockerOpenAI APIAnthropic APILM Studio

Read Case Study Source

CostPlan – LLM Cost Enforcement Proxy

Open-source circuit breaker for autonomous agent API spend

Built an open-source transparent proxy that enforces per-call and per-session budget limits on LLM API calls, with cache-aware pricing and zero-latency SSE streaming — preventing unbounded spend in autonomous agent workflows.

PythonAnthropic APISSE StreamingasyncioHTTP Proxy

Read Case Study Source

Autonomous Support Agent Architecture: Help Scout polling → Intent classification → Deterministic escalation or response generation → Stripe MCP (read-only) → RAG for product knowledge → Help Scout posting

Production AI Support Agent (Guardrail-First)

Risk-aware LLM-powered support agent reducing customer support load

Deployed a guardrail-first AI support agent handling live customer tickets with Stripe-backed context and deterministic escalation logic, designed to fail safely under uncertainty in a production SaaS environment.

PythonLLM SystemsHelp Scout APIStripe MCPRAG

Read Case Study

Brain2Qwerty v1 pipeline: MEG recording -> Conv encoder -> sentence-level Transformer -> per-keystroke character predictions -> (optional) n-gram language-model rescoring

Brain2Qwerty v1 — MEG Brain-to-Text Reproduction

Reproduced Meta FAIR's non-invasive brain-to-text decoder on public MEG data

Reproduced Meta FAIR's Brain2Qwerty v1 — a 623M-param Conv+Transformer that decodes typed sentences from MEG brain recordings — on the public SpanishBCBL dataset, matching the documented no-LM target of ~0.38 character error rate across 19 subjects. Full pipeline on a single GCP L4 GPU: gated 250GB dataset handling, a ~1.5-day training run, and a CUDA OOM diagnosis fixed with zero fidelity cost.

PyTorchPyTorch LightningCUDAGCP (Compute Engine, L4 GPU, GCS)Hugging Face Datasets

Read Case Study Source

Technical Skills

Technologies I use to build, ship, and evaluate production systems

Languages

PythonTypeScriptJavaScriptSQL

AI & LLM

OpenAIAnthropicGeminiRAGGuardrailsWhisperIntent Classification

Backend

FastAPINode.jsExpressNext.jsReact

Infrastructure

DockerAWSGCPCI/CDNginxVercel

Databases

PostgreSQLMySQLMongoDBRedisSQLiteFAISS

Tools & Integrations

GitPytestStripe MCPHelp Scout APIDiscord API

Let's Connect

Open to opportunities, collaborations, and interesting problems.

Email

muradlbalushi@gmail.com

Best for project inquiries

muradalbalushi

Professional networking

GitHub

The-Digital-Alchemist

View my code

Murad Al-Balushi

Professional Experience

360Remit

Highlight Projects

AI Code Generation Evaluation Engine (Code Arbiter)

CostPlan – LLM Cost Enforcement Proxy

Production AI Support Agent (Guardrail-First)

Brain2Qwerty v1 — MEG Brain-to-Text Reproduction

Technical Skills

Languages

AI & LLM

Backend

Infrastructure

Databases

Tools & Integrations

Let's Connect

Email

LinkedIn

GitHub