"
We are building a future where user and agent interfaces can adapt, evolve, and personalize themselves. Each improvement here increases optimization space, experimentation velocity, and thus, lift.
Reading that felt like reading my own job description, just pointed at a different surface. For the last four years at Cogniac, I've been shipping agent workflows and LLM pipelines into live casinos, manufacturing floors, railways, and clinical labs — places where a non-deterministic system has to actually work, day after day, under real latency and real data. Swap "cards on a table" for "interfaces on a page" and the core skill is the same: generate structured output from messy input, evaluate it, monitor it, and keep improving it in production.
Every line of your role description maps onto work I've already shipped. "Build AI agents that generate interfaces, client-side features, and more at scale" and "consistently turn manual workflows into scalable and reliable agents" — I architected a multi-agent, human-in-the-loop annotation pipeline that offloads routine labeling from annotators, delivering an 80% reduction in annotation time and a 30% label-quality improvement on two tested workflows. "Create validation systems that catch bugs, off-brand outputs, and poor user experiences before launch" and "create online and offline evaluation systems for generated outputs" — that same pipeline worked because the evaluation loop was the product. "Build monitoring systems that detect issues and regressions after rollout" — I shipped a vision-powered, AI-automated player rating system at the edge across 100+ live Baccarat tables in Asia, optimized inference to run 3+ tables per Cogniac Edgeflow device, and doubled revenue from this client; regressions show up in revenue in real time. "Ship reliable non-deterministic systems" — I architected platform-wide post-training quantization (FP32 / FP16 → INT8), giving 2× lower inference latency with no meaningful accuracy loss. And the LLM-powered post-game analytics dashboard I built for pit bosses is the substrate pattern that makes GenAI usable inside a product rather than beside it.
The other half of the role — "talk to technical and non-technical teams to capture qualitative feedback and convert it to metrics and improvements" — is where the best work I've done has always lived. At UCSF I built biomarker models inside Dr. Amorim's neurology lab. At IntelinAir I shipped density-estimation models on AWS next to agronomists and published the work in Frontiers in Plant Science. At Columbia I was a TA for Applied Deep Learning. Three provisional patents, a Best Paper Award, and a peer-reviewed publication came out of exactly the kind of environment you describe: small, talent-dense, bias toward shipping.
Generate, test, monitor interfaces at scale. That isn't a list I aspire to — it's how I already work. I'd love to bring it to your team, in person, in SF.