Tech Lead, Machine Learning Engineer - Global E-Commerce (Conversational AI)
ByteDance · Singapore
About UsFounded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.Why Join ByteDanceInspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.Diversity & InclusionByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.About the teamWe are building the next generation of conversational AI for Global E-commerce — a unified Agent system that learns from every interaction, runs in 30+ languages, and is deployed across one of the largest e-commerce surfaces on the internet. Our 2026 north star is a self-evolving Agent: post-training, harness, tools, memory, and evaluation form one closed loop, and every served conversation becomes training, evaluation, and retrieval signal for the next iteration.Business surface — buyer, seller, dispute & appeals, operations — is the substrate. Our work is foundational LLM + Agent engineering: post-training, agent harness, tool design, memory, evaluation, inference, multilinguality. We are hiring people who want to push the SOTA of these systems in production, at scale, with hundreds of millions of users in the loop.What we work on- LLM post-training & alignment — large-scale SFT, DPO/IPO/KTO, online RL (RLHF / RLAIF / RLVR), reward modeling, preference data curation, long-context training, distillation, QAT. We train and adapt frontier-class open-weights models (≥7B → ≥70B) and our own continually-pretrained checkpoints on internal infra (FSDP / DeepSpeed / Megatron-style stacks).- Agent foundations — harness design (context engineering, sub-agents, durable execution, parallel tool use), tool design (ACI principles, namespaced surfaces, poka-yoke, instrumented traces), memory (episodic + semantic + skill-shaped), MCP and Skill-style extensibility. We treat tools and prompts as APIs and iterate against production traces.- Auto-eval and observability — LLM-as-judge with calibrated human agreement, real-traffic replay, failure-mode taxonomies, regression + safety + cost + latency harnesses. We have moved root-cause analysis on a single case from ~13 engineer-days to ~3 minutes auto.- Self-evolving systems — every served conversation becomes a candidate for training data, eval set membership, retrieval index, and skill induction, with privacy and quality gates. The flywheel is the product.- Inference & serving — vLLM / TensorRT-LLM, MoE, speculative decoding, KV-cache reuse and prompt caching, multi-tenant low-latency serving. Cost per resolved conversation is a first-class metric.- Multilinguality & locale grounding — 30+ languages, low-resource adaptation, faithful translation, locale-aware reasoning, cross-cultural tone.- Reasoning & long-context modeling — chain-of-thought / planning post-training, reasoning-trace supervision, long-context training and serving, retrieval-augmented reasoning, self-consistency and verifier models.Responsibilities- Set technical direction. Own a multi-quarter roadmap across one or more of: post-training, agent harness, evaluation, self-evolving data flywheel, serving. Translate north-star metrics into a sequence of 2-3 high-ROI bets per quarter and ship them.- Compound the team. Hire and develop 1-3 strong ICs. Design their work surfaces for growth, not just dispatch. Raise the median technical bar through design review, code review, and 1:1 framing.- Stay in the loop with the model. Tech Lead is not a manager role. You still write the load-bearing PRs, propose the core abstractions, and write the design docs that decide the team's ceiling for the next 2-3 quarters.- Drive cross-team alignment. Partner with foundation-model, infra, product, and adjacent algorithm teams; own sign-off on cross-cutting technical decisions.- Observability and rollback. Build the per-turn tracing, tool-call analytics, and failure-mode taxonomies that let the team diagnose any regression within hours, not days.Minimum qualifications- BS / MS / PhD in CS, AI, Mathematics, or related quantitative field.- Hands-on experience in ML / NLP / applied DL. Top PhDs with strong publication record may qualify at 4+ years.- Strong Python and at least one of C++ / Go / Rust for production-path code.- Hands-on post-training or fine-tuning of frontier-class LLMs (≥7B, multi-node). Not API-only.- Has led at least one production LLM / Agent system from zero to one.Preferred Qualifications1. LLM post-training — multi-node SFT / DPO / online RL on ≥7B models; reward modeling; preference data construction; RLAIF / RLVR; distillation; QAT; long-context training; continual pre-training.2. Agent engineering (Anthropic-style) — production agent harness, context engineering, sub-agents, durable execution, MCP / Skill-style extensibility, parallel tool use, computer use.3. Reasoning & planning — chain-of-thought / reasoning-trace training, planner/critic decomposition, self-consistency, verifier models, multi-step reasoning evaluation.4. LLM / Agent evaluation — LLM-as-judge with human-agreement calibration, tau-bench / SWE-bench / GAIA / BFCL-style harnesses, regression + safety + cost + latency-aware evaluation.5. Inference & serving systems — vLLM / TensorRT-LLM, MoE, speculative decoding, KV-cache and prompt caching, low-latency multi-tenant serving.6. Multilingual & cross-cultural reasoning — multilingual SFT/DPO, low-resource adaptation, faithful MT, locale-aware reasoning.