SRE/Site Reliability Engineer - Leading E-Commerce/Internet firm, fast paced but work life balance, cutting-edge technical environment, AI Adaption

Dadaconsultants · Singapore

Sector
AI
Function
Product & Engineering
Level
Intern
Employment type
Full Time
Posted
2026-06-25
Source
mycareersfuture

Our client is a leading e-commerce brand dealing with premium goods. They're rapidly expanding into Singapore and are seeking an experienced Site Reliability Engineer (SRE) to join their team. The new joiner will be responsible for ensuring the stability, availability, and efficiency of their core business systems. This is a high-impact role that sits at the intersection of engineering excellence and operational resilience, giving you direct ownership over service governance, incident management, and high-availability architecture. If you thrive in fast-paced, high-stakes environments and are passionate about building systems that scale, this is an excellent opportunity to make a meaningful contribution.Key ResponsibilitiesOwn the end-to-end stability of business-critical applications, including deployment, configuration management, status monitoring, and capacity managementLead the investigation and resolution of major production incidents, conduct thorough post-incident analysis, and drive follow-up optimisation initiativesDevelop and execute fault drill exercises, maintain emergency response plans, and produce clear SOP documentation for operational proceduresDrive high-availability improvements across applications, encompassing rate limiting, graceful degradation, fault tolerance, disaster recovery, and multi-active deployment strategiesEstablish and maintain SLO evaluation frameworks, quantify incident impact on service level objectives, and track improvement initiatives to completionDeeply analyse and optimise service governance across critical paths, including performance bottleneck identification, issue localisation, and high-availability architecture upgradesDevelop and enforce operations and maintenance standards, and translate these into tooling and platform-based solutions to improve O&M efficiency and operational safetySupport colleagues with day-to-day IT queries and resolve network issues within the Singapore office environment, including matters relating to cloud service provider connectivityRequirementsMinimum 5 years of operations and maintenance experience in an internet or technology company environment, with strong expertise in system fault diagnosis and production incident resolutionProficiency in at least one scripting or programming language such as Python, Shell, Go, or Java; hands-on project development experience (is a bonus)Familiarity with common web middleware and message queue technologies as well as monitoring toolsWorking knowledge of open-source caching solutions including Memcache, Redis, and Twemproxy, along with JVM memory and GC mechanisms with the ability to diagnose Java process anomaliesExperience with business capacity management, microservices architecture operations, and full-lifecycle stability governance from system design through to production go-live (is a bonus)Familiarity with the SRE operational framework and experience maintaining high-concurrency, high-availability distributed systems (is a bonus)If you are passionate about technology and meet the above requirements, please don't hesitate to apply. Please note that only shortlisted candidates will be contacted. Appreciate your understanding. Data provided is for recruitment purposes only.About UsDada Consultants was established in 2017, with the commitment of providing the best recruitment services in Singapore. We are comprised of a dynamic head-hunting team dedicated to sourcing for highly competent professionals in IT industry. We provide enterprises with customized talent solutions, and bring talents to career advancement.Dada Consultants Pte LtdWebsite: www.dadaconsultants.comEA License No.: 18S9037Business Registration Number: 201735941W

Apply on mycareersfuture →
AI Liaise With Production Departments improvement targets Governance Cloud Networking Computer Proficiency Business Resilience Configuration Management