Data Engineer (Python, Data Systems & AI Enablement

Key Connect Recruitment · Singapore

Sector
AI
Function
Product & Engineering
Level
Mid-Level
Employment type
Contract
Posted
2026-06-29
Source
mycareersfuture

Job Title: Data Engineer (Python, Data Systems & AI Enablement)Role OverviewPython-focused Data Engineer with strong hands-on coding skills in data-intensive systems. The role focuses on building scalable data pipelines, processing large datasets, and enabling AI/Generative AI applications through well-structured data infrastructure.Key ResponsibilitiesBuild and maintain scalable data pipelines using PythonWrite production-grade Python code specifically for data processing, transformation, and ETL workflowsPerform data cleaning, preprocessing, and feature preparation for analytics and AI use casesUse data analysis and manipulation tools to handle large datasets efficientlyDevelop reusable Python modules for data ingestion and pipeline automationPerform exploratory data analysis (EDA) to understand data patterns and quality issuesOptimize data workflows for performance, scalability, and reliabilitySupport data requirements for AI/ML and Generative AI systemsBuild data services and APIs to support downstream AI applicationsEnsure data quality, consistency, and observability across pipelinesRequired Python & Data Libraries (Hands-on Experience Mandatory)Candidates must have strong practical experience with:pandas — data manipulation, transformation, and analysisNumPy — numerical operations and array-based processingMatplotlib — data visualization and reportingscikit-learn — basic ML workflows and model evaluationPyTorch — deep learning and AI model experimentationAI / Generative AI EnablementPrepare and structure datasets for ML and LLM-based systemsSupport integration of AI models into data pipelines and applicationsEnable workflows for Generative AI use cases (RAG systems, agent workflows)Work with multiple AI model providers:OpenAIAnthropicLLaMAMistralExposure to AI orchestration frameworks such as LangChain, AutoGen, and CrewAICore RequirementsStrong hands-on Python coding expertise focused on data systems (critical requirement)Ability to write clean, efficient, production-grade Python codeStrong understanding of data structures, ETL pipelines, and data workflowsExperience working with large-scale structured and unstructured dataStrong SQL skills for data extraction and manipulationUnderstanding of data modeling and analytics workflowsAbility to support end-to-end data-to-AI pipelinesPreferred / Good to HaveExperience with big data or distributed processing systemsUnderstanding of vector databases and embedding-based retrieval systemsExperience building APIs or services for data/AI systemsFamiliarity with cloud platforms (AWS, Azure, GCP)Exposure to production monitoring and data observability toolsWhat Success Looks LikeHigh-quality Python code powering scalable data pipelinesReliable, clean, and well-structured datasets for AI systemsEfficient ETL workflows with minimal manual interventionSeamless support for ML and GenAI applications in production

Apply on mycareersfuture →
AI Distributed Processing large datasets Generative AI Application Development and Deployment Python Scripting Data Pipeline Efficient Workflow Analysis