GenAI & Big Data Engineer
Bayan Baru, MY
About our group:
The Global Wafer Systems (GWS) group is responsible for the design, development, and support of essential business solutions across factory locations in Asia, Europe, and the United States. The team specializes in the advancement and maintenance of factory control systems, utilizing Artificial Intelligence in fields including image processing, automated data monitoring, recommendation, and automated decision-making.
About the role - you will:
● Explicitly support Generative AI (GenAI) initiatives, including preparing data pipelines for LLM/RAG workflows, enabling feature engineering for GenAI use cases, and partnering with AI teams to ensure high-quality data readiness ● Architect and develop big data solutions for smart factory operations, including operational data, parametric data, sensor data, and image storage, processing, and retrieval ● Perform Extract, Transform, Load (ETL) operations from source systems to Hadoop ● Maintain and develop Oracle PL/SQL procedures and packages ● Conduct data wrangling and troubleshooting to resolve data-related issues ● Collaborate closely with data scientists, application developers, and cross-functional global teams to design robust data architectures
About you:
● Significant willingness to support GenAI adoption, model integration, and data readiness ● Flexible and adaptive in learning new technologies and programming languages ● Passionate about big data, statistics, data lifecycle management, data testing, data governance, and analytics ● Comfortable working with diverse teams across Asia, Europe, and the US ● Willing to operate in a global environment with occasional early morning or evening meetings ● Educational tertiary background in Computer Science, Software Engineering, or Data Science
Your experience includes:
● Excellent working knowledge of SQL, including writing, debugging, and optimizing distributed SQL queries ● Demonstrated career background with industry-leading ETL practices ● Proficiency in at least one programming language: Python ● Outstanding skills in data modeling, ETL development, and data warehousing ● Application of Oracle PL/SQL ● Hands-on working background preparing data for LLM/RAG/GenAI systems, such as vector-friendly transformations, metadata extraction, and structured/unstructured data processing You May Also Have: ● Exposure to GenAI frameworks (e.g., Llama, RAG pipelines, embeddings, inference optimization) ● Familiar with application of AWS technologies such as Trino ● Familiarity with batch processing and orchestration tools such as Airflow and NiFi ● Building streaming ingestion frameworks using Spark Streaming or Kafka
Location:
Our Penang office is located in Suntech at Cybercity. Easily accessible from two bus stops, many employees take mass-transportation to work. Ample free on-site parking is also available. Enjoy our on-site gym, test your ping-pong skills, or take on your colleagues in a badminton match after work. You can grab breakfast, lunch and coffee at our on-site cafe. Prefer to eat off-site? The public food court across the street offers many delicious options.
Location: Penang Malaysia Suntech
Travel: None