> init ml_engineer.exe

Abdul Raheem AI Engineer · ML Engineer · Agentic AI

I architect and build production-ready applications powered by Large Language Models. My focus is on moving beyond off-the-shelf models to create robust, secure, scalable, and context-aware AI systems — deployed across cloud and on-premises environments.

5+
Years Experience
LLMs
Production Systems
RAG
Enterprise Pipelines
6x
Certified
// core expertise
What I Engineer
🔍

Production RAG Systems

Engineering enterprise-grade RAG beyond simple vector lookups. Intelligent chunking, hybrid search (sparse + dense vectors), multi-stage retrieval with query transformation and cross-encoder re-ranking — resulting in verifiable, low-latency systems trusted in production.

LangChainLlamaIndexPineconeHybrid SearchRe-ranking
🧬

LLM Fine-Tuning & Optimization

Fine-tuning open-source models (Llama 3, Qwen3, Mistral) on custom datasets. Leveraging HuggingFace ecosystem (Transformers, TRL, PEFT) with parameter-efficient methods like LoRA/QLoRA to create smaller, niche-expert models.

Llama 3MistralLoRAQLoRAHuggingFacePEFT
🤖

Agentic AI Systems

Engineering autonomous AI agents capable of reasoning, planning, and executing multi-step workflows. Building stateful, multi-actor systems with LangGraph for automated synthetic data generation, complex analysis, and MCP-powered automation.

LangGraphMCPMulti-AgentLangSmithTool Use
☁️

Cloud & MLOps

Deploying AI solutions across AWS and Azure — from containerized microservices to fully managed ML platforms. End-to-end MLOps with Docker, Kubernetes, CI/CD pipelines, and real-time model monitoring.

AzureAWSDockerKubernetesDatabricksLLMOps
🔐

Synthetic Data & Privacy

Building synthetic data generation pipelines with 94% accuracy and zero distribution loss. PII anonymization, NeMo guardrails, and enterprise-grade data security for compliance-ready AI systems.

Synthetic DataPII AnonymizationNeMoGuardrails
🧠

Deep Learning & Research

Designing neural architectures for CV, NLP, and tabular data. Predictive analytics, real-time inference pipelines, and publishing research at top venues including ICML.

PyTorchTensorFlowScikit-learnVLLMCNNs
// work history
Experience

Machine Learning Engineer (LLM)

Betterdata
Feb 2025 – Present
  • Built an intelligent document processing agent that secured Google funding and a feature at Google Demo Day Singapore; extracts data with 99% accuracy using layout-aware algorithms
  • Engineered an agentic app that anonymizes invoices by replacing PII with synthetic data, retaining exact visual layout, fonts, and pixel coordinates
  • Built an OpenAI-powered agentic pipeline connecting multiple data sources to generate high-fidelity synthetic datasets; implemented LangSmith for end-to-end tracing and monitoring
  • Fine-tuned Llama-3.2 on a 200M-record cybersecurity dataset for DHS (USA), achieving 94% synthesis accuracy with zero distribution loss
  • Built secure RAG pipelines with Pinecone vector DBs, sliding-window chunking, and metadata-enhanced filters to reduce LLM hallucinations
  • Authored research paper at ICML on instruction-tuned tabular foundation models

Machine Learning Engineer

DevHack Org
Dec 2024 – Feb 2025
  • Built an AI agent using LangChain SQLDB and LlamaIndex to convert natural language into SQL with dynamic chunking and a Plotly-powered Graphing Agent
  • Built an MCP-powered automation agent using GPT-4o and FastAPI for smart screen interaction from natural language prompts
  • Developed a LangGraph multi-agent system to detect workflow failures, apply fixes, and recover pipelines using agentic principles
  • Built a VLLM-powered prescription extraction app to identify medicine names, dosages, and instructions from handwritten doctor prescriptions
  • Designed and deployed a lead-generation bot using context engineering and RAG workflows

Data Scientist

Royal Cyber Inc
Jun 2023 – Nov 2024
  • Built ProClaim — a claims processing solution with Databricks Lakehouse and Agentic RAG, achieving 40% faster analysis and processing 100K records in 12 minutes at 99% accuracy
  • Deployed a multi-modal AI chatbot using Azure OpenAI and Whisper for automated ticket categorization and response generation
  • Fine-tuned LLMs (Llama 2, GPT-3.5) to build a Personalized Financial Planner, increasing customer satisfaction by 25%
  • Built a scalable Financial RAG pipeline processing 1,000+ page documents with advanced chunking and Pinecone embeddings
  • Built RCProfilePro — an ATS-friendly resume converter deployed on Azure Container Apps with Microsoft SSO, boosting screening efficiency by 50%

Machine Learning Engineer

ApexTech Health
Aug 2021 – May 2023
  • Optimized predictive models for early Sepsis detection using live vital data from smartwatches, preventing severe repercussions by 70%
  • Built robust data pipelines for 10K patient records with efficient S3 storage
  • Engineered real-time vital sign pipelines, reducing complications by 30%

Open Source Contributor

GitHub
Dec 2020 – Feb 2021
Lightning AI — Metrics

Identified and resolved issue #518 by implementing a comprehensive quality control process for checking and updating non-working links in all .rst files, leading to a 95% reduction in broken links across the website.

Apache — APISIX

Implemented a switch from radixtree_uri to radixtree_host_uri in the default HTTP router with test cases, resolving route priority confusion for users and reducing latency by 15% and improving overall performance.

// featured projects
Projects

DevHub

Python · Flask · Spring Boot · AzureML

ML-based web app offering career mentorship, personalized roadmaps, virtual internships, and an LLM chatbot. Drove project to Imagine Cup National Finals, won Microsoft AI Hackathon, and secured Microsoft funding.

→ View on GitHub

Enterprise Document Analyzer

Python · FastAPI · AWS · Amazon Bedrock

Scalable document analysis platform using RAG pipeline with Amazon Bedrock, Lambda, and S3. Implemented role-based access guardrails using NeMo for enterprise-grade data security.

Emodect

Keras · CNN · Flask

Live camera emotion detection for students — triggers alerts and emails to teachers when detecting signs of distress. Improved student mental health outcomes by 30%.

→ View on GitHub
// blogs & content
Writing & Talks

I write about LLMs, cloud AI, and ML engineering on DEV.to and Medium.

🔥 MOST POPULAR 72 ❤️ · 5 min read

Deploy Your LLM on AWS EC2

llm · rag · ec2 · aws

Step-by-step guide to deploying your own Large Language Model on AWS EC2 — from instance selection to serving inference at scale. Published on AWS Community Builders.

→ Read on DEV.to
AWS BUILDERS 19 ❤️ · 4 min read

Deploy Your Static Web App on AWS S3 in 10 Minutes

aws · s3 · serverless · static web apps

A practical walkthrough on hosting static web applications on S3 — from bucket setup to deployment in under 10 minutes. No servers needed.

→ Read on DEV.to
GOOGLE AI CHALLENGE 16 ❤️ · 3 min read

Extractly — Turn PDFs into Data

ai · gemini · devchallenge · google

Submission for the Google AI Studio Multimodal Challenge — an intelligent tool that converts complex PDF documents into structured, usable data using Gemini.

→ Read on DEV.to
AWS BUILDERS 5 ❤️ · 6 min read

Deequ: Your Data's BFF

deequ · aws · data · analytics

Deep dive into Amazon Deequ for data quality — ensuring reliable ML training data through automated validation, constraint suggestions, and anomaly detection.

→ Read on DEV.to
DEV.TO 9 min read

Mastering Amazon ECS: Key Building Blocks

aws · ecs · containers · cloud

Comprehensive guide to Amazon Elastic Container Service — understanding tasks, services, clusters, and how to orchestrate containerized applications at scale.

→ Read on DEV.to
DEV.TO 7 min read

Machine Learning 101: A Guide for Beginners

machine learning · beginner · guide

A comprehensive beginner's guide covering ML fundamentals — from supervised and unsupervised learning to practical implementation steps for your first model.

→ Read on DEV.to

Medium Blog

Technical articles on Git workflows, cloud development, and AI engineering. Including guides on GitHub Codespaces and developer productivity.

→ Read on Medium

YouTube Channel

Video tutorials and talks on machine learning, cloud computing, and software engineering. Subscribe for deep-dive technical content.

→ Watch on YouTube
// credentials
Certifications
Azure AI Engineer Associate (AI-102)
Microsoft · Jun 2024
Azure Data Scientist Associate
Microsoft · Jul 2024
Generative AI Certified Professional
Oracle · Jul 2024
GitHub Foundations
GitHub · Jul 2024
Docker Essentials
IBM · Jun 2024
Linux System Administration Essentials
Linux Foundation · Jul 2024
Designing Azure AI Solutions
Microsoft · Jul 2023
Python for Data Science
IBM · Aug 2022
AI Programming with Python Nanodegree
Udacity · 92% Score
// education
Education

Bachelor of Science in Computer Science

COMSATS University — Sahiwal Campus
GPA: 3.56 / 4.0

Aspire Leaders Program

Harvard Business School
Academic and Professional Development Program
// community & leadership
Community Impact

Gold MLSA — EMEA

Microsoft · 2020 – 2024

Achieved highest Gold rank. Mentored 10,000+ professionals and students. Built a community of 10K students, organized 2 hackathons, inter-city coding competition, and 50+ tech events. Won Microsoft Mentor award.

GDSC Lead

Google · 2022 – 2023

Led a 30-member diverse team. Secured $40K in annual resources from DataCamp and 100+ LinkedIn vouchers from Microsoft through strategic collaboration with global communities.

GitHub Campus Expert

GitHub · 2022 – Present

Supported thousands of students through mentorship and resources. Empowered women and underrepresented minorities via tech initiatives. Delivered talks at 10+ international conferences.

AWS Community Builder

AWS · 2023 – Present

Technical thought leader sharing knowledge about cloud-native ML infrastructure with access to AWS resources, mentorship, and networking opportunities.

CNCF Chapter Organizer

Cloud Native Sahiwal · 2023 – Present

Founded Cloud Native Sahiwal. Organizing regular meetups on containerization, microservices, Kubernetes, and cloud-native technologies.

10+ International Talks

Global Conferences & Events

Delivered talks at international conferences representing GitHub, Microsoft, and Google. Advocating tech innovation and community building across borders.

// reach out
Get In Touch
Email
arfarooqix@gmail.com
LinkedIn
@xfarooqi
GitHub
@XFarooqi
X / Twitter
@X_Farooqi
Phone
+92 332 3030358
Linktree
@abdulraheem01