Berkay Çelik

Site Reliability Engineer - DevOps Engineer - Cloud Architect

LinkedIn | GitHub

About

Highly experienced Site Reliability Engineer with over 4 years of expertise in optimizing cloud infrastructure and implementing robust DevOps best practices. Proven track record of achieving 99.99% uptime for complex enterprise systems, significantly reducing infrastructure costs by 30%, and enhancing deployment efficiency by 60%. Adept in Azure, AWS, Kubernetes, Terraform, and MLOps, with a focus on scalable and resilient cloud solutions.

Work Experience

Senior DevOps Engineer

AMADEUS

Apr 2024 - Present

Istanbul, Istanbul, TR

Managed SRE operations for a high-volume travel platform, leading cloud migration efforts and architecting MLOps solutions.

  • Managed SRE operations for a travel platform processing over 1.5B annual transactions, achieving 99.99% uptime through Prometheus/Grafana monitoring and automated incident response.
  • Led a critical Azure cloud migration for 200+ microservices on OpenShift, reducing infrastructure costs by 30% and deployment time by 60% using Terraform IaC and CI/CD pipelines.
  • Implemented a comprehensive observability framework with 50+ custom Prometheus metrics and Grafana dashboards, resulting in a 40% decrease in Mean Time To Resolution (MTTR).
  • Architected an MLOps platform utilizing Kubeflow and MLflow for AI-powered recommendations, streamlining model release cycles from weeks to just 2 days.
  • Collaborated cross-functionally with 25+ engineers across 4 time zones to establish robust SRE best practices, achieving 95% error budget compliance.
  • Automated infrastructure provisioning using Azure ARM templates and Ansible, reducing manual tasks by 70% and enhancing operational efficiency.

Cloud Engineer

ICRON

Sep 2022 - Apr 2024

Istanbul, Istanbul, TR

Orchestrated cloud deployments for a supply chain platform, automating infrastructure and modernizing CI/CD pipelines.

  • Orchestrated complex cloud deployments for a supply chain platform serving 50+ enterprise clients, managing Azure infrastructure and reducing associated costs by 25%.
  • Automated infrastructure provisioning using Terraform and Ansible, drastically cutting deployment time from 4 hours to 30 minutes.
  • Designed and implemented a robust RBAC framework with 20+ custom roles, reducing access provisioning time by 75% and ensuring SOC2 compliance.
  • Implemented a comprehensive monitoring stack with Prometheus/Grafana, improving system availability from 97% to 99.5% and reducing MTTR by 40%.
  • Modernized CI/CD pipelines using Azure DevOps, implementing blue-green deployments to enable zero-downtime releases and enhance system stability.

Site Reliability Engineering

IBM

Jun 2022 - Sep 2022

Istanbul, Istanbul, TR

Optimized performance for enterprise applications and established SRE practices.

  • Optimized performance for 20+ enterprise applications using Instana APM, reducing P1 incident resolution time from 2 hours to 45 minutes.
  • Deployed and managed Kubernetes clusters on RedHat OpenShift with auto-scaling and self-healing capabilities, increasing system uptime to 99.5%.
  • Established and enforced SRE practices, including golden signals monitoring and error budgets, which led to a 40% reduction in Mean Time To Resolution (MTTR).

DevOps Engineer

AYSTEK Smart Software

Jan 2021 - May 2022

Istanbul, Istanbul, TR

Migrated applications to AWS, developed Terraform modules, and optimized serverless architecture.

  • Successfully migrated over 15 applications from on-premises to AWS (EC2, RDS, S3), resulting in a 30% reduction in monthly costs, saving $8K per month.
  • Developed comprehensive Terraform modules for AWS infrastructure, automating the deployment of over 50 resources and accelerating development cycles.
  • Implemented robust CI/CD pipelines using Jenkins and AWS CodeDeploy, achieving a 95% deployment success rate for critical applications.
  • Optimized serverless architecture leveraging AWS Lambda and API Gateway, significantly improving application response times from 800ms to 200ms.

Teaching Assistant

Ozyegin University

Jan 2021 - May 2022

Istanbul, Istanbul, TR

Delivered lab sessions, developed automated grading systems, and mentored teaching assistants.

  • Delivered over 20 lab sessions focused on optimization techniques to 150+ students, achieving a 92% student satisfaction rating.
  • Developed an automated grading system using Python and VBA, which reduced grading time by 60% and improved efficiency.
  • Mentored and trained 15+ teaching assistants through structured programs, enhancing their instructional capabilities and team performance.

Education

Artificial Intelligence

Ozyegin University

Jan 2024

Istanbul, Istanbul, TR

Courses

  • Focus: Machine Learning
  • Deep Learning
  • MLOps

Computer Science

Ozyegin University

3.2/4.0

Jan 2017 - Dec 2022

Istanbul, Istanbul, TR

Certificates

Microsoft Azure Administrator (AZ-104)

Microsoft

Jan 2023

Projects

Cryptocurrency Price Prediction with Sentiment Analysis

Engineered deep learning models (Informer, Autoformer) to predict BTC/ETH prices using Twitter sentiment analysis.

Blockchain E-Commerce Platform

Built a decentralized e-commerce platform using React, Node.js, and Solidity smart contracts.

Skills

Cloud Platforms

  • Azure (AZ-104 Certified)
  • AWS (EC2, RDS, Lambda, S3, SageMaker)
  • Google Cloud Platform

Container/Orchestration

  • Kubernetes
  • Docker
  • OpenShift
  • Helm
  • ArgoCD
  • Kubeflow

Infrastructure as Code

  • Terraform
  • Ansible
  • Azure ARM Templates
  • CloudFormation
  • Pulumi

CI/CD Tools

  • Azure DevOps
  • Jenkins
  • GitHub Actions
  • GitLab CI
  • Tekton

Monitoring/Observability

  • Prometheus
  • Grafana
  • ELK Stack
  • Datadog
  • New Relic
  • Instana APM

Programming Languages

  • Python
  • Go
  • Bash
  • PowerShell
  • JavaScript
  • SQL

MLOps/AI Tools

  • MLflow
  • Kubeflow
  • TensorFlow
  • PyTorch
  • A/B Testing

Methodologies

  • Agile
  • Scrum
  • GitOps
  • SRE Practices
  • DevSecOps