Senior Site Reliability Engineer -->

This site looks healthier in portrait mode.

  •   New York, NY

Senior Site Reliability Engineer

Do you have a serious passion for ensuring complex systems never skip a beat? Have you ever learned a new technology, design pattern, or work in the cloud? Do you take a special joy in improving developer productivity?

Join Zocdoc as a Senior Site Reliability Engineer! You’ll be challenged to maintain, improve, and create new features in an ever changing environment while ensuring uptime and ludicrous speed response times for our patients and doctors. Work in a 100% Cloud based Microservice environment in AWS. You should love challenging the status quo and strive to make everything you touch easier, faster, and more robust.

Minimum Qualifications: 

  • B.S Degree in Computer Science, Computer Engineering, or equivalent engineering experience.
  • 5+ years of experience supporting Production Web Systems.
  • 5+ years of coding experience.
  • 2+ years of production Cloud experience (AWS, Azure, GCP).
  • Experienced in running large scale distributed web services.
  • Experienced with running and managing containerized workloads.
  • Deep understanding and troubleshooting of protocols such as: TCP/IP, HTTP/HTTPS, TLS, DNS, NTP.
  • Experienced working with creating tests, using test frameworks, and CI/CD process in general.
  • Experienced with Configuration management systems (Puppet, Chef, Salt, Ansible, etc.) and CI Systems (Jenkins, Teamcity).

Preferred Qualifications:

  • Experience with Kubernetes and running containers on Kubernetes.
  • Advanced load balancing and routing experience using frameworks like Kong, Istio, Envoy.


  • Maintain infrastructure, systems, and services in production AWS cloud environments.
  • Diagnose code and infrastructure related issues in Production.
  • Work with Development and Product teams to enhance automation pipelines and improve service reliability. 
  • Write, review, and contribute to various code bases.
  • Performance analyze and tune systems, code, and networking for scaling and optimal operation.
  • Participate in on-call rotation.