Sorry. This page is not yet translated.
Evernote

Brings your life's work together in one digital workspace for storing and sharing.

Staff Site Reliability Engineer
Austin, TX, US
Bothell, WA, US
Redwood City, CA, US
San Diego, CA, US
Job Description / Skills Required

About Us

Our SRE team is responsible for the overall performance and reliability of Evernote’s service and products. This includes software and infrastructure used by over 200 million passionate and engaged users around the world, with billions of notes and files. SRE creates and manages the resilient and scalable compute, network, storage, and database systems that serve as the foundation of Evernote. As a Staff SRE, you will contribute to the ongoing mission of delivering an exceptional service to our users.

What you will do

  • You will research and analyze new technology to solve problems at all layers of our stack
  • You will partner closely with engineering teams to maintain and scale our platforms that run our software
  • You will own the development of technical standards for new services that ensure success in production environments
  • You will publish internal design documentation and procedures that provide detailed specifications for the engineering audience
  • You will develop software and maintain automation systems to reduce toil and to run our infrastructure at scale
  • You will design and implement secure solutions with our Security team to protect our users’ data
  • You will champion our SLOs and continuously improve them
  • You will act as a subject-matter expert for critical infrastructure and provide mentorship for the department in those areas
  • You will participate in an on-call rotation to help maintain the availability of our service so that users always have access to their data

What we are looking for

  • You take initiative and lead by example to motivate your peers
  • You focus on quality to build resilient, scalable, and maintainable systems
  • You make decisions based on data and exercise judgement to balance risks and rewards
  • You partner with your teammates and thrive in a collaborative environment to tackle challenging technical problems
  • You share enthusiastically with your colleagues and provide strong mentorship
  • You write excellent documentation that inform and influence your audience

What you have done

  • You have 6 or more years of experience running a large-scale, online web service in a cloud environment
  • You know Linux systems like the back of your hand and mastered the fundamental TCP/IP networking protocols
  • You are an expert in Kubernetes and cloud-native infrastructure
  • You have worked with product teams to launch and run distributed microservices and have experience with service mesh platforms such as Istio
  • You have integrated and used third-party metrics and monitoring platforms such as Datadog and Pagerduty
  • You have successfully deployed configuration management and automation systems
  • You are an expert at debugging complex systems and incident response
  • You have developed extensible and maintainable software and tools that make an SRE’s job easier
  • You have Google Cloud Architect certification or equivalent experience

Skills that are particularly meaningful to us

  • Google Cloud Platform: VPC networking, firewalls, load balancing, GCE, GKE, GCS, PubSub, Spanner, App Engine, BigQuery, BigTable
  • AWS: EC2, S3, ELB, VPC networking
  • Monitoring: Pagerduty, Datadog, Splunk, nagios
  • Tools: Ansible, Puppet, Helm, Jenkins, Cloud Deployment Manager, Terraform
  • Infrastructure: Kubernetes, HAProxy, Envoy, Elasticsearch, Consul, Istio, Vault
  • Programming: Python, Java, Node.js, Go, shell