Sorry. This page is not yet translated.

Offers customer experience management software.

Site Reliability Engineer
Buenos Aires, AR
Job Description / Skills Required

Medallia is the global leader in Customer Experience Management. Our goal is to create a customer-centric world where companies see you as a person, and not just their next sale. We do this by creating a bridge between companies and their clients, giving them access to your eyes, ears, and hearts, so they can design and deliver exceptional experiences, every single day. Medallia Engineering is a global, no­-nonsense flat organization where the best ideas are implemented, no matter where they come from. We’ve got a culture focused on smarts, kindness, continual learning, and feedback…and our people love it. Come find out why! Production Engineering at Medallia requires a desire to dig deep into of a wide range of technologies, and a relentless drive to make the customer experience better through investments in automation and infrastructure improvements. Business is booming, and the infrastructure that supports the Medallia platform needs to continuously scale to meet the demands of our explosive growth. Production Engineering at Medallia brings together the infrastructure and applications that power a highly reliable, agile, and efficient global SaaS platform. We are building a next generation global data center operating system that spans on-premise and cloud-based infrastructure, leveraging some of the most exciting new open source technologies. We work closely with product and platform engineering to make the world's best customer experiences even better. Production Engineering owns the reliability of key components of the applications and infrastructure stack at Medallia, and ensure that they continue to scale with our rapidly-growing business.

As a Production Engineer, you may:

Deploy and update applications within our systems foundation (compute, storage, network, etc.).
Build monitoring automation to prove and maintain a world-class end-user experience.
Debug and solve complex problems that may span the full service stack.
Proactively monitor and manage the availability of infrastructure and applications.
Optimize performance of components across the full service.
Be a part of an engineering team on-call rotation for escalations.

2+ years of demonstrated experience managing and maintaining large scale SaaS platforms.
Strong understanding of the Linux operating system.
Ability to code or script automation in at least one language (Java, Go, Python, Ruby, Perl, Bash, etc.) on Linux-based platforms.
Experienced with physical, virtual and cloud environments.
Deep experience in at least one infrastructure component (operating systems, compute, storage, networking, data center, distributed systems, big data, cloud, etc.) and solid understanding of the rest, and how they impact services.
Experience building, configuring, and maintaining operational monitoring and reporting tools.
Solid understanding of infrastructure and application performance metrics, including capacity planning.
Familiarity with cluster management tools such as Mesos, Docker Swarm, Kubernetes, Marathon, Aurora.
Familiarity with distributed storage and filesystems such as CEPH, HDFS, GFS, IPFS.
Familiarity with relational databases, particularly PostgreSQL
Proven ability to work independently, and strong problem solving skills
Strong communication skills
BS/MS in Computer Science, Engineering, or related field preferred