Lets people listen to music, sports and news from around the world.
Our mission is to deliver the world’s best listening experiences. Every day we make good on that promise for millions of listeners through our flagship mobile and web applications along with more than 200 connected devices and services.
Meet TuneIn Engineering
We value being a top-notch engineering organization, and have the same high standards with our code and people. We make time for quality, we are agile and pragmatic, we keep it simple, we are data driven, and we love getting better. Check out our principals here: https://github.com/tunein/engineering/blob/master/Principles.md
We regularly invest time in your future and support growth, and we show this in a number of ways—clear job responsibilities and expectations for your career path, freedom to move teams, Mission Teams to contribute more broadly, and our quarterly Discovery Days, whereby you spend time in the form of building innovative features, products, or approaches to problems ("hackathon" like); addressing nagging issues or problems that take time away from adding value to TuneIn; or simply learning a new technology.
About this position
Tunein is building a Site Reliability Engineering (SRE) team to ensure we delight users 24x7, maintain 99.9% availability, and drive our mean time to resolve issues (MTTR) to zero. We’re looking for bright, motivated, accountable, empathetic, and mature folks who are passionate about availability, observability, mitigating risks and failure modes, and the overall performance of our platform.
On a given day, your job might be selecting, configuring, or building the right tools to detect risks or failures and automatically alert & escalate where appropriate. You might be building or configuring monitors, alerts, & escalation procedures. You might be working with engineering teams to design for failure and build in appropriate fault tolerance and observability techniques (e.g., logging). You might be developing ways to test for failure, capacity, & performance using our version of the simian army.
As a Site Reliability Engineer, you:
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Scale systems sustainably through automation, and evolve systems by pushing for changes that improve reliability.
Practice sustainable incident response and postmortems.
Participate in building advanced tooling for monitoring, administration, and operations.
Troubleshoot issues across the entire stack - hardware, software, application and network.
Work with InfraOps, DevOps, and Engineering teams to ensure the stability and reliability of platform
Work with monitoring tools and triage network, server and database issues
Help setup, evolve, maintain, and administer tools and automation for key components.
Identify issues affecting service uptime and performance
Participate in an on call rotation and be available for escalations.
Monitor system uptime and availability, ensuring functional and performance SLAs
Establish end-to-end monitoring and alerting on all critical aspects to ensure SLAs and get proactive notifications of possible issues for all systems
Creation of playbooks to run when responding to alerts or incidents
Develop automation to streamline standard operating procedures and prevent problem recurrence.
What you bring to the table (qualifications)
You have demonstrated creativity and results in identifying, scoping, and building tools to solve our availability, MTTR, and performance goals at scale. You have a strong understanding of operations monitoring pipelines and web architecture scalability challenges.
Systematic problem-solving approach
Strong, empathetic communication skills
Incredibly pronounced sense of ownership, accountability, and initiative
Experience with techniques and tools to detect and escalate failures. (today we use Datadog, SumoLogic, PagerDuty, Nodeping, and a few more things)
Depth in distributed systems and scalability patterns for web apps and services
Experience with techniques to reduce risks, failures, and downtime, e.g., clustering & failover.
Hands on experience with infrastructure technologies at scale, across cloud (AWS) and data centers. Today we use Linux, Windows, various key/value and SQL data stores, VMWare, Docker, AWS Lambda.
Practical knowledge of shell scripting and scripting languages.
Experience with principles of software development and design, and experience with common programming languages
BS degree in a technical field or equivalent experience Bonus Points
Experience with basic statistical analysis of streaming data sets
Experience with anomaly detection techniques and algorithms
Experience with Application Monitoring (APM) tools/vendors
Experience with Chaos Monkey-type software, load testing techniques, and other ways to break things on purpose
TuneIn is headquartered in San Francisco, in the heart of the SOMA district, across from AT&T Park. We also have a vibrant, growing office in Venice, CA with a full recording studio, where artists and personalities publish new content to the community every week. We’re well-funded by the most prestigious names in venture capital, including Sequoia Capital, General Catalyst Partners, Google Ventures, and Institutional Venture Partners.