Senior Software Engineer, Data Infrastructure
San Francisco, CA, US
Job Description / Skills Required
We are a San Francisco based team building self-driving semi trucks. We have raised $117MM in total and are backed by Tiger Global Management ($70MM Series C) and Sequoia Capital ($30MM Series B). We move freight daily between LA and Phoenix using our purpose built transfer hubs. This is an incredibly exciting time for autonomous driving and our team is looking to grow.
Data is core to everything that we do at Embark. Our fleet of self-driving trucks generate petabytes of valuable data from the road, and this data powers all of our engineering processes that enable us to deliver on our mission. From training machine learning algorithms, to generating and executing simulations in which we can measure the performance of our virtual driver across a wide variety of scenarios, to extracting insights from road data to drive our engineering decisions, data is fundamental to everything that we do, and the data engineering team builds the pipelines, architectures, APIs, and processes that power all of this.
As a data engineer, you’ll help make all of this possible.
Some of your responsibilities will include:
- Maintain the on-vehicle code responsible for data collection, monitoring and real-time communication with our backend systems (supporting use cases such as real-time low-latency streaming of sensors such as camera and LiDAR over LTE) while also building the backend systems that power all of this.
- Evaluate, architect, and deploy new database systems and data pipelines to enable our engineering team to gain richer insights into our data
- Build scalable data pipelines which operate over petabytes of autonomous vehicle data to extract useful features, enable advanced queries, and feed machine learning models and simulation environments
- Deploy, build, and maintain infrastructure that ingests terabytes of data uploaded from our vehicles on a daily basis. This includes the software and hardware, as well as the operational processes that power this system.
Your experience might include:
- 3+ years architecting and maintaining systems that process and store large amounts of data
- Significant experience with Python, C++, Go, or similar
- 2+ years experience managing distributed data processing frameworks such as Spark and Hadoop; significant, direct experience writing jobs using such frameworks
- 3+ years experience working with AWS and/or GCP
- Experience with classical relational and NoSQL databases
- Experience working with distributed message brokers like Kafka, Kinesis, or RabbitMQ
- Experience managing large Kubernetes clusters powering microservice-oriented architectures
When you apply, address the application to Jacqueline and let me know why you want to join our team.
A few company highlights: