Вакансия Lead Site Reliability Engineer

5 вакансий
Специализация: DevOps/Sysadmin
Уровень: senior
Опыт: 5 лет
Уровень английского: Upper-Intermediate
Город: Минск
Режим работы: Полный день
Размер команды: 3—5
Размер компании: 2000
java
apache kafka
google cloud platform
python
cassandra
elasticsearch
zookeeper

About us:

At Fitbit, our mission is to help people lead healthier, more active lives by empowering them with data, inspiration and guidance to reach their goals.

We started our journey in 2007—as a team of two with one big idea. Today, that idea has become a movement. Our culture combines the spirit of a startup with the perks of being public. As part of our team, you’ll have the opportunity to grow your career, contribute your ideas to life-changing products and services, and—above all—have fun doing it.

Think you’ve found your fit? See what we’re looking for below and apply today.

The Team:

Site Reliability Engineers ensure that Fitbit's site and backend services are available, healthy, and that customers are having a positive experience. We champion best practices to measure, manage and enhance site reliability. We encourage others to treat change, operational flexibility, and observability as first-class concerns and make informed tradeoffs between functional and operational goals.

Our goal is to transform the system such that services have service-level objectives (SLOs) and the appropriate amount of monitoring/alerting so that teams can balance shipping fast with maintaining the stability and reliability of their features and services.

The Role:

As Fitbit continues to build microservices and begins its migration to Google Cloud, our system is growing ever more complex. Our use of technologies such as Kafka, Cassandra, Elasticsearch, ZooKeeper, and Finagle/Finatra have been increasing significantly and in some cases have led to issues as we encounter their rough edges or the limits of the company's knowledge of those applications and frameworks. Additionally, our legacy codebases that utilize Spring and Hibernate often require developers with deep expertise in those frameworks to help with refactoring or to handle incidents.

In order to remain effective the SRE team wants to expand its skill set to include those technologies from a whitebox perspective so we can better respond to incidents and to advise teams using them.

That's where you come in. We're looking for someone that is an expert with the Java programming language and ecosystem (its tools, libraries, frameworks, etc.) to help the SRE team and other Fitbit software engineers "level-up" when it comes to building, scaling, and operating Java-based applications. In addition, you'll work with others to identify and fix latent performance, observability, scalability, and reliability issues in our system.

Technical Requirements:

  • 5+ years of experience as a software engineer, site reliability engineer, or similar role
  • You're extremely comfortable with the Java language and ecosystem. You know the ins and outs of the language, are comfortable with the standard tools for measuring and tuning performance, how to debug or troubleshoot Java-based applications, have an understanding of the Java memory model, and have experience with concurrency and multi-threaded code
  • Experience building, deploying, and operating high-traffic, scalable web applications and services
  • You are familiar with or have an interest in diving into the internals of applications and frameworks like Kafka, Cassandra, ZooKeeper, Elasticsearch, Spring Framework, Hibernate, and Finagle

Non-technical Requirements:

  • Able to work effectively as part of a small (and possibly distributed) team
  • You can communicate effectively with peers and are able to tailor your communication to your audience
  • Ability to take a high level (and often vague) problem, design the solution, and work independently to deliver the project

Responsibilities:

  • Able to independently design and implement moderately complex components or systems.
  • Create and own technical design documents
  • Act as an expert for the tools, systems, and/or applications you work on
  • Coaching and working with other teams to build scalable and reliable software
  • Mentoring new team members
  • Acting as a positive example for other software engineers to follow
  • Contribute to the goals of a globally-distributed team and be willing to take an active role in helping the team deliver results
  • Contribute process improvements that boost productivity and quality
  • Participate in the team’s production on-call rotation

Nice-to-haves:

  • Experience being part of an on-call rotation and responding to production incidents
  • Some familiarity with Python and its ecosystem

Perks & Benefits:

  • Competitive salary and stock options (RSUs)
  • International Team of world-class engineers
  • Cutting edge tech stack
  • Premium hardware of your choice: Apple or Lenovo
  • Comprehensive medical insurance: medicines and dental
  • Fitbit employee discount on devices and accessories
  • Sport expenses compensation
  • Flexible work hours
  • 26 days vacation, paid sick leave, extra days off
  • Free English classes with native speakers
  • Business trips to US & Europe
  • Fully-stocked kitchen
  • Office located in the city center
  • Work on life-changing product
Picture?type=square
Представитель компании
Вакансии компаний