Site reliability engineer

Site reliability engineer

Arbetsbeskrivning

A few words about us
Peltarion provides an operational AI platform for producing real-world AI applications at scale and at speed. Our goal is to make machine learning widely available: be it rapid prototyping for data scientists, using our wizard to build an image classifier for a hobby app project, or a sentiment analysis system for a business case. It is the first platform to provide fast, efficient and scalable production of commercially viable AI applications without extensive prior knowledge about machine learning.
We have barely scratched the surface of what is possible and AI will change the world fundamentally. At Peltarion we have been helping doctors fight cancer, carmakers optimize battery power, curators identify moods in music, farmers keep their crops secure... The platform plays a key role in this. And the opportunities to do more expand every day. Today, AI and deep learning is certainly not for everyone. We want to change that. Enable a wide audience to solve new classes of problems that were previously hard or impossible to solve, even without prior AI experience.


About the role
As a Site Reliability Engineer in our Infrastructure team, you'll provide support on application design and development. You’ll help us to manage and maintain all of Peltarions infrastructure and to make sure that we have all the tools, automation and monitoring in place to make our platform reliable and scalable. We work close to developers and architects and in the last few years we have moved from running our services natively on VMs to a hybrid container/Kubernetes setup with the goal of moving as many services as possible to cloud Kubernetes solutions.
To succeed, you will need to be an experienced problem solver with a generalist mindset and a genuine interest for cloud environments, so that you can design and build stellar infrastructure spanning multiple clouds. You enjoy debugging complex problems and to find solutions for how to fix them. You will work in a team of like-minded people where innovations are at the core. You will also share responsibility for on-call with the team.
Our ever evolving tech stack currently consists of (to mention a few) Ubuntu, Terraform, SaltStack, GCP, GKE, Cloud Build, Azure, AKS, Docker, Prometheus, Grafana, Graylog, PostgreSQL, Nginx, HAproxy, Python and Java.
We need you to have/be:
A good understanding of Linux
Programming skills in Bash, Python and/or Go
Experience from configuration management and CI/CD
Knowledge of logging, monitoring & alerting
Experience with Kubernetes and container orchestration
An understanding of cloud and traditional networking
Experience from administration of cloud environments such as GCP and Azure
A desire to automate manual and repetitive tasks
Able to both work independently and as part of a team
Fluent in English

And if you also are/have, then even better!
Experience with working in a remote team
A past in maintaining machine learning and/or GPU systems
Experience from building tools and services
Experience with security principles
Skills in Java

If you have a peculiar hobby, a kind heart, and a dedicated mind, and are looking for a new challenge, there’s a good chance we’ll be a great match. Please reach out if this strikes your interest

Sammanfattning

  • Arbetsplats: Peltarion AB Stockholm
  • 1 plats
  • Tillsvidare
  • Heltid
  • Fast månads- vecko- eller timlön
  • Publicerat: 29 juni 2021
  • Ansök senast: 11 juli 2021

Postadress

Holländargatan 17 B
Stockholm, 11160

Liknande jobb

20 augusti 2010

21 augusti 2010

Webutveckling, Java

23 augusti 2010

Supporttekniker/drifttekniker

16 augusti 2010