Principle Site Reliability Engineer - Architecture Job at Zeektek, Portland, OR

V25xR0wyTG5yZTB2VVJJUWtlb1kzclY3SlE9PQ==
  • Zeektek
  • Portland, OR

Job Description

Job Description

We have a Sr Lead SRE Architect role for a candidate that's not only hands on, but also will be involved with strategy, architecture, and innovation, leadership, heavily influential with technical direction, and someone who’s will to challenge, see the blind spots, and not afraid to bring in new ideas.

The initial projects and for the next year will be to provide platform stability and increase performance for the Java Code base, MySQL and API's.

The SRE need to have a solid background as an SRE in Java environments, API's and MySQL.

Qualifications:

  • Bachelor’s degree in Computer Science or equivalent years of experience
  • Datadog, Java, AWS, Python, AWS, EC2, CloudFormation, RDS, VPC, Lambda, RDS, S3, ECS, Docker, IAM, MySQL, NeoJ4, REST API
  • Minimum 5+ years' experience working with Java
  • Expert-level proficiency with 5+ years experience in AWS components like EC2, CloudFormation, RDS/Aurora, IAM Roles, etc.
  • Expert-level proficiency with 5+ years experience in operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
  • 4+ years of systems engineering/administration and/or support experience with web applications, especially J2EE technologies and technologies like Docker, Tomcat, Nginx
  • Understanding of network fundamentals including (TCP/IP, VPN, DNS, SMTP,
  • Experience with programmatic manipulation of cloud infrastructure such as AWS
  • Scripting and automation skills using common scripting languages like python, bash
  • Experience with network and web application monitoring tools, Datadog is preferred
  • Experience with DBMS (e.g. MySQL, MS SQL, Postgres, RDS), as well as graph databases (Neo4j, ArangoDB)
  • Experience with REST

The company is for innovation success in multidisciplinary engineering organizations. Numerous firsts for humanity in fields such as fuel cells, electrification, space, software-defined vehicles, surgical robotics, and more all rely requirements management software to minimize the risk of defects, rework, cost overruns, and recalls. This allowing engineering organizations to intelligently manage the development process by leveraging their tools to measurably improve outcomes.

We are looking for an inventive Principal Site Reliability Engineer. The Site Reliability Engineering team is a highly skilled group of Engineers that manage and maintain the Software production Cloud environment, ensuring that customers are experiencing a fantastic SaaS experience

We need this candidate to bring a deep understanding of modern Cloud infrastructure, programming expertise, operational experience and a desire to change the status quo. We're looking for an engineer who can analyze and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency. Success will be through crossing team and functional boundaries to advocate for reliability methodologies and will work with a variety of platform and product teams to both build reliability into our platform and drive adoption of those practices into our products.

  • Responsibilities: Architect, build, and maintain highly available, fault-tolerant systems using AWS/other services
  • Use Terraform to define infrastructure as code, enabling scalable, repeatable, and secure deployments
  • Continuously review and recommend the design, maintenance, development and implementation, including deployment and support, of our SaaS production platform solution using Docker and other modern web technologies
  • Set up and enforce guardrails for databases, infrastructure, and applications, ensuring consistency and adherence to best practices
  • Support operationally critical environments using monitoring tools, scripts, and logging
  • Document designs and implementations
  • Design and manage secure networking solutions, including AWS VPCs, and firewalls
  • Partner with SRE and Engineering teams to embed reliability and security best practices into the application life-cycle
  • Collaborate with fellow Engineers, Product Managers, and Quality Assurance Engineers to develop and deliver services that meet or exceed enterprise customer reliability and quality expectations
  • Participate and be effective at pair/mob programing and code reviews, both giving and receiving feedback

Job Tags

Similar Jobs

Refactor Games

Core Game Designer (Unreal Engine 5 / Sports) Job at Refactor Games

 ...Refactor Games is a AAA sports video game studio . We are developing the next generation of professional football (soccer) video games using Unreal Engine 5 . We are looking for a Core Game Designer to own the design of core moment-to-moment gameplay for our... 

DAVITA

Float Registered Nurse Job at DAVITA

 ...Massachusetts,02492,United States of America DaVita is seeking a Registered Nurse who is looking to give life in a hospital setting. You can...  ..., and Employer of Choice. Full vaccination against COVID-19 is required by hospitals in this program, which may include... 

Johnson Controls

BMS Operator Job at Johnson Controls

 ...~ Short-Term and Long-Term Disability~ Employee Assistance Program~ Wellness Program~ And More! Job Summary: The BMS Operator is responsible for overseeing the operation and maintenance of the Building Management System (BMS), which controls various... 

American Psychological Association

Director, Meetings & Events Job at American Psychological Association

 ...including years of relevant experience, level of education, and previous staff and/or governance experience at APA.The Director, Meetings & Events, will be masterful at meeting logistics and collaboration. Able to implement and coordinate multiple large-scale city-wide... 

Paradigm

Head of Marketing Management Job at Paradigm

o1Labs vision is to create an internet where developers use programmable cryptography to create more powerful applications. We have successfully incubated the Mina Protocol and are now in the next stage of our journey as the worlds premier zero-knowledge tooling provider...