Site Reliability Engineer UK

Open Position

Senior Site Reliability Engineer

Over the last decade Real-Time Bidding (RTB) created massive efficiency for the digital advertising marketplace in opening large volumes of inventory, driving down prices and creating the opportunity for smaller sites and apps to thrive. Today it has become a large $300B market that continues to grow at an accelerated pace with a supply chain that remains complex, fragmented, partly opaque and could be even more efficient. In a context where decisions are increasingly made by technology, traditional processes do not apply and there is a need to bring a technological solution to what is essentially a technological problem.

Fiducia is a UK company with a US subsidiary headquartered in London, that has been developing the platform of TAG TrustNet, a global cross-industry initiative, involving major trade associations in the US and the UK, and taking the industry to the next level. As an always-on "industry transparency utility, TAG TrustNet is providing the tools to certify the supply chain and allow everyone to be accountable make responsible decisions and improve efficiency.

TAG TrustNet is enabling this by automating the reconciliation of data across the supply chain and recording it in an immutable ledger as the Shared Truth: a unified record for every singly ad running across the supply chain made available in near real-time. The recorded data can then visualised in the platform Supply Chain Monitor or exported over an open API.

Why work for Fiducia?

Competitive salary and stock options
Experienced and supportive team members
Fast track career development with a forward-thinking company
Development of advanced high-impact technology

Role Overview

We are looking for an experienced and talented individual, reporting to the CTO, to join our technology team as Senior Site Reliability Engineer. Your responsibility is to ensure the reliability, scalability and security of our cloud environment and platform software components. Our platform is used across the digital advertising supply chain to harmonise, match and record billions of ad impression data points across multiple data feeds. We are using AWS cloud services and Java as primary programming language.

The Senior Site Reliability Engineer needs to combine technical leadership with hands-on operational expertise in managing large-scale distributed systems in compliance with data security and service availability requirements. To qualify for the role, you need to be a team player with a solid background in the fundamentals of computer science, distributed computing, security and high availability cloud systems.

Your proven ability to define and implement technical concepts effectively and to solve complex problems as a team contributor will be a critical part of the consideration process.

Responsibilities

Ensure compliance with availability, security and performance requirements of our AWS infrastructure and Linux environments in line with company goals.
Configure and manage Fiducia’s AWS infrastructure to ensure uptime and security, while controlling costs.
Identify platform weaknesses and anomalies, review configurations, software and hardware choices, architecture trade-offs, and come up with recommendations and action plans.
Perform capacity planning for Fiducia infrastructure and application components.
Organise incident management plan for production environment and provide operational support. Build tooling for timely identification and escalation of incidents. Build automation to remediate service failures in short timeframes.
Create strategies for permanent fixes to production incidents.
Maintain high standards of quality and performance, including mentorship, documentation, performance and reliability testing, fault-tolerance standards, security and stress-testing.
Advance our technology stack with innovative ideas and creative solutions.
Draft extensive platform guides for operation and project teams to streamline operational processes, ensure performance and business continuity.
Timely problem solving.

Qualifications

Minimum 3 years of hands-on experience in managing of large-scale distributed system architectures and cloud environments.
Thorough understanding of AWS stack (ECS, Config, Security Hub, CloudTrail, CloudFormation, S3), Linux, networking (TCP/IP stack, load balancing), SQL databases (Amazon Aurora), containerisation (Docker) and scripting (Shell).
Understanding and working knowledge of Unix operating systems, networking, reliability and scaling techniques.
Proven experience in measuring, monitoring and fine tuning performance in cloud environments.
Technical leadership experience in defining goals, visions, solutions, actions plans and managing their implementation within the defined timeframe.
Strong analytical, problem solving and decision-making skills.
A degree in Computer Science (preferred) or related engineering field. MS/PhD is preferred.
Must be hard working, team oriented, creative, friendly, cooperative and extraordinary problem solver.
Great written and verbal communication skills. Ability to create easy to understand high quality project documentation.