Core Technology Solutions is currently engaged in some unique searches to include an experienced Site Reliability Engineer.
This highly strategic role, works collaboratively with software engineers to deploy and manage systems within Amazon Web Services (AWS) Cloud.
- Automating and streamlining operations and processes.
- Build, setup, and maintain tools for deployment, monitoring, and infrastructure provisioning on the AWS Cloud.
- Being responsible for building the whole stack from load balancers to the databases, and then move and launch sites on every application release.
What we are seeking in YOU:
Are you an amazing developer, exceptional Linux/Unix systems administrator and obsessed with automation? Have your automated processes increased the efficiency and reliability of large-scale systems that have numerous moving parts? Do you remain calm under pressure and truly enjoy trying to solve problems that have never been solved before?
If you can say yes to all three of these qualities, then we want to talk to you!
What You’ll Do:
- Engagement in and improve the whole lifecycle of services – from inception and design, through deployment, operation and refinement.
- Administer all systems related to R&D projects, including user creation, systems provision troubleshooting, monitoring, etc.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Automate routine, manual tasks. Responsible for release management with a solid understanding of Continuous Integration and Continuous Delivery.
- Troubleshoot site down issues and respond to emergency outages and coordinate responses with engineering teams in multiple locations.
- Practice sustainable incident response and blameless postmortems.
- Develop automated tools to deploy code.
- Document system design and procedures.
- Participate in on-call rotation as needed.
- Mentor and coach less experienced SRE’s.
What we’re looking for:
- Bachelor’s degree in Computer Science or related field with 5-8 years in cloud infrastructure (AWS preferred).
- Extensive administration knowledge with LINUX, UNIX, SSH, cron, and access control
- Experience with managed hosting and colocation
- Experience supporting high traffic, high volume web applications and websites
- Willingness to work flexible/odd hours based on needs, including on-call rotation
- Ability to use a wide variety of open source technologies and cloud services (AWS)
- Expert level experience with AWS API integration
- Experience with distributed tracing (OpenTracing, OpenCensus, zipkin, etc.)
- Experience with Kubernetes and related technologies (etcd, istio, envoy)
- Familiarity with notification platforms like PagerDuty or OpsGenie
- Analytical thinking and troubleshooting to resolve infrastructure and/or application issues
- Solid scripting ability – Bash, PowerShell
- Strong working knowledge and experience with Go
- Deep understanding of TCP/IP, DNS, TLS, firewalls and networking concepts
- Experience with automation/configuration management (terraform and/or CloudFormation)
- Knowledge of best practices and IT operations in an always-up, always-available service environment
- Understanding of monitoring tools and statistics – Newrelic, Sumologic, DataDog, StackDriver, or CloudWatch
- Solid understanding of Docker and containers
Core Technology Solutions is an Equal Opportunity Employer and offers a variety of employment opportunities and benefits. Please check out our website for additional opportunities.