Site Reliability Engineer
As a Site Reliability Engineer your responsibilities will be, but not limited to:
Ensures the efficient planning, provisioning, installation/configuration, maintenance, and/or operations of the hardware and software infrastructure required to build, validate, and release a wide variety of hardware and software products and projects.
Works closely with development and quality teams to derive infrastructure design requirements, build, test, and automate tools appropriate to the project, and/or implements and maintains of those systems within the constraints imposed by Intel enterprise infrastructure (IT) and other governing bodies.
Owns the end to end delivery pipeline, including source code management, versioning/tagging strategy, component build and packaging, test automation tooling, release staging, acceptance and/or indicators, required security and IP scans, any third-party conformance tools, artifact storage and distribution, and disaster recovery planning.
Identifies opportunities and implements solutions for increased automation, reliability, and/or velocity within the pipeline through implementation of robust infrastructure telemetry, KPIs, and indicators, and by monitoring and applying industry best practices.
In addition the ideal candidate should also exhibit the following skills:
Excellent problem-solving skills to troubleshoot complex issues.
Passion in delivering high quality end-user experience.
Minimum qualifications are required to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates.
Bachelor's degree or Master's degree in Computer Science or any other related field.
3+ years of experience working with Linux.
Minimum of 4+ years of Linux system administration experience (CentOS/Suse preferred).
Automation using Ansible, Puppet, Chef etc.
Monitoring and visualization using Prometheus, ELK stack, Grafana, New-Relic, etc.
Knowledge of advanced Linux concepts (like Namespaces, Cgroups, SystemD, JournalD and FirewallD)
Solid knowledge bare-metal clustering and deployment strategies.
Solid knowledge of storage (NAS, Object) and networking.
Hands on experience: Shell, Python, Go, Ruby, etc.
Experience in designing and operating highly scalable, highly available solution.
Experience with containers and orchestration systems is a plus (Kubernetes and Docker a plus).