SunTrust Veterans Jobs

Transitioning Military Job Search

Job Information

Delta Air Lines, Inc. Senior Site Reliability Engineer in ATLANTA, Georgia

Senior Site Reliability Engineer

United States, Georgia, Atlanta

Information Technology


Ref #: 9347

How you'll help us Keep Climbing (overview & key responsibilities)

The SRE works with developers to improve the Reliability and Resiliency of Delta IT applications to meet the business requirements by implementing SRE tools, processes, and best practices. SRE is what happens when you ask a software engineer to design an operations function. The SRE helps to design, develop, test, debug, and automate tasks for applications. They troubleshoot incidents to address failure patterns, automate remediation through runbooks, and document application optimization.

  • Be willing to learn whatever technologies, tools, or patterns necessary to solve a problem. Be inquisitive, ask a lot of questions to learn from everyone.

  • Work collaboratively with business stakeholders, developers, managers, and leaders to create solutions for real world problems. Build domain knowledge and understand the user & business problems you’re solving.

  • Use various tools and techniques to ensure application resiliency, availability, reliability, and performance of applications

  • Conduct blameless postmortems, provide training to other teams on SRE best practices and benefits

  • Gather and maintain SOPs, SATs, One Pagers, SLOs, SLIs, SLAs, How Tos and other useful SRE documentation on SRE wiki page

  • Actively implement new features and maintain existing web applications created by SRE team

  • Gathering metrics from various sources and displaying them in meaningful and insightful ways on single page applications

  • Fostering an environment of self-service by collaborating with members from other teams during build phase for SRE apps which they will be consuming

  • Adapt quickly to changing business needs and tools ecosystem at Delta

  • Creating systems that are maintainable, scalable, and extensible and well architected.

  • Take ownership of problems comes to your attention. Effectively communicate work, decisions, ideas, have good conversations with colleagues.

  • Look for ways to make the work environment better for everyone. Share knowledge generously, mentor new members. Innovate where solutions don’t exist.

What you need to succeed (minimum qualifications)

  • Requires a Bachelor's degree in Computer Science, Engineering, or Information Systems or any equivalent combination of experience, education, and/or training in the computer systems engineering field.

  • 7 or more of experience as application developer or Site Reliability Engineer.

  • 2 or more years of team lead experience.

  • Site Reliability Engineering: Knowledge of the theories and methodologies of reliability

  • engineering: ability to design, develop and support various tools, services and

  • applications to maintain a reliable site environment.

  • Performance Measurement and Tuning: Knowledge of system performance, testing and

  • programming: ability to monitor, measure, and optimize system performance and

  • network communication, right sizing of application pods and probes

  • Experience in Runbook Automation to automate manual tasks and improve efficiencies in processes

  • Experience in building single page applications / dashboards for monitoring and reporting

  • Expertise with monitoring solutions: ELK, SUMO Logic, Prometheus, Dynatrace, Grafana

  • Knowledge of continuous integration/delivery ecosystem: GitLab, Maven/Gradle, Jenkins, Docker, Nexus, Selenium

  • Unix skills are required. Should have experience with Unix shell scripting.

  • Experience in cyber security with knowledge of DevSecOps pipeline tools for SAST/DAST.

  • Experience with modern container orchestration systems: Kubernetes, Swarm

  • Expertise with infrastructure configuration and automations processes and tools: Ansible

  • Experience with PagerDuty, ServiceNow

  • Experience leading support bridge calls for production systems issue resolution.

  • Experience in supporting application teams and Troubleshooting in all DEV to Production environments

  • Experience with REST APIs including design, development and build tools supporting APIs

    • Embraces diverse people, thinking and styles.
  • Consistently makes safety and security, of self and others, the priority.

  • Where permitted by applicable law, must have received or be willing to receive the COVID-19 vaccine by date of hire to be considered for U.S.-based job, if not currently employed by Delta Air Lines, Inc.

What will give you a competitive edge (preferred qualifications)

  • Knowledge or related experience in the Travel, Tour, or Hospitality industries preferred

  • Worked in an Agile environment is a plus

Delta Air Lines, Inc. is an Equal Employment Opportunity / Affirmative Action employer and provides reasonable accommodation in its application process for qualified individuals with disabilities and disabled veterans. If you are a qualified individual, you may request a reasonable accommodation if you are unable or limited in your ability to access job openings through this site, apply for jobs through Delta’s online system, or at any point in the selection process. To request a reasonable accommodation, please click here