In the quest to preserve our living environment, there is a need to turn troves of data into information and insights. The Environmental Policy Innovation Center (EPIC) recognizes this connection and orients its mission-based work to data-informed strategies and policies promoting modern software development approaches in support of the environment. For this non-profit, data is an asset that can best be utilized when it is placed within the proper infrastructure to be managed and maintained. Therefore, EPIC has embarked on an ambitious project aimed at modernizing how their data process and software deployment are managed, building internal proficiency and speed in their internal operations. Working with EPIC’s development team, The Commons deployed a modern deployment pipeline through which they can easily grow their software development portfolio. In this article, we discuss the need for a modernized Development Operations (DevOps) infrastructure and the strategy that our team utilized to support EPIC’s goals.
Improving data management and software development strategies
Data provides the foundation through which we can understand our natural world, make informed strategic decisions on how to protect and improve it, and allows us to track progress on our suite of implementation strategies. Getting from the broadly scoped philosophy of “we need data!” to the progressive “what does the data tell us?” has been an evolution benefiting from advancements in the software technology industry, specifically data infrastructure resources. The role of advanced data infrastructure cannot be overstated. When done well, data infrastructure provides a centralized and secure repository for storing and managing data within an organization which in turn provides organizations with the pathways to turn data into priority opportunities.
EPIC's mission is to build policies that deliver spectacular improvement in the speed of environmental progress. Data fuels their strategies to improve and transform policies and engage public and private interests in accelerated, innovative conservation. EPIC’s portfolio of software projects leans on R-Shiny applications that are used in policy analysis and data dissemination to target audiences.
Need for a robust data platform
If environmental data management made it into the news cycle, the headline would read “Unsung Hero of Environmental Preservation”, because at the heart of addressing all environmental issues lies the critical need for reliable, scalable, and efficient handling of code and data. Modernizing software deployment and data handling allows for rapid innovation to build better solutions to solve our nation’s biggest environmental problems.
“At The Commons, we are always looking for projects that allow us to help build the digital infrastructure at the heart of the environmental movement, so we were excited to work on this data infrastructure project with the EPIC development team, explained John Dawes, The Commons’ Executive Director. “EPIC sees the value in building a scalable and repeatable digital infrastructure and we are eager to support the application library that they continue to build based on the process we deployed for them.”
The EPIC DATA_INFRA project created an Amazon Web Service-based application to package existing and new EPIC-hosted applications and then integrated them into GitHub. The overhaul of the infrastructure approach provided the EPIC team and their collaborators with a state-of-the-art, and greatly improved, data infrastructure and management process. The resulting project reduces the burden on software developers to manage their code and opens up more opportunities to focus on feature development and deployment. Furthermore, once the infrastructure has been designed and deployed, software developers and data scientists can autonomously re-use the system without requiring continuous interventions by DevOps experts.
The digital infrastructure system that we built for EPIC has two key components: Amazon Web Services (AWS) and GitHub. Amazon Web Services (AWS) offers a suite of services that support digital infrastructure efforts. We chose to use AWS for this project because it effectively organizes all of the complex components within one system. This, in turn, ensures the high availability and scalability of the resulting system that we delivered for EPIC. We also integrated the AWS system with GitHub. This facilitates the automated building of future application services. The infrastructure supports the entire lifecycle of container building and deployment not only for EPIC’s current portfolio of software development projects but it will support future applications built by the EPIC team.
Purpose and Impact of Containerization
Before this project, EPIC managed its data applications without a standardized process or system. By migrating existing EPIC data applications to a container-based paradigm with build and deployment automation, they have been able to improve developer productivity. The development, focused on AWS and GitHub, automates operations integral to EPIC's development efforts. Infrastructure building and instantiation have also been updated, aiming to rid developers and data scientists of concerns and complexities of cloud infrastructure. The benefits of this process are multifold.
Container revolution and introduction of Docker. Docker and containerized deployments have greatly changed and improved the way we develop, deploy, and manage applications. These technologies have become essential tools for modern software engineers, offering several key benefits that have transformed the software development landscape.
Simplified Infrastructure. One of the primary advantages of Docker and containerization is the simplification of infrastructure management. In traditional development environments, setting up the necessary dependencies and configuring the server for an application can be a time-consuming and error-prone process. Docker eliminates many of these challenges by encapsulating an application and its dependencies within a container. This containerization approach ensures that an application runs consistently across different environments, from a developer's workstation machine to the development and production servers. Developers no longer need to worry about complex server configurations, making it easier to maintain and replicate application environments. This also improves project transferability. If a data scientist or application developer leaves, new employees taking the reins can get up and running with fewer dependence headaches.
Iteration Speed. Containerization also significantly improves iteration speed. With Docker, developers can quickly spin up containers with their application and test changes in a controlled environment either on their local machines or in dedicated remote development environments. This rapid feedback loop allows for faster development cycles, enabling developers to identify and fix issues more efficiently. This speed is crucial for agile development processes, where frequent updates and feature enhancements are the goal.
Streamlined Deployment. Docker simplifies the deployment process, making it more efficient and reliable. Containerized applications can be easily packaged with all their dependencies and configurations, ensuring that what worked in development will work in production. This consistency minimizes the "it works on my machine" problem, which often plagues traditional deployment methods.
Empowering Developers. One of the most significant shifts brought about by Docker and containerization is the empowerment of developers. In the past, developers often had to rely on system administrators and operations teams to set up and manage the infrastructure. With containers, developers can define their application's environment as code, specifying dependencies and configurations in a Dockerfile. This shift towards "Infrastructure as Code" enables developers to be more self-sufficient, reducing their dependency on ops and system administrators.
In conclusion, introducing containerization and Docker to the EPIC organization will kickstart a new age of creativity and developer efficiency that will result in more rapid innovations and the implementation of new ideas and application feature sets.
We used three software systems to build EPIC’s digital infrastructure. Each was chosen for its individual and collective strengths, as explained below.
GitHub Repositories - The Core of Collaboration
Central to this project are the GitHub repositories connected to the AWS-hosted containers. A mapping has been established between Git repositories as core workspaces in which developers build and test their code and experiments and cloud infrastructure for service deployments. Developers work within repositories and push to them, while the system we built manages and automates all the other operations needed to build deployment artifacts out of the repository code. This relation is what is ultimately deployed in a publicly accessible form. Making this relationship and releasing this infrastructure interactivity is the main innovation of this project. While the initial configuration may deter some existing software development projects in the environmental space from making the transition, the benefits acquired from adopting GitOps justify the investment.
GitOps - Streamlining Development and Deployment
GitOps, a paradigm that leverages Git as the single source of truth for declarative infrastructure and applications, revolutionizes the way software is built, deployed, and maintained. By utilizing GitHub actions and GitHub triggers, GitOps brings unmatched efficiency and precision to the development pipeline. These automated triggers ensure that any change in the repository directly initiates the corresponding build and deployment processes, thereby reducing manual intervention and potential human errors. This automation not only streamlines the workflow but also accelerates the development cycle, allowing teams to focus more on innovation rather than the intricacies of deployment processes. It aligns perfectly with agile practices, supporting continuous integration and continuous delivery (CI/CD) approaches, which are crucial for fast-paced and adaptive software development.
Enhancing Reliability and Compliance
GitOps offers significant benefits in terms of reliability and compliance. With every change being tracked and version-controlled in Git, it provides a comprehensive audit trail of what was changed, when, and by whom. This level of traceability is invaluable, especially in environments where compliance is crucial.
GitOps facilitates easier rollbacks to previous states, enhancing system stability and reliability. In the event of a failure or an issue, reverting to a stable state is as simple as reverting a Git commit, drastically reducing downtime and the impact of errors. This inherent reliability makes GitOps an ideal choice for managing complex deployments and infrastructure.
Developer Experience: Empowering the Change-Makers
Thanks to GitOps, developers can focus on their core software and functionality development activities, with the platform handling the complexities of infrastructure and deployment. The process is streamlined from service creation to deployment, allowing for quick adaptation to environmental data challenges.
Terraform: Infrastructure management through Infrastructure as Code
At the foundation of EPIC DATA_INFRA is Terraform, an infrastructure-as-code tool and definition language. In the journey towards a highly efficient development cycle, the evolution of infrastructure and system development plays a key role. A cornerstone of this evolution is the adoption of Infrastructure as Code (IaC), a modern approach that revolutionizes how we manage and provision IT resources. By treating infrastructure as software code, IaC allows for more dynamic, flexible, and efficient management of resources.
Versioning: The core of reliable infrastructure
A significant aspect of IaC is versioning, primarily facilitated through tools like Git. In traditional infrastructure management, changes often lack transparency and traceability. However, with IaC, every modification is versioned, similar to how software code is managed. This versioning capability, underpinned by Git and the principles of GitOps, ensures that every change in the infrastructure is tracked, reviewed, and reversible.
Accelerating Infrastructure Development and Iteration
IaC significantly speeds up the iteration cycle of infrastructure development. In a field where time is of the essence, the ability to rapidly deploy and modify infrastructure is invaluable. Developers and data scientists can quickly test new ideas, deploy updates, and scale systems without the traditional delays of manual infrastructure management.
IaC greatly aids in the auditing of infrastructure changes. With every change documented and versioned, it becomes easier to audit and ensure compliance with environmental data handling standards and regulations. This not only streamlines the review process but also instills confidence in the developers and operators that the infrastructure managing sensitive data adheres to the highest standards of accountability and transparency.
Amazon Web Service (AWS): Cloud services grounded in solid infrastructure offering
The EPIC DATA__INFRA project bases its core on the AWS cloud services and infrastructure components. By building on AWS we are taking advantage of the most advanced cloud resources and capabilities that will future-proof EPIC’s data and software development efforts, and allow for almost unlimited scaling that is both granular and efficient.
AWS is renowned for its scalability, and this attribute is especially beneficial for software services. By tightly integrating with Amazon EC2 (Elastic Compute Cloud) and Amazon ECS (Elastic Container Service) the system can easily scale EPIC’s infrastructure as needed. With EC2, developers can quickly spin up additional virtual machines to handle increased workloads, while ECS allows them to efficiently orchestrate and manage containers for their applications. This scalability ensures that their software services can handle varying levels of traffic and demand, providing a seamless experience for users without the need for extensive manual intervention.
Extensive AWS APIs for Monitoring
AWS offers a comprehensive set of APIs for monitoring and managing deployed services. Services like Amazon CloudWatch provide real-time monitoring of resources, application performance, and operational health. By integrating with AWS, devs can leverage these APIs to gain insights into software services' performance, set up automated alarms and triggers for proactive issue resolution, and make data-driven decisions to optimize applications further.
Building on the solid base of AWS EC2 and ECS platforms opens up a world of possibilities for future improvements. AWS is continually innovating and introducing new services and features to enhance its cloud platform. By integrating EPIC's software services deeply with AWS, the organization positions its applications to take advantage of these advancements seamlessly.
Conclusion: Embracing the Future of Infrastructure Management
This ambitious project builds a solid base for all future software and technology efforts within the EPIC organization. By bringing these fundamental new paradigms for managing infrastructure and development pipelines, containerization, infrastructure as code, GitOps, and deep AWS integration, we are aiming to kickstart a new age in the development efforts of the EPIC software development and operational teams.