Introduction
At the end of 2023, GDI began a partnership with Azure Alliance and with Infoxchange as the facilitator for the project. This turned out to be a project that was technically and personally very rewarding for the team. In the blog post below, the GDI team working on the project delves into some of the work they did and the impact it has had with Azure Alliance.
Who is Azure Alliance?
Founded in 2019, Azure Alliance is dedicated to promoting environmental sustainability of the ocean and our waterways through innovative technology. Specifically, their flagship initiative is the Azure Fighter, a fully-electric marine debris cleaning boat equipped with both remote control and autonomous driving modes. This vessel is used to clean up plastic-contaminated water, and not only operates with low carbon emissions but is also adept at navigating and cleaning confined spaces like fishing ports.
Azure Alliance collaborates with governments, industries, and non-governmental organisations in their marine debris removal programs. Their efforts have garnered significant recognition including accolades at events like the APEC Workshop on Regional Marine Debris Management and the Smart Ocean hackathon.
Project Overview
Azure Alliance has been highly effective in manually collecting data from their cleaning expeditions and downloading weather condition data from open-source government resources. However, as a non-profit organisation, they have faced challenges in finding the necessary resources — time, expertise, and funds — to fully leverage this data. Recognising the potential wealth of knowledge hidden within the data they’ve collected over the years, they sought our expertise to streamline their data analytics process. Our solution aimed to automate the ingestion of open source data and integrate it with performance data they collect during their ocean clean ups, providing an intuitive dashboard populated with real-time data, meaningful plots, and key metrics.
Technical Breakdown
In the section below we will outline the technical solution that we built for Azure Alliance, going into some of the decisions we needed to make in the process.
The Requirements
The technical requirements outlined below are a mix of the initial specifications and evolving needs shaped through fortnightly meetings with GDI, Azure Alliance, and Infoxchange.
Data from Azure Alliance's cleaning sessions should be able to be correlated with environmental factors such as wind and tides information, and these correlations should be clearly visualised.
The visualisations built off the database should offer flexibility for analysts to investigate different correlations.
The platform should persist the cleaning and environmental data to enable current and future analytics use cases.
The platform should have low cost (development cost + maintenance cost + hosting cost).
Azure Alliance staff should be able to securely upload proprietary data (i.e. cleaning sessions) as CSV files.
Azure Alliance staff should be able to download database tables to a CSV format via a UI.
Openly available data sources such as tides and wind data should be ingested into the platform in an automated fashion without any manual touches from Azure Alliance.
Platform Overview
The final deliverable was a Git repository containing a suite of Docker images, along with comprehensive documentation, providing a ready-to-deploy analytics solution for Azure Alliance.
Timeliness, feasibility, and extensibility were the top 3 identified characteristics of the desired solution. The solution uses a service based architecture composed of 3 user-facing-services, a reverse proxy and authentication gateway, a shared database, and 2 scheduled API ingest services.
The Technologies
The platform's data is managed with PostgreSQL, with SQLAlchemy serving as the ORM for database interactions in Python. NGINX is used as both the authentication gateway, as well as the reverse proxy for routing users to the upload, download, and dashboard user-facing services.
The user-facing services use Python's Waitress for their WSGI servers. Flask is used for managing the upload and download services, while Plotly Dash is used for visualising the data in the database.
Behind the scenes, wind and tides data from internet-hosted endpoints are automatically ingested into the database on a daily schedule, managed via the Cron utility. This data is processed to an hourly granularity using a combination of Jq and Python.
Data Cleaning
The Datasets included:
Rubbish type including quantity and weight
Wind attribute data such as wind direction, wind and gust speed
Number of volunteers at a given day along with the ship they were operating from and the start/end times of their cleaning activities
Tide data, location and project name
Micro weather station data such as temperature, barometer, dew point etc.
Some of the challenges faced in the data cleaning:
There were multiple challenges here, one example involved dealing with data quality. For instance, the “Weather” field did not have consistent definitions used across the time periods (such as the usage of ‘brief rain’ vs ‘light rain’ etc). The data in these instances had to be aligned.
Another example being data pivots were required to be able to use the data within Plotly more flexibly. For example, the pollution data needed to be pivoted so that additional fields such as “Rubbish Type” could be added to be able to perform aggregations across these fields (i.e., total pieces of rubbish collected, total rubbish weight etc).
The Dashboard
All of the above accumulates in a dashboard that is available to the Azure Alliance team such that they can gather information about the impact they are having.
Challenges during the project
This project held some unique challenges compared to other projects we work on but it is able to shine a spotlight on how the GDI team and our partners are able to effectively work together. We outline a couple of these challenges below.
1) Working across geographies and languages
The project team was spread across Taipei, Sydney, Melbourne, and New York, which created challenges in finding suitable meeting times for everyone. This occasionally led to compromises, with the full team not always being present for every meeting.
In addition, there were language barriers between GDI, Infoxchange, and Azure Alliance. Fortunately, we had the support of a talented translator, Cindy, who did a fantastic job of conveying the technical nuances of the English language, ensuring effective communication across the project team.
2) Prioritisation and evolving requirements
During the build phase, the focus was initially on establishing a Minimum Viable Product (MVP). As the project progressed, each unit of work not only added value but also created new opportunities for future enhancements.
The primary challenge became managing this ever-expanding to-do list as the project deadline approached. This required a careful balancing act between investing time in features that directly impact the user experience today, and making behind-the-scenes improvements for the software's long-term stability and evolvability.
For example, schema evolution using Alembic was de-prioritized in favour of more urgent, user-facing features. One such feature was the development of a user-friendly interface for downloading the raw tables and table views, which allowed non-technical users to access data without needing to use the terminal. This decision was driven by the immediate need to address a pain point for users, rather than investing time in a feature that may not be valued (i.e. if the rate of schema changes is low, then the value of adding a schema evolution feature will be low). This is an example of a tough tradeoff that was made in order to get to a positive outcome.
Outcome for Azure Alliance
Through our project, Azure Alliance now has enhanced visibility into the impact of their efforts on marine ecosystems. By having clearer insight into trends between marine debris accumulation and weather conditions, they can more efficiently deploy their limited resources to maximise their environmental impact. Furthermore, the dashboard empowers them to educate the wider community about the importance of marine conservation and the detrimental effects of pollution on our oceans.
Our work has enabled Azure Alliance to easily access and visualise crucial information that was previously out of reach. By reducing the time and effort needed to collect and interact with their data, their team can now focus more time on their core mission. This has helped them elevate their efforts in promoting environmental sustainability and marine conservation, ultimately enhancing their positive impact on marine health and the wider community.
GDI Commitment to Volunteers
Our partnership with Azure Alliance was particularly important for another reason. As a data organisation, we strive to lead by example in terms of how we use data to improve the way we operate: internally, and externally to our volunteers and charities we work with.
When volunteers join the organisation, we have them fill out a survey to understand what types of impact areas they are interested in working with. We also track the impact areas of the charities we work with. An impact area that over indexes with our volunteers is climate action and sustainable cities and communities. However, at the start of 2024 when we compared it with the charities we had been working with, we noticed we perhaps were not providing enough opportunities for our volunteers to work in these impact areas.
This is why we made a goal to work with at least three charities that are involved in this type of impact area! So far in 2024 we are on track to accomplish this with already three projects started in this impact area.
By Nina Kayshap, Scott Simmons and Krishna Nadoor
About GDI:
The Good Data Institute (established 2019) is a registered not-for-profit organisation (ABN: 6664087941) that aims to give not-for-profits access to data analytics (D&A) support & tools. Our mission is to be the bridge between the not-for-profit world and the world of data analytics practitioners wishing to do social good. Using D&A, we identify, share, and help implement the most effective means for growing NFP people, organisations, and their impact.
Comentarios