![]()
Vikram Koka stumbled upon Apache Airflow in late 2019. He was working within the Web of Issues trade and trying to find an answer to orchestrate sensor knowledge utilizing software program. Airflow gave the impression to be an ideal match, however Koka observed the open-source undertaking’s stagnant state. Thus started a journey to breathe a second life into this dying software program.
Airflow was the brainchild of Airbnb. The corporate created the system to automate and handle its data-related workflows, akin to cleansing and organizing datasets in its knowledge warehouse and calculating metrics round host and visitor engagement. In 2015, Airbnb launched the software program as open supply. Then, 4 years later, Airflow transitioned right into a top-level undertaking on the Apache Software program Basis, a number one developer and steward of open-source software program.
What was as soon as a thriving undertaking had stalled, nonetheless, with flat downloads and a scarcity of model updates. Management was divided, with some maintainers specializing in different endeavors.
But Koka believed within the software program’s potential. Not like static configuration recordsdata, Airflow follows the precept of “configuration as code.” Workflows are represented as directed acyclic graphs of duties—a graph with directed edges and no loops. Builders can code these duties within the Python programming language, permitting them to import libraries and different dependencies that may assist them higher outline duties. Akin to a musical conductor, Airflow orchestrates the symphony of duties and manages the scheduling, execution, and monitoring of workflows.
This flexibility is what caught Koka’s eye. “I fell in love with the idea of code-first pipelines—pipelines which might truly be deployed in code,” he says. “The entire notion of programmatic workflows actually appealed to me.”
Koka began work righting the Airflow ship. As an open-source contributor with many years of expertise within the knowledge and software-engineering house, he related with folks in the neighborhood to repair bugs round reliability and craft different enhancements. It took a 12 months, however Airflow 2.0 was launched in December 2020.
Airflow’s Development and Group Enlargement
The discharge served as a vital turning level for the undertaking. Downloads from its GitHub repository elevated, and extra enterprises adopted the software program. Inspired by this development, the crew envisioned the subsequent technology of Airflow: a modular structure, a extra trendy consumer interface, and a “run anyplace, anytime” function, enabling it to function on premises, within the cloud, or on edge units and deal with event-driven and advert hoc situations along with scheduled duties. The crew delivered on this imaginative and prescient with the launch of Airflow 3.0 final April.
“It was wonderful that we managed to ‘rebuild the airplane whereas flying it’ once we labored on Airflow 3—even when we had some non permanent points and glitches,” says Jarek Potiuk, one of many foremost contributors to Airflow and now a member of its project-management committee. “We needed to refactor and transfer plenty of items of the software program whereas holding Airflow 2 operating and offering some bug fixes for it.”
In contrast with Airflow’s second model, which Koka says had just a few hundred to a thousand downloads per 30 days on GitHub, “now we’re averaging someplace between 35 to 40 million downloads a month,” he says. The undertaking’s group additionally soared, with greater than 3,000 builders of all talent ranges from all over the world contributing to Airflow.
Jens Scheffler is an lively a part of that group. As a technical architect of digital testing automation at Bosch, his crew was one of many early adopters of Airflow, utilizing the software program to orchestrate checks for the corporate’s automated driving techniques.
Scheffler was impressed by the openness and responsiveness of Airflow members to his requests for steering and help, so he thought of “giving again one thing to the group—a contribution of code.” He submitted a couple of patches at first, then applied an thought for a function that might profit not solely his crew however different Airflow customers as properly. Scheffler additionally found different departments inside Bosch using Airflow, so that they’ve shaped a small in-house group “so we are able to trade information and communicate.”
Koka, who can be a member of Airflow’s project-management committee and a chief technique officer on the data-operations platform Astronomer, notes that managing an enormous group of contributors is difficult, however nurturing that community is as important as bettering the software program. The Airflow crew has established a system that permits builders to contribute step by step, beginning with documentation after which progressing to small points and bug fixes earlier than tackling bigger options. The crew additionally makes it a degree to swiftly reply and supply constructive suggestions.
“For many people in the neighborhood, [Airflow] is an adopted baby. None of us have been the unique creators, however we wish extra folks feeling they’ve additionally adopted it,” says Koka. “We’re in several organizations, in several international locations, converse totally different languages, however we’re nonetheless capable of come collectively towards a sure mission. I really like having the ability to do this.”
The Airflow crew is already planning future options. This consists of instruments to write down duties in programming languages aside from Python, human-in-the-loop capabilities to evaluation and approve duties at sure checkpoints, and help for synthetic intelligence (AI) and machine studying workflows. In response to Airflow’s 2024 survey, the software program has a rising variety of use circumstances in machine studying operations (MLOps) and generative AI.
“We’re at a pivotal second the place AI and ML workloads are an important issues within the IT trade, and there’s a nice must make all these workloads—from coaching to inference and agentic processing—strong, dependable, scalable, and usually have a rock-solid basis they’ll run on,” Potiuk says. “I see Airflow as such a basis.”
From Your Web site Articles
Associated Articles Across the Net
