Edit

Share via


Install a Private Package as a requirement in Apache Airflow job

Note

Apache Airflow job is powered by Apache Airflow.

A Python package lets you organize related Python modules into a single directory hierarchy. A package is typically represented as a directory that contains a special file called init.py. Inside a package directory, you can have multiple Python module files (.py files) that define functions, classes, and variables. With Apache Airflow Jobs, you can develop your own private packages to add custom Apache Airflow operators, hooks, sensors, plugins, and more.

In this tutorial, you'll build a simple custom operator as a Python package, add it as a requirement in your Apache Airflow job, and import your private package as a module in your DAG file.

Develop a custom operator and test with an Apache Airflow Dag

  1. Create a file called sample_operator.py and turn it into a private package. If you need help, check out this guide: Creating a package in python

    from airflow.models.baseoperator import BaseOperator
    
    
    class SampleOperator(BaseOperator):
        def __init__(self, name: str, **kwargs) -> None:
            super().__init__(**kwargs)
            self.name = name
    
        def execute(self, context):
            message = f"Hello {self.name}"
            return message
    
    
  2. Next, create an Apache Airflow DAG file called sample_dag.py to test the operator you made in the first step.

    from datetime import datetime
    from airflow import DAG
    
     # Import from private package
    from airflow_operator.sample_operator import SampleOperator
    
    
    with DAG(
    "test-custom-package",
    tags=["example"]
    description="A simple tutorial DAG",
    schedule_interval=None,
    start_date=datetime(2021, 1, 1),
    ) as dag:
        task = SampleOperator(task_id="sample-task", name="foo_bar")
    
        task
    
  3. Set up a GitHub Repository with your sample_dag.py file in Dags folder, along with your private package file. You can use formats like zip, .whl, or tar.gz. Put the file in either the 'Dags' or 'Plugins' folder, whichever fits best. Connect your Git Repository to your Apache Airflow Job, or try the ready-made example at Install-Private-Package.

Add your package as a requirement

Add the package under Airflow requirements using the format /opt/airflow/git/<repoName>/<pathToPrivatePackage>

For example, if your private package sits at /dags/test/private.whl in your GitHub repo, just add /opt/airflow/git/<repoName>/dags/test/private.whl to your Airflow environment.

Screenshot showing private package added as requirement.

Quickstart: Create an Apache Airflow Job