Xcom In Airflow ~repack~ < PROVEN — BREAKDOWN >
Now go build DAGs that actually share information – cleanly and reliably.
XComs are not designed for large data. Default size limit is 1 MB (configurable, but don’t). Use them for IDs, file paths, dates, small JSON – not DataFrames or images. The Two Ways to Use XComs 1. Implicit XComs via return Any Python function decorated with @task (TaskFlow API) automatically pushes its return value as an XCom. xcom in airflow
Here, each mapped task gets its own XCom value, and aggregate receives a list of all results. ❌ Passing large data # BAD – will bloat metadata DB @task def bad_task(): return large_dataframe.to_dict() # can be MB/GB ✅ Better: Store data in S3/GCS and pass the path as an XCom. ❌ Pulling from a task that hasn’t run @task def step_one(): return 1 @task def step_two(x): # If step_one failed or was skipped, this will raise an error return x + 1 Now go build DAGs that actually share information
push = PythonOperator(task_id='push_task', python_callable=push_function) pull = PythonOperator(task_id='pull_task', python_callable=pull_function) Use them for IDs, file paths, dates, small
@task def consume_two(data): return f"Got data['source']" @task def fetch_urls() -> list[str]: return ["http://a.com", "http://b.com"] @task def download(url: str) -> str: # download content return f"content_of_url"
from airflow.operators.python import PythonOperator def push_function(**context): context['ti'].xcom_push(key='user_id', value=123)
No xcom_push or xcom_pull needed – the TaskFlow wiring handles it. With traditional operators, you must push/pull manually.
