Skip to main content

Assets

Assets are the core building blocks of Dagster. An asset is a data object that represents a file, table, model, or any persistent artifact produced by your data pipeline. Assets describe what data should exist and how to compute it, rather than prescribing a specific execution order.

Why Assets Matter

Assets provide a declarative approach to data pipelines:
  • Data-centric thinking: Focus on the data artifacts you need, not the tasks that produce them
  • Automatic lineage: Dagster tracks dependencies between assets automatically
  • Observability: See when assets were last updated, their freshness, and materialization history
  • Selective execution: Materialize individual assets or subsets based on your needs
  • Cross-pipeline dependencies: Assets can depend on outputs from different jobs and pipelines

Basic Asset Definition

Define an asset using the @asset decorator. The function name becomes the asset name:
import json
import os
from dagster import asset

@asset
def my_asset():
    os.makedirs("data", exist_ok=True)
    with open("data/my_asset.json", "w") as f:
        json.dump([1, 2, 3], f)
In this example:
  • The asset name is my_asset
  • When materialized, it writes data to a JSON file
  • Dagster tracks when this asset was last materialized

Asset Dependencies

Assets can depend on other assets. Dagster infers dependencies from function parameters:
from dagster import asset

@asset
def upstream_asset():
    return [1, 2, 3]

@asset
def downstream_asset(upstream_asset):
    # The parameter name matches the upstream asset
    return upstream_asset + [4]
Dagster automatically determines that downstream_asset depends on upstream_asset by matching the parameter name to the upstream asset name.

Non-Argument Dependencies

For dependencies that don’t pass data (e.g., checking that a table exists before querying), use the deps parameter:
from dagster import asset, AssetDep

@asset
def upstream_table():
    # Creates a database table
    create_table("my_table")

@asset(deps=["upstream_table"])
def downstream_query():
    # Depends on upstream_table but doesn't load its output
    return query_table("my_table")

Asset Configuration

The @asset decorator accepts many parameters to customize behavior:
from dagster import asset

@asset(
    name="custom_name",           # Override the function name
    key_prefix=["analytics"],    # Namespace: analytics/custom_name
    group_name="data_ingestion", # UI grouping
)
def my_asset():
    return compute_data()

Multi-Assets

Sometimes a single computation produces multiple assets. Use @multi_asset for this:
from dagster import multi_asset, AssetOut, Output

@multi_asset(
    outs={
        "users": AssetOut(),
        "orders": AssetOut(),
    }
)
def extract_from_api():
    data = fetch_api_data()
    
    users_df = process_users(data)
    orders_df = process_orders(data)
    
    yield Output(users_df, output_name="users")
    yield Output(orders_df, output_name="orders")

Asset Materialization

Materializing an asset means computing its value and persisting it. You can materialize assets:
  • In the UI: Click the “Materialize” button
  • Via CLI: dagster asset materialize
  • Programmatically: Using schedules, sensors, or automation conditions
  • In tests: Call the asset function directly or use materialize()
from dagster import materialize

# Materialize a single asset
result = materialize([my_asset])

# Materialize assets with dependencies
result = materialize([upstream_asset, downstream_asset])

Asset Checks

Asset checks validate the quality of your assets:
from dagster import asset, asset_check, AssetCheckResult

@asset
def orders_data():
    return load_orders()

@asset_check(asset=orders_data)
def orders_not_empty(orders_data):
    num_rows = len(orders_data)
    return AssetCheckResult(
        passed=num_rows > 0,
        metadata={"num_rows": num_rows},
    )

Source Assets

Source assets represent external data that Dagster doesn’t manage:
from dagster import SourceAsset, asset

# Define a source asset
raw_users = SourceAsset(key="raw_users")

@asset(deps=[raw_users])
def clean_users():
    # Depends on raw_users but doesn't materialize it
    return load_and_clean("raw_users")

Best Practices

Each asset should represent a single logical data artifact. Break large computations into multiple assets that can be materialized independently.
Asset names should clearly describe the data they represent. Use key_prefix to organize assets into namespaces like ["raw", "staging", "analytics"].
Document your assets with descriptions and metadata. This helps team members understand what each asset contains and how it’s used.
Think about your data as a graph of dependencies. The Dagster UI visualizes this graph, making it easy to understand data lineage.

API Reference