What You’ll Build
You’ll create a simple data pipeline that:- Loads sample data from a CSV file
- Processes the data using Pandas
- Saves the transformed output
Prerequisites
Make sure you have Dagster installed. If not, see the Installation guide.
Step 1: Create Your First Asset
Dagster assets represent data artifacts in your pipeline. Let’s create a simple one.Create a Python file
Create a new file called
hello_dagster.py:hello_dagster.py
The
@dg.asset decorator turns a Python function into a data asset. Assets with deps depend on other assets.Step 2: Create a Real Data Pipeline
Now let’s build something more practical with actual data processing.Create your asset definition
Create
defs/assets.py with a data processing asset:defs/assets.py
This asset reads a CSV, adds a computed column, and saves the result. Dagster tracks when this runs and what it produces.
Step 3: Start the Dagster UI
Launch the development server
From your project directory, run:You should see output like:
The
dagster dev command starts both the Dagster daemon and web server in development mode.Open the UI
Open your browser to http://localhost:3000You’ll see the Dagster web interface with your asset graph.
Step 4: Materialize Your Asset
Click Materialize
In the asset view, click the Materialize button in the top-right corner.
“Materializing” an asset means executing the function to produce its output.
Watch the run
You’ll be taken to the run page where you can:
- See real-time logs
- Monitor progress
- View any errors or warnings
Advanced Example: HackerNews Pipeline
Here’s a more complex example from Dagster’s official examples that fetches real data:This example shows:
- Asset dependencies -
topstoriesdepends ontopstory_ids - Metadata - Attach rich information like record counts and data previews
- Logging - Track progress during execution
- Grouping - Organize related assets with
group_name
Understanding the Asset Graph
Dagster automatically builds a dependency graph from your assets:- Upstream dependencies - Assets that must run first
- Downstream dependencies - Assets that depend on this one
- Compute kind badges - Visual indicators of what type of computation
- Last materialization - When it was last run successfully
Development Workflow
Next Steps
Now that you have a working Dagster setup:Add Schedules
Run your pipeline automatically on a schedule
Add Metadata
Track data quality metrics and previews
Use Resources
Connect to databases and external services
Deploy to Production
Take your pipeline to production
Common Patterns
Asset with multiple dependencies
Asset with multiple dependencies
Asset with configuration
Asset with configuration
Asset with IO Manager
Asset with IO Manager
Testing assets
Testing assets
Troubleshooting
Asset not showing in UI
Asset not showing in UI
- Ensure your asset is included in the
Definitionsobject - Click Reload definitions in the UI
- Check for Python syntax errors in your code
Import errors
Import errors
- Verify all dependencies are installed:
pip install pandas dagster - Check your Python path includes the project directory
Materialization fails
Materialization fails
- Check the Logs tab in the run view for detailed error messages
- Verify file paths and permissions
- Ensure external APIs or databases are accessible
Learn More: Explore the official Dagster tutorial for a comprehensive walkthrough of building production pipelines.
