Database Migrations
This guide covers using Alembic for database migrations, including common commands, workflows, and best practices.
About Alembic
What is Alembic?
Alembic is a database migration tool for SQLAlchemy. It provides:
- Version Control for Database Schema: Track all database schema changes over time
- Automated Migration Generation: Detect changes between SQLAlchemy models and generate migrations
- Reproducible Deployments: Apply the same schema changes across development, staging, and production
- Rollback Capability: Safely downgrade to previous schema versions when needed
Why Alembic in This Project?
Alembic is used in the Financial Data Extractor project to:
- Manage Schema Evolution: As the project evolves, database schema changes are version-controlled and traceable
- Enable Team Collaboration: All developers can apply the same migrations to their local databases
- Support Data Migrations: Not just schema changes, but also data transformations (e.g., seeding initial companies)
- Ensure Consistency: The same migration process works across all environments (development, CI/CD, production)
How Alembic Works in This Project
Configuration:
- Alembic Config:
backend/alembic.ini- Contains database connection settings and migration directory paths - Migration Environment:
backend/alembic/env.py- Handles database connection and migration execution - Migrations Directory:
backend/alembic/versions/- Contains all migration files
Workflow:
- Model Changes: When you modify SQLAlchemy models in
app/db/models/, Alembic can detect the changes - Generate Migration: Run
make migrate-create NAME="description"to auto-generate a migration file - Review Migration: Always review the generated migration file before applying
- Apply Migration: Run
make migrateto apply migrations to your database - Version Tracking: Alembic tracks applied migrations in the
alembic_versiontable
Integration with SQLAlchemy:
- Alembic reads from SQLAlchemy models defined in
app/db/models/viaBase.metadatainapp/db/base.py - The
alembic/env.pyimports all models to enable auto-detection of schema changes - Uses the same database connection configuration as the FastAPI application
- Supports both declarative SQLAlchemy models and raw SQL operations via
op.execute()
Connection Configuration:
Alembic reads database connection settings from (in priority order):
DATABASE_URLenvironment variable (highest priority)- Settings from
config.py(reads from.envfile using Pydantic Settings) - Default values in
alembic.ini(fallback)
This allows flexible configuration across different environments while maintaining consistency.
Quick Start
# Apply all migrations
cd backend
make migrate
# Check migration status
make migrate-history
# Verify database setup
make db-info
make db-list-companies
Backend Database Makefile Commands
All database migration operations are managed through the Makefile in the backend/ directory:
cd backend
# Migration Commands
make migrate # Apply all migrations (alias for migrate-up)
make migrate-up # Apply all migrations
make migrate-down # Rollback last migration
make migrate-create NAME="description" # Create new migration
make migrate-history # Show migration history
# Database Utilities
make db-reset # Reset database (WARNING: destructive)
make db-list-companies # List all companies (uses db_manager.py)
make db-info # Show database info (uses db_manager.py)
Common Migration Commands
Apply All Migrations
# Using Makefile (recommended)
cd backend
make migrate
# Or directly with alembic
alembic upgrade head
Check Current Migration Status
# Using Makefile
cd backend
make migrate-history
# Or directly with alembic
alembic current
View Migration History
# Using Makefile
cd backend
make migrate-history
# Or with verbose output
alembic history --verbose
Create a New Migration (Auto-generate)
Alembic will detect changes in your models and generate migration:
# Using Makefile (from backend/)
cd backend
make migrate-create NAME="description of changes"
# Or directly with alembic
alembic revision --autogenerate -m "description of changes"
Note: Always review auto-generated migrations before applying!
Create a Blank Migration
For data migrations or complex schema changes:
# From backend/
cd backend
alembic revision -m "description of changes"
Edit the generated file in backend/alembic/versions/ directory, then apply with make migrate.
Upgrade to Specific Version
# Upgrade to a specific revision
cd backend
alembic upgrade <revision_id>
# Upgrade by relative number
alembic upgrade +1 # Upgrade one version
alembic upgrade +2 # Upgrade two versions
Downgrade Database
# Downgrade to previous version (using Makefile)
cd backend
make migrate-down
# Or directly with alembic
alembic downgrade -1
# Downgrade to specific version
alembic downgrade <revision_id>
# Downgrade to base (empty database)
alembic downgrade base
View SQL Without Executing
Preview what SQL would be executed:
# See what SQL would be run
cd backend
alembic upgrade head --sql
Migration Files
All migration files are stored in backend/alembic/versions/ with the format:
<revision_id>_<description>.py
Migration File Structure
Each migration file contains:
- Revision ID: Unique identifier (e.g.,
001,002) - Description: Human-readable description of the migration
- Upgrade Function: SQLAlchemy operations to apply the migration
- Downgrade Function: SQLAlchemy operations to rollback the migration
- Dependencies: Links to previous migrations in the chain
Example Migration Structure:
"""Add user email column
Revision ID: 003
Revises: 002
Create Date: 2025-02-01 10:00:00.000000
"""
from alembic import op
import sqlalchemy as sa
revision = '003'
down_revision = '002'
branch_labels = None
depends_on = None
def upgrade():
op.add_column('companies', sa.Column('email', sa.String(), nullable=True))
def downgrade():
op.drop_column('companies', 'email')
Current Migrations
-
001_create_initial_schema.py - Creates all core tables:
companies- Company information (name, primary_ticker, tickers JSONB, IR URL)documents- PDF documents (annual reports)extractions- Raw LLM extractions (JSONB)compiled_statements- Compiled 10-year views (JSONB)
-
002_seed_initial_companies.py - Seeds initial companies:
- AstraZeneca PLC - Tickers: AZN (LSE, NASDAQ)
- SAP SE - Tickers: SAP (XETRA, NYSE)
- Siemens AG - Tickers: SIE (XETRA)
- ASML Holding N.V. - Tickers: ASML (Euronext Amsterdam, NASDAQ)
- Unilever PLC - Tickers: ULVR (LSE), UNA (Euronext Amsterdam), UL (NYSE)
- Allianz SE - Tickers: ALV (XETRA)
All companies include their Investor Relations URLs for annual report access.
Verify Database Setup
After running migrations, verify everything is set up correctly:
# Option 1: Show database info (migration status + company count)
cd backend
make db-info
# Option 2: List all seeded companies with details
make db-list-companies
# Option 3: Check migration status directly
alembic current
# Option 4: Connect via psql (using Infrastructure Makefile)
cd ../infrastructure
make psql
# Then run: SELECT COUNT(*) FROM companies;
Expected Output: You should see 6 companies (AstraZeneca, SAP, Siemens, ASML, Unilever, Allianz) after running migration 002_seed_initial_companies.py.
Troubleshooting
Migration Out of Sync
If your database schema doesn’t match Alembic’s version:
# Check current version
cd backend
alembic current
# View history
alembic history
# Stamp database at current state (without running migrations)
# Use with caution - only if database matches expected schema
alembic stamp head
Reset Database
⚠️ WARNING: This will destroy all data!
# Drop all tables and re-run migrations (using Makefile)
cd backend
make db-reset
# Or manually:
alembic downgrade base
alembic upgrade head
Connection Issues
# Option 1: Test database connection and show info (from backend/)
cd backend
make db-info
# Option 2: Connect via psql using Infrastructure Makefile (from infrastructure/)
cd infrastructure
make psql
# Option 3: View service URLs and connection strings
make url
# Option 4: Test Python connection
cd ../backend
python -c "from app.db.base import engine; print(engine.connect())"
Generate Migration but Don’t Auto-detect
# Create empty migration template
cd backend
alembic revision -m "manual changes"
# Edit the generated file in alembic/versions/ directory
# Then apply:
alembic upgrade head
Best Practices
- Always review auto-generated migrations before applying
- Test migrations on development database first
- Never edit applied migrations - create new ones instead
- Use descriptive migration messages (e.g.,
make migrate-create NAME="add_user_email_index") - Keep migrations small and focused - one logical change per migration
- Add indexes for foreign keys and frequently queried columns
- Use transactions - Alembic wraps each migration in a transaction automatically
CI/CD Integration
GitHub Actions Example
- name: Run database migrations
run: |
cd backend
alembic upgrade head
env:
DATABASE_URL: $
Docker Container Startup
When running the backend as a Docker service, migrations should run automatically:
backend:
command: >
sh -c "alembic upgrade head &&
uvicorn app.main:app --host 0.0.0.0 --port 8000"
This ensures migrations run before the app starts.
Manual Migration in Docker Environment
If you need to run migrations manually:
# Option 1: Using Makefiles (recommended)
# Start database services
cd infrastructure
make up-dev
# Apply migrations
cd ../backend
make migrate
# Option 2: Check database is ready first
cd infrastructure
make health
make url # View connection details
# Option 3: From inside backend container (if running as Docker service)
docker exec -it <backend-container-name> alembic upgrade head
Related Documentation
- Infrastructure Development Setup - Setting up Docker services
- Database Schema - Understanding table structures
- Database Queries - Useful SQL query examples