---
name: database-schema
description: Database schema design patterns for PostgreSQL and MySQL including normalization, indexing strategies, migrations, relationships, and query optimization.
---

# Database Schema Design

Database schema design patterns for PostgreSQL and MySQL including normalization, indexing strategies, migrations, relationships, and query optimization.

## Schema Design Principles

### Normalization Levels

**1NF (First Normal Form)**
- Eliminate repeating groups
- Create separate tables for related data
- Identify each row with a primary key

**2NF (Second Normal Form)**
- Meet 1NF requirements
- Remove partial dependencies
- All non-key columns depend on the entire primary key

**3NF (Third Normal Form)**
- Meet 2NF requirements
- Remove transitive dependencies
- Non-key columns don't depend on other non-key columns

### When to Denormalize

- Read-heavy workloads
- Reporting/analytics queries
- Caching frequently accessed data
- Reducing complex joins

## Table Design

### Naming Conventions

```sql
-- Tables: plural, snake_case
CREATE TABLE users (...);
CREATE TABLE order_items (...);

-- Columns: snake_case
user_id, created_at, email_address

-- Primary keys: id or table_singular_id
id, user_id

-- Foreign keys: referenced_table_singular_id
user_id, order_id

-- Boolean: is_, has_, can_ prefix
is_active, has_subscription, can_edit

-- Timestamps: _at suffix
created_at, updated_at, deleted_at
```

### Standard Table Structure

```sql
CREATE TABLE users (
    -- Primary key
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

    -- Core fields
    email VARCHAR(255) NOT NULL UNIQUE,
    username VARCHAR(50) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL,

    -- Profile
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    avatar_url TEXT,

    -- Status
    is_active BOOLEAN DEFAULT true,
    is_verified BOOLEAN DEFAULT false,
    role VARCHAR(20) DEFAULT 'user',

    -- Timestamps
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    deleted_at TIMESTAMPTZ,  -- Soft delete

    -- Constraints
    CONSTRAINT valid_role CHECK (role IN ('user', 'admin', 'moderator'))
);

-- Indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_created_at ON users(created_at);
CREATE INDEX idx_users_active ON users(is_active) WHERE is_active = true;
```

## Relationships

### One-to-Many

```sql
-- One user has many posts
CREATE TABLE posts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_posts_user_id ON posts(user_id);
```

### Many-to-Many

```sql
-- Users can have many roles, roles can have many users
CREATE TABLE roles (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(50) NOT NULL UNIQUE,
    permissions JSONB DEFAULT '[]'
);

CREATE TABLE user_roles (
    user_id UUID REFERENCES users(id) ON DELETE CASCADE,
    role_id UUID REFERENCES roles(id) ON DELETE CASCADE,
    granted_at TIMESTAMPTZ DEFAULT NOW(),
    granted_by UUID REFERENCES users(id),
    PRIMARY KEY (user_id, role_id)
);
```

### Self-Referencing

```sql
-- Categories with parent-child hierarchy
CREATE TABLE categories (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    parent_id UUID REFERENCES categories(id) ON DELETE SET NULL,
    name VARCHAR(100) NOT NULL,
    slug VARCHAR(100) NOT NULL UNIQUE,
    depth INTEGER DEFAULT 0,
    path TEXT  -- Materialized path: '1/5/12'
);

CREATE INDEX idx_categories_parent ON categories(parent_id);
CREATE INDEX idx_categories_path ON categories(path);
```

### Polymorphic Relationships

```sql
-- Comments on multiple entity types
CREATE TABLE comments (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id),
    commentable_type VARCHAR(50) NOT NULL,  -- 'post', 'article', 'video'
    commentable_id UUID NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_comments_commentable ON comments(commentable_type, commentable_id);
```

## Indexing Strategies

### Types of Indexes

```sql
-- B-tree (default) - equality and range queries
CREATE INDEX idx_users_email ON users(email);

-- Hash - equality only (PostgreSQL)
CREATE INDEX idx_users_email_hash ON users USING hash(email);

-- GIN - arrays, JSONB, full-text
CREATE INDEX idx_posts_tags ON posts USING gin(tags);
CREATE INDEX idx_users_metadata ON users USING gin(metadata);

-- GiST - geometric, full-text
CREATE INDEX idx_locations_coords ON locations USING gist(coordinates);

-- Partial index - filtered subset
CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;

-- Composite index - multiple columns
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

-- Covering index (PostgreSQL 11+)
CREATE INDEX idx_users_covering ON users(email) INCLUDE (name, avatar_url);
```

### Index Guidelines

1. Index columns used in WHERE, JOIN, ORDER BY
2. Consider composite indexes for common query patterns
3. Put most selective column first in composite indexes
4. Use partial indexes for filtered queries
5. Avoid over-indexing (slows writes)
6. Monitor index usage with `pg_stat_user_indexes`

## Migrations

### Migration File Structure

```sql
-- migrations/20240101120000_create_users.sql

-- Up
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) NOT NULL UNIQUE,
    name VARCHAR(100) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_users_email ON users(email);

-- Down
DROP TABLE IF EXISTS users;
```

### Safe Migration Patterns

```sql
-- Adding column (safe)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Adding NOT NULL column (use default first)
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
UPDATE users SET status = 'active' WHERE status IS NULL;
ALTER TABLE users ALTER COLUMN status SET NOT NULL;

-- Renaming column (PostgreSQL)
ALTER TABLE users RENAME COLUMN name TO full_name;

-- Adding index concurrently (no locks)
CREATE INDEX CONCURRENTLY idx_users_phone ON users(phone);

-- Dropping column (add deprecation period)
-- Step 1: Stop writing to column
-- Step 2: Deploy code that doesn't read column
-- Step 3: Drop column
ALTER TABLE users DROP COLUMN old_column;
```

## Common Patterns

### Audit Trail

```sql
CREATE TABLE audit_logs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    table_name VARCHAR(100) NOT NULL,
    record_id UUID NOT NULL,
    action VARCHAR(10) NOT NULL,  -- INSERT, UPDATE, DELETE
    old_data JSONB,
    new_data JSONB,
    changed_by UUID REFERENCES users(id),
    changed_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_audit_table_record ON audit_logs(table_name, record_id);
CREATE INDEX idx_audit_changed_at ON audit_logs(changed_at);
```

### Soft Delete

```sql
-- Add deleted_at column
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;

-- Create view for active records
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;

-- Soft delete function
CREATE OR REPLACE FUNCTION soft_delete()
RETURNS TRIGGER AS $$
BEGIN
    NEW.deleted_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
```

### Full-Text Search

```sql
-- Add search vector column
ALTER TABLE posts ADD COLUMN search_vector tsvector;

-- Create GIN index
CREATE INDEX idx_posts_search ON posts USING gin(search_vector);

-- Update trigger
CREATE FUNCTION posts_search_update() RETURNS trigger AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector('english', COALESCE(NEW.title, '')), 'A') ||
        setweight(to_tsvector('english', COALESCE(NEW.content, '')), 'B');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_search_trigger
BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION posts_search_update();

-- Search query
SELECT * FROM posts
WHERE search_vector @@ plainto_tsquery('english', 'search terms')
ORDER BY ts_rank(search_vector, plainto_tsquery('english', 'search terms')) DESC;
```

## Query Optimization

```sql
-- Use EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

-- Check for sequential scans on large tables
-- Add indexes for frequently filtered columns

-- Use covering indexes to avoid table lookups
CREATE INDEX idx_users_email_name ON users(email) INCLUDE (name);

-- Partition large tables
CREATE TABLE events (
    id UUID,
    created_at TIMESTAMPTZ,
    data JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
```

## Tips

- Always use UUIDs or auto-incrementing IDs for primary keys
- Add created_at and updated_at to all tables
- Use appropriate data types (don't use VARCHAR for everything)
- Define foreign key constraints for referential integrity
- Consider soft deletes for important data
- Plan for schema migrations from the start
- Monitor slow queries and add indexes as needed
- Use connection pooling in production
