Day 6: Designing the Contacts Data Model & Implementing CRUD APIs

Lesson 6 60 min

Designing the Contacts Data Model & Implementing CRUD APIs

Welcome back, architects and engineers! In our last session (Day 5), we laid the groundwork by setting up PostgreSQL. Today, we're diving into the heart of any CRM: the Contact. This isn't just about storing names and emails; it's about building a robust, extensible foundation for millions of customer interactions. We'll design a data model that anticipates growth and implement the core Create, Read, Update, and Delete (CRUD) operations via a clean API.

The Foundation: Why Data Modeling is More Than Just Tables

Think about a contact in a real-world CRM like Salesforce or HubSpot. It's not just a person; it's a living entity that evolves. It gets associated with companies, deals, activities, and a plethora of custom fields. Our data model needs to reflect this dynamism from day one.

Core Concepts and Architecture:

1. Layered Architecture (1.API -> 2.Service -> 3.Repository -> 4.DB): We'll adopt a classic layered approach.

API Layer (FastAPI): Handles HTTP requests, input validation (using Pydantic), and serialization. It's the system's public face.
Service Layer: Contains the business logic. This is where you'd implement rules like "a contact must have a unique email" or "update contact status based on deal stage."
Repository Layer (SQLAlchemy ORM): Abstracts database interactions. It knows how to talk to PostgreSQL, allowing the Service layer to focus on what to do. This separation is crucial for testability and allows us to swap databases later if needed without touching business logic.
Database (PostgreSQL): Our persistent storage.

2. Contact Data Model:

Flowchart

id (UUID): The primary key. Using UUIDs instead of auto-incrementing integers is a subtle but powerful choice for distributed systems. It avoids collisions when merging data from different sources or during sharding, crucial for ultra-high-scale systems.
first_name, last_name (VARCHAR): Standard name fields.
email (VARCHAR, UNIQUE): Essential for communication and often a unique identifier. Enforcing uniqueness at the database level prevents data integrity issues.
phone_number (VARCHAR): Store as string to accommodate international formats and special characters.
company (VARCHAR): The associated company.
title (VARCHAR): Job title.
status (ENUM/VARCHAR): E.g., 'Lead', 'Opportunity', 'Customer', 'Churned'. This drives automation and segmentation.
created_at, updated_at (TIMESTAMP with TIME ZONE): Absolutely critical for auditing, debugging, and understanding data evolution. Every production system relies heavily on these.
is_active (BOOLEAN): For "soft deletes." Instead of physically removing data (which often breaks historical reporting and audit trails), we mark it as inactive. This is a common pattern in CRMs to preserve data integrity and comply with retention policies.

The "Why" Behind the Design Choices:

UUIDs for id: Imagine two microservices independently generating contact IDs. If they both use auto-incrementing integers, you'll get collisions when trying to consolidate. UUIDs eliminate this headache. In a system handling 100M RPS, you absolutely need this global uniqueness without coordination.
Soft Deletes (is_active): Hard deleting a contact is like burning a bridge. What if that contact was part of a historical sales report? What if a customer wants to reactivate their account? Soft deletes preserve this history, allowing for data recovery and robust analytics. This is a non-negotiable for serious CRMs.
created_at/updated_at: These aren't just for show. They're your forensic tools. When a bug appears, or a data anomaly is reported, these timestamps tell you when things happened, allowing you to trace changes and pinpoint issues. They are also vital for data synchronization and replication strategies in distributed environments.
Separation of Concerns (API, Service, Repository): This isn't academic. In big tech, teams own specific layers. An API team might manage the external contract, a business logic team owns the service, and a data team manages the repository. This structure enables parallel development, independent scaling, and clearer accountability. For 100M RPS, you'll have dozens of services, each with its own repository, all communicating via well-defined APIs.

Control Flow and Data Flow (CRUD Operations):

State Machine

Create Contact:

Client sends POST /contacts with contact data (JSON).
FastAPI receives, Pydantic validates the input.
Service layer receives validated data, adds business logic (e.g., generate UUID, set created_at).
Repository layer takes the processed data, constructs an SQL insert statement via SQLAlchemy, and executes it against PostgreSQL.
PostgreSQL stores the new contact.
Response (e.g., 201 Created) with the new contact's ID.

Read Contact:

Client sends GET /contacts/{id} or GET /contacts?email=....
FastAPI routes the request.
Service layer might add authorization checks or data transformation.
Repository layer constructs an SQL select query.
PostgreSQL returns the data.
Response (e.g., 200 OK) with contact data.

Project Implementation:

We'll use FastAPI for our API, Pydantic for data validation, and SQLAlchemy for ORM with PostgreSQL.

python

# database.py
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

# This should eventually come from environment variables
DATABASE_URL = "postgresql://user:password@localhost:5432/crm_db"

engine = create_engine(DATABASE_URL)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()

def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

python

# models.py
import uuid
from sqlalchemy import Column, String, DateTime, Boolean
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.sql import func
from .database import Base

class Contact(Base):
    __tablename__ = "contacts"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    first_name = Column(String, index=True)
    last_name = Column(String, index=True)
    email = Column(String, unique=True, index=True)
    phone_number = Column(String, nullable=True)
    company = Column(String, nullable=True)
    title = Column(String, nullable=True)
    status = Column(String, default="Lead") # e.g., Lead, Opportunity, Customer, Churned
    created_at = Column(DateTime(timezone=True), server_default=func.now())
    updated_at = Column(DateTime(timezone=True), onupdate=func.now())
    is_active = Column(Boolean, default=True)

python

# schemas.py (Pydantic models for request/response validation)
from pydantic import BaseModel, EmailStr
from typing import Optional
from datetime import datetime
import uuid

class ContactBase(BaseModel):
    first_name: str
    last_name: str
    email: EmailStr
    phone_number: Optional[str] = None
    company: Optional[str] = None
    title: Optional[str] = None
    status: Optional[str] = "Lead" # Default status

class ContactCreate(ContactBase):
    pass # Inherits all fields, can add more if needed for creation only

class ContactUpdate(ContactBase):
    first_name: Optional[str] = None
    last_name: Optional[str] = None
    email: Optional[EmailStr] = None # Email can be updated but still unique
    status: Optional[str] = None
    is_active: Optional[bool] = None

class ContactInDB(ContactBase):
    id: uuid.UUID
    created_at: datetime
    updated_at: datetime
    is_active: bool

    class Config:
        orm_mode = True # Tells Pydantic to read data even if it's not a dict, but an ORM model

python

# crud.py (Repository Layer)
from sqlalchemy.orm import Session
from . import models, schemas

def create_contact(db: Session, contact: schemas.ContactCreate):
    db_contact = models.Contact(**contact.dict())
    db.add(db_contact)
    db.commit()
    db.refresh(db_contact)
    return db_contact

def get_contact(db: Session, contact_id: uuid.UUID):
    return db.query(models.Contact).filter(models.Contact.id == contact_id, models.Contact.is_active == True).first()

def get_contact_by_email(db: Session, email: str):
    return db.query(models.Contact).filter(models.Contact.email == email, models.Contact.is_active == True).first()

def get_contacts(db: Session, skip: int = 0, limit: int = 100):
    return db.query(models.Contact).filter(models.Contact.is_active == True).offset(skip).limit(limit).all()

def update_contact(db: Session, contact_id: uuid.UUID, contact: schemas.ContactUpdate):
    db_contact = db.query(models.Contact).filter(models.Contact.id == contact_id).first()
    if db_contact:
        update_data = contact.dict(exclude_unset=True)
        for key, value in update_data.items():
            setattr(db_contact, key, value)
        db.add(db_contact)
        db.commit()
        db.refresh(db_contact)
    return db_contact

def delete_contact(db: Session, contact_id: uuid.UUID):
    db_contact = db.query(models.Contact).filter(models.Contact.id == contact_id).first()
    if db_contact:
        db_contact.is_active = False # Soft delete
        db.add(db_contact)
        db.commit()
        db.refresh(db_contact)
    return db_contact

python

# main.py (API Layer)
from fastapi import FastAPI, Depends, HTTPException, status
from sqlalchemy.orm import Session
from typing import List
import uuid

from . import crud, models, schemas
from .database import engine, get_db

models.Base.metadata.create_all(bind=engine) # Create tables if they don't exist

app = FastAPI(
    title="CRM Contacts API",
    description="API for managing customer contacts.",
    version="0.1.0",
)

@app.post("/contacts/", response_model=schemas.ContactInDB, status_code=status.HTTP_201_CREATED)
def create_contact_endpoint(contact: schemas.ContactCreate, db: Session = Depends(get_db)):
    db_contact = crud.get_contact_by_email(db, email=contact.email)
    if db_contact:
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Email already registered")
    return crud.create_contact(db=db, contact=contact)

@app.get("/contacts/", response_model=List[schemas.ContactInDB])
def read_contacts_endpoint(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
    contacts = crud.get_contacts(db, skip=skip, limit=limit)
    return contacts

@app.get("/contacts/{contact_id}", response_model=schemas.ContactInDB)
def read_contact_endpoint(contact_id: uuid.UUID, db: Session = Depends(get_db)):
    db_contact = crud.get_contact(db, contact_id=contact_id)
    if db_contact is None:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Contact not found or inactive")
    return db_contact

@app.put("/contacts/{contact_id}", response_model=schemas.ContactInDB)
def update_contact_endpoint(contact_id: uuid.UUID, contact: schemas.ContactUpdate, db: Session = Depends(get_db)):
    db_contact = crud.update_contact(db, contact_id=contact_id, contact=contact)
    if db_contact is None:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Contact not found")
    return db_contact

@app.delete("/contacts/{contact_id}", response_model=schemas.ContactInDB)
def delete_contact_endpoint(contact_id: uuid.UUID, db: Session = Depends(get_db)):
    db_contact = crud.delete_contact(db, contact_id=contact_id) # Performs soft delete
    if db_contact is None:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Contact not found")
    return db_contact

Assignment: Enriching the Contact Model

Your task is to extend our contact data model and API.

Add tags Field: Many CRMs allow tagging contacts (e.g., 'VIP', 'Marketing Lead', 'Newsletter Subscriber'). Add a tags column to the Contact model (consider a JSONB type in PostgreSQL for flexibility, or a simple ARRAY[VARCHAR] if you prefer).
Implement Search by Tag: Extend the GET /contacts/ endpoint to allow filtering contacts by one or more tags.
Add notes Field: A notes (TEXT) field for free-form text about the contact.
Update schemas.py and crud.py: Ensure your Pydantic schemas and CRUD operations can handle these new fields for creation and updates.

Solution Hints:

tags Field:

In models.py, import JSONB from sqlalchemy.dialects.postgresql.
Add tags = Column(JSONB, nullable=True) to the Contact model.
In schemas.py, add tags: Optional[List[str]] = None to ContactBase, ContactCreate, and ContactUpdate.
In crud.py, modify get_contacts to accept an optional tag: Optional[str] parameter. If tag is provided, add a filter(models.Contact.tags.contains([tag])) clause (for JSONB) or filter(models.Contact.tags.any(tag)) (for ARRAY).

notes Field:

In models.py, add notes = Column(String, nullable=True) to the Contact model.
In schemas.py, add notes: Optional[str] = None to ContactBase, ContactCreate, and ContactUpdate.
No changes needed for crud.py's create_contact or update_contact if you're using contact.dict() and setattr as implemented, as it will automatically handle new fields present in the schema. Just ensure the fields are in the Pydantic schemas.

This hands-on exercise reinforces the principles of data modeling, API design, and layered architecture. You're not just writing code; you're building a system designed to scale and endure. Good luck, and happy coding!

Component Architecture

Learning Objectives

✓ Design a scalable CRM contact data model with real-world considerations
✓ Understand and apply layered architecture (API, Service, Repository, DB)
✓ Implement CRUD APIs using FastAPI, Pydantic, and SQLAlchemy
✓ Apply data integrity and lifecycle patterns (UUIDs, soft deletes, timestamps)
✓ Extend the model with searchable and flexible fields for future growth

💬 Discuss this topic