Entity-Relationship Diagrams: A Practical Guide to Database Design

Learn how to create ER diagrams for database modeling. Covers entities, attributes, relationships, cardinality, Chen vs Crow's Foot notation, and real-world schema examples.

10 分钟阅读

A database schema written without a diagram is a guess. You might get the tables right the first time, but you will almost certainly miss a relationship, misplace a foreign key, or build a structure that resists the queries your application actually needs. Entity-relationship diagrams — ER diagrams — are the standard tool for designing and communicating database structure before writing a single line of SQL.

ER diagrams are useful not just for initial design but throughout a system's lifecycle: onboarding engineers who need to understand an existing schema, planning migrations, identifying normalization problems, and documenting the data model for non-technical stakeholders.

What is an ER diagram?

An ER diagram (entity-relationship diagram) is a visual representation of the entities in a database and the relationships between them. It shows what data the system stores, what attributes each entity has, and how entities relate to one another.

The concept was introduced by Peter Chen in 1976 and has been extended and adapted by database tools and practitioners ever since. Two notation styles dominate modern practice: Chen notation (conceptual design) and Crow's Foot notation (implementation-focused, used by most modern tools).

ER diagrams operate at different levels of abstraction:

Level Purpose Audience
Conceptual High-level entities and relationships Business stakeholders, PMs
Logical Attributes, keys, cardinality added Database architects
Physical Data types, indexes, foreign keys, SQL-ready Developers, DBAs

Key concepts

Entities

An entity is a thing about which the database stores information. It maps to a table in the implemented database.

  • Strong entity: Exists independently. Has its own primary key. Example: Customer, Product, Order
  • Weak entity: Depends on another entity for its existence. Example: OrderLineItem depends on Order

Entity naming conventions:

  • Use singular nouns (Order, not Orders)
  • PascalCase is common for entity names
  • Name what the row represents, not the collection

Attributes

Attributes are properties of an entity. They map to columns in the implemented table.

Attribute Type Description Example
Simple Single, atomic value firstName, price
Composite Made up of sub-attributes address = street + city + zip
Derived Computed from other attributes age derived from birthDate
Multi-valued Can have multiple values phoneNumbers (a person can have several)
Key attribute Uniquely identifies an entity instance customerId, email

In physical ER diagrams, specify data types alongside attribute names:

Customer
├── customerId  INT PRIMARY KEY
├── email       VARCHAR(255) UNIQUE NOT NULL
├── firstName   VARCHAR(100) NOT NULL
├── lastName    VARCHAR(100) NOT NULL
├── createdAt   TIMESTAMP DEFAULT NOW()
└── status      ENUM('active', 'inactive', 'banned')

Relationships

A relationship defines how two entities are associated. It maps to a foreign key constraint (or a junction table for many-to-many relationships) in the implemented schema.

Relationships have:

  • Name: describes the association (places, contains, belongs to)
  • Cardinality: how many instances of each entity can participate
  • Participation: whether participation is mandatory (total) or optional (partial)

Cardinality

Cardinality is the most important concept in ER modeling. It defines the numerical relationship between entity instances.

Cardinality Meaning Example
One-to-One One A instance relates to exactly one B instance User has one Profile
One-to-Many One A instance relates to many B instances Customer places many Orders
Many-to-One Many A instances relate to one B instance Many Orders belong to one Customer
Many-to-Many Many A instances relate to many B instances Students enroll in many Courses

Many-to-many relationships cannot be implemented directly in a relational database. They require a junction table (also called a join table or associative entity):

Student ──< Enrollment >── Course

The Enrollment table holds the foreign keys to both Student and Course, plus any attributes of the enrollment itself (like enrollmentDate or grade).

Notation styles

Chen notation

Chen notation is the original academic notation. It uses geometric shapes:

  • Rectangle: Entity
  • Ellipse: Attribute
  • Diamond: Relationship
  • Line: Connection
  • Double ellipse: Multi-valued attribute
  • Dashed ellipse: Derived attribute
  • Double rectangle: Weak entity
  • Double diamond: Identifying relationship (for weak entities)

Chen notation ASCII example:

 (customerId)   (email)    (firstName)
      |             |           |
      └─────────────┴───────────┘
                    |
              ┌─────────┐
              │Customer │
              └────┬────┘
                   │
              <  places  >
                   │
              ┌─────────┐
              │  Order  │
              └─────────┘

Chen notation is useful for conceptual modeling because it makes relationships and attributes explicit. It becomes cluttered for large schemas.

Crow's Foot notation

Crow's Foot notation is the dominant style in modern database tools (MySQL Workbench, dbdiagram.io, ERDPlus). It encodes cardinality directly on the relationship line using symbols at each end.

Crow's Foot symbols:

Symbol at line end   Meaning
─────────────────    ──────────────────────────────
      ──|            Exactly one (mandatory)
      ──o            Zero or one (optional)
      ──<            Many (one or more)
      ──o<           Zero or more (optional many)
      ──|<           One or more (mandatory many)
      ──||           Exactly one (on both sides)

Crow's Foot example — e-commerce relationships:

Customer ||──o< Order ||──|< OrderLineItem >|──|| Product

Reading this: a Customer has zero or more Orders; each Order has one or more OrderLineItems; each OrderLineItem references exactly one Product.

Full entity with Crow's Foot:

┌─────────────┐       ┌─────────────┐       ┌──────────────────┐
│  Customer   │       │    Order    │       │  OrderLineItem   │
│─────────────│       │─────────────│       │──────────────────│
│ PK id       │||──o<─│ PK id       │||──|<─│ PK id            │
│    email    │       │ FK customerId│      │ FK orderId       │
│    name     │       │    status   │       │ FK productId     │
│    createdAt│       │    total    │       │    quantity      │
└─────────────┘       │    createdAt│       │    unitPrice     │
                      └─────────────┘       └──────────────────┘

Crow's Foot is preferred for logical and physical ER diagrams because it is more compact than Chen notation and directly reflects the SQL schema structure.

ER diagram in the database design process

ER diagrams fit into database design at multiple stages:

1. Requirements gathering Work with stakeholders to identify the things the system needs to track. Each noun in the requirements is a potential entity. Each fact about a noun is a potential attribute.

2. Conceptual design Draw entities and relationships without worrying about implementation details. Focus on what the data means, not how it is stored.

3. Logical design Add attributes, define primary keys, resolve many-to-many relationships into junction tables, specify cardinality.

4. Physical design Add data types, indexes, constraints, and NULL/NOT NULL specifications. The ER diagram at this stage should map directly to SQL DDL.

5. Review and refinement Validate the schema against actual queries: Can you retrieve the data your application needs efficiently? Does the schema support all required operations?

Step-by-step creation guide

Step 1: Identify entities

Read through the requirements or user stories and highlight nouns that represent distinct concepts the system tracks. Start with the obvious ones.

For an e-commerce system:

  • Customer, Product, Order, Category, Review, Address, Payment

Rules of thumb:

  • If it has multiple instances, it is probably an entity (not an attribute)
  • If it has its own attributes, it is probably an entity
  • If other entities refer to it independently, it is an entity

Step 2: Define attributes for each entity

For each entity, list its properties. Identify:

  • The primary key (unique identifier, system-generated or natural)
  • Required attributes (NOT NULL in SQL)
  • Optional attributes
  • Attributes that might be composite (address) or multi-valued (tags)

Avoid storing derived data unless performance demands it. age is derivable from birthDate; storing both creates consistency problems.

Step 3: Map relationships

For each pair of entities that interact, define the relationship:

  1. Name the relationship (verb phrase from one entity's perspective)
  2. Determine cardinality (one-to-one, one-to-many, many-to-many)
  3. Determine participation (is each side mandatory or optional?)

Write it as a sentence and verify it makes sense in both directions:

  • "A Customer places zero or more Orders"
  • "An Order is placed by exactly one Customer"

Step 4: Resolve many-to-many relationships

Every many-to-many relationship becomes a junction table. Identify attributes of the relationship itself (if any) — these become columns in the junction table.

Student ──< Enrollment >── Course

Enrollment attributes:
- enrollmentDate
- grade
- status (active, withdrawn, completed)

Step 5: Assign data types and constraints

For each attribute, specify:

  • Data type (INT, VARCHAR(n), DECIMAL(10,2), TIMESTAMP, BOOLEAN)
  • NULL / NOT NULL
  • Default values
  • Unique constraints
  • Check constraints (e.g., price > 0)

Step 6: Validate against queries

Write out the key queries your application will run and trace them through the schema. If a query requires multiple joins that could have been avoided with better design, revisit the schema.

Check:

  • Can you retrieve a customer's full order history?
  • Can you find all products in a category?
  • Can you calculate order totals from line items?
  • Can you generate revenue reports by date range?

Normalization considerations

Normalization is the process of structuring a schema to reduce data redundancy and improve integrity. The three most commonly applied normal forms:

First Normal Form (1NF): Each column contains atomic values. No repeating groups. Each row is uniquely identifiable.

Second Normal Form (2NF): Meets 1NF. Every non-key attribute is fully dependent on the entire primary key (relevant for composite keys).

Third Normal Form (3NF): Meets 2NF. No transitive dependencies — non-key attributes depend only on the primary key, not on other non-key attributes.

Example of a 3NF violation: storing customerCity in the Order table. City depends on the customer, not the order. If the customer moves, you must update every order record.

When to denormalize: Normalization optimizes for write consistency. Read-heavy analytical workloads sometimes benefit from denormalized structures (materialized views, reporting tables) that avoid expensive joins. Make this a deliberate decision, not an accident.

Common mistakes

Using natural keys as primary keys. Email addresses change. Social security numbers should not be stored. Names are not unique. Use surrogate keys (auto-increment integers or UUIDs) as primary keys and add unique constraints to natural identifiers.

Storing lists in a single column. A tags column containing comma-separated values violates 1NF. Use a junction table (ProductTag) instead.

Missing junction table attributes. When resolving a many-to-many relationship, teams often create a bare junction table with only two foreign keys. Think about attributes that belong to the relationship itself — enrollment date, order quantity, permission level.

Conflating entities and attributes. Address in a simple system might be attributes on Customer. In a system where multiple customers share an address, or where addresses have their own lifecycle, Address should be a separate entity.

No consideration for soft deletes. Many systems need to retain records after "deletion" for audit or regulatory reasons. Decide upfront whether to use a deletedAt timestamp or an isActive flag, and apply consistently.

Over-normalizing for small datasets. Extracting every repeated string into a lookup table adds complexity without benefit for datasets that will never grow beyond thousands of rows. Apply judgment, not mechanical rules.

Ignoring index design. The ER diagram shows structure, not performance. After finalizing the schema, identify the columns that will be used in WHERE clauses and JOIN conditions and add indexes to them.

Real-world example: e-commerce schema

The following schema covers the core entities of a typical e-commerce system.

┌──────────────┐     ┌──────────────┐     ┌───────────────────┐
│   Customer   │     │    Order     │     │   OrderLineItem   │
│──────────────│     │──────────────│     │───────────────────│
│ PK id        │     │ PK id        │     │ PK id             │
│    email     │||─o<│ FK customerId│||─|<│ FK orderId        │
│    firstName │     │ FK addressId │     │ FK productId      │
│    lastName  │     │    status    │     │    quantity       │
│    createdAt │     │    subtotal  │     │    unitPrice      │
│    status    │     │    tax       │     └────────┬──────────┘
└──────────────┘     │    shipping  │              │
                     │    total     │         exactly one
                     │    createdAt │              │
                     └──────────────┘     ┌────────┴──────────┐
                                          │      Product      │
┌──────────────┐                          │───────────────────│
│   Address    │                          │ PK id             │
│──────────────│                          │ FK categoryId     │
│ PK id        │                          │    name           │
│ FK customerId│                          │    description    │
│    line1     │                          │    sku            │
│    line2     │                          │    price          │
│    city      │                          │    stockQty       │
│    state     │                          │    isActive       │
│    zip       │                          └────────┬──────────┘
│    country   │                                   │
│    isDefault │                              many │ belongs to
└──────────────┘                                   │ one
                                          ┌────────┴──────────┐
┌──────────────┐                          │     Category      │
│   Payment    │                          │───────────────────│
│──────────────│                          │ PK id             │
│ PK id        │                          │ FK parentId       │
│ FK orderId   │                          │    name           │
│    method    │                          │    slug           │
│    amount    │                          └───────────────────┘
│    status    │
│    processedAt│
└──────────────┘

Key design decisions in this schema:

  • Address is a separate entity (customers can have multiple addresses; orders reference the address used at purchase time)
  • OrderLineItem.unitPrice is stored separately from Product.price (prices change; order history must reflect what was charged)
  • Category has a self-referencing parentId for hierarchical categories
  • Payment is separate from Order because an order might involve multiple payment attempts

Creating ER diagrams with Flowova

Designing a schema by hand or in a plain text editor is slow. Flowova's ER diagram generator lets you describe your data model in plain language and produces a structured diagram to refine. Start with your entity list, describe the relationships between them, and get a visual schema you can share with your team for review.

Conclusion

ER diagrams are the foundation of good database design. They force you to be explicit about what data the system stores, what makes each record unique, and how entities relate to one another — before those decisions become expensive to change in SQL. Start at the conceptual level with entities and relationships, move to the logical level by adding attributes and resolving many-to-many joins, then add data types and constraints for the physical design. Validate against real queries, apply normalization where it matters, and document the schema so the next engineer can understand it without reading the migration files.

相关文章