Designing Data-Intensive Applications - 2. Data Models and Query Languages

Open Table Of Contents

Driving Forces Behind NoSQL Databases
Relational vs. Document Databases
Representing Complex Data Structures
- Handling One-to-Many Relationships
- Example: LinkedIn Profile Representation
Schema Approaches
- Schema-on-Write (Relational)
- Schema-on-Read (Document)
When to Choose Relational Databases
When to Choose Document Databases
Query Languages
- SQL Advantages
- Alternative Query Approaches
Graph Data Model
Practical Considerations
- Performance Limitations
Future of Data Models

Driving Forces Behind NoSQL Databases

NoSQL databases emerged due to several key motivations:

Greater scalability beyond relational database capabilities, requiring high throughput
Preference for free and open-source software
Support for specialized query operations not well-supported by relational model
Desire for more dynamic and flexible data models than restrictive SQL schemas

Relational vs. Document Databases

Relational Model Characteristics

Hides implementation details behind a clean interface
Generalizes well across diverse computing purposes
Uses normalized tables with foreign key references
Provides robust support for joins and complex relationships

Some relational databases

Oracle
IBM DB2
Microsoft SQL Server
PostgreSQL
MySQL

Document Model Characteristics

Offers schema flexibility
Provides better performance through data locality
Closer alignment with application data structures
Native support for one-to-many relationships

Some document databases

MongoDB
Cassandra
Apache CouchDB

The Object-Relational Impedance Mismatch

A key criticism of relational databases is the translation layer required between:

Object-oriented programming languages
Relational database structure (tables, rows, columns)

Representing Complex Data Structures

Handling One-to-Many Relationships

Relational databases manage these through:

Separate tables with foreign key references
Structured datatypes (XML, JSON)
Encoded documents within text columns

Example: LinkedIn Profile Representation

Relational Approach: Multiple normalized tables

Document Approach: Single JSON document with embedded data

{
    "user_id": 251,
    "first_name": "Bill",
    "last_name": "Gates",
    "positions": [
        {"job_title": "Co-chair", "organization": "Bill & Melinda Gates Foundation"},
        {"job_title": "Co-founder, Chairman", "organization": "Microsoft"}
    ],
    "education": [
        {"school_name": "Harvard University", "start": 1973, "end": 1975},
        {"school_name": "Lakeside School, Seattle"}
    ]
}

Schema Approaches

Schema-on-Write (Relational)

Explicit schema enforced by the database
Data must conform to predefined structure
Rigorous data integrity

Schema-on-Read (Document)

Implicit schema interpreted when data is read
More flexible and adaptable
Structure not enforced at the database level

When to Choose Relational Databases

Complex Relationships and Joins
Strong Data Integrity and Consistency
ACID Transactions
Many-to-one/Many-to-many relationships (If you have many many-to-many relationships, you may also consider graph databases)

When to Choose Document Databases

Schema changes frequently
Entire document needs to be accessed in a single read
Data has a document-like structure (tree of one-to-many relationships)

Query Languages

SQL Advantages

Declarative nature
Specifies result pattern, not querying mechanism
Easier to parallelize across multiple machines

Alternative Query Approaches

MapReduce: Low-level distributed execution model
Cypher: Declarative query language for graph databases

Graph Data Model

Particularly suitable for:

Data with many-to-many relationships
Complex interconnected data structures
Applications requiring advanced relationship tracking

Practical Considerations

Performance Limitations

Keep documents relatively small
Avoid writes that significantly increase document size
Be cautious of potential performance bottlenecks

Future of Data Models

A hybrid approach is emerging, with:

Relational and document models converging
Polyglot persistence (using multiple database types)
Increased flexibility and specialized data storage solutions (graph databases)

Contribute to this article here.