URL Shortener

When discussing the system design of a URL shortener like TinyURL or Bit.ly, ensure you cover the following key aspects. Structuring your notes around these will help you stay organized during the interview.

Requirements

Functional Requirements

Shorten a given URL and return a unique, shortened version
Redirect users to the original URL when they use the shortened URL.
Custom alias support
Optionally, allow expiration and analytics (clicks, user analytics, etc.).

Non-Functional Requirements

Focus on availability over consistency: The service should always be up.
Low Latency: Redirection should be fast.
Scalability: Support millions or billions of URLs.
Durability: Store URLs reliably to prevent data loss.

Estimation and Capacity Planning

QPS (Queries Per Second): Estimate reads and writes (e.g., 10:1 ratio).
Storage Requirements: Assume average URL length and calculate storage needs (e.g., 1 billion URLs).

Entities

Original URL
Shortened URL
User

API Design

POST /shorten -> returns a shortened URL.
- { originalUrl, customAlias?, expirationTime? }
GET {shortUrl}-> Redirects to the original URL.
- HTTP 302 (temporary redirect) instead of HTTP 301 (permanent redirect which browser caches)
Optional APIs for stats or management:
- GET /stats/{shortUrl}.

b) Database Schema

ShortUrl (short_key, original_url, created_at, expiration_date)
Index on short_key for fast lookups.

c) URL Shortening Logic

Short Key Generation:
- Use a Base62 encoding scheme (characters a-z, A-Z, 0-9) to keep keys short and human-readable.
  - A 6-character Base62 key can represent over 56 billion unique URLs (62⁶).
- Counter/Sequence: Increment a global counter and encode the value in Base62.
  - A single Redis instance on modern hardware can handle 80,000 to 100,000 operations per second (OPS) for simple commands like INCR, assuming:
    Redis is running on dedicated high-performance hardware.
    The network is low-latency and high-bandwidth.
    There are no significant memory or disk I/O bottlenecks.
Handle collisions in case of duplicate keys.

d) Data Storage

Choose a storage system based on scale:
- Relational Databases: MySQL/PostgreSQL for small-scale systems.
- NoSQL Databases: DynamoDB, Cassandra, or MongoDB for large-scale systems.
- In-memory caching (Redis or Memcached) for frequently accessed data.

e) Redirection

Use HTTP 301 (Moved Permanently) or 302 (Found) for redirection.

Scaling the System

a) Read/Write Patterns

Reads will dominate writes (many more redirections than shortenings).
Optimize for high-read throughput using caching.

b) Partitioning and Replication

Use database replication for durability, fault tolerance and to allow scaling reads horizontally.
Partition data by the short key to distribute load (consistent hashing).

c) Caching

Use Redis or Memcached to cache mappings of frequently accessed short_key -> original_url.

d) Rate Limiting

Implement rate limiting to prevent abuse (e.g., spamming the shorten API).

High-Level Architecture

Trade-offs and Challenges

Key Length: Shorter keys are user-friendly but increase the chance of collisions.
Consistency: Balancing consistency and availability (CAP theorem).
Redundancy: Ensure data is not lost due to server failures.
Performance: Optimize key lookup and redirection latency.

PreviousAdvanced Topics

Last updated 7 months ago

Was this helpful?