URL Shortener
When discussing the system design of a URL shortener like TinyURL or Bit.ly, ensure you cover the following key aspects. Structuring your notes around these will help you stay organized during the interview.
Requirements
Functional Requirements
Shorten a given URL and return a unique, shortened version
Redirect users to the original URL when they use the shortened URL.
Custom alias support
Optionally, allow expiration and analytics (clicks, user analytics, etc.).
Non-Functional Requirements
Focus on availability over consistency: The service should always be up.
Low Latency: Redirection should be fast.
Scalability: Support millions or billions of URLs.
Durability: Store URLs reliably to prevent data loss.
Estimation and Capacity Planning
QPS (Queries Per Second): Estimate reads and writes (e.g., 10:1 ratio).
Storage Requirements: Assume average URL length and calculate storage needs (e.g., 1 billion URLs).
Entities
Original URL
Shortened URL
User
API Design
POST
/shorten
-> returns a shortened URL.{ originalUrl, customAlias?, expirationTime? }
GET
{shortUrl}
-> Redirects to the original URL.HTTP 302 (temporary redirect) instead of HTTP 301 (permanent redirect which browser caches)
Optional APIs for stats or management:
GET
/stats/{shortUrl}
.
b) Database Schema
ShortUrl (short_key, original_url, created_at, expiration_date)
Index on
short_key
for fast lookups.
c) URL Shortening Logic
Short Key Generation:
Use a Base62 encoding scheme (characters
a-z
,A-Z
,0-9
) to keep keys short and human-readable.A 6-character Base62 key can represent over 56 billion unique URLs (62⁶).
Counter/Sequence: Increment a global counter and encode the value in Base62.
A single Redis instance on modern hardware can handle 80,000 to 100,000 operations per second (OPS) for simple commands like
INCR
, assuming:Redis is running on dedicated high-performance hardware.
The network is low-latency and high-bandwidth.
There are no significant memory or disk I/O bottlenecks.
Handle collisions in case of duplicate keys.
d) Data Storage
Choose a storage system based on scale:
Relational Databases: MySQL/PostgreSQL for small-scale systems.
NoSQL Databases: DynamoDB, Cassandra, or MongoDB for large-scale systems.
In-memory caching (Redis or Memcached) for frequently accessed data.
e) Redirection
Use HTTP 301 (Moved Permanently) or 302 (Found) for redirection.
Scaling the System
a) Read/Write Patterns
Reads will dominate writes (many more redirections than shortenings).
Optimize for high-read throughput using caching.
b) Partitioning and Replication
Use database replication for durability, fault tolerance and to allow scaling reads horizontally.
Partition data by the short key to distribute load (consistent hashing).
c) Caching
Use Redis or Memcached to cache mappings of frequently accessed
short_key -> original_url
.
d) Rate Limiting
Implement rate limiting to prevent abuse (e.g., spamming the shorten API).
High-Level Architecture
Trade-offs and Challenges
Key Length: Shorter keys are user-friendly but increase the chance of collisions.
Consistency: Balancing consistency and availability (CAP theorem).
Redundancy: Ensure data is not lost due to server failures.
Performance: Optimize key lookup and redirection latency.
Last updated