URL Shortener

When discussing the system design of a URL shortener like TinyURL or Bit.ly, ensure you cover the following key aspects. Structuring your notes around these will help you stay organized during the interview.


Requirements

Functional Requirements

  • Shorten a given URL and return a unique, shortened version

  • Redirect users to the original URL when they use the shortened URL.

  • Custom alias support

  • Optionally, allow expiration and analytics (clicks, user analytics, etc.).

Non-Functional Requirements

  • Focus on availability over consistency: The service should always be up.

  • Low Latency: Redirection should be fast.

  • Scalability: Support millions or billions of URLs.

  • Durability: Store URLs reliably to prevent data loss.


Estimation and Capacity Planning

  • QPS (Queries Per Second): Estimate reads and writes (e.g., 10:1 ratio).

  • Storage Requirements: Assume average URL length and calculate storage needs (e.g., 1 billion URLs).


Entities

  • Original URL

  • Shortened URL

  • User


API Design

  • POST /shorten -> returns a shortened URL.

    • { originalUrl, customAlias?, expirationTime? }

  • GET {shortUrl}-> Redirects to the original URL.

    • HTTP 302 (temporary redirect) instead of HTTP 301 (permanent redirect which browser caches)

  • Optional APIs for stats or management:

    • GET /stats/{shortUrl}.

b) Database Schema

  • ShortUrl (short_key, original_url, created_at, expiration_date)

  • Index on short_key for fast lookups.

c) URL Shortening Logic

  • Short Key Generation:

    • Use a Base62 encoding scheme (characters a-z, A-Z, 0-9) to keep keys short and human-readable.

      • A 6-character Base62 key can represent over 56 billion unique URLs (62⁶).

    • Counter/Sequence: Increment a global counter and encode the value in Base62.

      • A single Redis instance on modern hardware can handle 80,000 to 100,000 operations per second (OPS) for simple commands like INCR, assuming:

        • Redis is running on dedicated high-performance hardware.

        • The network is low-latency and high-bandwidth.

        • There are no significant memory or disk I/O bottlenecks.

  • Handle collisions in case of duplicate keys.

d) Data Storage

  • Choose a storage system based on scale:

    • Relational Databases: MySQL/PostgreSQL for small-scale systems.

    • NoSQL Databases: DynamoDB, Cassandra, or MongoDB for large-scale systems.

    • In-memory caching (Redis or Memcached) for frequently accessed data.

e) Redirection

  • Use HTTP 301 (Moved Permanently) or 302 (Found) for redirection.


Scaling the System

a) Read/Write Patterns

  • Reads will dominate writes (many more redirections than shortenings).

  • Optimize for high-read throughput using caching.

b) Partitioning and Replication

  • Use database replication for durability, fault tolerance and to allow scaling reads horizontally.

  • Partition data by the short key to distribute load (consistent hashing).

c) Caching

  • Use Redis or Memcached to cache mappings of frequently accessed short_key -> original_url.

d) Rate Limiting

  • Implement rate limiting to prevent abuse (e.g., spamming the shorten API).


High-Level Architecture


Trade-offs and Challenges

  • Key Length: Shorter keys are user-friendly but increase the chance of collisions.

  • Consistency: Balancing consistency and availability (CAP theorem).

  • Redundancy: Ensure data is not lost due to server failures.

  • Performance: Optimize key lookup and redirection latency.

Last updated