# URL Shortener

When discussing the system design of a URL shortener like TinyURL or Bit.ly, ensure you cover the following key aspects. Structuring your notes around these will help you stay organized during the interview.

***

## Requirements

### **Functional Requirements**

* Shorten a given URL and return a unique, shortened version
* Redirect users to the original URL when they use the shortened URL.
* Custom alias support
* Optionally, allow expiration and analytics (clicks, user analytics, etc.).

### **Non-Functional Requirements**

* **Focus on availability over consistency**: The service should always be up.
* **Low Latency**: Redirection should be fast.
* **Scalability**: Support millions or billions of URLs.
* **Durability**: Store URLs reliably to prevent data loss.

***

## **Estimation and Capacity Planning**

* **QPS (Queries Per Second)**: Estimate reads and writes (e.g., 10:1 ratio).
* **Storage Requirements**: Assume average URL length and calculate storage needs (e.g., 1 billion URLs).

***

## Entities

* Original URL
* Shortened URL
* User

***

## **API Design**

* **POST `/shorten`** -> returns a shortened URL.
  * { originalUrl, customAlias?, expirationTime? }
* **GET `{shortUrl}`**->  Redirects to the original URL.
  * **HTTP 302 (temporary redirect)** instead of HTTP 301 (permanent redirect which browser caches)
* Optional APIs for stats or management:
  * **GET `/stats/{shortUrl}`**.

**b) Database Schema**

* `ShortUrl (short_key, original_url, created_at, expiration_date)`
* Index on `short_key` for fast lookups.

**c) URL Shortening Logic**

* **Short Key Generation**:
  * Use a **Base62** encoding scheme (characters `a-z`, `A-Z`, `0-9`) to keep keys short and human-readable.
    * A 6-character Base62 key can represent over 56 billion unique URLs (62⁶).
  * **Counter/Sequence**: Increment a global counter and encode the value in Base62.
    * A single Redis instance on modern hardware can handle **80,000 to 100,000 operations per second (OPS)** for simple commands like `INCR`, assuming:
      * Redis is running on dedicated high-performance hardware.
      * The network is low-latency and high-bandwidth.
      * There are no significant memory or disk I/O bottlenecks.
* Handle **collisions** in case of duplicate keys.

**d) Data Storage**

* Choose a storage system based on scale:
  * **Relational Databases**: MySQL/PostgreSQL for small-scale systems.
  * **NoSQL Databases**: DynamoDB, Cassandra, or MongoDB for large-scale systems.
  * In-memory caching (Redis or Memcached) for frequently accessed data.

**e) Redirection**

* Use **HTTP 301 (Moved Permanently)** or **302 (Found)** for redirection.

***

## **Scaling the System**

**a) Read/Write Patterns**

* Reads will dominate writes (many more redirections than shortenings).
* Optimize for high-read throughput using caching.

**b) Partitioning and Replication**

* Use database replication for durability, fault tolerance and to allow scaling reads horizontally.
* Partition data by the short key to distribute load (consistent hashing).

**c) Caching**

* Use Redis or Memcached to cache mappings of frequently accessed `short_key -> original_url`.

**d) Rate Limiting**

* Implement rate limiting to prevent abuse (e.g., spamming the shorten API).

***

## **High-Level Architecture**

<figure><img src="https://1588585907-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MTwgToRvLjYdjfpAVgP%2Fuploads%2FEIWBBvCwn9nUMo23xNmD%2Fimage.png?alt=media&#x26;token=11c091f5-9749-4715-9cb6-931a8492c49b" alt=""><figcaption></figcaption></figure>

***

## **Trade-offs and Challenges**

* **Key Length**: Shorter keys are user-friendly but increase the chance of collisions.
* **Consistency**: Balancing consistency and availability (CAP theorem).
* **Redundancy**: Ensure data is not lost due to server failures.
* **Performance**: Optimize key lookup and redirection latency.
