系统设计的关键点在于知识面广 + 考虑全面,这需要经验做支撑,但通过多学习案例、套路是可以弥补的
1. 答题技巧 & 策略
2. 重要数据指标
3. 技术点
4. 问题集 & 经典案例
5. 学习资料
这里推荐的课程,资料,全部都在 b 站、油管、TG、BT、网盘, 自行寻找 🈶️ 💯 🆓
1. 答题技巧 & 策略
和算法面试相比,系统设计的问题和解答都比较发散,无固定线路,过程更像是不断的发现和讨论,不存粹是找出答案。
Design Gurus 的 6Step 六步解题法:
1. 明确需求
2. 定义系统接口
3. 粗略估算
4. 定义数据模型
5. 总体概要设计
6. 详细设计
7. 系统瓶颈,开放问题 & 总结陈述
九章总结的 4S 分析法:
- Scenario 场景:需要设计哪些功能,到什么程度?
- who will use
- how many will use
- usage pattern
- use case not covered
- estimated throughput
- estimated latency
- Service 服务:将大系统拆分为小服务
- api for read/write scenarios
- schema
- Storage 存储:数据如何存储与访问
- data size
- read/write ratio
- read/write traffic
- rdbms/nosql
- Scale 升级:解决缺陷,处理可能遇到的问题
- scaling the algorithm
- scaling the individual component
- memory cache
- DNS, CDN, reverse proxy, load balancer, ...
- async: message queue, back pressure, time & order, ...
- communiction: tcp, udp, rest, rpc, ...
2. 重要数据指标
具体单项数据应该不用死记,下面的数据用于估算:
吞吐量(throughput)
Web 服务器的 QPS:1000
RDB 单机 QPS:1000
NoSQL DB 磁盘单机 QPS:10K
内存访问单机 QPS:1M
1 million reqs/month --> .4 reqs/sec
2.6 million reqs/month --> 1 reqs/sec
5 million reqs/month --> 2 reqs/sec
10 million reqs/month --> 4 reqs/sec
100 million reqs/month --> 40 reqs/sec
1 billion reqs/month --> 400 reqs/sec
系统吞吐量有三个考量因素:
- QPS/TPS
- Concurrency
- Response Time
QPS/TPS = concurrency/average response time
延迟(latency)
CPU 访问(包括 CPU 缓存):10ns
L1 缓存:0.5ns
L2 缓存:5ns
主内存访问:100ns
同机房网络时延:1ms
异地网络时延:10ms
国际网络时延:100ms
HDD 磁盘搜索:10ms
HDD 磁盘访问:10ms,如果是 SSD 大约快 100 倍
HDD 磁盘吞吐量:100 MB/s,如果是 SSD 则高几倍
1MB 内存连续读取: 250us
1MB 网络连续读取: 10,000us = 10ms
1MB 磁盘连续读取: 30,000us = 30ms
Round-trips within a data center at 2,000 trips/sec.
World-wide round trips at 6-7 trips/sec.
常用的估算指标包括:QPS peak QPS(QPS 峰值) storage cache 服务器数量
举例,如果一个页面要包含 30 个缩略图,图的大小为 256kb,如果顺序读取,那么花在磁盘的的时间就是:
30 seek * 10ms/seek + 30 * 256kb / 30mb/s = 560ms
如果并行读取:
10ms/seek + 256kb / 30mb/s = 18ms
End-to-end latency = client processing latency + Network latency + server processing latency
高可用
System Availability = Uptime ÷ (Uptime + Downtime)
SLA, SLI, SLO
SLA 通常是一个比较商业化的指标,他们之间的关系和区别用一个例子解释就是 SLA 就是你信用卡的额度,SLO 就是你的预算(SLA 决定 SLO),SLI 就是你的各项实际花费,用来监控是否符合预算。常用的 SLI 包括:
- uptime of service
- no. of transaction
- latency
- error rate
- throughput
- response time
- durability
3. 技术点
CDN
、Cache
、Proxy
、Load Balance
、API Gateway
、Service Mesh
、Rest
、Graphicql
、RPC
、Messaging Queue
、Pub/Sub
, Pull/Push
、Vertical Scaling
、Horizontal Scaling
、Sharding
、Partitioning
、NoSQL
、Indexing
、Kev-Value
、multi-datacenter replication
、Consistent Hashing
、Distribute ID/UUID
、ACID
、CAP
、Paxio
、Raft
,展开的话范围相当的广,只能平时积累,逐个突破:
互联网应用的典型架构,架构是互通的,了解后至少懂得大方向,参考 👉 淘宝服务端高并发分布式架构演进之路
4. 问题集 & 经典案例
突击 or 有架构经验的话,我觉得只要认真研读案例即可:
问题/案例 | GitHub | Youtube | 我的解答 ⏳ |
---|---|---|---|
Airbnb(共享服务) | |||
Amazon,eBay,Shopify(电商) | 解答 | codeKarle | |
Dropbox(网络文件) | Dropbox Google Doc S3 | ||
Facebook Newsfeed | |||
Facebook Social | |||
Instagram(图片分享) | 解答 | Gaurav Sen | |
Parking Lot | |||
Search Engine | |||
TikTok(短视频分享) | |||
Tinder(约会网站) | |||
TinyUrl, Bit.ly(短链接) | 解答 | Tech Dummies Narendra L | |
解答 | 花花酱 Tech Dummies Narendra L | ||
Uber, Yelp(基于地理位置服务) | Tech Dummies Narendra L | ||
Web Crawler(爬虫) | 解答 | Tech Dummies Narendra L | |
WhatsApp, Facebook Messagener, WeChat | Gaurav Sen | ||
YouTube, Netflix, Bilibili | 花花酱 Gaurav Sen Tech Dummies Narendra L | ||
Zoom | codeKarle | ||
API Rate Limiter | 解答 | Gaurav Sen | |
Limited Order Book | 解答 |
5. 学习资料
系统设计课程(英文)
Design Gurus:We are a team of senior engineers and managers from Facebook, Google, Microsoft, Lyft, and Amazon
Educative: Scalability & System Design for Developers by Design Gurus
As you progress in your career as a developer, you’ll be increasingly expected to think about software architecture. Can you design systems and make trade-offs at scale? Developing that muscle is a great way to set yourself apart from the pack. In this learning path, you’ll cover everything you need to know to design scalable systems for enterprise-level software. Buckle in.
Educative: Grokking the System Design Interview by Design Gurus
System design questions have become a standard part of the software engineering interview process. Performance in these interviews reflects upon your ability to work with complex systems and translates into the position and salary the interviewing company offers you. Unfortunately, most engineers struggle with SDI, partly because of their lack of experience in developing large-scale systems and partly because of the unstructured nature of SDIs. Even engineers who’ve some experience building such systems aren’t comfortable with these interviews, mainly due to the open-ended nature of design problems that don’t have a standard answer.
This course is a complete guide to master SDIs. It is created by hiring managers who’ve been working at Google, Facebook, Microsoft, and Amazon. We’ve carefully chosen a set of questions that have been repeatedly asked at top companies.
Educative: Grokking the Advanced System Design Interview by Design Gurus
System design questions have increasingly become an integral part of software engineering interviews. For senior engineers, the discussion around system design is considered even more important than solving a coding question. In a system design interview, you can show your real design skills and show how they will work with designing complex systems. It is a given that a good performance in system design interviews will get you a senior position and result in higher salaries.
This course presents the architectural review of famous distributed systems. The main goal is to extract out important design details that are relevant to system design interviews. The course also presents a list of system design patterns that constitute the common design problems and their solutions that different distributed systems have developed over time.
系统设计课程(中文)
专栏结合拉勾招聘大数据调研,根据名企面试会考到的架构原理、分布式技术、中间件、数据库、缓存、业务系统架构 6 个方面依次展开,结合具体的面试场景,从案例背景、案例分析、原理剖析、解答方法等层面,由浅入深地为你分享实践经验
九章系统架构设计 System Design 2021 版
油管视频成为百万架构师必上,34 课时带你快速掌握 16 大系统架构设计知识点与面试考点
X Code:系统设计
油管视频古城算法:系统设计
油管视频
GitHub 资料
GitHub 上的资料 - 除了拿来面试,也是学习架构的大宝藏 💎: