本书由图灵奖得主Patterson和Hennessy联袂撰写,是计算机体系结构新黄金时代之作。根据读者的需求,这一版将RV64切换为RV32,减少10条指令,降低学习难度;新增关于领域定制体系结构(DSA)的讨论,使用Google的TPUv1作为示例,还新增了TPUv3 DSA超级计算机与NVIDIA Volta GPU集群的比较;每一章都增加了“性能提升”一节,分别采用数据级并行、指令级并行、线程级并行等方法,仅增加21行代码便使矩阵乘法程序加速近50 000倍,直观呈现出硬件对提高能效的重要性。
戴维·A. 帕特森(David A. Patterson)
自1977年加入加州大学伯克利分校以来,他一直在该校教授计算机体系结构课程,并在那里担任计算机科学Pardee教席。他曾因教学工作获得加州大学杰出教学奖、ACM Karlstrom奖、IEEE Mulligan教育奖章以及IEEE本科教学奖。因为对RISC的贡献,Patterson获得了IEEE技术进步奖和ACM Eckert-Mauchly奖,并因为对RAID的贡献分享了IEEE Johnson信息存储奖。他和Hennessy共同获得了IEEE John von Neumann奖章以及C&C奖金。与Hennessy一样,Patterson是美国国家工程院、美国国家科学院、美国艺术与科学院和计算机历史博物馆院士,ACM和IEEE会士,并入选了硅谷工程名人堂。他曾担任加州大学伯克利分校电气工程与计算机科学(EECS)系计算机科学分部主任、计算研究学会主席和ACM主席。这些工作使他获得了ACM、CRA以及SIGARCH的杰出服务奖。他因在科学普及和计算多样化方面的贡献而获得了Tapia成就奖,并与Hennessy共同获得了2017年ACM图灵奖。
在伯克利,Patterson领导了RISC I的设计与实现工作,这可能是第一台VLSI精简指令系统计算机,为商用SPARC体系结构奠定了基础。他也是廉价磁盘冗余阵列(RAID)项目的领导者,RAID技术引导许多公司开发出了高可靠的存储系统。他还参加了工作站网络(NOW)项目,正是因为该项目,才有了被互联网公司广泛使用的集群技术以及后来的云计算。这些项目获得了四个ACM最佳论文奖。2016年,他成为伯克利的荣休教授和谷歌杰出工程师,在谷歌,他致力于面向机器学习的领域定制体系结构的研究工作。他还是RISC-V国际协会副主席和RISC-V国际开源实验室主任。
约翰·L.亨尼斯(John L. Hennessy)
斯坦福大学第十任校长,从1977年开始任教于该校电气工程与计算机科学系。Hennessy是IEEE和ACM会士,美国国家工程院、美国国家科学院、美国哲学院以及美国艺术与科学院院士。Hennessy获得的众多奖项包括:2001年ACM Eckert-Mauchly奖(因对RISC的贡献),2001年Seymour Cray计算机工程奖,2000年与Patterson共同获得IEEE John von Neumann奖章,2017年又与Patterson共同获得ACM图灵奖。他还获得了七个荣誉博士学位。
1981年,Hennessy带领几位研究生在斯坦福大学开始研究MIPS项目。1984年完成该项目后,他暂时离开大学,与他人共同创建了MIPS Computer Systems公司(现在的MIPS Technologies公司),该公司开发了早期的商用 RISC 微处理器之一。2006年,已有超过20亿个MIPS微处理器应用在从视频游戏和掌上计算机到激光打印机和网络交换机的各类设备中。Hennessy后来领导了共享存储器体系结构(DASH)项目,该项目设计了第一个可扩展cache一致性多处理器原型,其中的很多关键思想都在现代多处理器中得到了应用。除了参与科研活动和履行学校职责之外,Hennessy还作为前期顾问和投资者参与了很多初创项目,为相关领域学术成果的商业化做出了杰出贡献。
他目前是Knight-Hennessy学者奖学金项目的主管,并担任Alphabet的非执行董事长。
目錄:
Contents
CHAPTERS
Computer Abstractions and Technology 2
1.1 Introduction 3
1.2 Seven Great Ideas in Computer Architecture 10
1.3 Below Your Program 13
1.4 Under the Covers 16
1.5 Technologies for Building Processors and Memory 25
1.6 Performance 29
1.7 The Power Wall 40
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43
1.9 Real Stuff: Benchmarking the Intel Core i7 46
1.10 Going Faster: Matrix Multiply in Python 49
1.11 Fallacies and Pitfalls 50
1.12 Concluding Remarks 53
1.13 Historical Perspective and Further Reading 55
1.14 Self-Study 55
1.15 Exercises 59
Instructions: Language of the Computer 66
2.1 Introduction 68
2.2 Operations of the Computer Hardware 69
2.3 Operands of the Computer Hardware 73
2.4 Signed and Unsigned Numbers 80
2.5 Representing Instructions in the Computer 87
2.6 Logical Operations 95
2.7 Instructions for Making Decisions 98
2.8 Supporting Procedures in Computer Hardware 104
2.9 Communicating with People 114
2.10 RISC-V Addressing for Wide Immediates and Addresses 120
2.11 Parallelism and Instructions: Synchronization 128
2.12 Translating and Starting a Program 131
2.13 A C Sort Example to Put it All Together 140
2.14 Arrays versus Pointers 148
2.15 Advanced Material: Compiling C and Interpreting Java 151
2.16 Real Stuff: MIPS Instructions 152
2.17 Real Stuff: ARMv7 (32-bit) Instructions 153
2.18 Real Stuff: ARMv8 (64-bit) Instructions 157
2.19 Real Stuff: x86 Instructions 158
2.20 Real Stuff: The Rest of the RISC-V Instruction Set 167
2.21 Going Faster: Matrix Multiply in C 168
2.22 Fallacies and Pitfalls 170
2.23 Concluding Remarks 172
2.24 Historical Perspective and Further Reading 174
2.25 Self-Study 175
2.26 Exercises 178
Arithmetic for Computers 188
3.1 Introduction 190
3.2 Addition and Subtraction 190
3.3 Multiplication 193
3.4 Division 199
3.5 Floating Point 208
3.6 Parallelism and Computer Arithmetic: Subword Parallelism 233
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions
in x86 234
3.8 Going Faster: Subword Parallelism and Matrix Multiply 236
3.9 Fallacies and Pitfalls 238
3.10 Concluding Remarks 241
3.11 Historical Perspective and Further Reading 242
3.12 Self-Study 242
3.13 Exercises 246
The Processor 252
4.1 Introduction 254
4.2 Logic Design Conventions 258
4.3 Building a Datapath 261
4.4 A Simple Implementation Scheme 269
4.5 Multicycle Implementation 282
4.6 An Overview of Pipelining 283
4.7 Pipelined Datapath and Control 296
4.8 Data Hazards: Forwarding versus Stalling 313
4.9 Control Hazards 325
4.10 Exceptions 333
4.11 Parallelism via Instructions 340
4.12 Putting It All Together: The Intel Core i7 6700 and ARM
Cortex-A53 354
4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply 363
4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 365
4.15 Fallacies and Pitfalls 365
4.16 Concluding Remarks 367
4.17 Historical Perspective and Further Reading 368
4.18 Self-Study 368
4.19 Exercises 369
Large and Fast: Exploiting Memory Hierarchy 386
5.1 Introduction 388
5.2 Memory Technologies 392
5.3 The Basics of Caches 398
5.4 Measuring and Improving Cache Performance 412
5.5 Dependable Memory Hierarchy 431
5.6 Virtual Machines 436
5.7 Virtual Memory 440
5.8 A Common Framework for Memory Hierarchy 464
5.9 Using a Finite-State Machine to Control a Simple Cache 470
5.
內容試閱:
Preface
The most beautiful thing we can experience is the mysterious. It is the source of all true art and science.
Albert Einstein, What I Believe, 1930
About This Book
We believe that learning in computer science and engineering should reflect the current state of the field, as well as introduce the principles that are shaping computing. We also feel that readers in every specialty of computing need to appreciate the organizational paradigms that determine the capabilities, performance, energy, and, ultimately, the success of computer systems.
Modern computer technology requires professionals of every computing specialty to understand both hardware and software. The interaction between hardware and software at a variety of levels also offers a framework for understanding the fundamentals of computing. Whether your primary interest is hardware or software, computer science or electrical engineering, the central ideas in computer organization and design are the same. Thus, our emphasis in this book is to show the relationship between hardware and software and to focus on the concepts that are the basis for current computers.
The recent switch from uniprocessor to multicore microprocessors confirmed the soundness of this perspective, given since the first edition. While programmers could ignore the advice and rely on computer architects, compiler writers, and silicon engineers to make their programs run faster or be more energy-efficient without change, that era is over. For programs to run faster, they must become parallel. While the goal of many researchers is to make it possible for programmers to be unaware of the underlying parallel nature of the hardware they are programming, it will take many years to realize this vision. Our view is that for at least the next decade, most programmers are going to have to understand the hardware/software interface if they want programs to run efficiently on parallel computers.
The audience for this book includes those with little experience in assembly language or logic design who need to understand basic computer organization as well as readers with backgrounds in assembly language and/or logic design who want to learn how to design a computer or understand how a system works and why it performs as it does.
About the Other Book
Some readers may be familiar with Computer Architecture: A Quantitative Approach, popularly known as Hennessy and Patterson. (This book in turn is often called Patterson and Hennessy.) Our motivation in writing the earlier book was to describe the principles of computer architecture using solid engineering fundamentals and quantitative cost/performance tradeoffs. We used an approach that combined examples and measurements, based on commercial systems, to create realistic design experiences. Our goal was to demonstrate that computer architecture could be learned using quantitative methodologies instead of a descriptive approach. It was intended for the serious computing professional who wanted a detailed understanding of computers.
A majority of the readers for this book do not plan to become computer architects. The performance and energy efficiency of future software systems will be dramatically affected, however, by how well software designers understand the basic hardware techniques at work in a system. Thus, compiler writers, operating system designers, database programmers, and most other software engineers need a firm grounding in the principles presented in this book. Similarly, hardware designers must understand clearly the effects of their work on software applications.
Thus, we knew that this book had to be much more than a subset of the material in Computer Architecture, and the material was extensively revised to match the different audience. We were so happy with the result that the subsequent editions of Computer Architecture were revised to remove most of the introductory material; hence, there is much less overlap today than with the first editions of both books.