Foreword
Introduction
1 Probability basics
1.1 Events,event space,and probabilities
1.2 Combinatorics
1.3 Combined,joint,and conditional probabilities
1.4 Exercises
2 Probability distributions
2.1 Mean and variance
2.2 Exponential,Poisson,and binomial distributions
2.3 Continuous distributions
2.4 Uniform,exponential,and Gaussiannormaldistributions
2.5 Central-limit theorem
2.6 Exercises
3 Measuring information
3.1 Making sense of information
3.2 Measuring information
3.3 Information bits
3.4 Rényi?s fake coin
3.5 Exercises
4 Entropy
4.1 From Boltzmann to Shannon
4.2 Entropy in dice
4.3 Language entropy
4.4 Maximum entropydiscrete source
4.5 Exercises
5 Mutual information and more entropies
5.1 Joint and conditional entropies
5.2 Mutual information
5.3 Relative entropy
5.4 Exercises
6 Differential entropy
6.1 Entropy of continuous sources
6.2 Maximum entropycontinuous source
6.3 Exercises
7 Algorithmic entropy and Kolmogorov complexity
7.1 Defining algorithmic entropy
7.2 The Turing machine
7.3 Universal Turing machine
7.4 Kolmogorov complexity
7.5 Kolmogorov complexity vs. Shannon?s entropy
7.6 Exercises
8 Information coding
8.1 Coding numbers
8.2 Coding language
8.3 The Morse code
8.4 Mean code length and coding efficiency
8.5 Optimizing coding efficiency
8.6 Shannon?s source-coding theorem
8.7 Exercises
9 Optimal coding and compression
9.1 Huffman codes
9.2 Data compression
9.3 Block codes
9.4 Exercises
10 Integer,arithmetic,and adaptive coding
10.1 Integer coding
10.2 Arithmetic coding
10.3 Adaptive Huffman coding
10.4 Lempel-Ziv coding
10.5 Exercises
11 Error correction
11.1 Communication channel
11.2 Linear block codes
11.3 Cyclic codes
11.4 Error-correction code types
11.5 Corrected bit-error-rate
11.6 Exercises
12 Channel entropy
12.1 Binary symmetric channel
12.2 Nonbinary and asymmetric discrete channels
12.3 Channel entropy and mutual information
12.4 Symbol error rate
12.5 Exercises
13 Channel capacity and coding theorem
13.1 Channel capacity
13.2 Typical sequences and the typical set
13.3 Shannon?s channel coding theorem
13.4 Exercises
14 Gaussian channel and Shannon-Hartley theorem
14.1 Gaussian channel
14.2 Nonlinear channel
14.3 Exercises
15 Reversible computation
15.1 Maxwell?s demon and Landauer?s principle
15.2 From computer architecture to logic gates
15.3 Reversible logic gates and computation
15.4 Exercises
16 Quantum bits and quantum gates
16.1 Quantum bits
16.2 Basic computations with 1-qubit quantum gates
16.3 Quantum gates with multiple qubit inputs and outputs
16.4 Quantum circuits
16.5 Tensor products
16.6 Noncloning theorem
16.7 Exercises
17 Quantum measurements
17.1 Dirac notation
17.2 Quantum measurements and types
17.3 Quantum measurements on joint states
17.4 Exercises
18 Qubit measurements,superdense coding,and quantum
teleportation
18.1 Measuring single qubits
18.2 Measuring n-qubits
18.3 Bell state measurement
18.4 Superdense coding
18.5 Quantum teleportation
18.6 Distributed quantum computing
18.7 Exercises
19 Deutsch-Jozsa,quantum Fourier transform,and Grover quantum
database search algorithms
19.1 Deutsch algorithm
19.2 Deutsch-Jozsa algorithm
19.3 Quantum Fourier transform algorithm
19.4 Grover quantum database search algorithm
19.5 Exercises
20 Shor?s factorization algorithm
20.1 Phase estimation
20.2 Order finding
20.3 Continued fraction expansion
20.4 From order finding to factorization
20.5 Shor?s factorization algorithm
20.6 Factorizing N=15 and other nontrivial composites
20.7 Public-key cryptography
20.8 Exercises
21 Quantum information theory
21.1 Von Neumann entropy
21.2 Relative,joint,and conditional entropy,and mutual
information
21.3 Quantum communication channel and Holevo bound
21.4 Exercises
22 Quantum data compression
22.1 Quantum data compression and fidelity
22.2 Schumacher?s quantum coding theorem
22.3 A graphical and numerical illustration of Schumacher?s quantum
coding theorem
22.4 Exercises
23 Quantum channel noise and channel capacity
23.1 Noisy quantum channels
23.2 The Holevo-Schumacher-Westmoreland capacity theorem
23.3 Capacity of some quantum channels
23.4 Exercises
24 Quantum error correction
24.1 Quantum repetition code
24.2 Shor code
24.3 Calderbank-Shor-SteineCSScodes
24.4 Hadamard-Steane code
24.5 Exercises
25 Classical and quantum cryptography
25.1 Message encryption,decryption,and code breaking
25.2 Encryption and decryption with binary numbers
25.3 Double-key encryption
25.4 Cryptography without key exchange
25.5 Public-key cryptography and RSA
25.6 Data encryption standardDESand advanced encryption
standardAES
25.7 Quantum cryptography
25.8 Electromagnetic waves,polarization states,photons,and quantum
measurements
25.9 A secure photon communication channel
25.10 The BB84 protocol for QKD
25.11 The B92 protocol
25.12 The EPR protocol
25.13 Is quantum cryptography?invulnerable??
Appendix AChapter 4Boltzmann’s entropy
Appendix BChapter 4Shannon’s entropy
Appendix CChapter 4Maximum entropy of discrete sources
Appendix DChapter 5Markov chains and the second law of
thermodynamics
Appendix EChapter 6From discrete to continuous entropy
Appendix FChapter 8Kraft-McMillan inequality
Appendix GChapter 9Overview of data compression standards
Appendix HChapter 10Arithmetic coding algorithm
Appendix IChapter 10Lempel-Ziv distinct parsing
Appendix JChapter 11Error-correction capability of linear block
codes
Appendix KChapter 13Capacity of binary communication
channels
Appendix LChapter 13Converse proof of the channel coding
theorem
Appendix MChapter 16Bloch sphere representation of the
qubit
Appendix NChapter 16Pauli matrices,rotations,and unitary
operators
Appendix OChapter 17Heisenberg uncertainty principle
Appendix PChapter 18Two-qubit teleportation
Appendix QChapter 19Quantum Fourier transform circuit
Appendix RChapter 20Properties of continued fraction
expansion
Appendix SChapter 20Computation of inverse Fourier transform in
the factorization of N=21 through Shor’s algorithm
Appendix TChapter 20Modular arithmetic and Euler’s theorem
Appendix UChapter 21Klein’s inequality
Appendix VChapter 21Schmidt decomposition of joint pure
states
Appendix WChapter 21State purification
Appendix XChapter 21Holevo bound
Appendix YChapter 25Polynomial byte representation and modular
multiplication
Index
內容試閱:
1 Probability basics
Because of the reader’s interest in information theory, it is
assumed that, to some extent,
he or she is relatively familiar with probability theory, its main
concepts, theorems, and
practical tools. Whether a graduate student or a confirmed
professional, it is possible,
however, that a good fraction, if not all of this background
knowledge has been somewhat
forgotten over time, or has become a bit rusty, or even worse,
completely obliterated by
one’s academic or professional specialization!
This is why this book includes a couple of chapters on probability
basics. Should
such basics be crystal clear in the reader’s mind, however, then
these two chapters could
be skipped at once. They can always be revisited later for backup,
should some of the
associated concepts and tools present any hurdles in the following
chapters. This being
stated, some expert readers may yet dare testing their knowledge by
considering some
of this chapter’s easy problems, for starters. Finally, any
parent or teacher might find
the first chapter useful to introduce children and teens to
probability.
I have sought to make this review of probabilities basics as
simple, informal, and
practical as it could be. Just like the rest of this book, it is
definitely not intended to be a
math course, according to the canonic theorem–proof–lemma–example
suite. There exist
scores of rigorous books on probability theory at all levels, as
well as many Internet sites
providing elementary tutorials on the subject. But one will find
there either too much or
too little material to approach Information Theory, leading to
potential discouragement.
Here, I shall be content with only those elements and tools that
are needed or are used in
this book. I present them in an original and straightforward way,
using fun examples. I
have no concern to be rigorous and complete in the academic sense,
but only to remain
accurate and clear in all possible simplifications. With this
approach, even a reader who
has had little or no exposure to probability theory should also be
able to enjoy the rest
of this book.
1.1 Events, event space, and probabilities
Aswe experience it, reality can be viewed as made of different
environments or situations
in time and space, where a variety of possible events may take
place. Consider dull
and boring life events. Excluding future possibilities, basic
events can be anything
like:
It is raining,
I miss the train,
Mom calls,
The check is in the mail,
The flight has been delayed,
The light bulb is burnt out,
The client signed the contract,
The team won the game.
Here, the events are defined in the present or past tense, meaning
that they are known
facts. These known facts represent something that is either true or
false, experienced
or not, verified or not. If I say, “Tomorrow will be raining,” this
is only an assumption
concerning the future, which may or may not turn out to be true
for that matter, weather
forecasts do not enjoy universal trust. Then tomorrow will tell,
with rain being a more
likely possibility among other ones. Thus, future events, as we may
expect them to come
out, are well defined facts associated with some degree of
likelihood. If we are amidst
the Sahara desert or in Paris on a day in November, then rain as an
event is associated
with a very low or a very high likelihood, respectively. Yet, that
day precisely it may
rain in the desert or it may shine in Paris, against all
preconceived certainties. To make
things even more complex and for that matter, to make life
exciting, a few other events
may occur, which weren’t included in any of our predictions.
Within a given environment of causes and effects, one can make a
list of all possible
events. The set of events is referred to as an event space also
called sample space.
The event space includes anything that can possibly happen.1 In the
case of a sports
match between opposing two teams, A and B, for instance, the basic
event space is the
four-element set:
with it being implicit that if team A wins, then team B loses, and
the reverse. We can
then say that the events “team A wins” and “team B loses” are
strictly equivalent, and
need not be listed twice in the event space. People may take bets
as to which team is
likely to win not without some local or affective bias. There may
be a draw, or the
game may be canceled because of a storm or an earthquake, in that
order of likelihood.
This pretty much closes the event space.
When considering a trial or an experiment, events are referred to
as outcomes. An
experiment may consist of picking up a card from a 32-card deck.
One out of the 32
possible outcomes is the card being the Queen of Hearts. The event
space associated
1 In any environment, the list of possible events is generally
infinite. One may then conceive of the event space
as a limited set of well defined events which encompass all known
possibilities at the time of the inventory.
If other unknown possibilities exist, then an event category called
“other” can be introduced to close the
event space.
with this experiment is the list of all 32 cards. Another
experiment may consist in
picking up two cards successively, which defines a different event
space, as illustrated in
Section 1.3, which concerns combined and joint events.
The probability is the mathematical measure of the likelihood
associated with a given
event. Thismeasure is called pevent. By definition, themeasure
ranges in a zero-to-one
scale. Consistently with this definition, pevent = 0 means that
the event is absolutely
unlikely or “impossible,” and pevent = 1 is absolutely
certain.
Let us not discuss here what “absolutely” or “impossible” might
really mean in
our physical world. As we know, such extreme notions are only
relative ones! Simply
defined, without purchasing a ticket, it is impossible to win the
lottery! And driving
50 mph above the speed limit while passing in front of a police
patrol leads to absolute
certainty of getting a ticket. Let’s leave alone the weak
possibilities of finding by chance
the winning lottery ticket on the curb, or that the police officer
turns out to be an old
schoolmate. That’s part of the event space, too, but let’s not
stretch reality too far. Let us
then be satisfied here with the intuitive notions that
impossibility and absolute certainty
do actually exist.
Next, formalize what has just been described. A set of different
events in a family
called x may be labeled according to a series x1, x2, . . . , xN ,
where N is the number of
events in the event space S = {x1, x2, . . . , xN }. The
probability pevent = xi , namely,
the probability that the outcome turns out to be the event xi, will
be noted pxi for
short.
In the general case, and as we well know, events are neither
“absolutely certain” nor
“impossible.” Therefore, their associated probabilities can be any
real number between
0 and 1. Formally, for all events xi belonging to the space S =
{x1, x2, . . . , xN }, we
have:
0 ≤ pxi ≤ 1. 1.2
Probabilities are also commonly defined as percentages. The event
is said to have
anything between a 0% chance impossible and a 100% chance
absolutely certain of
happening, which means strictly the same as using a 0–1 scale. For
instance, an election
poll will give a 55% chance of a candidate winning. It is
equivalent to saying that the
odds for this candidate are 55:45, or that pcandidate wins =
0.55.
As a fundamental rule, the sum of all probabilities associated with
an event space S
is equal to unity. Formally,
px1 + px2+· · · pxN =
i=N
i=1
pxi = 1. 1.3
In the above, the symbol in Greek, capital S or sigma implies the
summation of the
argument pxi with index i being varied from i = 1 to i = N, as
specified under and
above the sigma sign. This concise math notation is to be well
assimilated, as it will
be used extensively throughout this book. We can interpret the
above summation rule
according to:
It is absolutely certain that one event in the space will
occur.
This is another way of stating that the space includes all
possibilities, as for the game
space defined in Eq. 1.1. I will come back to this notion in
Section 1.3,when considering
combined probabilities.
But how are the probabilities calculated or estimated? The answer
depends on whether
or not the event space is well or completely defined. Assume first
for simplicity the first
case: we know for sure all the possible events and the space is
complete. Consider
then two familiar games: coin tossing and playing dice, which I am
going to use as
examples.
Coin tossing
The coin has two sides, heads and tails. The experiment of tossing
the coin has two
possible outcomes heads or tails, if we discard any possibility
that the coin rolls on the
floor and stops on its edge, as a third physical outcome! To be
sure, the coin’s mass is
also assumed to be uniformly distributed into both sides, and the
coin randomly flipped,
in such a way that no side is more likely to show up than the
other. The two outcomes
are said to be equiprobable. The event space is S = {heads, tails},
and, according to the
previous assumptions, pheads = ptails. Since the space includes
all possibilities, we
apply the rule in Eq. 1.3 to get pheads = ptails = 12 = 0.5.
The odds of getting
heads or tails are 50%. In contrast, a realistic coin mass
distribution and coin flip may
not be so perfect, so that, for instance, pheads = 0.55 and
ptails = 0.45.
Rolling dice game 1
Play first with a single die. The die has six faces numbered one to
six after their number
of spots.As for the coin, the die is supposed to land on one face,
excluding the possibility
however well observed in real life! that it may stop on one of
its eight corners after
stopping against an obstacle. Thus the event space is S = {1, 2, 3,
4, 5, 6}, and with the
equiprobability assumption, we have p1 = p2 = · · · = p6 =
16 ≈ 0.166 666 6.
Rolling dice game 2
Now play with two dice. The game consists in adding the spots
showing in the faces.
Taking successive turns between different players, the winner is
the one who gets the
highest count. The sum of points varies between 1 + 1 = 2 to 6 + 6
= 12, as illustrated
in Fig. 1.1. The event space is thus S = {2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12}, corresponding
to 36 possible outcomes. Here, the key difference from the two
previous examples is
that the events sum of spots are not equiprobable. It is, indeed,
seen from the figure
that there exist six possibilities of obtaining the number x = 7,
while there is only one
possibility of obtaining either the number x = 2 or the number x =
12. The count of
possibilities is shown in the graph in Fig. 1.2a.
Such a graph is referred to as a histogram. If one divides the
number of counts by
the total number of possibilities here 36, one obtains the
corresponding probabilities.
For instance, px = 2 = px = 22 = 136 = 0.028, and px = 7 =
636 = 0.167.
The different probabilities are plotted in Fig. 1.2b. To complete
the plot, we have
Figure 1.1 The 36 possible outcomes of counting points from casting
two dice.
Figure 1.2 a Number of possibilities associated with each
possible outcome of casting two dice,
b corresponding probability distribution.
included the two count events x = 1 and x = 13, which both have
zero probability.
Such a plot is referred to as the probability distribution; it is
also called the probability
distribution function PDF. See more in Chapter 2 on PDFs and
examples. Consistently
with the rule in Eq. 1.3, the sum of all probabilities is equal
to unity. It is equivalent
to say that the surface between the PDF curve linking the different
points x, px and
the horizontal axis is unity. Indeed, this surface is given by s =
13 ? 1?px = 72 =
12?6362≡1.
The last example allows us to introduce a fundamental definition of
the probability
p xi in the general case where the events xi in the space S =
{x1, x2, . . . , xN } do not
have equal likelihood:
This general definition has been used in the three previous
examples. The single coin
tossing or single die casting are characterized by equiprobable
events, in which case the
PDF is said to be uniform. In the case of the two-dice roll, the
PDF is nonuniform with
a triangular shape, and peaks about the event x = 7, as we have
just seen.
Here we are reaching a subtle point in the notion of probability,
which is often
mistaken or misunderstood. The known fact that, in principle, a
flipped coin has equal
chances to fall on heads or tails provides no clue as to what the
outcome will be. We may
just observe the coin falling on tails several times in a row,
before it finally chooses to
fall on heads, as the reader can easily check try doing the
experiment!. Therefore, the
meaning of a probability is not the prediction of the outcome
event x being verified but
the measure of how likely such an event is. Therefore, it actually
takes quite a number
of trials to measure such likelihood: one trial is surely not
enough, and worse, several
trials could lead to the wrong measure. To sense the difference
between probability and
outcome better, and to get a notion of how many trials could be
required to approach a
good measure, let’s go through a realistic coin-tossing
experiment.
First, it is important to practice a little bit in order to know
how to flip the coin with a
good feeling of randomness the reader will find that such a
feeling is far from obvious!.
The experiment may proceed as follows: flip the coin then record
the result on a piece
of paper heads = H, tails = T, and make a pause once in a while
to enter the data in
a computer spreadsheet it being important for concentration and
expediency not to try
performing the two tasks altogether. The interest of the computer
spreadsheet is the
possibility of seeing the statistics plotted as the experiment
unfolds. This creates a real
sense of fun. Actually, the computer should plot the cumulative
count of heads and tails,
as well as the experimental PDF calculated at each step from Eq.
1.4, which for clarity
I reformulate as follows:
The plots of the author’s own experiment, by means of 700
successive trials, are shown
in Fig. 1.3. The first figure shows the cumulative counts for heads
and tails, while the
second figure shows plots of the corresponding experimental
probabilities pheads,
ptails as the number of trials increases. As expected, the counts
for heads and tails are
seemingly equal, at leastwhen considering large numbers. However,
the detail shows that
time and again, the counts significantly depart from each other,
meaning that there are
more heads than tails or the reverse. But eventually these
discrepancies seem to correct
Figure 1.3 Experimental determination of the probability
distribution of coin flipping, by means
of 700 successive trials: a cumulative count of head and tails
outcomes with inset showing
detail for the first 100 trials, b corresponding
probabilities.
themselves as the game progresses, as if the coin would “know” how
to come back to
the 50:50 odds rule. Strange isn’t it? The discrepancies between
counts are reflected
by the wide oscillations of the PDF Fig. 1.3b. But as the
experiment progresses,
the oscillations are damped to the point where pheads ≈ ptails
≈ 0.5, following an
asymptotic behavior.2
2 Performing this experiment and obtaining such results is not
straightforward. Different types of coins must
be tested first, some being easier to flip than others, because of
size or mass. Some coins seem not to lead