Preface........................................iv
Acknowledgments........................................vi
About the Author........................................viii
Notation........................................ix
CHAPTER1 Introduction........................................1
1.1 The Historical Context........................................1
1.2 Artificia Intelligenceand Machine Learning..........................2
1.3 Algorithms Can Learn WhatIs Hidden in the Data......................4
1.4 Typical Applications of Machine Learning............................6
Speech Recognition......................................6
Computer Vision........................................6
Multimodal Data........................................6
Natural Language Processing...............................7
Robotics........................................7
Autonomous Cars.......................................7
Challenges for the Future..................................8
1.5 Machine Learning: Major Directions................................8
1.5.1 Supervised Learning.....................................8
1.6 Unsupervised and Semisupervised Learning...........................11
1.7 Structure and a Road Map of the Book...............................12
References........................................16
CHAPTER2 Probability and Stochastic Processes.............................19
2.1 Introduction........................................20
2.2 Probability and Random Variables..................................20
2.2.1 Probability........................................20
2.2.2 Discrete Random Variables................................22
2.2.3 Continuous Random Variables..............................24
2.2.4 Meanand Variance.......................................25
2.2.5 Transformation of Random Variables.........................28
2.3 Examples of Distributions........................................29
2.3.1 Discrete Variables.......................................29
2.3.2 Continuous Variables.....................................32
2.4 Stochastic Processes........................................41
2.4.1 First-and Second-Order Statistics...........................42
2.4.2 Stationarity and Ergodicity.................................43
2.4.3 Power Spectral Density...................................46
2.4.4 Autoregressive Models....................................51
2.5 Information Theory........................................54
2.5.1 Discrete Random Variables................................56
2.5.2 Continuous Random Variables..............................59
2.6 Stochastic Convergence........................................61
Convergence Everywhere..................................62
Convergence Almost Everywhere............................62
Convergence in the Mean-Square Sense.......................62
Convergence in Probability................................63
Convergence in Distribution................................63
Problems........................................63
References........................................65
CHAPTER3 Learning in Parametric Modeling: Basic Concepts and Directions.........67
3.1 Introduction........................................67
3.2 Parameter Estimation: the Deterministic Point of View...................68
3.3 Linear Regression........................................71
3.4Classifcation........................................75
Generative Versus Discriminative Learning....................78
3.5 Biased Versus Unbiased Estimation.................................80
3.5.1 Biased or Unbiased Estimation.............................81
3.6 The Cram閞朢ao Lower Bound....................................83
3.7 Suffcient Statistic........................................87
3.8 Regularization........................................89
Inverse Problems:Ill-Conditioning and Overfittin...............91
3.9 The Bias朧ariance Dilemma......................................93
3.9.1 Mean-Square Error Estimation..............................94
3.9.2 Bias朧ariance Tradeoff...................................95
3.10 Maximum Likelihood Method.....................................98
3.10.1 Linear Regression: the Nonwhite Gaussian Noise Case............101
3.11 Bayesian Inference........................................102
3.11.1 The Maximum a Posteriori Probability Estimation Method.........107
3.12 Curse of Dimensionality........................................108
3.13 Validation........................................109
Cross-Validation........................................111
3.14 Expected Loss and Empirical Risk Functions..........................112
Learnability........................................113
3.15 Nonparametric Modeling and Estimation.............................114
Problems........................................114
MATLABExercises....................................119
References........................................119
CHAPTER4 Mean-Square Error Linear Estimation.............................121
4.1 Introduction........................................121
4.2 Mean-Square Error Linear Estimation: the Normal Equations..............122
4.2.1 The Cost Function Surface.................................123
4.3 A Geometric Viewpoint: Orthogonality Condition......................124
4.4 Extension to Complex-Valued Variables..............................127
4.4.1 Widely Linear Complex-Valued Estimation....................129
4.4.2 Optimizing With Respect to Complex-Valued Variables: Wirtinger Calculus...........................132
4.5 Linear Filtering........................................134
4.6 MSE Linear Filtering: a Frequency Domain Point of View..........