Contents
Preface.iv
Acknowledgments.vv
Notation.vfivi
CHAPTER 1 Introduction .1
1.1 What Machine Learning is About1
1.1.1 Classification.2
1.1.2 Regression3
1.2 Structure and a Road Map of the Book5
References8
CHAPTER 2 Probability and Stochastic Processes 9
2.1 Introduction.10
2.2 Probability and Random Variables.10
2.2.1Probability11
2.2.2Discrete Random Variables12
2.2.3Continuous Random Variables14
2.2.4Meanand Variance15
2.2.5Transformation of Random Variables.17
2.3 Examples of Distributions18
2.3.1Discrete Variables18
2.3.2Continuous Variables20
2.4 Stochastic Processes29
2.4.1First and Second Order Statistics.30
2.4.2Stationarity and Ergodicity30
2.4.3PowerSpectral Density33
2.4.4Autoregressive Models38
2.5 InformationTheory.41
2.5.1Discrete Random Variables42
2.5.2Continuous Random Variables45
2.6 Stochastic Convergence48
Problems49
References51
CHAPTER 3 Learning in Parametric Modeling: Basic Concepts and Directions 53
3.1 Introduction.53
3.2 Parameter Estimation: The Deterministic Point of View.54
3.3 Linear Regression.57
3.4 Classification60
3.5 Biased Versus Unbiased Estimation.64
3.5.1 Biased or Unbiased Estimation?65
3.6 The Cramér-Rao Lower Bound67
3.7 Suf?cient Statistic.70
3.8 Regularization.72
3.9 The Bias-Variance Dilemma.77
3.9.1 Mean-Square Error Estimation77
3.9.2 Bias-Variance Tradeoff78
3.10 MaximumLikelihoodMethod.82
3.10.1 Linear Regression: The Nonwhite Gaussian Noise Case84
3.11 Bayesian Inference84
3.11.1 The Maximum a Posteriori Probability Estimation Method.88
3.12 Curse of Dimensionality89
3.13 Validation.91
3.14 Expected and Empirical Loss Functions.93
3.15 Nonparametric Modeling and Estimation.95
Problems.97
References102
CHAPTER4Mean-quare Error Linear Estimation105
4.1Introduction.105
4.2Mean-Square Error Linear Estimation: The Normal Equations106
4.2.1The Cost Function Surface107
4.3A Geometric Viewpoint: Orthogonality Condition109
4.4Extensionto Complex-Valued Variables111
4.4.1Widely Linear Complex-Valued Estimation113
4.4.2Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus116
4.5Linear Filtering.118
4.6MSE Linear Filtering: A Frequency Domain Point of View120
4.7Some Typical Applications.124
4.7.1Interference Cancellation124
4.7.2System Identification125
4.7.3Deconvolution: Channel Equalization126
4.8Algorithmic Aspects: The Levinson and the Lattice-Ladder Algorithms132
4.8.1The Lattice-Ladder Scheme.137
4.9Mean-Square Error Estimation of Linear Models.140
4.9.1The Gauss-Markov Theorem143
4.9.2Constrained Linear Estimation:The Beamforming Case145
4.10Time-Varying Statistics: Kalman Filtering148
Problems.154
References158
CHAPTER 5 Stochastic Gradient Descent: The LMS Algorithm and its Family .161
5.1 Introduction.162
5.2 The Steepest Descent Method163
5.3 Application to the Mean-Square Error Cost Function167
5.3.1 The Complex-Valued Case175
5.4 Stochastic Approximation177
5.5 The Least-Mean-Squares Adaptive Algorithm179
5.5.1 Convergence and Steady-State Performanceof the LMS in Stationary Environments.181
5.5.2 Cumulative Loss Bounds186
5.6 The Affine Projection Algorithm.188
5.6.1 The Normalized LMS.193
5.7 The Complex-Valued Case.194
5.8 Relatives of the LMS.196
5.9 Simulation Examples.199
5.10 Adaptive Decision Feedback Equalization202
5.11 The Linearly Constrained LMS204
5.12 Tracking Performance of the LMS in Nonstationary Environments.206
5.13 Distributed Learning:The Distributed LMS208
5.13.1Cooperation Strategies.209
5.13.2The Diffusion LMS211
5.13.3 Convergence and Steady-State Performance: Some Highlights218
5.13.4 Consensus-Based Distributed Schemes.220
5.14 A Case Study:Target Localization222
5.15 Some Concluding Remarks: Consensus Matrix.223
Problems.224
References227
CHAPTER 6 The Least-Squares Family 233
6.1 Introduction.234
6.2 Least-Squares Linear Regression: A Geometric Perspective.234
6.3 Statistical Properties of the LS Estimator236
6.4
內容試閱:
PrefaceMachine Learning is a name that is gaining popularity as an umbrella for methods that have been studied and developed for many decades in different scientific communities and underdiffer entnames,such as Statistical Learning,Statistical Signal Processing, Pattern Recognition,Adaptive Signal Processing,Image Processing and Analysis,System Identification and Control,Data Mining and Information Retrieval,Computer Vision,and Computational Learning.The name“Machine Learning”indicates what all these disciplines have in common,that is,to learn from data,and thenmake predictions.What one tries to learn from data is their underlying structure an dregularities, via the development of a model,which can then be used to provide predictions.To this end,anumber of diverse approaches have been developed,ranging from optimization of cost functions,whose goal is to optimize the deviation between what one observes from data and what them odelpredicts,to probabilistic models that attempt to model the statistical properties of the observed data.The goal of this book is to approach the machine learning discipline in a unifying context, by presenting the major paths and approaches that have been followed over the years, without giving preference to a specific one. It is the author’s belief that all of them are valuable to the newcomer who wants to learn the secrets of this topic, from the applications as well as from the pedagogic point of view.As the title of the book indicates,the emphasis is on the processing and analysis front of machine learning and not on topics concerning the theory of learning itself and related performance bounds.In other words,the focusis on methods and algorithms closer to the application level.The book is the outgrowth of more than three decades of the author’s experience on research and teaching various related courses.The book is written in such a way that individualorpairsofchapters are as self-contained as possible. So,one can select and combine chapters according to the focus heshe wants to give to the course heshe teaches,or to the topics heshe wants to grasp in a first reading.Some guidelines on how one can use the book for different courses are provided in the introductory chapter.Each chapter grows by starting from the basics and evolving to embrace the more recent advances. Some of the topics had to be split into two chapters,such as sparsity-aware learning, Bayesian learning,probabilistic graphical models, and Monte Carlo methods.The book addresses the needs of advanced graduate, postgraduate,and research students as well as of practicing scientists and engineers whose interests lie beyond black-box solutions. Also,the book can serve the needs of short courses on specific topics,e.g.,sparse modeling, Bayesian learning, robabilistic graphical models,neural networks and deep learning.Most of the chapters include Matlab exercises,and the related code is available from the book’s website. The solutions manual as well as PowerPointlectures are also available from the book’s website.AcknowledgmentsWritingabookisaneffortontopofeverythingelsethatmustkeeprunninginparallel.Thus,writingisbasicallyanearlymorning,afterfive,andovertheweekendsandholidaysactivity.Itisabigeffortthatrequiresdedicationandpersistence.Thiswouldnotbepossiblewithoutthesupportofanumberofpeople—peoplewhohelpedinthesimulations,inthemakingofthefigures,inreadingchapters,andindiscussingvariousissuesconcerningallaspects,fromproofstothestructureandthelayoutofthebook.First,Iwouldliketoexpressmygratitudetomymentor,friend,andcolleagueNicholasKalouptsidis,forthislong-lastingandfruitfulcollaboration.ThecooperationwithKostasSlavakisoverthelastsixyearshasbeenamajorsourceofinspirationandlearningandhasplayedadecisiveroleformeinwritingthisbook.Iamindebtedtothemembersofmygroup,andinparticulartoYannisKopsinis,PantelisBouboulis,SimosChouvardas,KostasThemelis,GeorgePapageorgiou,andCharisGeorgiou.Theywerebesidemethewholetime,especiallyduringthedifficult