3. Probability and Information Theory

말뱅이 2022. 9. 3. 20:16

- Source of uncertainiy

1. Inherent stochasticity in the system bening modeled

2. Imcomplete observability

3. Incomplete modeling (model that must discard some of the information we have observed)


- Frequentist probability :related directly to the rates at which events occur


- Bayesian probability : related to qualitative levels of certainty (degree of belief)


- Probability mass function (PMF) : Probability distribution over discrete variables, denote as P

x 변수일 확률 : x ~ P(x) , P(x=x)


- Joint probability distrubition : many variables at simulataneously, P(x=x, y=x)


- Probability density function (PDF) : Continuous variables and probability density function, p(x)


- Marginal Probability : Probability distribution over the subset


- Conditional Probabiltiy : 

.Chain rule of conditional probabilities 


- Independence : notation, xy

- Conditional independent : notation, xy | z  (z : given random variable)


- Expectation, Expected value :

- variance, standard deviation (square root of the variance) :

- Covariance : How much two values are linearly related to each other

.independent two variable have zero convariance

* independent exclude nonlinear relationship (depedent and zero covariance is possible)

  Correlation : normalize the contribution of each variable


- Bernoulli Distribution: single binary random variable, parameterized, φ [0,1]


- Multinoulli (Categorical) Distribution: single discrete variable with k different states (k is finite),

parametrized by a vector p [0,1] k1


- Gaussian Distribution (normal distribution):

*precision : square and inverse variance

.The reason why normal distribution is a good default choice

1. many independent random variables is approximately normally distributed

2.out of all possible probability distributions with the same variance,

the normal distribution encodes the maximum amount of uncertainty over the real numbers


- Exponential and Laplace distribution : sharp peak probability mass at arbitary point u


- Dirac distribution and empirical distribution: zero valued everywhere except 0 but integrates to 1

 Dirac delta distribution is used as empirical distribution

- Mixtures of distributions: A mixture distribution is made up of several component distributions

P(c) : multinoulli distribution over component identities

Gaussian mixture model :

.p(x|c=i) are Gaussians

.each components has a separately parametrized mean and covariance

.universal approximator : any smooth density can be approximated with any specific nonzero amount of error by a   Gaussian mixture model with enough components

.parameter of gaussain mixture(mean,covariance) specify the prior probability P(c=i)

prior probability: P(c), model's beliefs about c before it has observed by x

posterior probability : P(c|x), computed after observation of x





- logistic sigmoid :

- Bayes' rule: P(x|y) 를 알고 싶을 때, P(y|x) 와 P(x) 를 알면 구할 수 있음


- Information Theory : 

Basic Intuition

1. 자주 발생하는 사건 정보가 거의 없다. (ex: 내일은 해가 뜬다)

2. 잘 발생하지 않는 사건은 정보가 많다. (ex: 내일 해가 뜰 때 태양풍이 일어난다)

3. 독립적인 사건은 추가적인 정보를 가진다.


- Shannon entroy: 특정 분포에서의 정보(불확실성)의 양의 총 합. 분포를 encoding 하는데 필요한 최소의한 bits 수

.분포가 deterministic 할 수록 entropy 가 낮음, 분포가 Unifrom 할 수록 entropy 가 높음

- Kullback-Leibler (KL) divergence: 같은 random variable x 에 대한 두 개의 분포 P(x), Q(x) 의 다른 정도를 계산

.Asymmetry 함

.Cross entory : Q에 대한 Cross entory 의 최소화는 KL divergence 의 최소화와 같음


- Structured probabilistic models (graphical model) :  factorization of a probability distribution with a graph


*factorization probability distribution :  greatly reduce the number of parameters needed to describe distribution

*graph : a set of vertices that be connected to each other with edge