Advanced topics in machine learning (CK0255)
Probability and random variables (TIP8421)

The course overviews basic facts of probability and distribution theory, the mechanisms of basic inference, large-sample theory on convergence in probability and convergence in distribution, the central limit theorem, and complete inference based on maximum likelihood theory.
  1. Introductory refresher: Probability theory and distribution theory;
  2. Multivariate distributions: Distributions, expectations, transformations, correlation and dependence;
  3. Common distributions: The binomial distribution and friends, the Poisson distribution, the gamma distribution and friends, the normal distribution, $t$ and $F$ distributions, mixtures;
  4. Elementary inference: Sampling and statistics, confidence intervals, hypothesis testing;
  5. Consistency and limiting distributions: Generalities about convergence in probability and distribution, the central limit theorem;
  6. Maximum likelihood methods: Estimation and testing, Rao-Cramér lower bound, tests, expectation-maximization.
If time allows, selected topics related to sufficiency, completeness, uniqueness and independence will be discussed, too.

Instructor : Francesco Corona (FC), francesco döt corona ät ufc döt br

Physical location : Wednesdays and Fridays 14:00-16:00, Bloco 951, Sala 10.
Internet location : Here! Or, here (CK0255/TIP8421) for mambojumbo related to administration.

Evaluation : Approximately half a dozen theoretical and practical problem sets will be assigned as homework: Home assignments are for training but are not mandatory, they can be handed-in but they will not be evaluated. The actual evaluation will be based on three or four partial evaluations (APs) in class (weight 70%) and a final project (weight 30%). If needed a final evaluation (AF) will be arranged.

Grading :
- AP01 (SEP 29/OCT 06): Exam sheet (SEP 29)/Exam sheet (OCT 06) and grades (updated, OCT 11)
- AP02 (NOV 24): Exam sheet and grades

Go to:   Lectures and schedule | Problem sets | Supplementary material | As it pops out |


>>>>> Final project ++ By DEC 12, 2017 <<<<< Read me! >>>>>> Avaliação Institucional 2017.2 <<<<<<

>>>>> TCC Positions - 1 or 2 positions @DC/CC <<<<<

Lectures and schedule

  1. About this course

    A About this course (FC)
    • About this course

  2. Probability theory and distribution theory

    A Probability theory (FC)
    • Slides (AUG 30, SEP 01 and SEP 06 | Last update SEP 06)
    • Exercises (Last update SEP 04, fixed errors in A.4 and A.5)
    • Set theory
    • Probability set functions
    • Conditional probability and independence
    B Random variables (FC)
    • Probability mass functions, probability density functions, distribution functions
    • Discrete random variables (PMF and transformations)
    • Continuous random variables (PDF and transformations)
    C Expectations (FC)
    • Expectations, moment generating functions
    D Inequalities (FC)
    • Slides (SEP 20 and SEP 22, postponed)
    • Markov's inequality
    • Chebyshev's inequality
    • Jensen's inequality

  3. Multivariate distributions

    A Two random variables (FC)
    • Joint CDF, joint PMF, joint PDF
    • Expectation, MGF
    B Transformations (FC)
    • Expectations, moment generating functions
    C Conditional distributions and expectations (FC)
    • Conditional PMF, conditional PDF, conditional expectations
    D Correlation coefficient (FC)
    E Independent random variables (FC)
    F Several random variables (FC)
    • Joint conditional PMF/PDF
    • Mutual independence
    • Variance-covariance matrices
    • Transformations
    • Linear combinations

  4. Some named distributions

    A The binomial distribution and the Poisson distribution (FC)
    • Slides (Last update NOV 01, fixed some errors)
    • The binomial distribution
    • The Poisson distribution
    B The gamma distribution and friends (FC)
    • The gamma distribution
    • The exponential distribution
    • The chi-square distribution
    C The normal distribution (FC)
    • Slides (NOV 3, NOV 8 and NOV10 (cancelled, Encontros Universitários) and NOV 17 | Last updated, NOV 17 )
    • The normal distribution
    • The multivariate normal distribution
    D $t$ and $F$ distributions (FC)
    • The $t$-distribution
    • The $F$-distribution
    • The Student's theorem
    E Mixture distributions (FC)

  5. Elementary inference

    A Sampling and statistics (FC)
    • Slides (NOV 29, DEC 01, DEC 06 and DEC 08)
    • Random samples, statistics, estimators
    • Histogram estimates of PMF and PDF
    B Confidence intervals and order statistics (FC)
    • Definitions, difference in means, difference in proportions
    • CIs for the parametres of discrete distributions
    • Quantiles, confidence intervals for quantiles
    C Hypothesis testing (FC)
    • Hypothesis, tests, error types, significance levels
    • Critics

  6. Consistency and limiting distributions

    A Convergence in probability and convergence in distribution (FC)
    • Bounds in probability
    • The $\Delta$-method
    • The MGF technique

    • The central limit theorem

  7. Maximum likelihood methods

    A Maximum likelihood estimation (FC)
    • The Rao-Cramér inequality
    • Fisher information
    • Score function
    • Efficiency

    • Rao-Cramer lower bound
    • Maximum likelihood tests
    B Expectation maximization (FC)


Problem sets

As we use problem set questions covered by books, papers and webpages, we expect you not to copy, refer to, or look at the solutions in preparing your answers. We expect you to want to learn and not google for answers: If you do happen to use other material, it must be acknowledged clearly with a citation on the submitted solution.

The purpose of problem sets is to help you think about the material, not just give us the right answers.

Homeworks must be done individually: Each of you must hand in his/her own answers. In addition, each of you must write his/her own code when requested. It is acceptable, however, for you to collaborate in figuring out answers. We are assuming that you take the responsibility to make sure you personally understand the solution to any work arising from collaboration (though, you must indicate on each homework with whom you collaborated).

To typeset assignments, students are encouraged to use this LaTeX template: Source (PDF).

Assignments must be returned via SIGAA.



Course slides will suffice. Slides are mostly based on the following textbook: The material can be complemented using material from the following textbooks (list not exhaustive):
  1. Theory of point-estimation (2nd edition), by Erich L. Lehmann and George Casella;
  2. Testing statistical hypothesis (3rd edition), by Erich L. Lehmann and Joseph P. Romano;
  3. Elements of large-sample theory, by Erich L. Lehmann.
Copies of these books are floating around.

>>>>>> Course material is prone to a typo or two - Please inbox FC to report <<<<<<


Read me or watch me