Frequently Asked Questions
This page will be updated regularly with clarifications for the problem set and other commonly asked questions.
Programming Assignment 3
- The CONDITIONAL_SENSORS structure: Our observation model returns a value D in the range 1-10. In theory it could/should be continuous, but for the sake of this assignment we have quantized the value to 10 bins. CONDITIONAL_SENSORS{d} is a factor over the robot location X and landmark location L that represents P(D = d | X, L). It is merely there for quick access. This should be the same factor as you would get from selecting D = d from the full SENSORS factor P(D | X, L).
- In our forward inference implementation, we represent our state belief as marginal probabilities over robot location and mine locations. Due to DBN entanglement, these two sets of variables are actually not independent. Clearly, this representation is not exact, but it helps us represent the state more efficiently. Your job is first to compute the joint probability of the robot and landmarks locations given the new evidence and then compute the marginals for the individual variables. In particular, when accounting for observations (eq. 18.4 in the reader),
will obey:
. - In UpdateUnrolledRoxedDBN.m, you dont' need to set MESSAGES to anything (just return them as they are.)
- In UnrollRoxerDBN.m, you need to properly handle the initial robot location and landmark beliefs (BELIEF_0 and LANDMARK_BELIEFS_0). For mine
in LANDMARK_BELIEFS_0, you have a choice over which cluster to associate the initial belief with. In order for your test to pass, you need to associate it with the cluster over
. - In written question 4, you should assume that we run exact Forward Inference and exact unrolled DBN inference.
- It is ok to submit either a README plain text or a README.pdf file. Please do not submit raw latex code.
- For written question 5, we are looking for a very brief justification of which algorithm you think is applicable to each scenario. The goal is to think about the speed/accuracy tradeoffs of the different algorithms.
Problem Set 4
- Question 1(a): In this question the path in the unrolled DBN is also directed. That is, the question should read "… for every t, there exists a directed path…"
- Question 1(c): In addition to the definition of a 2-TBN in the reader (definition 18.1.4), we also require that the unrolled DBN be a single connected component — there must be at least one (undirected) path between each pair of nodes.
- Question 2(b): For part (i) we are looking for a compact representation for
. You do not need to show that the computation is tractable until part (ii). - Question 2(c): When we ask for the conditions for convergence, the question should read: "What are the necessary conditions for this algorithm to converge to the correct map?"
- Question 4. The description of the data set D is not stated fully correctly. It should read "… and a data set D where each training sample is a full assignment to the M copies of the variables in X, the N copies of the variables in Y, and the M*N copies of the variables in Z.
- Question 4. All parents of variables in
must lie in Plate 1 and all parents of variables in
must lie in Plate 2. Each variable in
must either have parents in in both
and
, have parents in
, or both these must be true. For example, in Figure 19.4a, there might be a variable Regret for whether the student regrets having taken the class; this variable would be in
, and its only parent might be Grade. Or, Grade might have multiple parents in
and
(such as WorkEthic of the student and/or Mercy of the course's profess). - Question 4. Hint: When providing estimates of
, you should use terms of the form
, where
is the indicator function, as in the book. - Question 4(a). It is not enough to simply use the term "counts" or
when defining sufficient statistics, because the definition in the book does not precisely correspond to the this particular setting. You may need to redefine
to deal with plate models. - Question 4(c). All copies of all instances of H are hidden.
- Question 5. The formula for the derivative of the log-likelihood in a standard Markov Network should be
and should not include the summation.
Problem Set 3
- Problem 2. There is a theorem required for the solution of this problem that is *stronger* than the one stated in the book (Thm 15.3.4). First, the definition of a "covered edge": an edge between X and Y is covered if X and Y have the same parents (or both have no parents), except that one is the parent of the other. The stronger theorem that is required for Problem 2 is the same as Thm 15.3.4, with "single edge reversal" replaced by "single covered edge reversal". So the theorem says that there is a sequence from G to G' of I-equivalent networks that differ only by the reversal of a covered edge.
Programming Assignment 2
- LearnTreeNetwork.m. The function BuildMaxSpanningTree will return a correctly directed (single parent) network; this is an artifact of Prim's algorithm. Feel free to look through the code if you're curious.
- When ExpandDataForNaiveBayes finishes, the cell array returned, EDF, has a set of matrices, each of which represents the probabilities of the assignment to a specific variable each data instance has. More specifically, consider EDF{i}. The
row of this matrix represents the probability of the assignment to the
variable that
instance has, given each of the assignments to
. That is, supposing that
takes on value
in the
data instance EDF{i}(j, k) is
. The entry for the class variable, supposing it is at index 11, is EDF{11}(j, k) = 
Problem Set 2
- Problem 1. A "clique separator" is the same thing as a sepset, defined in Def. 9.1.1.
- Problem 1b. For the subsets of X and Y that are in S, separation trivially holds by the definition of separation and active trails. So if this aspect is confusing you, you can simply state this fact and focus on proving sep(X-S;Y-S|S).
- Problem 2. In the figure, the clique should be "C2", not "C1"
- Problem 2. The factored messages are approximate, not exact, in the sense that the factored message
is an approximation of the exact message
that results from standard message-passing. (And the same is true for all factored message.) However, you should not introduce further approximations beyond this approximation that we have provided you with. - Problem 2a. You must show that the new method's computation is asymptotically less than the computation of standard message-passing. That is, if you write the total computation in big-O notation, you should show that the new method's is less than the old method's.
- Problem 2a. Clarification: In your analysis, you should show that the new method's computation is less than the computation for standard message passing even if you augment standard message-passing by pushing in summations. So the complexity (total computation in big-O notation) of your solution must be less than the complexity of standard message-passing that is augmented by pushing in summations. In other words, you should not simply show that pushing in summations gives you less computation than not pushing in summations; you should show that by using factored messages, you are able to push in summations in such a way that is asymptotically more efficient than you could achieve without using factored message.
- Problem 2a. Hint: You should show how to compute the factored message
by showing how to compute
and
separately. You should show that this total computation is more efficient than computing
as in standard message passing. - Problem 2c. Your answer should be a simple characterization that encompasses all situations (in which using factored message does not lead to better efficiency).
- Problem 3. "Figure 9.11" refers to "Figure 1" in the additional handout.
- Problem 5. Here's an explanation of the changes made to book's section on Distributional Gibbs, as posted on the website: "The Bayes rule is not incorrect by itself, but it should have conditioned on e and not e_d. Moreover, it doesn't actually add value, because you wouldn't necessarily solve the RHS of 10.24 using Bayes rule. You would probably just run inference to get the marginal over X_i given u_i and e. So, that's why I took it out."
Problem Set 1
- Problem 1. You are allowed (and supposed) to use the other properties and definitions from Chapter 2, such as Decomposition.
- Problem 2b. Correction: The resulting graph BN' must be a minimal I-map, as in part (a).
- Problem 2b. The following algorithm, or any variation thereof, will not be accepted: (1) Make a fully connected graph. (2) Remove edges until it is a minimal I-map for the distribution. Your algorithm must be more efficient.
- Problem 2b. Hint: Your algorithm should be local, in the sense that it only touches nodes "near" the variable that is being removed.
- Problem 2b. You do not need to prove the correctness of your algorithm.
- Problem 3. For this problem you need only state the number of I-equivalent networks and briefly explain how you arrived at this number. No formal proof is required.
Programming Assignment 1
- Update: Two small updates have been made to the files in /afs/ir.stanford.edu/class/cs228/ProgrammingAssignments/PA1/. Neither update is necessary in order to correctly solve and get full points on the assignment; however, if you've already copied everything over, you may want to get the following updated files: ViewGraph.m, insurance.mat, tests1.mat. The updates were:
- ViewGraph.m had an incorrect pointer to the application (called 'dot') that creates the visualizations of the clique tree and cluster graph we provide you with. This isn't related to the actual code you need to write for the assignment, and if you want to simply view the visualizations that it creates, they've been posted at http://www.stanford.edu/class/cs228/assignments.html.
- The Clique Tree structure (insurance_clique_tree in insurance.mat) that we provide you with for performing inference on the INSURANCE network has been updated. There were some cliques in 'insurance_clique_tree' that weren't maximal (the scope of one clique was a subset of the scope of another clique) so the clique tree was larger and more inefficient than necessary. These cliques have now been removed. Note that this does not affect any of the code you need to write, nor does it affect the results of the tests — all tests in TestPA1.m should pass with the correct implementation regardless of whether you get the updated insurance.mat and tests1.mat (but don't get one without the other).
- In case you're curious, the method I used to construct the clique tree from the original network was variable elimination (as discussed in 9.4) using the Min-Neighbors criteria (8.3.3.2). But I forgot to apply the pruning method described in Theorem 9.4.1, which I've now done. Incidentally, the output from ExactInference.m should be the same using either the old or new clique tree. Why? Because the final marginals of each variable from exact inference do not depend on the structure of the clique tree as long as it's valid.
- The "for" loop in lines 44-48 of CliqueTreeCalibrate are unnecessary and potentially confusing, although they have no effect on the eventual results. You may comment them out or remove them if you wish.
- Matlab note: It seems that if you are working with Matlab version 6.5 or lower, then you may encounter a problem loading the .mat files. We're seeing if we can correct this, but if you encounter this problem, try working from a machine that has version 7 (for example, the myth.stanford.edu machines).
- Updated: We've saved versions of the .mat files that should be readable in Matlab 6.*. They are named insurance_v6.mat and tests1_v6.mat in the directory /afs/…/assignment/.
- Your task for ExactInference.m and ApproximateInference.m is correctly described in the handout, but unfortunately not exactly clear in the comments embedded in the code next to "YOUR CODE HERE". Both functions should in fact return the normalized marginal probabilities over each variable. The comments have been updates in the code in the assignment directory, but of course there's no need to copy this over again.
- We recently realized that there may be some specific (correct) implementations for which ApproxInference.m returns results that are not quite within TOLERANCE of the "correct" results that we've provided in tests1.mat, potentially causing some of the tests to fail incorrectly. Therefore, you may set TOLERANCE = 1e-2 immediately before the tests of ApproxInference.m in the file TestPA1.m if you are experiencing these failures, and we will do the same when grading your submissions. (Incidentally, this should give you some sense of how Approximate Inference may not always converge to the same solution.) Also, this isn't quite necessary, but you may also set TOLERANCE = 1e-5 before the tests of ExactInference.m. We've updated the code in the assignments folder in case you want copy it over again, but you can simply insert these assignments to TOLERANCE instead.
- In the cluster graph we provided you (insurance_cluster_graph), we didn't explicitly specify the sepsets. Since we used the Bethe Approximation (11.3.4.2 in the book) to create the cluster graph, the sepsets are simply the intersection of the scopes of the two cliques that it separates (which is always a single variable in this case).
- Question (f): You should provide the maximum difference between the marginal probability of a single variable that comes out of exact inference and the same for approximate inference. So the values you compare should simply be the results you get out of ExactInference (on the provided clique tree) and ApproxInference (on the provided cluster graph).
Grading
- A clarification about how we determine the final grading curve. We set the median of the class to be about a B+. We then look for natural breakpoints in the final score distribution, so that small differences in the final grade do not cause a change in letter grade. Although we cannot commit right now to specific grade cutoffs, the A grades are usually at final scores of around 80+, the B grades around 60-80, and the C grades at around 50-60. However, note that this is not a commitment, and we will not accept complaints if the curve ends up different this year. For those people taking the course P/NC, university regulations define a P as corresponding to a final grade of C- or above.
Reader
- For students who purchased the reader last year: There are some differences between last year's reader and this year's. So far, the differences have been fairly local, and haven't come up in the material covered in class. As changes become important, we will put the relevant sections up on the webpage, in the restricted directory. Later on in the course, entire chapters may be involved. If that happens, we may need to print out the chapters, and students can purchase the handouts at cost.
- Additionally, for a better mapping to past years' course readers, here is a list of the chapter titles for THIS year's reader:
- 1. Introduction
- 2. Foundations
- 3. The Bayesian Network Representation
- 4. Local Probabilistic Models
- 5. Undirected Graphical Models
- 7. Inference with Graphical Models: Overview
- 8. Exact Inference: Variable Elimination
- 9. Exact Inference: Clique Trees
- 10. Particle-Based Approximate Inference
- 11. Global Approximate Inference: Inference as Optimization
- 13. Learning Graphical Models: Overview
- 14. Parameter Estimation
- 15. Structure Learning in Bayesian Networks
- 16. Partially Observed Data
- 18. Temporal Processes
- 20. Causality
- 21. Utilities and Decisions
- 22. Structured Decision Problems
page_revision: 62, last_edited: 1196153099|%e %b %Y, %H:%M %Z (%O ago)