Note: all the programming tasks must be finished with Jupyter notebook

Tensorflow can compute the gradient automatically. The following code snippet shows an example of computing the gradient for mean square loss.

```
import tensorflow as tf
label = tf.constant(1.0, dtype=tf.float32)
x = tf.placeholder(tf.float32)
loss_mse = tf.losses.mean_squared_error(label, x)
gradient_mse = tf.gradients(loss_mse, x)
with tf.Session() as sess:
ci, gi = sess.run((loss_mse, gradient_mse), feed_dict={x: 1.0})
print 'mse, loss, grad = ', ci, gi
```

Please extend the above code to compute:

- The gradient of hinge loss when label = 1.0, x = 1.001
- The gradient of hinge loss when label = 1.0, x = 0.009
- Plot the curves of gradients and losses for x in [-2, 2] (hint: use
`%matplotlib inline`

in Jypiter)

Logistic regression and multi-layer perceptrons (MLPs) are two basic models for classification tasks. Try to use these two models to learn from the following xor dataset:

```
import numpy
xs = np.array([[-1.1, 1.0], [-1.0, 1.1], [-1.1, 1.1], [1.0, -1.1],[1.1, -1.0],[1.0, -1.0],
[1.1, 1.1],[1.0, 0.9],[1.1, 1.0], [-1.1, -1.0], [-1.1, -1.1], [-1.0, -1.1]],
dtype=np.float32)
ys = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0], dtype=np.float32)
ys = ys[:, None]
```

Train and Evaluate your models using `xs`

(samples) and `ys`

(training labels). You are allowed to use
Keras. Compare the performance of your two models.

The MNIST dataset of handwritten digits is a very popular dataset to test the algorithms and ideas of machine learning. To train MNIST data, the following procedures are adopted:

- Reshape the digit pictures ( each with 28x28 pixels) to vectors of 784
- Change the type of xs to float32
- Every pixel is from 0 to 255. Renormalize it to 0 and 1
- Reshape the label vectors ys if necessary

The original MNIST data has 10 categories. Our new task is to take only two categories: digit 4 and digit 8 and train a classifier. You are suggested to compare two models:

- One hidden layer MLP with cross entropy loss
- One hidden layer MLP with hinge loss
- (bonus) MLP with two and three hidden layers

*hint: you may refer to Keras example*.