Implementing Gradient Descent Algorithm in Python, bit confused regarding equations

I'm following the guide as outlined at this link: http://neuralnetworksanddeeplearning.com/chap2.html

For the purposes of this question, I've written a basic network 2 hidden layers, one with 2 neurons and one with one neuron. For a very basic task, the network will learn how to compute an OR logic gate so the training data will be:

X = [[0, 0], [0, 1], [1, 0], [1, 1]]Y = [0, 1, 1, 1]

And the diagram:

For this example, the weights and biases are:

w = [[0.3, 0.4], [0.1]]b = [[1, 1], [1]]

The feedforward part was pretty easy to implement so I don't think I need to post that here. The tutorial I've been following summarises calculating the errors and the gradient descent algorithm with the following equations:

For each training example $x$, compute the output error $\delta^{x, L}$ where $L =$ Final layer (Layer 1 in this case). $\delta^{x, L} = \nabla_aC_x \circ \sigma'(z^{x, L})$ where $\nabla_aC_x$ is the differential of the cost function (basic MSE) with respect to the Layer 1 activation output, and $\sigma'(z^{x, L})$ is the derivative of the sigmoid function of the Layer 1 output i.e. $\sigma(z^{x, L})(1-\sigma(z^{x, L}))$.

That's all good so far and I can calculate that quite straightforwardly. Now for $l = L-1, L-2, ...$, the error for each previous layer can be calculated as

$\delta^{x, l} = ((w^{l+1})^T \delta^{x, l+1}) \circ \sigma(z^{x, l})$

Which again, is pretty straight forward to implement.

Finally, to update the weights (and bias), the equations are for $l = L, L-1, ...$:

$w^l \rightarrow w^l - \frac{\eta}{m}\sum_x\delta^{x,l}(a^{x, l-1})^T$

$b^l \rightarrow b^l - \frac{\eta}{m}\sum_x\delta^{x,l}$

What I don't understand is how this works with vectors of different numbers of elements (I think the lack of vector notation here confuses me).

For example, Layer 1 has one neuron, so $\delta^{x, 1}$ will be a scalar value since it only outputs one value. However, $a^{x, 0}$ is a vector with two elements since layer 0 has two neurons. Which means that $\delta^{x, l}(a^{x, l-1})^T$ will be a vector even if I sum over all training samples $x$. What am I supposed to do here? Am I just supposed to sum the components of the vector as well?

Hopefully my question makes sense; I feel I'm very close to implementing this entirely and I'm just stuck here.

Thank you

[edit] Okay, so I realised that I've been misrepresenting the weights of the neurons and have corrected for that.

weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]

Which has the output

[array([[0.27660583, 1.00106314],   [0.34017727, 0.74990392]])array([[ 1.095244  , -0.22719165]])

Which means that layer0 has a weight matrix with shape 2x2 representing the 2 weights on neuron01 and the 2 weights on neuron02.

My understanding then is that $\delta^{x,l}$ has the same shape as the weights array because each weight gets updated indepedently. That's also fine.

But the bias term (according to the link I sourced) has 1 term for each neuron, which means layer 0 will has two bias terms (b00 and b01) and layer 1 has one bias term (b10).

However, to calculate the update for the bias terms, you sum the deltas over x i.e $\sum_x \delta^{x, l}$; if delta has the size of the weight matrix, then there are too many terms to update the bias terms. What have I missed here?

Many thanks

Implementing Gradient Descent Algorithm in Python, bit confused regarding equations

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112