Commit a6b9bde1 authored by Stenli Karanxha's avatar Stenli Karanxha
Browse files

Completed the linear regression and fixed the links.

parent e5c3adec
...@@ -3,10 +3,11 @@ ...@@ -3,10 +3,11 @@
<p>The target of the project was to solve a classification problem, <p>The target of the project was to solve a classification problem,
identifying which of 87 species of birds and anfibians are present in a identifying which of 87 species of birds and anfibians are present in a
list of continuous wild song recording. list of continuous wild song recording.
The problem is presented as a Kaggle competition. For more information the website is: The problem is presented as a Kaggle competition. For more information see:
https://www.kaggle.com/c/multilabel-bird-species-classification-nips2013. <a href="https://www.kaggle.com/c/multilabel-bird-species-classification-nips2013"> bird classification</a>.
</p> </p>
<p>The main characteristics of the problem are:</p> <p>The main characteristics of the problem are:</p>
<ol> <ol>
<li> The training is simplified from the fact that each of the training samples contains a single call. </li> <li> The training is simplified from the fact that each of the training samples contains a single call. </li>
...@@ -74,7 +75,82 @@ as valid only if it has a better performance than that. </p> ...@@ -74,7 +75,82 @@ as valid only if it has a better performance than that. </p>
</ol> </ol>
<h1> 3. Linear regression </h1> <h1>3. Linear regression</h1>
<h2>1. Basics</h2>
<p>Linear regression is a commonly used predictive model in machine learning.
In our case, the linear regression machine learning algorithm is needed to do the multi-class classification.
For this purpose, I created the class 'LinearRegression' containing the later explained functions,
as well as some required utility functions.</p>
<p>The following modules are needed in the code:</p>
<UL><LI>numpy - for diverse operations on arrays
<LI>scipy.optimize - to calculate the optimal value of the parameters
<LI>mathutils - because of the sigmoid function
<LI>Sample from sample - to get the samples to train the algorithm</UL><br>
<h2>2. Implementation</h2>
<p>The 'LinearRegression' class contains the following functions:</p>
<UL><DL><LI><DT><STRONG>the train function</STRONG></p>
<DD>This one trains the algorithm by analysing the list of samples from Sample.</p>
<p>It uses the '_get_flat_biased_data' function to assign all the training samples to a Matrix X,
in which every row contains a sample and the amount of columns is defined by the number of classes.
Thus, the matrix X contains all the training samples. </p>
<p>Furthermore it assigns a matrix with the expected outputs to y by calling the
'get_classification()' function from Sample. Here again, each row of the matrix y contains an
output and the amount of columns is defined by the number of classes.</p>
<p>After minimizing the cost function, the train function assigns the optimal
value of the parameters to self.parameters. This value is obtained through the BFGS optimization
function from the
<a href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html">scipy fmin</a> module.
</p>
<p>Last, the train function calls the cost and the gradient function.</p>
<br>
<LI><DT><STRONG>the evaluate function</STRONG></p>
<DD>Using the already trained algorithm, the evaluate function does the evaluation of the test samples. In other words, it calculates the 87 class belonging probabilities for a given sample. </p></UL>
<br>
<p>The required utility functions are those below:</p>
<UL><DL><LI><DT><STRONG>the cost function</STRONG></p>
<DD>The regression hypothesis is defines as:
<p><img src="http://4.bp.blogspot.com/-c-C6IGeS_eo/TraZSGODaCI/AAAAAAAAAog/lspXBBwj7FE/s1600/Screen+shot+2011-11-06+at+11.26.53+AM.png" border="0"></p>
with <p><img src="http://2.bp.blogspot.com/-PqGvtE_NEmE/TraZYTHEfLI/AAAAAAAAAoo/swDZg20vd4Q/s1600/Screen+shot+2011-11-06+at+11.27.03+AM.png" border="0"></p>
the sigmoid function which figures in the module mathutils.
<p>In our case &Theta; is the lot of parameters, and with m = len(y), the cost function is given by the following formula:
<p><img style="width: 537px; height: 78px;" src="http://4.bp.blogspot.com/-0vWgkEmE-u4/TraaI_rd-bI/AAAAAAAAAow/Ya5rp0rQS48/s1600/Screen+shot+2011-11-06+at+11.30.37+AM.png" class="CSS_LIGHTBOX_SCALED_IMAGE_IMG"></p>
But we're going to use the regularized version of the cost function, therefore the regularization term needs to be added, the regularized cost function follows:<p><img src="http://3.bp.blogspot.com/-qNym-oCdMIg/Trd03YeslWI/AAAAAAAAApQ/GUfXiJ3vpUE/s400/Screen+shot+2011-11-07+at+3.03.55+AM.png" border="0" height="48" width="400"></p>
<br>
<LI><DT><STRONG>the gradient function</STRONG></p>
<DD>ad gradient:
<p><blockquote>&lsaquo;In mathematics, the <b>gradient</b> is a generalization of the usual concept of derivative to the functions of several variables. If <span class="texhtml"><i>f</i>(<i>x</i><sub>1</sub>, ..., <i>x</i><sub><i>n</i></sub>)</span> is a differentiable function of several variables, also called "scalar field", its <b>gradient</b> is the vector of the <i>n</i> partial derivatives of <i>f</i>. It is thus a vector-valued function also called vector field.&rsaquo; (wikipedia extract)</blockquote></p>
<p>Thus the gradient of the regularized cost function is a vector whose elements are defined as follows:</p>
<p><img src="http://31.media.tumblr.com/49d21814f22a1d5482d64c947ea8b035/tumblr_mx1jitadG71rpbhdso1_400.png" border="0"></p>
</UL>
<br>
<h1> 4. K-nearest neighbours </h1> <h1> 4. K-nearest neighbours </h1>
...@@ -89,8 +165,8 @@ based on the categories of the k nearest neighbors in the training data set.</p> ...@@ -89,8 +165,8 @@ based on the categories of the k nearest neighbors in the training data set.</p>
<h2> Preparation </h2> <h2> Preparation </h2>
<p> The first idea was to use Fast ICA, described in the paper: <p> The first idea was to use
http://mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA_Hyvarinen.pdf. <a href="http://mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA_Hyvarinen.pdf"> Fast ICA </a>.
The approach was rather complex and at the end I opted for the k-nearest neighbors which The approach was rather complex and at the end I opted for the k-nearest neighbors which
is a simpler and more efficient approach. </p> is a simpler and more efficient approach. </p>
...@@ -120,22 +196,30 @@ which should do the bird classification.</p> ...@@ -120,22 +196,30 @@ which should do the bird classification.</p>
<h2> Material and Preparation </h2> <h2> Material and Preparation </h2>
<p>As my knowledge about neural networks was very limited (to abstract basic principles), it was necessery to become familiar with the basic concepts. <p>As my knowledge about neural networks was very limited (to abstract basic principles),
Stenli found the following coursera course about machine learning: https://www.coursera.org/course/ml it was necessery to become familiar with the basic concepts.
Stenli found the following coursera course about
<a href="https://www.coursera.org/course/ml"> machine learning</a>.
There the material for week 4 and 5 is concerned with neural networks and I used this to get the gist of the topic. There the material for week 4 and 5 is concerned with neural networks and I used this to get the gist of the topic.
Then I found another coursera class about neural networks in machine learning by Geoffrey Hinton: https://www.coursera.org/course/neuralnets. Then I found another coursera class about
<a href="https://www.coursera.org/course/neuralnets"> neural networks in machine learning </a> by Geoffrey Hinton.
Some of the lectures can also be found on youtube.</p> Some of the lectures can also be found on youtube.</p>
<p>I watched the first 5 lectures which provided a more profound theoretical background and introduced me to backpropagation.</p> <p>I watched the first 5 lectures which provided a more profound theoretical background and introduced me to backpropagation.</p>
<p>I limited the scope of my algorithm to this material as more complex neural networks would have meant studying even more material. <p>I limited the scope of my algorithm to this material as more complex neural networks would
Python provides many modules for the implementation of neural networks (pybrain: http://pybrain.org/docs/, have meant studying even more material.
neurolab: http://pythonhosted.org/neurolab/intro.html#support-neural-networks-types, ...). Python provides many modules for the implementation of neural networks (
<a href="http://pybrain.org/docs/"> pybrain </a>,
<a href="http://pythonhosted.org/neurolab/intro.html#support-neural-networks-types"> neurolab</a>, ...).
</p> </p>
<p>I decided not to use these modules as I wanted to make every step of the algorithm explicit. Implementing a neural network and a learning algorithm with <p>I decided not to use these modules as I wanted to make every step of the algorithm explicit.
these modules can basically be reduced to two function calls but I was interested in what is going on behind the scenes. I am well aware of the fact Implementing a neural network and a learning algorithm with
these modules can basically be reduced to two function calls but I was interested in what is going on
behind the scenes. I am well aware of the fact
that my algorithms performance lies far below the performance of the algorithms used in these modules. that my algorithms performance lies far below the performance of the algorithms used in these modules.
</p> </p>
...@@ -143,9 +227,11 @@ that my algorithms performance lies far below the performance of the algorithms ...@@ -143,9 +227,11 @@ that my algorithms performance lies far below the performance of the algorithms
<h2> Implementation </h2> <h2> Implementation </h2>
<p>So my implementation is a basic feed-forward neural network consisting of 1 input, 1 hidden and 1 output layer. <p>So my implementation is a basic feed-forward neural network consisting of 1 input, 1 hidden and 1 output layer.
To adjust the weights during training I use the backpropagation algorithm (http://en.wikipedia.org/wiki/Backpropagation). To adjust the weights during training I use the <a href="http://en.wikipedia.org/wiki/Backpropagation">backpropagation algorithm</a>.
The main functions of the algorithm are the initialisation of the neuronal network, the training of the network with the training data, and the evaluation of the testing data. The main functions of the algorithm are the initialisation of the neuronal network, the training of the network with
Several other functions were necessary as helpers to realise these 3 functions. More detailed information can be found in the code. the training data, and the evaluation of the testing data.
Several other functions were necessary as helpers to realise these 3 functions. More detailed information can
be found in the code.
</p> </p>
...@@ -155,7 +241,8 @@ Several other functions were necessary as helpers to realise these 3 functions. ...@@ -155,7 +241,8 @@ Several other functions were necessary as helpers to realise these 3 functions.
where n depends on the length of the input wav file. This means that the input varies in length. where n depends on the length of the input wav file. This means that the input varies in length.
This results in 17*n input nodes of the network (it does not matter that 2 dimensional data is fed in This results in 17*n input nodes of the network (it does not matter that 2 dimensional data is fed in
as 1 dimensional as the network will find its own way to extract the information). as 1 dimensional as the network will find its own way to extract the information).
The problem, however, is that the input length is variable and in feed-forward networks the input must be of the same length always. The problem, however, is that the input length is variable and in feed-forward networks the input must be of
the same length always.
For variable input-length other networks such as recurrent networks (which are beyond my knowledge) are more suitable. For variable input-length other networks such as recurrent networks (which are beyond my knowledge) are more suitable.
</p> </p>
...@@ -172,9 +259,9 @@ the results by using the verification process integrated in the base script. </p ...@@ -172,9 +259,9 @@ the results by using the verification process integrated in the base script. </p
<p> We run some tests of all the algorithms, using the full training data, and using 75% of <p> We run some tests of all the algorithms, using the full training data, and using 75% of
it for learning and 25% for verification. The neural network algorithm performed quite well, it for learning and 25% for verification. The neural network algorithm performed quite well,
reaching on a limited set of training samples a correspondence index of 77% (against the reaching on a limited set of training samples a correspondence index of 77% (against the
benchmark of around 50% of the random algorithm). The linear regression and K-neighbours algorithms benchmark of around 50% of the random algorithm). The linear regression on the other hand
on the other hand had some implementation problems which did not allow them to reach acceptable was not a viable choice, as python could not cope with the matrix of parameters
results. </p> coming out of it. </p>
<h1> 7. Further development </h1> <h1> 7. Further development </h1>
...@@ -184,10 +271,10 @@ to be able to re-use the algorithm running and verification parts. Another inter ...@@ -184,10 +271,10 @@ to be able to re-use the algorithm running and verification parts. Another inter
can be done on the verification process, which could replicate better the one used in the Kaggle can be done on the verification process, which could replicate better the one used in the Kaggle
competition, and making the reading and writing of the files operating system independent.</p> competition, and making the reading and writing of the files operating system independent.</p>
<p>The linear regression and K-neighours algorithms need of course to be fixed, and generally all <p>The K-neighours algorithms needs of course to be fixed, and generally all the algorithms
of them need a more efficient approach, as the complexity and efficiency could be implemented more efficiently, as the complexity and efficiency
are of concern for such amounts of data. </p> are of concern for such amounts of data. </p>
<p>A better tuning of the parameters of the algorithms should also be done, <p>A better tuning of the parameters of the algorithms could also be done,
to improve the performance.</p> to improve the performance.</p>
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment