Feature Selection using Genetic Algorithm in Python

Photo by Franki Chamaki on Unsplash

Over the years, the use of Machine Learning in real life has increased manifold. Be it the diagnosis of cancerous cells or the prediction of stock market prices, machine learning models are being utilized extensively. With the sudden spurt of machine learning, our focus has turned to methods that improve and optimize its performance.

Feature selection is one such method.

In feature selection, we find the optimal feature subset that contributes most to our predicted variable. The computational ability of machine learning models depends a lot on the feature set. Retaining the significant features vastly improves the learning time, and also improves accuracy.

Why do we need feature selection?

  1. Remove unnecessary/redundant data.
  2. Curtail the Curse Of Dimensionality
  3. Optimize training time

The dataset used is Breast Cancer Wisconsin(Diagnostic) which consists of 30 real-valued features.

Genetic Algorithm

Underlying Idea

Steps involved in Genetic Algorithm

Figure 1: Steps involved in Genetic Algorithm


Figure 2: Representation of population, chromosome, and gene

Population is the group of individuals that form a generation. It is a subset of possible solutions that undergoes reproduction. A chromosome represents an individual in a population. In computational terms, the chromosome is represented by a binary string. For feature selection, the chromosome's length is taken as the number of features in the dataset. 0/1 indicates the presence/absence of the ith feature in the solution.

A large population size would cause the algorithm to slow down, while a small population will not show diversity. So it is important to choose population size wisely. Here I have used a population size of 40 (~1.5x(number of features in the dataset)).

Here, we are initializing our population to find the top N features.


Usually, the fitness function is the same as the optimization function. For, eg., in a maximization problem, the fitness function will be the function that is to be maximized. Targetting an improved model performance, I used the cross-validation f1 score from MLPClassifier() trained on the individual’s solution as the fitness value.

F1 score is used as the accuracy metric because:

  1. The dataset is quite imbalanced, and using accuracy_score could indicate a slight bias towards the majority class.
  2. A high f1 score indicates a better classification, i.e. a better model performance.

Here, get_fitness() finds the population's fitness values by decoding each chromosome into a feature subset, and calculate_fitness() finds the corresponding f1 score.


Fitness Proportionate Parent Selection is the widely accepted criteria for parent selection. It ensures that all individuals get a chance to be selected as a parent with a probability proportionate to their fitness value. In this way, the underlying idea behind genetic algorithm would be justified.

There are various methods for parent selection like Tournament Selection, Roulette Wheel Selection, Stochastic Universal Sampling, Rank Selection, Random Selection, etc. I have used Roulette Wheel Selection here.

Figure 3: Roulette Wheel Selection

In Roulette Wheel Selection, a fixed point is chosen on the pie chart prepared using the fitness values. On every rotation, whichever individual comes in front of the point is selected for reproduction. This means that an individual with a greater area on the pie chart (i.e. a greater fitness value ) has a high probability of being selected.


  1. Find the sum of all fitness values in a population(S)
  2. Find normalized fitness values (=fitness value/S)
  3. Find cumulative fitness values ( See Figure 3)
  4. Generate a random number p between [0,1]
  5. The individual for which p is just smaller than the cumulative sum is selected. Eg. In Figure 3, consider the Normalised Fitness Values column. If p<0.232, Individual 1 is selected; if 0.232<p<0.426, Individual 2 is selected, and so on.

Shown below is the implementation of Roulette Wheel Selection.



Usually, a probability is assigned to this process, indicating the chance of crossover for a given pair. Crossover is a high probability event and is assigned an optimum probability between 0.65–0.80. Here, I have used a probability of 0.78.

Figure 4: Two Point Crossover


In general, mutation is done by randomly swapping any bit of a random individual in the population. Following the conventional mutation process, it was observed that after many generations, the number of features extracted deviated a lot from N. To reduce deviation, I swapped a ‘0’ bit with ‘1’ bit. In this way, the deviation was reduced by a good extent.


A few of the stopping criteria are listed below.

  1. Fixing the number of generations: This is not a very good method as we may miss out on the best generation because of an upper limit on the number of generations.
  2. Lack of progress in the fitness of the best individual of the population: Lack of progress does not necessarily imply convergence as evolution proceeds with punctuated equilibria that can later lead to improvement.
  3. Keeping an upper limit on the variance of fitness values in a population: When individuals are so similar to each other that we may not get a better individual in future generations, the algorithm would indicate convergence.

Here I have used the third stopping criteria.

Conclusion and final remarks

You can find the whole source code here. Feel free to go through it and suggest any improvements or optimization.


Undergrad @ IIT Roorkee

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store