How to Implement Genetic Algorithm using python .

This is classical evolutionary algorithm which generates new solutions by applying random changes to current ones.

It is based on Darwin’s theory of evolution. It makes slow and slight changes in order to find the best solution.

GA works on populations which include some solutions , the population is the number of solution,each solution is an individual. The chromosome representation is done by a set of parameters which also defines an individual.Chromosome has as set of genes each being represented can be represented as string of 0’s or 1’s.

Implementation :

In this tutorial we will implement the following equation .


x1,x2,x3,x4,x5,x6 are input values 

W1,w2,w3,w4,w5,w6 are weights.

We will be looking to find the weights that maximize the equation . From just looking at the equation we can tell that positive numbers should be multiplied by by the largest possible positive number and negative numbers by smallest possible negative numbers. But we need the GA to do all these on it’s own.

Now we will define initial population.The number genes each population will have depends on the number weights to be found. The number of solutions is not fixed , we can only chose the value that fits well with our problem. Solutions Per Population, population size and initial population will be held in variables.

When we print new_population it will generate new_population,since it’s random it will provide different result every time we run it .


The next step is find the best individuals for parent mating. We are going to have to use GA variants which helps produce the offspring of the later generations. This will lead to the creation of a new population. It will keep repeating the steps for several generations.

The GA is as follows .

Link : , source GIT HUB

The next step is to call the function in GA to find new population , we will use pop_fitness – which calculates the value of each solution in the current population.

Mating_pool function selects the best individuals from the current generation for producing the offspring. Crossover points to crossover between parents and finally the mutation function changes a single gene in each offspring , this happens randomly.

The full source code : 

‘ ”

#Inputs of the equation.

  equation_weights =[3 – 5, 1.2, 3, -13, -2]

#Number of the weights we are looking to optimize.

  weights = 6 import numpy sol_per_pop = 8

#Defining the population size.

  pop_size = (sol_per_pop, weights)

#The population will have sol_per_pop chromosome where each chromosome has num_weights genes.

#Creating the initial population.

  new_population = numpy.random.uniform (low = -4.0, high = 4.0, size =

pop_size) print (new_population)


     cal_pop_fitness (equation_inputs, pop):

#Calculating the fitness value of each solution in the current population.

#The fitness function caulcuates the sum of products between each input and its corresponding weight.

fitness = numpy.sum (pop * equation_inputs, axis = 1) return fitness def select_mating_pool (pop, fitness, num_parents):

#Selecting the best individuals in the current generation as parents for producing the offspring of the next generation.

parents = numpy.empty ((num_parents, pop.shape[1]))

  for parent_num

in range (num_parents):

max_fitness_idx = numpy.where (fitness == numpy.max (fitness)) max_fitness_idx = max_fitness_idx[0][0] parents[parent_num,: ] = pop[max_fitness_idx,: ]fitness[max_fitness_idx] = -99999999999 return parents def crossover (parents, offspring_size):

  offspring =

    numpy.empty (offspring_size)

#The point at which crossover takes place between two parents. Usually it is at the center.

    crossover_point = numpy.uint8 (offspring_size[1] / 2) for k

  in range (offspring_size[0]):

#Index of the first parent to mate.

    parent1_idx = k % parents.shape[0]

#Index of the second parent to mate.

      parent2_idx = (k + 1) % parents.shape[0]

#The new offspring will have its first half of its genes taken from the first parent.

  offspring[k, 0: crossover_point] = parents[parent1_idx, 0:crossover_point]

#The new offspring will have its second half of its genes taken from the second parent.

  offspring[k, crossover_point: ] = parents[parent2_idx, crossover_point: ]return offspring def mutation (offspring_crossover):

#Mutation changes a single gene in each offspring randomly.



    in range (offspring_crossover.shape[0]):

#The random value to be added to the gene.

      random_value = numpy.random.uniform (-1.0, 1.0, 1)

offspring_crossover[idx, 4] =


    4] +

random_value return offspring_crossover num_generations =

5 num_parents_mating = 4 for generation

      in range (num_generations):

#Measuring the fitness of each chromosome in the population.

fitness = cal_pop_fitness (weights, new_population)

#Selecting the best parents in the population for mating.

  parents = select_mating_pool (new_population, fitness,


#Generating next generation using crossover.

  offspring_crossover = crossover (parents,

  offspring_size =

  (pop_size[0] – parents.shape[0],


#Adding some variations to the offsrping using mutation.

  offspring_mutation = mutation (offspring_crossover)

#Creating the new population based on the parents and offspring.

      new_population[0: parents.shape[0],: ] = parents new_population[parents.shape[0]: ,:] = offspring_mutation

  print (“New population”)

  print (new_population)

Ref :

Written by Masud Imran

Particle Swarm Optimization with Python

What is  PSO ? 

There are things in the nature we human seldom understand. We can either guess what is going to happen in the future or we can use Algorithms to remove the guess-work from the equation. PSO is one of those techniques. It is a routine designed to mimic Birds flocking or fish schooling . Sounds interesting right ? It is one of the wonders of science.It was developed Developed in 1995 by Eberhart and Kennedy.

Below, are the only two equations that make up a bare bones PSO algorithm. As a heads up, “k” references the current iteration, therefore “k+1″ implies the next iteration.

The Algorithm :

Uploaded by Ganesh K. Venayagamoorthy

In the 2 for loops, it initializes the positions of the particles with a random uniform distribution for all their dimensions within a permissible range.

After that, it calculates its fitness value for each particle and compared it to its own best position (the p best value has ever been the best position of that particular particle) and then it selects the best position of all particles in g best.

The Equation : Let’s take a closer look at the equation that defines the velocity of a particle dimension’s next iteration: Vi(k+1) is an inertial parameter of the next iteration velocity W. This parameter affects the propagation of the movement given by the last value of the velocity. C1 and C2 are coefficients of acceleration. The C1 value gives personal best value and the C2 value is the social best value. Pi is the best position for each individual and Pg is the best position for all particles. The distance of each of these parameters to the actual position of the particle is measured in the equation. Rand1 and rand2 are random numbers where 0 is within random range 1 and control each value’s influence: social and individual as shown below. After that the position of the new particle is calculated until specified number of iterations or an error criteria are reached.

Google Images

Implementation : 

We have to find the minimum point for a functions which in this case is f(x,y) = x^2 + y^2 + 1. 

So what would be the minimum for f(x,y) ?

Ans : [0,0]

Implementing in Python : 

Library used : Numpy 

The Particle class

As a particle is initiated we sort two positions , +50 and – 50 and the pbest_position is initiated with these. Pbest_position is the best individual position for the particle. For the minimum value pbest_value is initiated , _str_() is defined to print the actual position and the best individual value. The move() method add the positional vector and the dimensions’ velocity calculated in the searches .

Search space : 

Search Space controls algorithm routine. It is responsible to keep all the particles , Identify and  set the individuals best position values of all particles, manage the target error criteria, calculate the best global value and set the best global position. In resume, it encapsulate all principal steps .

Set_pbset and st_gbset goes through all the particles and compares them to the best individual position . The method move_particles calculate the new vector velocity to each particle in each dimension as it was explained before.

Main Loop : 

Search space is initiated with target 1 . This is the target at fitness value , in other words f(x,y) = 1 . So it finds the value of x and y that gives the result as 1 because we want to find the minimum.User can set the target error and number of particles. List generator will initiate all the particles once iterations are initiated .

Inside the interaction , the best individual and global best position will be found , to help identify the target error criteria. Until the minimum error is achieved the particle’s velocity to move will be calculated and it will stop at a new position .

Be cautious about c1, w and c2 .

Written by Masud Imran

Getting Started With Orange

Orange is a Gui based tool which is great for visualizing patterns and understanding their data . It allows user to do these things without the need to code . 

Why orange ?  

  1. It’s easy to use , any professionals can use it because of the absence of coding .
  2. Basic visualization , data manipulation ,transformation and mining can be one in a single workflow. 
  3. It has some wonderful visuals , which makes presentation appealing .

Getting Started with orange 

Download Orange distribution package and run the installation file on your local computer from here 

Download Link :

After installation is done you should be able to run the orange , just locate the orange icon and click it . 

When you run orange for the first time you should greeted by something similar like this . 

You can click Tutorial to browse through tutorial to watch tutorials on youtube and click Examples for reloaded workflow. 

After selecting Examples

  • You can choose any of the preloaded data mining workflows
  • For this module we will chose hierarchical clustering.
  • Selected tutorial will open in Orange canvas. In Orange, data mining workflows consist of computational components called widgets. Widgets do all the work and exchange information. They can communicate through channels. In the workflow below, the File widget sends its data to the Data Table widget and Distance widget, which, in turn, communicates the computed distances to two other widgets in the workflow .
  • The file widget on top left hand corner  reads the from your computer and sends data to other widget
  • If you click( double ) the file , it will open up a window from where you can browse through documentation data sets to browse through , for this one from the pre installed data files select 

  • From this file we will predict the probability of survival of passenger based on the information we get from the file itself .
  • Select the little curve around the File Widget to select other widgets to send data to , for this one we will send data from file widget to Data table and Sieve Diagram .
  • Now doubleclick sieve diagram to visualize survival probabilities against expected ones . Now you can play with the combination of attributes to get answers to following questions .
  1. Lowest Probability of survival based on class , sex and age . 
  2. Who had higher probability of survival the crew or the first class passengers ?

Now play around with other preloaded files to learn more . 

Using External Files in Orange

Firstly We will to pick a file operator , from that file operator we will have to open the file we downloaded from Kaggle . 

Note : Depending on the file we downloaded , it might take some time to load and work with other operators . Larger the file , more the time . 

For the sake of this tutorial we will be using these two files , these excel files . For each files we will need two different file operator , repeat the method for both files .

From the figure we can see exactly the number of attributes available for each files .Since these files are inserted we can start visualising . 

By using distribution attribute we can visualise the distribution of Latitude among the states in USA . 

Or by using the scatter plot , we can plot the scatter plot of longitude among the states 

 Or else just by using Data Table we can create one table from each file . 

Lets work with another flight data , this time with a bigger file  and it will take a lot longer to work with operators or just to load data depending on your computer .

Let’s load the adsb.csv 

Connect the file data table to see the table 

And then load up some charts 

Written by Masud Imran

Crawling Twitter Data using Rapid Miner for Analytics.

I am kinda using Rapid Miner for a few University Projects , thought i would share some things i learned during the process , this is just about getting data from specific twitter accounts .

Installation :

Use the following link to download Rapid Miner : , you can probably do it by your self , it’s a very simple installation .

The Interface :

When you done setting up Rapid Miner you will be greeted with this start page .

As you can see there is a bunch of different options here for the sake of this tutorial we will use the Blank option .

Our goal is to get Twitter Data from specific twitter accounts of news publication .

As you can see in from the screenshot , there is three available operators on regards to twitter available from the get go in Rapid Miner . Since we want to crawl user tweets , we will select get twitter user statuses for our case .

We will take three of them because we want to crawl three accounts accounts and append them together using the append operator .

On the right hand side you will see a connection option , click the twitter logo beside it to set up a twitter account to log in to , that’s not the account we will take tweets from but it is important be logged into an account in order to crawl data.

Now we need to set up the Get Twitter User Status operator , to do that lets start with finding the news outlet we want to pick for our work , for this we will nytimes ,dailystar and the star .

We will use each of these twitter accounts user name . For New York Times , it’s nytimes , for The star it is staronline and for The daily star , it is dailystar .

If we select one of the operators , parameters of that operator will show up on the right side of the Rapid Miner dashboard . We will select the connection type we already added before . Query type we will select name because we will crawl using the username of the account . We will then set the user to our desired account , for this case we will use nytimes , dailystar and staronline and we will set the limit of query to 1000 . so it will work with 1000 tweets.

Now we will use the filter operator to filter what type of news we would want to find , for now we are trying to find the negative news posted by these accounts recently (out of 1000 tweets each account .

So we are almost done , now we are moving close to getting our results , if we want to save our result in a excel file we can use the Write to Excel operator which will print the result to excel.

So finally it will look something like this , now we will try running the program using the blue play button and it should produce something like this .

From-User holds all user accounts and text field contains the status . From the number of column we can now find out out of 1000 news , how many of them contain the negative words we used as our filter .

Easy right ?

Exercise : Compare the number of Bad and Good news generated in the United States of America , using four news news outlet’s twitter account and compare the results .

Visualization using Rapid Miner :

Now let’s say we want visualize what we want to display a pie chart , that will display the Retweet count for each account .

Let’s select our plot type . pie chart from these then tick the Aggregate data column

Since we will plot the chart using the number of tweets generated from the three accounts , we are gonna group by From-User , Aggregation Fund will be set to count as we are counting number of tweets .

And we have our Pie Chart .

Exercise : From KFC and Pizza hut’s twitter page , plot a pie chart based on the retweets for each account .

Written by Masud Imran

A Brief Introduction to Python.

I will try to make it as easy as possible , let’s see what we end up with . Note this tutorial is targeted towards audience who have acquired knowledge on C/C++ programming language.


Many of the recent Operating Systems comes with python Pre Installed , but it’s okay if your pc don’t have python installed , you can easily install it yourself.

From this link : , you can download and install Python on your computer , follow the instructions provided for your device.

Getting started with IDE :

There is a bunch of IDE to pick from , if you are beginner you should look for IDE’s that are tailored to make it easier for you to make Python Coding easy.

In this tutorial we will use PYcharm . You can download it from this link : .

If you have completed Installation and done setting up the IDE , you are good to get started with Python.

The Properties of Python:

Python is strongly typed (i.e. types are enforced), dynamically, implicitly typed (i.e. you don’t have to declare variables), case sensitive (i.e. var and VAR are two different variables) and object-oriented (i.e. everything is an object).

Python Syntax :

If you have experience with C++ ,which you should have by now , you must be habituated with using “ ; “ this as your termination character but you will have to change that habit while using Python because Python does not have any mandatory termination character . Indentation specifies a block , indent to begin , dedent to end. Comments start with the pound (#) sign and are single-line, multi-line strings are used for multi-line comments. Just like C++ , values are assigned with an equal “=” sign and equality is defined by double equal sign “==”. Also “+=” or “-=” works just like c++. This works on many data-types, strings included. You can also use multiple variables on one line.

Data Types in Python :

Lists ,Tuples and Dictionaries are available as Data Structures in Python , sets are available in sets library. Lists are like One Dimensional arrays , dictionaries are associative arrays and tuples are immutable one-dimensional arrays.

In Python Arrays can be of any type , you can mix up an array with Integer , string or lists etc. The first index of every Array is set to 0 by default.Negative numbers count from the end towards the beginning, -1 is the last item. Variables can point to functions.

To access array ranges you can use Colon “:” , like print List [0::2] will print the first and 3rd item of an array. If you leave the left part empty , compiler will assume it’s the first item. is inclusive-exclusive, so if you specify print(list[0:2]) it will only print O’th item.

Exercise : Try to access 3rd and 4th item using negative index.

Strings in Python :

You should also know what String is by now . If were into C++ , you are probably used to using string with a double quotation mark, in python you can use both single and double quotation and if you can use one kind of quotation inside another one as well. For multiline strings , double or triple quotes can be used. Strings in Python are Unicode, bytestrings are represented with b prefix.Modulo (%) and a tuple is used to fill strings with values.It is important to note that every %s gets itself replaced by an item from the tuple , it happens left to right , Dictionary subs can also be used .

Flow Control in Python:

For flow control python has if , for and while statement and it works just like c++ but python does not have switch , so for selection we will have to used if , number lists can be obtained by range(<number>).


We can declare functions in Python using def keyword . Arguments can be set to function , they can be mandatory or optional . Mandatory arguments are set with default values. Parameters can be passed by reference . but immutable types (tuples, ints, strings, etc) cannot be changed in the caller by the callee.

Class in Python

Python is an Object Oriented Programming language , almost everything in python is an object.Class is a blueprint for creating object.

Create a class by using the class keyword and we create an object like objectName = classname().

Now we know how to create object is simplest form but it is not enough for any real life usage , there is a built in __init__() function in python which is fundamental to Python Classes.

__init__() function is used to object property assignment.

Exercise : Make a class student with a function that will greet the student .

Importing :

You can import library by using keyword import followed by the library name. Python has a ton of libraries with tons of different functionalities which makes working with python more interesting .


Python has several functions for creating, reading, updating, and deleting files. The Open function in python takes two parameters, the file name and mode. For reading just specifying the name is enough.

Write files using file.write

A file can be deleted using the os module .

I hope you learned something about python from it , Thank you.

Prepared by Masud Imran