Uncommon Tips for R Programming

I have been working in R for a almost a period of Three Years. I learned it myself from the tutorials, documentation and online courses. Recently, I came across a two interesting problems, which I feel are worth sharing to the people. There could have been many ways to work around those problems but the best optimal solution is the best solution.

Conversion of Attribute

SO I was working at a data which had a time attribute in seconds but the values were much distorted. By Distorted, I mean they were not in the same format. Some of the values were like 0000015 and some were like 24. Both represented times in second.


[1] 00000115.123 0000053.159  0000068.468  15.989       50.604       0000029.135

It was important to normalize them. The variable was read as factor, so the first thing that comes to mind is to change it to numeric. I tried but it didn’t work as the R had populated the garbage values.

Let’s check that too.

 time$TSnumeric <- as.numeric(time$TimeSpent)


[1]   65  689  818 1147 1509  411

Does the head looks anywhere near the original values?. Now there could have been many ways to work around that but the solution would have been much larger. One would be to write a loop which would remove all the 0s before any natural number other than 0 come up.

However there was a much simpler solution which worked like a charm. It was to change the variable to character and then to Numeric.

time$TSnumeric &lt;- as.numeric(as.character(time$TimeSpent))


[1] 115.123  53.159  68.468  15.989  50.604  29.135

You see, we got rid of a problem and the whole variable is normalized.

Customizing function of a package

I think it is too common that sometimes we want to customize the function of a package to make it work in the way we want. I too recently was working on a package tuber which scrapped data from Youtube. I felt an urge to make changes to one of a function for something but as I was running it, It failed due to odd reasons.

The first thing, let’s learn how to build your customized solution.

Simply write down the function of a package you want to change. For my case, I want to make changes to get_comments from tuber package.



function (part = "snippet", video_id = NULL, text_format = "html",

simplify = TRUE, max_results = 100, page_token = NULL, ...)


if (is.null(video_id))

stop("Must specify a video ID")

if (max_results &lt; 20 | max_results &gt; 100)

stop("max_results only takes a value between 20 and 100")

if (text_format != "html" &amp; text_format != "plainText")

stop("Provide a legitimate value of textFormat.")

querylist &lt;- list(part = part, videoId = video_id, maxResults = max_results,

textFormat = text_format)

res &lt;- tuber_GET("commentThreads", querylist, ...)

if (simplify == TRUE &amp; part == "snippet") {

simple_res &lt;- lapply(res$items, function(x) x$snippet$topLevelComment$snippet)

simpler_res &lt;- as.data.frame(do.call(rbind, simple_res))





<environment: namespace:tuber>

Simply copy paste it into another function and give it a non conflicting name such as get_comments2

 get_comment2 = function (part = "snippet", video_id = NULL, text_format = "html",

simplify = TRUE, max_results = 200, page_token = NULL, ...)


if (is.null(video_id))

stop("Must specify a video ID")

if (max_results &lt; 20 | max_results &gt; 300)

stop("max_results only takes a value between 20 and 100")

if (text_format != "html" &amp; text_format != "plainText")

stop("Provide a legitimate value of textFormat.")

querylist &lt;- list(part = part, videoId = video_id, maxResults = max_results,

textFormat = text_format)

res &lt;- tuber_GET("commentThreads", querylist, ...)

if (simplify == TRUE &amp; part == "snippet") {

simple_res &lt;- lapply(res$items, function(x) x$snippet$topLevelComment$snippet)

simpler_res &lt;- as.data.frame(do.call(rbind, simple_res))





Now when I tried to run it, it gave an error of not finding a function which is called within this function.

could not find function “tuber_GET”

It is because this new changed function is not added to the environment of the package.

Simply do this to add the function to the environment of the package.

environment(function_name) &lt;- asNamespace("package_name")

#In the above case, it would be


Using Built in Themes for Ggplot2

So one of the most common problem we face is adding manual theme to graphics we build in ggplot2. We have to add many lines of code or sometimes we just leave it as it is. I have recently come across a package called ggthemes which is very helpful in this regard. You have about 10 themes to choose from including the themes of WSJ and Economist. You simply add them to syntax of your plot and you are good to go.

Let’s take example of mtcars data.

p2 &lt;- ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(gear))) +

geom_point() +




This is original plot, now let’s add economist theme to it.

p2 + theme_economist()


Let’s also use the colors by Economist for legends.

p2 + theme_economist() + scale_color_economist()


Let’s use theme used by WSJ.

p2 + theme_wsj() + scale_color_wsj()


So you see how easy it is.

Thank you for reading out.

Usman is an aspiring data scientist who works to automate Oil Industry. He tweets @rana_usman and can be reached out at usmanashrafrana@gmail.com


Drone Attacks in Pakistan – A Quantitative Analysis.

Drone Attacks, a Quantitative Analysis

A series of drone attacks started in 2005. The series was initially planned to target the terrorists hiding in tribal areas of Pakistan but what drone attacks did more than killing terrorists have consequences we would be facing for decades. It has killed our self-confidence and destroyed our esteem as a sovereign nation.

Let’s analyze the drone attacks and see what wrong has them done to Pakistan.

The first ever drone attack on the sovereign land of Pakistan happened back in 2004 in the Wana area of South Waziristan. The attack killed 6 people, out of which 2 were civilians and 2 were children. The series started then.

Let’s look into data and find which area’s had the highest number of drone attacks.


Area count
North Waziristan 303
South Waziristan 95
Kurram Agency 8
Khyber Agency 6
Bajaur Agency 4
Bannu Frontier Region 3
Khyber Pakhtunkhwa province 1
Orakzai Agency 2



  • Out of total 422 attacks on the surface of Pakistan, 72% of them targeted North Waziristan.


Let’s check on the total number of people killed in comparison to civilians.


  • We have a startling finding
  • Out of 3,994 total killed, there are 965 civilians which means about 24% of causalities in drone attacks are civilians.
  • North Waziristan had the highest number of civilian causalities.
  • Bannu Frontier Region has the lowest.


How many Children are killed?

It is equally surprising that a huge number of children were also killed in drone attacks.


  • It is reported that a total of 207 children were victim of drone attacks.
  • The highest number of children causalities happened in 2006 when 76 children fell to drone attack
  • Out of total Causalities, 5% were children.

Let’s see which American administration killed highest civilians and children. First we will check which American President held highest number of drone strikes.


  • Barack Obama takes the lead with 371 strikes while George Bush is responsible for 51 only.


  • Surprisingly, the highest number of children causalities happened in George Bush Era. The total count is 129.


  • From 2005-2008, 56% of causalities were civilians.
  • In the year 2006, 95% of casualties were civilians.
  • In the year 2007, 82% of casualties were civilians.


Top 5 locations for Drone Attacks

Location count
Datta Khel 37
Shawal 25
Mir Ali 15
Miram Shah 13
Danda Darpakhel 11
  • Historically it is seen that Friday of any month has the highest number of attacks followed by Wednesday.

Musharraf Vs Zardari Vs Nawaz Sharif Government


  • In the regime of Musharraf, a total number of 16 drone attacks happened.
  • While Asif Ali Zadari was president, an enormous number of 356 drone attacks happened
  • So far in the government of Nawaz Sharif, a total of 54 drone attacks happened.


Total Number of attacks by Year


  • The trend line represents that the attacks drastically increased in 2008
  • They were highest in 2010
  • The number started to drop later with lowest of 13 in last five years.

 Interactive Plots by Microsoft PowerBI

Please follow this link to see the dashboard.

There might be a purpose which is served by a drone strike. There might be potential threats that are eliminated by a few drone strikes, but the damage it is causing socially and psychologically are far more dangerous than the purpose it serves. The epidemic of “Collateral Damage” will only cause retaliation, the uprising against the murderers who are sitting thousands of miles away playing the game of death with the joystick. The pukhtoons have history of vengeance; they revenge their enemies, in any way, through any way. So if Zubair, 13 years old, the brother of Nabila Rehman, pick up arms against the America, Will then it be justified? A drone might kill one potential threat but leaves many definite threats behind.

The United States has used drones since 2004 but the program escalated after President Obama’s Inauguration. In the President Obama regime, highest number of 371 drone attacks happened. The President who was eager to end up the war overlooked an important aspect of drone warfare which has the highest number of civilian casualties, hence causing havoc.


Usman is an aspiring data scientist who likes to tweet @rana_usman and can be reached at usmanashrafrana@gmail.com

The Database used for analysis is publicly available The Bureau of Investigative Journalism.



Hitch Hikers guide to Data Manipulation with R on American Community Survey

In today’s tutorial I am going to teach on how to do basic data manipulation on a sample dataset. I will cater the data from US Census.  US Census is one of the richest data source on internet. You can get great insights from American Community Survey about United States of America.

So let’s get started.

  • Go to http://factfinder.census.gov/
  • Click on Advance Search > Show Me All
  • Click on Selected Housing Characteristics
  • Add Geography for County Level
  • Download CSV, it will come with two files. One would be the data file, the other would be data file.

What we will do in this tutorial.

  1. We would get ACS data
  2. We would find the states with highest number of houses using Solar Energy

What we will learn in this tutorial

  1. How to manipulate data
  2. How cool the deep house mixes are


You have to first set your directory in R to your desired folder in your hard drive. Setting directory helps you get all the files at single click to your workspace.

Usually people like to import data using read.csv, but I like to import it through GUI.

setwd("H:/Tech Blogs/R Tutorial/ACS Manipulation")

However for the tutorial, let’s stick to R, so I will import via read.csv. I will put to acs_data.

acs_data <- read.csv("ACS_14_5YR_DP04_with_ann.csv")

Now since the data is imported, let’s have a look at it.

You can look at data both through GUI and console. The Above mentioned command is for GUI. Having a good look at data can itself give you great insights and help you understand what you want to do with it.

Let’s see how many rows and columns our data frame contains

## [1] 567
## [1] 3143

The data has 567 number of columns which means we have lots of variables. Dealing with such large files requires a lot of work. The other interesting thing about the data is that the columns really doesn’t have human understandable names. To our fortune, the other CSV file with metadata comes along with it which specifies all the column names description.


You can clearly see the column names which are not humanly understandable. So let’s give them a human face.

Let’s now load the metadata file which contains the column name description.

acs_desc <- read.csv("ACS_14_5YR_DP04_metadata.csv", header=F)

I have specificly included header = FALSE while loading this file into my R enviornment because I don’t want Geo.id and ID to be header name for columns. Make sure to study the significance of headers when working in R.

Let’s see how our acs_desc data frame looks like

##                  V1
## 1            GEO.id
## 2           GEO.id2
## 3 GEO.display-label
## 4         HC01_VC03
## 5         HC02_VC03
## 6         HC03_VC03
##                                                         V2
## 1                                                       Id
## 2                                                      Id2
## 3                                                Geography
## 4        Estimate; HOUSING OCCUPANCY - Total housing units
## 5 Margin of Error; HOUSING OCCUPANCY - Total housing units
## 6         Percent; HOUSING OCCUPANCY - Total housing units

You can now see that there’s a description for each variable we have in Acs_data file.

R is an amazing tool for data analysis, statistical inferences and data transformation. You wouldn’t believe the power of R. We can actually give a human face to our original file acs_data just by one command.

Let’s do it.

colnames(acs_data) <- acs_desc$V2

Now if you view your file, it would look like this.


We have proper description for each variable now. There are words instead of funny codes. The best thing about words are that they actually mean something :).

If you closely look at the data, you’d find that we have a lot of things hidden in the data. We have estimates, percentages and margin of errors but for now, we are only interested in looking estimates and that too with deep house mixes being played at our ears.

You can read about Regular expressions here.

Let’s first find how many data columns actually have estimates written in their headings. We will be using Regular expressions for that.

grep('Estimate;', colnames(acs_data))
##   [1]   4   8  12  16  20  24  28  32  36  40  44  48  52  56  60  64  68
##  [18]  72  76  80  84  88  92  96 100 104 108 112 116 120 124 128 132 136
##  [35] 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204
##  [52] 208 212 216 220 224 228 232 236 240 244 248 252 256 260 264 268 272
##  [69] 276 280 284 288 292 296 300 304 308 312 316 320 324 328 332 336 340
##  [86] 344 348 352 356 360 364 368 372 376 380 384 388 392 396 400 404 408
## [103] 412 416 420 424 428 432 436 440 444 448 452 456 460 464 468 472 476
## [120] 480 484 488 492 496 500 504 508 512 516 520 524 528 532 536 540 544
## [137] 548 552 556 560 564

We can see all these columns have estimates in them but to play with these estimates, we need them to be in another data frame. So what we will do is, we will subset the data into another data frame. We will subset only the column names which have Estimate in them. Deep House mixes are cooL 🙂

Est <- acs_data[, grep('Estimate;', colnames (acs_data))]

Now if you view the dataset, we only have column names that had Estimate in them. The world is not perfect, there is no such thing as ideality. So we all encounter problems similar to what we have encountered now. If you look at original data file which is acs_data, you’d see that there’s a column named Geography which is not copied in estimates data frame. It happened because we asked the R to only copy columns with “Estimate” written in the header.

I think Geography is very important factor for any sort of analysis. So let’s now add geography from acs_data to Est.

What we will do is, we will copy the Geography from acs_data to Est.

Est$Geography <- acs_data$Geography

Now we have geography enjoined, I have this weird thought coming on check how many houses ran on Solar energy in USA. So we will repeat the similar regular expression as we did for subsetting estimates.

solar <- Est[, grep('Solar', colnames (Est))]

We have made a vector, let’s turn it into a dataframe with geography.

solar_data <- data.frame (acs_data$Geography, solar)

We shall here remove the first row because it contains the count for whole USA.

solar_data = solar_data[-1,]

Now we have a file which tells us for each county, how many houses we have which run on solar energy.

Let’s apply a delimiter and split the data in counties and states only. We would be able to do a better analysis on which state has more houses running on solar energy.

solar_data$county <- lapply(strsplit(as.character(solar_data$acs_data.Geography), "\\,"), "[", 1)
solar_data$state <- lapply(strsplit(as.character(solar_data$acs_data.Geography), "\\,"), "[", 2)

Now you can see, we have two new columns which are split by “,”


We want to sum the count which in our case in the column name called Solar with states. The first thing we will do is check what is the class type for column name state.

## [1] "list"

It’s list, we have to change it to character.

solar_data$state <- as.character(solar_data$state)

Now When its changed to character, we will perform a base function of R to aggregate the sum.

states_with_solar <- as.data.frame(xtabs(solar ~ state, solar_data))


Now you can see that we exactly know which state has how many houses running on solar energy but it’s not enough yet, so we have to visualize it for better comparison. We are not interested in all the states, we are only interested in top 5 states which has highest number of houses running on solar energy.

This is going to take a lot of effort and I am going to use all the base functions. You can use many packages here but I like to work with base functions. So let’s sort it for top 5 states that use Solar Energy.

top_5 <- states_with_solar[order(states_with_solar$Freq, decreasing = TRUE),]

We have created a new data frame and sorted the highest frequency on the top. Now we will extract the top 5 by simply using the head command.

top_5 <- head(top_5,5)

Here we have the top 5 states where houses rely on Solar Energy. It is pretty evident why the top state is California.

The only time I like to use a package is with visualization because ggplot2 is immensely powerful. First install it if you don’t have it installed already.

## Warning: package 'ggplot2' was built under R version 3.2.3
ggplot(top_5, aes(x=reorder(state, +Freq), y=Freq)) + 
  geom_bar(stat="identity", fill="lightgreen", color="grey50", position="dodge") +

So folks! that’s all for now. My deep house mix playlist is also over. We shall meet again with something interesting. Please return me with your feedback.

Usman is an aspiring data scientist who likes deep house mixes and hates chocolates. He likes to tweet @rana_usman and can be reached at usmanashrafrana@gmail.com.

Practical Guide to Azure Machine Learning Studio

A lots of people are unfamiliar with one of the best Predictive Modeling tool of Microsoft. It’s called Azure Machine Learning Studio.

I am going to give a basic hands-on with Azure ML studio.

What I am going to build is a census income predictor. We will be using classification technique to predict income on the top of a sample dataset already in Azure Machine Learning Studio.

We will learn the following in this tutorial

  • How to transform data in Azure ML
  • How to make a predictive model
  • How to make a web service
  • How to consume the web service

Let’s go step by step.

  • Make a free account at https://studio.azureml.net. The trial free version gives you enough space for experimentation.
  • Go to New Experiment and you’d see your work space as follow.


This is how your work space would look like. At the left are the modules that you would be using to transform, build, the data.

I wouldn’t go in technical details. I will keep the tutorial as simple as possible.

At the top right, you can see Saved Data sets. Click on it and you will see two drop downs.

  1. Saved Datasets
  2. Samples

Saved Datasets are the datasets that you will upload to your Work space. They would be available to use at any instance over a click. The Samples are the datasets that come up with Azure Machine Learning Studio in Default.

Lets drag and drop the Adult Income Data to the work space.


The first thing we would do is visualize the dataset. It is the most important step before doing any transformation because unless we don’t know what’s in the data, we would be hitting in the air.


Right click the module and click Visualize and you should see a result like this.

result vis


You can click on each variable/column to see the histograms at the right hand side.

You would also come across descriptive statistics as follow.


These statistics can give you a quick peek into your data.

Let’s now move on to next step.

As you are well aware of the fact that world is full of noise, dirt and mess. So there’s no data which comes with flowers, but most of the data comes with tangible thorns. The missing values are cancer to the data and we should always get rid of them.

Azure ML provides a missing value scrubber module for that.


Drag the module to the work space and connect the dataset node with it.


Click on it and at the right hand side, you would see the properties of it.

Select Custom Substitution Value and replace all the missing value with 0.

Continuing the steps in data transformation. One of the most important module that you will be working with all your life is Project Columns. This module is used to project, include, and exclude columns.

We necessarily have to see for income prediction, which of the columns are necessary for us and which are not.

First we search the project column, add it to the work space and attach its node to Missing Values Scrubber.


We click the Launch Column Selector from properties and a windows pop up.

We would exclude these columns as we believe these columns may not be a good feature for income. To learn more about feature engineering, I would recommend taking a detailed Edx course on Feature Engineering.
Now if you right click on Project Column Node and visualize the data, you would see that the above mentioned columns are excluded from the dataset.

Now when we are done with Data Transformation, it’s time to kick into modelling. We would go again to the search pane and search for Split Data. Drag the Module into work space and connect the node with Project Columns.
Splitting data into test and train is an important concept of Machine Learning. Whenever we are making a predictive model, we split data into partition of 60 % for training the model and 40 % for testing. The ratio could differ for different problems and data, however in basic it usually remains same. For more details on it, you can have a quick read of machine learning concepts over internet.


I have split the data into 50-50.

It’s time to train the model. Click on Machine Learning > Train and Drag the Train Model Module into Work space.


Remember, the split module would have two output nodes, one would go into train model and about the other one, I would explain later in the post.

Click on train model, go to right hand side properties, you’d see a Launch Column Selector. Click it and a window will pop it. You would have to mention your target variable here.

For our case, its income.


Now that you have specified the target Variable, its time to identify an algorithm for training. Since our is a classification problem that is we want to see if the income of an individual is greater than or less than 50k. We would use a classification algorithm. I would use Boosted Decision Trees.


We would connect the algorithm to the other node of train model.

The next step would be to score the model results. Scoring model would give us an output column of the scored prediction results.

What is Decision Trees?

In simple words, it finds a probability of happening of an event by making trees. More details about the algorithm can be read here.


Now is the time when I would tell you that the other node of Split Colum that would be the test data would go into the right hand side node of Score Model and at the left hand side, we would connect train model.



Run the model and wait until you find green ticker signs at the modules. Once it is done, right click on Score Model and Visualize it. You would find two more column names at the end, One with Score Labels and other with Scored Probabilities.


By the results, and comparison of it with the original income column, we can see that our model has done pretty well but is there a way to evaluate it?

Let’s move on.


Drag the Evaluate Module and connect it with Score Model. Once again run the model and when the ticker is green, right click on Evaluate model and visualize the results.


Here our model accuracy is about 91% which means our model is doing pretty well. We can see the precision, F-measure, Recall and other results here.


Web Service

One of the best feature of Azure Machine Learning Studio is that it provides a building Web Service tutorial which could be access using .NET, Python or R Shiny App.

We are now going to build a web service on the top of our predictive service. It is as simple as eating a cake.


Click on Setup Web service and Deploy Web Service.

You should then see a page like this.


Above mentioned is your API Key which you will use to deploy web service on a third-part. We will explain that later in the tutorial. You can test your web service by clicking on test.


Click on Request/Response and you would see the parameters to use it in a third-party environment. In the middle of page, you’d see all the parameters that your data is using. It also includes the Json string format.

At the end of the page, you’d see the code generated by Azure ML studio.


To make a real-time web application, we would be using Python for this tutorial. You can make use of .NET and R Shiny as per your skills.

You can download the Python based web app which is made under Django framework via this link.

To view the application running in live mode. Please click here

Most of the hosting websites don’t have support for python. I have hosted the application at my personal AWS instance.

If you have found this tutorial useful, please feel free to comment.


Usman is an aspiring data scientist. He tweets @rana_usman and can be reached at usmanashrafrana@gmail.com

Will Pakistan See More School Attacks? A Quantitative Analysis

A year back, it was my second day as a Data Science intern when APS happened. We the people who live by the children of house saw the 144 children brutally murdered on the hands of terrorists. The news shook Pakistan and unearth the earth. These kids were too young to fight but they won.


Father laminated decade old homework of his kid who was killed in APS attack.
An honorable teacher, Abu Bakr. He took 3 bullets to save four children. Three of them survived. He now walks with Cane.


Terror attacks on schools and colleges around the world have risen to higher levels than at any point in more than 40 years. It’s a global phenomenon that is increasing unprecedentedly. The long haul outcomes of assaults have been extremely frightful for children throughout the world.

The assaults have left scars more than any assaults on the grounds in light of the fact that there target was not just to slaughter individuals but subjugate their minds to irreversible oppression. By attacking the educational institutes around the world, the terrorist advanced in murdering hopes, dreams and aspirations. They succeeded in debilitating nations.

A similar incident happened on 16th December 2014 which left Pakistan amidst the shouts of mothers and tears of fathers when Army Public School in Peshawar was attacked and 132 children were brutally murdered.

I went on to find out the patterns of terrorism related to school incidents driven by emotions after the APS attack and the revelations was startling.


I took a peek into a well maintained database of terrorism attacks globally. The database is a research project by University of Maryland and they have open-sourced it. I filtered through all the terrorist attacks globally and then filtered the database to attacks on educational institutes only. The major challenge was to find an aggregate of all the people killed and wounded in the attacks. A major portion of R scripting, a statistical computing language was involved in cleaning the database to desired form to run analysis.

Characteristics of Global Terrorism Database

  • Contains information on over 140,000 terrorist attacks
  • Currently the most comprehensive unclassified data base on terrorist events in the world
  • Includes information on more than 58,000 bombings, 15,000 assassinations, and 6,000 kidnappings since 1970.
  • Includes information on at least 45 variables for each case, with more recent incidents including information on more than 120 variables.
  • Supervised by an advisory panel of 12 terrorism research experts.
  • Over 4,000,000 news articles and 25,000 news sources were reviewed to collect incident data.



The planned mantra dates back to 1970s but the frequency has increased dramatically post war on terror.

When I analyzed, I came across a startling number of individuals who became subject to terrorist attacks done by suicide bombers. The second lead is taken by Armed Assault, a technique which was used prior to suicide bombing. In recent past, the events of Armed Assault have increased with the great pace.

  • 74% of the attacks on Educational Institutes all around the world were conducted using Suicide Bombers.


South Asian region have been very unlucky in this regard. It had the highest number of attacks on Educational Institutes. Middle East takes the second lead and South America stands third. As I dig deeper, I find more startling facts.


In the figure above, it is evident how the pace of attacks on South Asian region increased after 2000.

  • The correlation is very interesting as American invasion of Afghanistan happened in 2001. Post American invasion, the frequency of attacks increased in South Asia, Hence the terrorism increased.

I tried to dig more to see what else I can find out of the data. I divided the attacks in 5 years intervals to see frequency in accordance to years.



The graph proved our primary inferences that the frequency dramatically increased post American war against terrorism.

  • There have been more attacks in last decade than in last 40 years.
  • There have been a total of 884 attacks all over the world in 5 years from 2005-2010, that is one attack every 45 hour.

Let’s breakdown the attack type by regions

The data again shows that South Asia had highest number of Bombings followed by facility attacks and armed assault. The Middle East stands second.

We also tried to compare the frequency of attacks of each region to another. The result is plotted below


  • It appears that there’s a little correlation between Sub-Saharan African and South Asia. The probable reason could be mercenaries of the same terrorist organizations.


We tried to find what weapon types are used versus the regions. The results were very interesting as in most of countries, only bombing and Fire Arms were used.

wapon type


When we ran to check overall result of what weapon types were used in attacks. The result was not different to our finding above.

  • It is apparent that most of the terrorist organizations prefer to use Bombing and Explosives for attacks followed by Firearms.
  • The most dangerous day for attack is Tuesday as historically highest number of attacks happened on Tuesday.


  • On any given day, the frequency of Suicide Bombing is always high but it goes dangerously high on Tuesday
  • I couldn’t correlate why Tuesday is the best day for attack, however I assume the reason for Tuesday to be a best day for attack could be because the staff gets in flow with the weekend work.



The breakdown of attacks by month with each countries show that Pakistan is the country which had highest number of attacks in every month. We can then clearly see that Afghanistan takes the second lead. We have United States and Thailand at the third place and Iraq at the fourth. Sadly, Pakistan is the only country which had a high frequency of attacks every single month.
We have also generated a table for the highest number of attacks by city.

The table below represents the number of attacks breakdown to the cities. Guatemala stands first, the country has a long history of a civil war. Iraq as we all know had a great resistance post American invasion but the most surprising fact is that Peshawar stands third. Peshawar is provincial capital of a KPK. The city always had a beefed up security because it has border with Afghanistan even then, there have been 58 attacks on educational institutes only. We have 8 cites of Pakistan alone in the table. The number is highest compared to other cities of the world.


Country City Total Attacks
Guatemala Guatemala City 66
Iraq Baghdad 64
Pakistan Peshawar 58
Peru Lima 51
Pakistan Mohmand district 43
El Salvador San Salvador 38
Chile Santiago 37
Colombia Bogota 37
Iraq Mosul 30
Colombia Medellin 28
Lebanon Beirut 28
Turkey Istanbul 28
Bangladesh Dhaka 27
Nigeria Maiduguri 27
Pakistan Bajaur district 27
Pakistan Safi 27
Pakistan Bara 26
Pakistan Karachi 26
Pakistan Quetta 25
Pakistan Landi Kotal 24

Another table for countries most attacked per region

I was flabbergasted of the fact that Pakistan had the highest number of attacks region wise. Pakistan leads in attacks on educational institutes not only in South Asia but the whole world. We had a surprising number of 743 attacks since 1970.


Region Country # Attacks
South Asia Pakistan 743
South Asia India 205
South Asia Afghanistan 204
South Asia Bangladesh 54
South Asia Nepal 42
Middle East Iraq 180
Middle East Turkey 116
Middle East Algeria 66
Middle East Lebanon 45
Middle East Israel 31
South America Peru 151
South America Colombia 107
South America Chile 50
South America Argentina 16
South America Bolivia 10
Southeast Asia Thailand 194
Southeast Asia Philippines 101
Southeast Asia Indonesia 19
Southeast Asia Vietnam 4
Southeast Asia Myanmar 2
Sub-Saharan Africa Nigeria 92
Sub-Saharan Africa Burundi 47
Sub-Saharan Africa South Africa 27
Sub-Saharan Africa Somalia 10
Sub-Saharan Africa Uganda 10
Central America Guatemala 98
Central America El Salvador 72
Central America Nicaragua 14
Central America Honduras 8
Central America Puerto Rico 7
Western Europe Italy 44
Western Europe Spain 31
Western Europe Northern Ireland 22
Western Europe France 18
Western Europe Greece 15
North America United States 156
North America Mexico 6
North America Canada 2
USSR Russia 24
USSR Ukraine 3
USSR Georgia 2
USSR Tajikistan 2
USSR Estonia 1



There have been a total of 1248 attacks on educational institutes in South Asia out of which 60% happened in Pakistan. We are the country which had 264% more attacks than in Afghanistan and 85% more than Attacks in Afghanistan and India combined. The safety of education, progress and development is at the cliffs edge.

Terrorism is the plague that my country has suffered since years. The deadliest attack in the history of Pakistan on a school happened last year when an army school was attacked in Peshawar. The attack left 132 children dead.

For death for every child, died the dreams of parents, the future of nation, a doctor, an engineer, an architect, an artist and everyone who they could have been. The Peshawar Army Public School resumes the candle of education but the screams of kids still haunt corridors of school and the walls though may have been painted but still contain the blood stain of innocents.


Author: Rana Muhammad Usman is an aspiring data scientist who likes to write about social issues. He tweets @rana_usman. He could be reached at usmanashrafrana@gmail.com

Database: Global Terrorism Database has been used for analysis.

This appeared in Dawn. The Dawn Version can be Read Here

This article is cited in BBC Article Here

This article is also cited in La Stampa Italy


Pakistani Ruling Cockroaches.


The story begins with a cockroach who wanted to be emperor. He wanted to reign like none before. He wanted his subjects to bow down to his every whim, fancy and desire. To fulfill his nefarious scheme, he developed a system, the system which would inject into the minds of his subjects the information and commands he desired to be obeyed in order to subjugate and harness them. Nobody retaliated, nobody would – nobody could. The selective feed destroyed the sheer intelligence that once danced on the corridors of the swarm. Thought process of individuals died a slow death. The challengers faded and the concept of questioning the authority perished. A pseudo society came into being, the society embroidered with extravagant beauty but hollow as ice.

The king roach was flattered every day. The artificial gratifications gave him delusions, the delusions underneath which he laughed haughtily. Everybody bowed to him; such was his splendour and magnificence.

His subjects were not insects like him; they were humans, humans without eyes, ears and brains. They knew deep inside their hypnotized minds that they are being ruled by a roach but their slumbering conscious created such an impenetrable blockade that stupefied them to unfathomable limits. The roach, taking advantage of the ultimate stupor of his subjects, tyrannized them every day and in turn they endured the oppression. The roach would give vibes to his antennas every day in sheer arrogance, but in the deepest darkest core of his heart, the roach knew his secret. He knew his own reality – nothing more than a filthy slimy little cockroach.

There was a ballroom with twinkling chandeliers that illuminated the majestic grand hall where the roach threw parties to show his grandeur and splendour. Intoxicated subjects pair danced as the evening fell. Amidst the bright lights of ballroom, the faces with dead hearts smiled and laughed. The roach inebriated with the power overlooked these parties with the pride and conceitedness through a magic crystal ball, which was a gift to him from the god of hope and lust. This crystal ball also illuminated his portrait as an emperor, a commander – a supreme being. During the course of time, the subjects paid their homage by kissing his feet and in exchange he enhanced their miseries, but his diabolic plan was not challenged.

This charade continued till the time the lust of power and authority coupled with unconditional submissions gave him illusions – the illusion of being a god. His power was ever increasing, and to top it all one night amidst the sparkling lights of ballroom, amnesia struck him like a bolt of lightning and he forgot who he was. He even forgot the old saying, nothing lasts forever.

Seasons changed but time halted in the kingdom, as if it was gripped by the roach and his machinery forever. A status quo prevailed to the utter satisfaction and amusement of roach the emperor. Little did the roach knew that time is a nasty adversary, which always takes its toll at snail’s speed.  The roach was slowly dying his own death and on one wintery night when the gloomiest wind was blowing this realization dawned upon him. This was the darkest day befallen upon any roach; it was a day of unmatched catastrophe in history.

To add to the impending doom, a wizard suddenly materialized out of thin smoke and in his hand, he held a gold wand – a wand drenched with the blood of oppressed, held up tears, broken dreams, miseries, anxieties, anguish, agonies and shattered hopes of the masses and with one sharp blow of the wand the majestic wizard shattered all that roach possessed – the mystic crystal ball, the dreams of highness, and the god of hope. Everything that roach once possessed was destroyed in a blink of eye, broken up to million pieces. This was the time when the secret of roach which nobody knew but himself was no more a secret. This was the time when roach finally saw his true self. In each glittering piece of his shattered dream, nothing but a filthy cockroach he was, he is.

True is the saying that history repeats itself. Now, at present, this is the story of Pakistan, where the roach has been struck and Pakistan has to rebuild itself; it would not be constructed by those who are destroying it but by those who still have even slightest degree of enlightening conscience left.

The nation is in jeopardy since the time Imran Khan and Tahir ul Qadri have brought their demands along with thousands of people in Islamabad. The tugs of war have started in every square of Pakistan if not at ‘Azadi Square’ or ‘Revolution Avenue’.  The profane political culture of Pakistan has shown no mercy to anyone. The culture has not left even Quaid e Azam alone. It’s the time when we have to decide. The decision of fighting the roach that if not rules our country; it rules our minds, because in real life, the wizard doesn’t come to rescue.