Uncommon Tips for R Programming


I have been working in R for a almost a period of Three Years. I learned it myself from the tutorials, documentation and online courses. Recently, I came across a two interesting problems, which I feel are worth sharing to the people. There could have been many ways to work around those problems but the best optimal solution is the best solution.

Conversion of Attribute

SO I was working at a data which had a time attribute in seconds but the values were much distorted. By Distorted, I mean they were not in the same format. Some of the values were like 0000015 and some were like 24. Both represented times in second.

 head(time$TimeSpent)

[1] 00000115.123 0000053.159  0000068.468  15.989       50.604       0000029.135

It was important to normalize them. The variable was read as factor, so the first thing that comes to mind is to change it to numeric. I tried but it didn’t work as the R had populated the garbage values.

Let’s check that too.

 time$TSnumeric <- as.numeric(time$TimeSpent)

head(time$TSnumeric)

[1]   65  689  818 1147 1509  411

Does the head looks anywhere near the original values?. Now there could have been many ways to work around that but the solution would have been much larger. One would be to write a loop which would remove all the 0s before any natural number other than 0 come up.

However there was a much simpler solution which worked like a charm. It was to change the variable to character and then to Numeric.

time$TSnumeric &lt;- as.numeric(as.character(time$TimeSpent))

head(time$TSnumeric)

[1] 115.123  53.159  68.468  15.989  50.604  29.135

You see, we got rid of a problem and the whole variable is normalized.

Customizing function of a package

I think it is too common that sometimes we want to customize the function of a package to make it work in the way we want. I too recently was working on a package tuber which scrapped data from Youtube. I felt an urge to make changes to one of a function for something but as I was running it, It failed due to odd reasons.

The first thing, let’s learn how to build your customized solution.

Simply write down the function of a package you want to change. For my case, I want to make changes to get_comments from tuber package.

library(tuber)

get_comments

function (part = "snippet", video_id = NULL, text_format = "html",

simplify = TRUE, max_results = 100, page_token = NULL, ...)

{

if (is.null(video_id))

stop("Must specify a video ID")

if (max_results &lt; 20 | max_results &gt; 100)

stop("max_results only takes a value between 20 and 100")

if (text_format != "html" &amp; text_format != "plainText")

stop("Provide a legitimate value of textFormat.")

querylist &lt;- list(part = part, videoId = video_id, maxResults = max_results,

textFormat = text_format)

res &lt;- tuber_GET("commentThreads", querylist, ...)

if (simplify == TRUE &amp; part == "snippet") {

simple_res &lt;- lapply(res$items, function(x) x$snippet$topLevelComment$snippet)

simpler_res &lt;- as.data.frame(do.call(rbind, simple_res))

return(simpler_res)

}

res

}

<environment: namespace:tuber>

Simply copy paste it into another function and give it a non conflicting name such as get_comments2

 get_comment2 = function (part = "snippet", video_id = NULL, text_format = "html",

simplify = TRUE, max_results = 200, page_token = NULL, ...)

{

if (is.null(video_id))

stop("Must specify a video ID")

if (max_results &lt; 20 | max_results &gt; 300)

stop("max_results only takes a value between 20 and 100")

if (text_format != "html" &amp; text_format != "plainText")

stop("Provide a legitimate value of textFormat.")

querylist &lt;- list(part = part, videoId = video_id, maxResults = max_results,

textFormat = text_format)

res &lt;- tuber_GET("commentThreads", querylist, ...)

if (simplify == TRUE &amp; part == "snippet") {

simple_res &lt;- lapply(res$items, function(x) x$snippet$topLevelComment$snippet)

simpler_res &lt;- as.data.frame(do.call(rbind, simple_res))

return(simpler_res)

}

res

}

Now when I tried to run it, it gave an error of not finding a function which is called within this function.

could not find function “tuber_GET”

It is because this new changed function is not added to the environment of the package.

Simply do this to add the function to the environment of the package.

environment(function_name) &lt;- asNamespace("package_name")

#In the above case, it would be

environment(get_comments2)&lt;-asNamespace("tuber")

Using Built in Themes for Ggplot2

So one of the most common problem we face is adding manual theme to graphics we build in ggplot2. We have to add many lines of code or sometimes we just leave it as it is. I have recently come across a package called ggthemes which is very helpful in this regard. You have about 10 themes to choose from including the themes of WSJ and Economist. You simply add them to syntax of your plot and you are good to go.

Let’s take example of mtcars data.

p2 &lt;- ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(gear))) +

geom_point() +

ggtitle("Cars")

p2

car_original

This is original plot, now let’s add economist theme to it.

p2 + theme_economist()

car_eco

Let’s also use the colors by Economist for legends.

p2 + theme_economist() + scale_color_economist()

car_eco_scale

Let’s use theme used by WSJ.

p2 + theme_wsj() + scale_color_wsj()

car_wsj

So you see how easy it is.

Thank you for reading out.

Usman is an aspiring data scientist who works to automate Oil Industry. He tweets @rana_usman and can be reached out at usmanashrafrana@gmail.com

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s