R Programming Language (Analysis Software for Statistics & Data Science)
R is a programming language and software that is becoming increasingly popular in the disciplines of statistics and data science.
The R software is completely free and gets developed collaboratively by its community (open source software) – every R user can publish new add-on packages.
The open source ideology of R programming reflects a huge contrast compared to most traditional programming environments (e.g. SAS, SPSS, Stata etc.), where the software development is in the hands of a payed development team.
All R Tutorials on statistical-programming.com
In the following, you can find a list of R tutorials on statistical-programming.com. In the tutorials, I’m explaining statistical concepts and provide reproducible example codes in R.
The Increasing Popularity of R Programming
Since the R programming language provides features for almost all statistical tasks without any costs for the user, R is rapidly growing since its release. Let’s check some numbers…
Graphic 1: Google Scholar Search Results for R Programming Filtered by Year
Reasons to Learn R
+ R is free
+ R’s popularity is growing – More and more people will use it
+ Almost all statistical methods are available in R
+ New methods are implemented in add-on packages quickly
+ Algorithms for packages and functions are publicly available (transparency and reproducibility)
+ R provides a huge variety of graphical outputs
+ R is very flexible – Essentially everything can be modified for your personal needs
+ R is compatible with all operating systems (e.g. Windows, MAC, or Linux)
+ R has a huge community that is organized in forums to help each other (e.g. stackoverflow)
+ R is fun 🙂
– Relatively high learning burden at the beginning (even though it’s worth it)
– No systematic validation of new packages and functions
– No company in the background that takes responsibility for errors in the code (this is especially important for public institutes)
– R is almost exclusively based on programming (no extensive drop-down menus such as in SPSS)
– R can have problems with computationally intensive tasks (only important for advanced users)
You are not sure yet, whether you should learn the R programming language? In that case, I can recommend the following video of the YouTube channel RenegadeThinking. The speaker provides you with many reasons, why it is advisable to learn R.
Appendix 1: R code for the creation of Graphic 1
year <- 2018:2000 # Years r_gs <- c(21600 * 2, 43300, 43100, 38100, 33200, 29800, # Google Scholar searches 28500, 25500, 22400, 19100, 15900, 12000, 8270, 5930, 3740, 2600, 1980, 1600, 1360) data <- data.frame(software = rep("R", 19), # Combine data year = year, searches = r_gs) ggplot(data) + # Create plot geom_point(aes(x = year, y = searches, color = software, shape = software)) + geom_line(aes(x = year, y = searches, color = software)) + theme(legend.title = element_blank(), legend.position = "none") + ggtitle("Google Scholar Search Results") + labs(x = "Year", y = "Search Results") + scale_y_continuous(labels = comma)
Appendix 2: How to create the header graphic of this page
par(mar = c(0, 0, 0, 0)) # Remove space around plot par(bg = "#1b98e0") # Set background color set.seed(10293847) # Seed N <- 100000 # Sample size x <- rnorm(N) # X variable y <- rnorm(N) + x # Correlated Y variable plot(x, y, col = "#353436", pch = 19, cex = 0.1 # Create plot , xlim = c(- 4, 4), ylim = c(- 7, 7)) text(0, 0, "R", col = "#1b98e0", cex = 12) # Write R points(0, 0, col = "#1b98e0", cex = 30, lwd = 5) # Create circles points(0, 0, col = "#1b98e0", cex = 50, lwd = 5) points(0, 0, col = "#1b98e0", cex = 70, lwd = 5) points(0, 0, col = "#1b98e0", cex = 90, lwd = 5) points(0, 0, col = "#1b98e0", cex = 110, lwd = 5) box(col="#1b98e0") # Color of box