# R NA – What are <Not Available> Values?

Your data contains NA, <NA>, or NaN values? That’s not the end of the world — but your alarm bells should start ringing!

In R (or R Studio), NA stands for **Not Available**. Each cell of your data that displays NA is a missing value.

Not available values are sometimes enclosed by < and >, i.e. **<NA>**. That happens when the vector or column that contains the NA is a factor.

In R, NA needs to be distinguished from NaN. NaN stands for **Not a Number** and represents an undefined or unrepresentable value. It appears, for instance, when you try to divide by zero.

Consider the following **example in R**:

# Create some example variables x1 <- c(7, 9, NA, 2, 5) x2 <- as.factor(c(NA, 2, NA, 1, 1)) x3 <- c(4, NaN, 0, 9, 8) x4 <- c(6, 1, 5, 5, 7) # Create data.frame data <- data.frame(x1, x2, x3, x4) |

# Create some example variables x1 <- c(7, 9, NA, 2, 5) x2 <- as.factor(c(NA, 2, NA, 1, 1)) x3 <- c(4, NaN, 0, 9, 8) x4 <- c(6, 1, 5, 5, 7) # Create data.frame data <- data.frame(x1, x2, x3, x4)

**Table 1: R Example Data with NA, <NA> & NaN**

The column X1 of our R example data has one missing value in the third row. The missing value is displayed with NA, since **the column is numeric**.

Column X2 has two missing values in the first and third row. The missings are represented by <NA>, since **the second column is a factor**.

The third column X3 is of class numeric (the same as X1). The second entry of the column is **not a number** and is therefore displayed by the code NaN.

The fourth column X4 is **complete** and does therefore not contain any NAs or NaNs.

## Important Functions for Dealing with NAs

In the following, I’ll show you some of the most important approaches and functions of the R programming language for the **handling of missing data**. I’ll use our exemplifying data table that we created above.

### na.omit

The na.omit function is used to exclude rows of a data set with one or more missing values. Read more…

na.omit(data) # x1 x2 x3 x4 # 2 1 9 5 # 5 1 8 7 |

na.omit(data) # x1 x2 x3 x4 # 2 1 9 5 # 5 1 8 7

na.omit can also be used to delete NAs in a vector…

na.omit(data$x1) # [1] 7 9 2 5 |

na.omit(data$x1) # [1] 7 9 2 5

…or in a list.

# Create some data frames and matrices data_1 <- data[ , 1:2] data_2 <- data[1:3, 3:4] data_3 <- matrix(ncol = 2, c(0, NA, - 4, 3, 2, 1)) # Store data frames and matrix in list data_list <- list(data_1, data_2, data_3) # Create empty list data_list_na.omit <- list() # For loop for removal of rows with NAs in whole list for(i in 1:length(data_list)) { data_list_na.omit[[i]] <- na.omit(data_list[[i]]) } |

# Create some data frames and matrices data_1 <- data[ , 1:2] data_2 <- data[1:3, 3:4] data_3 <- matrix(ncol = 2, c(0, NA, - 4, 3, 2, 1)) # Store data frames and matrix in list data_list <- list(data_1, data_2, data_3) # Create empty list data_list_na.omit <- list() # For loop for removal of rows with NAs in whole list for(i in 1:length(data_list)) { data_list_na.omit[[i]] <- na.omit(data_list[[i]]) }

Note: With such a for loop, all functions can be applied to a list (not only na.omit).

### na.rm

na.rm is used to remove NAs of your data matrix within a function by setting na.rm = TRUE. For instance, na.rm can be used in combination with the functions mean…

mean(data$x1, na.rm = TRUE) # [1] 5.75 |

mean(data$x1, na.rm = TRUE) # [1] 5.75

…and max.

max(data$x1, na.rm = TRUE) # [1] 9 |

max(data$x1, na.rm = TRUE) # [1] 9

### use

Often confusing: The function cor uses the option use instead of na.rm.

cor(data$x1, data$x3, use = "complete.obs") # [1] -0.9011271 |

cor(data$x1, data$x3, use = "complete.obs") # [1] -0.9011271

### complete.cases

The complete.cases function creates a logical vector that indicates complete rows of our data matrix by TRUE. Read more…

complete.cases(data) # [1] FALSE FALSE FALSE TRUE TRUE |

complete.cases(data) # [1] FALSE FALSE FALSE TRUE TRUE

The function can also be used for casewise deletion (same as na.omit).

data[complete.cases(data), ] # x1 x2 x3 x4 # 2 1 9 5 # 5 1 8 7 |

data[complete.cases(data), ] # x1 x2 x3 x4 # 2 1 9 5 # 5 1 8 7

### is.na

is.na is also used to identify missing values via TRUE and FALSE (TRUE indicates NA). In contrast to the function complete.cases, is.na retains the dimension of our data matrix. Read more…

is.na(data) # x1 x2 x3 x4 # FALSE TRUE FALSE FALSE # FALSE FALSE TRUE FALSE # TRUE TRUE FALSE FALSE # FALSE FALSE FALSE FALSE # FALSE FALSE FALSE FALSE |

is.na(data) # x1 x2 x3 x4 # FALSE TRUE FALSE FALSE # FALSE FALSE TRUE FALSE # TRUE TRUE FALSE FALSE # FALSE FALSE FALSE FALSE # FALSE FALSE FALSE FALSE

### !is.na

!is.na (with a ! in front) does the opposite than is.na.

!is.na(data) # x1 x2 x3 x4 # TRUE FALSE TRUE TRUE # TRUE TRUE FALSE TRUE # FALSE FALSE TRUE TRUE # TRUE TRUE TRUE TRUE # TRUE TRUE TRUE TRUE |

!is.na(data) # x1 x2 x3 x4 # TRUE FALSE TRUE TRUE # TRUE TRUE FALSE TRUE # FALSE FALSE TRUE TRUE # TRUE TRUE TRUE TRUE # TRUE TRUE TRUE TRUE

### which

Combined with the function which, logical vectors can be used to find missing values. Read more…

which(is.na(data$x1)) # [1] 3 |

which(is.na(data$x1)) # [1] 3

### sum

Another benefit of logical vectors is the possibility to count the amount of missing values. The function sum can be used together with is.na to count NA values in R.

sum(is.na(data$x1)) # [1] 1 |

sum(is.na(data$x1)) # [1] 1

### summary

The summary function provides another way to count NA values in a data table, column, array, or vector.

summary(data) |

summary(data)

**Table 2: Summary Function in R Counts NAs in Each Column**

In the bottom cell of each column of Table 2, the amount of NAs is displayed.

### Merge Complete Data via rbind and na.omit

The functions rbind and na.omit can be combined in order to merge (i.e. row bind) only complete rows.

# Create 2 data sets; NA in data_merge_2 data_merge_1 <- data.frame(x1 = c(5, 9, 8), x2 = c(1, 2, 3)) data_merge_2 <- data.frame(x1 = c(2, NA, 8), x2 = c(6, 9, 3)) # Merge data sets and keep only complete rows data_merge <- na.omit(rbind(data_merge_1, data_merge_2)) data_merge # Display merged data |

# Create 2 data sets; NA in data_merge_2 data_merge_1 <- data.frame(x1 = c(5, 9, 8), x2 = c(1, 2, 3)) data_merge_2 <- data.frame(x1 = c(2, NA, 8), x2 = c(6, 9, 3)) # Merge data sets and keep only complete rows data_merge <- na.omit(rbind(data_merge_1, data_merge_2)) data_merge # Display merged data

### R Remove NA, NaN, and Inf

It is also possible to exclude all rows with NA, NaN, and/or Inf values.

# Create data with NA, NaN, and Inf data_inf <- data data_inf[5, 4] <- Inf # Remove NA, NaN, and Inf data_no_na_nan_inf <- data_inf[ complete.cases(data_inf) & apply(data_inf, 1, max) != "Inf", ] data_no_na_nan_inf # Display complete subset |

# Create data with NA, NaN, and Inf data_inf <- data data_inf[5, 4] <- Inf # Remove NA, NaN, and Inf data_no_na_nan_inf <- data_inf[ complete.cases(data_inf) & apply(data_inf, 1, max) != "Inf", ] data_no_na_nan_inf # Display complete subset

### Recode Values to NA

Sometimes existing values have to be recoded to NA. If you want to replace a certain value with NA, you can do it as follows.

data_NA <- data # Replicate data data_NA[data_NA == 1] <- NA # Recode the value 1 to NA |

data_NA <- data # Replicate data data_NA[data_NA == 1] <- NA # Recode the value 1 to NA

If you want to recode a specific cell of your data matrix to NA, you can do it as follows.

data_NA2 <- data # Replicate data data_NA2[1, 3] <- NA # Recode row 1, column 3 to NA |

data_NA2 <- data # Replicate data data_NA2[1, 3] <- NA # Recode row 1, column 3 to NA

### Replace NAs

Logical vectors can also be used to replace NA with other values, e.g. 0. Read more…

vect_example <- data$x1 vect_example[is.na(vect_example)] <- 0 vect_example # [1] 7 9 0 2 5 |

vect_example <- data$x1 vect_example[is.na(vect_example)] <- 0 vect_example # [1] 7 9 0 2 5

### Missing Value Imputation

Missing data imputation replaces missing values by new values. Data imputation has many advantages compared to the deletion of rows/columns with NAs. Read more…

In the following example, we use the predictive mean matching imputation method. However, there are many other imputation methods such as regression imputation or mean imputation available.

install.packages("mice") # Install mice package in R library("mice") # Load mice package imp <- mice(data, # Impute data m = 1, seed = 123) data_imp <- complete(imp) # Store imputed data set data_imp # Display imputed data |

install.packages("mice") # Install mice package in R library("mice") # Load mice package imp <- mice(data, # Impute data m = 1, seed = 123) data_imp <- complete(imp) # Store imputed data set data_imp # Display imputed data

## Video Example – How to Handle NA Values

Need more help with your NA values in R? Then you should definitely have a look at the following video of my Statistical Programming YouTube channel.

In this video, I’m explaining how to deal with incomplete data. I show easy-to-understand live examples and explain how to apply different functions such as is.na, na.omit, and na.rm.

## I Would Like to Hear From You

I’ve shown you my favourite ways to handle NA values in R.

Now, I would like to hear about **your experiences**.

Which of these methods is your favourite? Do you use any other methods that I missed above?

Let me know in the comments!

## Appendix

The header graphic of this page shows a correlation plot of two variables. Missing cases are illustrated via NA.

With the following code, the plot is created in R.

N <- 50000 # Sample size x <- rnorm(N) # X variable y <- rnorm(N) # Y variable par(bg = "#353436") # Set background color par(mar = c(0, 0, 0, 0)) # Remove space around plot plot(x, y, # Plot observed values col = "#1b98e0") points(x[1:15], y[1:15], # Plot missing values pch = 16, cex = 5, col = "#353436") text(x[1:15], y[1:15], # Write NA into each missing value "NA", col = "red") |

N <- 50000 # Sample size x <- rnorm(N) # X variable y <- rnorm(N) # Y variable par(bg = "#353436") # Set background color par(mar = c(0, 0, 0, 0)) # Remove space around plot plot(x, y, # Plot observed values col = "#1b98e0") points(x[1:15], y[1:15], # Plot missing values pch = 16, cex = 5, col = "#353436") text(x[1:15], y[1:15], # Write NA into each missing value "NA", col = "red")

### Subscribe to my free statistics newsletter:

### R Tutorials

abs Function in R

all & any R Functions

Set Aspect Ratio of Plot

attach & detach R Functions

attr, attributes & structure in R

cbind R Command

Change ggplot2 Legend Title

Character to Numeric in R

Check if Object is Defined

col & row sums, means & medians

Complete Cases in R

Concatenate Vector of Strings

Convert Date to Weekday

cumsum R Function

Data Frame Column to Numeric

diff Command in R

difftime R Function

dim Function in R

dir R Function

Disable Scientific Notation

Draw Segments in R

droplevels R Example

Evaluate an Expression

Extract Characters from String

Factor to Numeric in R

Format Decimal Places

get, get0 & mget in R

is.na R Function

is.null Function in R

jitter R Function

Join Data with dplyr Package

length Function in R

lowess R Smoothing Function

max and min Functions in R

NA Omit in R

nchar R Function

ncol Function in R

nrow Function in R

outer Function in R

pairs & ggpairs Plot

parse, deparse & R expression

paste & paste0 Functions in R

pmax and pmin R Functions

polygon Plots in R

pretty R Function

R Find Missing Values

R Functions List (+ Examples)

R NA – Values

R Replace NA with 0

rbind & rbind.fill in R

Read Excel Files in R

readLines, n.readLines & readline

Remove Element from List

Remove Legend in ggplot2

Rename Column Name in R

Replace Last Comma of String

rev R Command

Round Numeric Data in R

Save & Load RData Workspace

scan R Function

setdiff R Function

setNames vs. setnames in R

sink Command in R

Sort, Order & Rank Data in R

sprintf Function in R

Square Root in R

str_c Function of stringr Package

str_sub Function of stringr Package

strptime & strftime Functions

substr & substring R Commands

sweep R Function

Transform Data Frames

union Function in R

unlist in R

weekdays, months, quarters & julian in R

with & within R Functions

Write Excel File in R

## 2 Comments. Leave new

Hi Joachim,

My data has a specific column named “treatment” where the contents are 1) empty cells 2) drug 3) diet 4) unknown and 5) None.

I want to create a parallel column named “treatment_n” with drug replaced as 1 and all other content as 0.

Can you please help with this.

Thank you

Nara

Hey Nara,

that’s a great question. I have created an example, which simulates your problem. You can copy/paste the following code to your RStudio and run it yourself:

I hope that helps!

Regards,

Joachim