# R Find Missing Values (6 Examples for Data Frame, Column & Vector)

Let’s face it:

Missing values are an issue of almost every raw data set!

If we don’t handle our missing data in an appropriate way, our estimates are likely to be biased.

However, before we can deal with missingness, we need to identify in which rows and columns the missing values occur.

In the following, I will show you several examples how to find missing values in R.

**Example 1: One of the most common ways in R to find missing values in a vector**

expl_vec1 <- c(4, 8, 12, NA, 99, - 20, NA) # Create your own example vector with NA's is.na(expl_vec1) # The is.na() function returns a logical vector. The vector is TRUE in case # of a missing value and FALSE in case of an observed value which(is.na(expl_vec1)) # The which() function returns the positions with missing values in your vector. # In our case there are NA's at positions 4 & 7 ### [1] 4 7 |

expl_vec1 <- c(4, 8, 12, NA, 99, - 20, NA) # Create your own example vector with NA's is.na(expl_vec1) # The is.na() function returns a logical vector. The vector is TRUE in case # of a missing value and FALSE in case of an observed value which(is.na(expl_vec1)) # The which() function returns the positions with missing values in your vector. # In our case there are NA's at positions 4 & 7 ### [1] 4 7

**Example 2: Find missing values in a column of a data frame**

expl_data1 <- data.frame(x1 = c(NA, 7, 8, 9, 3), # Numeric variable with one missing value x2 = c(4, 1, NA, NA, 4), # Numeric variable with two missing values x3 = c(1, 4, 2, 9, 6), # Numeric variable without any missing values x4 = c("Hello", "I am not NA", NA, "I love R", NA)) # Factor variable with # two missing values expl_data1 # This is how our data with missing values looks like |

expl_data1 <- data.frame(x1 = c(NA, 7, 8, 9, 3), # Numeric variable with one missing value x2 = c(4, 1, NA, NA, 4), # Numeric variable with two missing values x3 = c(1, 4, 2, 9, 6), # Numeric variable without any missing values x4 = c("Hello", "I am not NA", NA, "I love R", NA)) # Factor variable with # two missing values expl_data1 # This is how our data with missing values looks like

**Table 1: Example Data Frame with Missing Values**

which(is.na(expl_data1$x1)) # Same procedure as in Example 1, but this time with the column of a data frame; # Missing value in x1 at position 1 which(is.na(expl_data1$x2)) # Variable x2 has missing values at positions 3 and 4 which(is.na(expl_data1$x3)) # The variable x3 in column 3 has no missing values which(is.na(expl_data1$x4)) # Our factor variable x4 in column 4 has missing values at positions 3 and 5; # The same procedure can be applied to factors |

which(is.na(expl_data1$x1)) # Same procedure as in Example 1, but this time with the column of a data frame; # Missing value in x1 at position 1 which(is.na(expl_data1$x2)) # Variable x2 has missing values at positions 3 and 4 which(is.na(expl_data1$x3)) # The variable x3 in column 3 has no missing values which(is.na(expl_data1$x4)) # Our factor variable x4 in column 4 has missing values at positions 3 and 5; # The same procedure can be applied to factors

**Example 3: Identify missing values in an R data frame**

# As in Example one, you can create a data frame with logical TRUE and FALSE values; # Indicating observed and missing values is.na(expl_data1) apply(is.na(expl_data1), 2, which) # In order to get the positions of each column in your data set, # you can use the apply() function |

# As in Example one, you can create a data frame with logical TRUE and FALSE values; # Indicating observed and missing values is.na(expl_data1) apply(is.na(expl_data1), 2, which) # In order to get the positions of each column in your data set, # you can use the apply() function

**Example 4: Detect missing values in a column of an R matrix**

# Create matrix on the basis of the first three columns of our example data of Example 2 expl_matrix1 <- as.matrix(expl_data1[ , 1:3]) expl_matrix1 which(is.na(expl_matrix1[ , 1])) # The $ operator is invalid for columns of matrices. # Therefore we have to select our matrix columns by squared brackets which(is.na(expl_matrix1[ , 2])) # Beside the change from the $ operator to squared brackets, # we can apply the same functions as in the other examples which(is.na(expl_matrix1[ , 3])) # Again, no missing values in x3 |

# Create matrix on the basis of the first three columns of our example data of Example 2 expl_matrix1 <- as.matrix(expl_data1[ , 1:3]) expl_matrix1 which(is.na(expl_matrix1[ , 1])) # The $ operator is invalid for columns of matrices. # Therefore we have to select our matrix columns by squared brackets which(is.na(expl_matrix1[ , 2])) # Beside the change from the $ operator to squared brackets, # we can apply the same functions as in the other examples which(is.na(expl_matrix1[ , 3])) # Again, no missing values in x3

**Example 5: Identify NA values in a matrix**

# We can check the missing values of the whole matrix with the same procedure as in Example 3 apply(is.na(expl_matrix1), 2, which) |

# We can check the missing values of the whole matrix with the same procedure as in Example 3 apply(is.na(expl_matrix1), 2, which)

**Example 6: Find missing values in R with the complete.cases() function**

# An alternative to the is.na() function is the function complete.cases(), # which searches for observed values instead of missing values which(complete.cases(expl_vec1)) # Identify observed values (opposite result as in Example 1) which(complete.cases(expl_vec1) == FALSE) # Reproduce result of Example 1 by adding == FALSE complete.cases(expl_data1) # If a data frame or matrix is checked by complete.case(), # the function returns a logical vector indicating whether a row is complete |

# An alternative to the is.na() function is the function complete.cases(), # which searches for observed values instead of missing values which(complete.cases(expl_vec1)) # Identify observed values (opposite result as in Example 1) which(complete.cases(expl_vec1) == FALSE) # Reproduce result of Example 1 by adding == FALSE complete.cases(expl_data1) # If a data frame or matrix is checked by complete.case(), # the function returns a logical vector indicating whether a row is complete

## Video Example – Detect Missing Values in a Real Data Set

The following video of my YouTube channel shows in a live example how to find NA, how to count NA, how to omit NA, and how to remove missing values.

Have a look at minute 1:05.

I’m showing here the same approach that I have explained in Example 1.

## R – Count Missing Values per Row and Column

Besides the positioning of your missing data, the question might arise how to count missing values per row, by column, or in a single vector. Let’s check how to do this based on our example data above:

# With the sum() and the is.na() functions you can find the number of missing values in your data sum(is.na(expl_vec1)) # Two missings in our vector sum(is.na(expl_data1)) # The same method works for the whole data frame; Five missings overall sum(is.na(expl_matrix1)) # The procedure works also for matrices; The NA count is three in our case |

# With the sum() and the is.na() functions you can find the number of missing values in your data sum(is.na(expl_vec1)) # Two missings in our vector sum(is.na(expl_data1)) # The same method works for the whole data frame; Five missings overall sum(is.na(expl_matrix1)) # The procedure works also for matrices; The NA count is three in our case

## How to Handle Missing Data in R?

Once we found missing values in our data, the question appears how we should treat these not available values. Complete case data is needed for most data analyses in R!

The default method in the R programming language is listwise deletion, which deletes all rows with missing values in one or more columns.

Basic data manipulations can be done with the na.omit command or with the is.na R function.

A more sophisticated approach – which is usually preferable to a complete case analysis – is the imputation of missing values.

Very simple imputation approaches would be mean imputation (mode imputation in case of categorical variables) or the replacement of NA’s with 0.

However, in order to create a more reasonable complete data set, missing data imputation usually replaces missing values with estimates that are based on statistical models (e.g. via regression imputation or predictive mean matching).

## Now It’s Your Turn

So that is how I’m checking for missing values in my data sets.

Now I’d like to hear about your thoughts: What’s your favorite approach?

Are you going to use the is.na function of Example 1? Or will you find NA’s by searching for complete cases?

Let me know by leaving a comment below. I will respond to every question!

## Appendix

**How to create the graphic of the header of this page**

The header graphic shows a simple dotplot created with the R package ggplot2.

The dark blue values indicate observed values; The light blue values indicate missingness.

Since the missing values appear more often in the upper right part of the plot, they can not be considered as Missing Completely At Random anymore.

set.seed(8765) # Reproducability var1 <- rnorm(2000, 10, 3) # Normal distribution var2 <- var1 + rnorm(2000) # Correlated normal distribution range01 <- function(x){(x - min(x)) / (max(x) - min(x))} # Supress probabilities of missingness between 0 and 1 var2_miss <- rbinom(2000, 1, range01(var1^3)) == 1 # Insert missing values for var2 in dependance of var1 data_ggplot_missings <- data.frame(var1, var2) # Store var1 and var2 in a data frame colours <- rep(1, 2000) # Set colours colours[var2_miss] <- 2 ggplot_missings <- ggplot(data_ggplot_missings, aes(x = var1, y = var2)) + # Create ggplot geom_point(aes(col = colours, size = 1.1)) + theme(legend.position = "none") |

set.seed(8765) # Reproducability var1 <- rnorm(2000, 10, 3) # Normal distribution var2 <- var1 + rnorm(2000) # Correlated normal distribution range01 <- function(x){(x - min(x)) / (max(x) - min(x))} # Supress probabilities of missingness between 0 and 1 var2_miss <- rbinom(2000, 1, range01(var1^3)) == 1 # Insert missing values for var2 in dependance of var1 data_ggplot_missings <- data.frame(var1, var2) # Store var1 and var2 in a data frame colours <- rep(1, 2000) # Set colours colours[var2_miss] <- 2 ggplot_missings <- ggplot(data_ggplot_missings, aes(x = var1, y = var2)) + # Create ggplot geom_point(aes(col = colours, size = 1.1)) + theme(legend.position = "none")

### Subscribe to my free statistics newsletter:

### R Tutorials

abs Function in R

all & any R Functions

Set Aspect Ratio of Plot

attach & detach R Functions

attr, attributes & structure in R

cbind R Command

Change ggplot2 Legend Title

Character to Numeric in R

Check if Object is Defined

col & row sums, means & medians

Complete Cases in R

Concatenate Vector of Strings

Convert Date to Weekday

cumsum R Function

Data Frame Column to Numeric

diff Command in R

difftime R Function

dim Function in R

dir R Function

Disable Scientific Notation

Draw Segments in R

droplevels R Example

Evaluate an Expression

Extract Characters from String

Factor to Numeric in R

Format Decimal Places

get, get0 & mget in R

is.na R Function

is.null Function in R

jitter R Function

Join Data with dplyr Package

length Function in R

lowess R Smoothing Function

max and min Functions in R

NA Omit in R

nchar R Function

ncol Function in R

nrow Function in R

outer Function in R

pairs & ggpairs Plot

parse, deparse & R expression

paste & paste0 Functions in R

pmax and pmin R Functions

polygon Plots in R

pretty R Function

R Find Missing Values

R Functions List (+ Examples)

R NA – Values

R Replace NA with 0

rbind & rbind.fill in R

Read Excel Files in R

readLines, n.readLines & readline

Remove Element from List

Remove Legend in ggplot2

Rename Column Name in R

Replace Last Comma of String

rev R Command

Round Numeric Data in R

Save & Load RData Workspace

scan R Function

setdiff R Function

setNames vs. setnames in R

sink Command in R

Sort, Order & Rank Data in R

sprintf Function in R

Square Root in R

str_c Function of stringr Package

str_sub Function of stringr Package

strptime & strftime Functions

substr & substring R Commands

sweep R Function

Transform Data Frames

union Function in R

unlist in R

weekdays, months, quarters & julian in R

with & within R Functions

Write Excel File in R