# Complete Cases in R (3 Programming Examples)

A **complete data set** (i.e. data without any missing values) is essential for many types of data analysis in the programming language R.

In order to deal with missing data, it is crucial to find missing values and to identify observations in your data without any missings.

I’ll show you in this article how to **handle missing values in R with the complete.cases function**. Select the specific topic you are interested in:

## Example 1: Find Complete Rows of a Data Frame

The complete.cases function is often used to identify complete rows of a data frame.

Consider the following example data:

data <- data.frame(x1 = c(7, 2, 1, NA, 9), # Some example data x2 = c(1, 3, 1, 9, NA), x3 = c(NA, 8, 8, NA, 5)) data # This is how our example data looks like |

data <- data.frame(x1 = c(7, 2, 1, NA, 9), # Some example data x2 = c(1, 3, 1, 9, NA), x3 = c(NA, 8, 8, NA, 5)) data # This is how our example data looks like

**Table 1: Incomplete Example Data**

We can use complete.cases() to print a logical vector that indicates complete and missing rows (i.e. rows without NA).

Rows 2 and 3 are complete; Rows 1, 4, and 5 have one or more missing values.

complete.cases(data) # [1] FALSE TRUE TRUE FALSE FALSE |

complete.cases(data) # [1] FALSE TRUE TRUE FALSE FALSE

We can also create a **complete subset** of our example data by using the complete.cases function.

This process is sometimes called listwise deletion:

data[complete.cases(data), ] # Keep only the complete rows data_complete <- data[complete.cases(data), ] # Store the complete cases subset in a new data frame |

data[complete.cases(data), ] # Keep only the complete rows data_complete <- data[complete.cases(data), ] # Store the complete cases subset in a new data frame

Note that such a complete case data set might consist of a much **smaller sample size** compared to our original incomplete data.

In our example, data_complete consists of only 2 rows. Rows 1, 4, and 5 were deleted.

For that reason, it might be worth to conduct some more sophisticated missing data techniques such as a missing value imputation or a simple replace of missing data by zero or a variable’s mean.

Did you have problems to understand the previous code? No problem! I have recorded a video, in which I’m explaining the previous example in more detail:

## Example 2: Apply the complete.cases Function to a Vector

The complete cases function can also be applied to vectors or columns (even though the is.na function is more popular for this task).

Similar to Example 1, the function returns a logical vector (TRUE = observed; FALSE = missing value).

set.seed(10101) # Set seed in order to create a reproducible example vec <- round(runif(20, 0, 10)) # Create example vector vec[rbinom(20, 1, 0.2) == 1] <- NA # Insert some NA values to the vector complete.cases(vec) # The R programming language uses for vectors the same procedure as for data frames # [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE # [2] TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE # Delete missing values and store the complete vector in the new object vec_completevec_complete <- vec[complete.cases(vec)] |

set.seed(10101) # Set seed in order to create a reproducible example vec <- round(runif(20, 0, 10)) # Create example vector vec[rbinom(20, 1, 0.2) == 1] <- NA # Insert some NA values to the vector complete.cases(vec) # The R programming language uses for vectors the same procedure as for data frames # [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE # [2] TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE # Delete missing values and store the complete vector in the new object vec_completevec_complete <- vec[complete.cases(vec)]

## Example 3: How to Use the Complete Cases Function for Real Data

The 2 examples above illustrate the usage of the complete cases function on the basis of synthetic data.

However, the real world is different and therefore I’m going to show you now how to find observed and missing values in a real database.

# Load and inspect data data(airquality) # Load the data set airquality dim(airquality) # The data has 153 rows and 6 columns head(airquality) # Head of data; Missing values are, for instance, in column 1 & 2 in row 5 # Check the whole data frame for missing values complete.cases(airquality) # TRUE indicates a complete row; FALSE indicates a row with at least # one incomplete column sum(complete.cases(airquality)) # We have 111 complete rows airquality_complete <- airquality[complete.cases(airquality), ] # Create new data without missing values # Find incomplete cases in a column complete.cases(airquality$Ozone) # By adding $Ozone behind airquality, # we identify observed values in the column Ozone sum(complete.cases(airquality$Ozone)) # We have 116 observations Ozone_complete <- airquality$Ozone[complete.cases(airquality$Ozone)] # Exclude missing data # from our column vector |

# Load and inspect data data(airquality) # Load the data set airquality dim(airquality) # The data has 153 rows and 6 columns head(airquality) # Head of data; Missing values are, for instance, in column 1 & 2 in row 5 # Check the whole data frame for missing values complete.cases(airquality) # TRUE indicates a complete row; FALSE indicates a row with at least # one incomplete column sum(complete.cases(airquality)) # We have 111 complete rows airquality_complete <- airquality[complete.cases(airquality), ] # Create new data without missing values # Find incomplete cases in a column complete.cases(airquality$Ozone) # By adding $Ozone behind airquality, # we identify observed values in the column Ozone sum(complete.cases(airquality$Ozone)) # We have 116 observations Ozone_complete <- airquality$Ozone[complete.cases(airquality$Ozone)] # Exclude missing data # from our column vector

## Example Video: Find Complete Cases in R

You need more examples? So be it!

In the following YouTube video, the speaker *Dragonfly Statistics* explains how to check a real data set for complete cases (he also uses the airquality data set which I used in Example 3).

He shows several examples in the R programming language.

## Now it’s On You

I showed you how I’m applying the complete cases function in RStudio.

What are your thoughts? Will you identify your complete data like me or do you know a better approach?

Did you have any problems with the complete cases function that I didn’t cover in this article?

I’d love to hear about your experiences in the comments!

## Further Reading

- The na.omit Function in R
- The is.na Function in R
- NA Values in R
- Handling of Missing Data
- How to Find Missing Values in R
- Listwise Deletion
- The R Programming Language

## Appendix

The graphic of the header of this site shows a data frame with missing and observed values (indicated by TRUE and FALSE).

The graphic was created with the R programming language as follows.

set.seed(34756) # Set seed data_header <- data.frame(Household_ID = runif(100), # Some dummy data Sex = runif(100), Age = runif(100), Nationality = runif(100), Year = runif(100), Health = runif(100), CoB = runif(100), Income = runif(100), Household_Size = runif(100), Holidays = runif(100), Marital_Status = runif(100), Expenditure = runif(100)) data_header$Sex[rbinom(100, 1, 0.1) == 1] <- NA # Insert NA's data_header$Age[rbinom(100, 1, 0.15) == 1] <- NA data_header$Nationality[rbinom(100, 1, 0.1) == 1] <- NA data_header$Year[1:7] <- NA data_header$Health[rbinom(100, 1, 0.25) == 1] <- NA data_header$Income[rbinom(100, 1, 0.4) == 1] <- NA data_header$Marital_Status[rbinom(100, 1, 0.05) == 1] <- NA data_header$Expenditure[rbinom(100, 1, 0.25) == 1] <- NA data_logical <- as.data.frame(is.na(data_header) == FALSE) # Check for missing data data_logical |

set.seed(34756) # Set seed data_header <- data.frame(Household_ID = runif(100), # Some dummy data Sex = runif(100), Age = runif(100), Nationality = runif(100), Year = runif(100), Health = runif(100), CoB = runif(100), Income = runif(100), Household_Size = runif(100), Holidays = runif(100), Marital_Status = runif(100), Expenditure = runif(100)) data_header$Sex[rbinom(100, 1, 0.1) == 1] <- NA # Insert NA's data_header$Age[rbinom(100, 1, 0.15) == 1] <- NA data_header$Nationality[rbinom(100, 1, 0.1) == 1] <- NA data_header$Year[1:7] <- NA data_header$Health[rbinom(100, 1, 0.25) == 1] <- NA data_header$Income[rbinom(100, 1, 0.4) == 1] <- NA data_header$Marital_Status[rbinom(100, 1, 0.05) == 1] <- NA data_header$Expenditure[rbinom(100, 1, 0.25) == 1] <- NA data_logical <- as.data.frame(is.na(data_header) == FALSE) # Check for missing data data_logical

### Subscribe to my free statistics newsletter:

### R Tutorials

abs Function in R

all & any R Functions

Set Aspect Ratio of Plot

attach & detach R Functions

attr, attributes & structure in R

cbind R Command

Change ggplot2 Legend Title

Character to Numeric in R

Check if Object is Defined

col & row sums, means & medians

Complete Cases in R

Concatenate Vector of Strings

Convert Date to Weekday

cumsum R Function

Data Frame Column to Numeric

diff Command in R

difftime R Function

dim Function in R

dir R Function

Disable Scientific Notation

Draw Segments in R

droplevels R Example

Evaluate an Expression

Extract Characters from String

Factor to Numeric in R

Format Decimal Places

get, get0 & mget in R

is.na R Function

is.null Function in R

jitter R Function

Join Data with dplyr Package

length Function in R

lowess R Smoothing Function

max and min Functions in R

NA Omit in R

nchar R Function

ncol Function in R

nrow Function in R

outer Function in R

pairs & ggpairs Plot

parse, deparse & R expression

paste & paste0 Functions in R

pmax and pmin R Functions

polygon Plots in R

pretty R Function

R Find Missing Values

R Functions List (+ Examples)

R NA – Values

R Replace NA with 0

rbind & rbind.fill in R

Read Excel Files in R

readLines, n.readLines & readline

Remove Element from List

Remove Legend in ggplot2

Rename Column Name in R

Replace Last Comma of String

rev R Command

Round Numeric Data in R

Save & Load RData Workspace

scan R Function

setdiff R Function

setNames vs. setnames in R

sink Command in R

Sort, Order & Rank Data in R

sprintf Function in R

Square Root in R

str_c Function of stringr Package

str_sub Function of stringr Package

strptime & strftime Functions

substr & substring R Commands

sweep R Function

Transform Data Frames

union Function in R

unlist in R

weekdays, months, quarters & julian in R

with & within R Functions

Write Excel File in R

## 11 Comments. Leave new

Very comprehensive treatment Indeed. Thanks alot

Thank you Shrinivas, nice to hear.

Hello Joachim

After using na.omit, I am still getting the following result

> dm1 table(dm1)

dm1

no yes

330 60 13

Why this NA reappears again?

Hi Shrinivas,

if you apply the following code, your NAs should be removed:

dm1_updated <- dm1[complete.cases(dm1), ]

Hello Jo

I am losing my confidence. Am I so dumb ? See I tried earlier what you told me and got stuck as follows

dm1 dm dm1 length(dm1)

[1] 403

> table(dm1)

dm1

no yes

330 60 13

> sum(complete.cases(dm1))

[1] 403

> dm1 table(dm1)

dm1

no yes

330 60 13

Not getting it at all. See as under:

dm1 <- dm[complete.cases(dm), ]

Error in `[.default`(dm, complete.cases(dm), ) :

incorrect number of dimensions

Is dm1 a vector or a data.frame? You can check that with class(dm1). If it is a vector, you can try: dm1_updated <- dm1[complete.cases(dm1)] (without the comma at the end)

This is what I got

> class(dm1)

[1] “factor”

> dm1_updated table(dm1_updated)

dm1_updated

no yes

330 60 13

Your data seems to be a one-dimensional vector and not a two-dimensional table/data.frame. Furthermore, it seems like your missing values are stored as “” instead of NA. Try the following:

dm1_updated <- dm1; dm1_updated[dm1_updated == ""] <- NA; dm1_updated <- dm1[complete.cases(dm1)]; sum(is.na(dm1_updated)); # The result should be 0

Jo

Sorry forgot to mention. dm is a column vector in a data frame. Any change in your advise?

You can either remove all rows of your data frame in which dm1 contains NAs: your_data_updated < - your_data[!is.na(your_data$dm1), ] Or you can impute the missing values of dm1. Find more instructions on missing data imputation here: https://statistical-programming.com/missing-data-imputation-statistics/