R is.na Function Example (remove, replace, count, if else, is not NA)

Well, I guess it goes without saying that NA values decrease the quality of our data.

Fortunately, the R programming language provides us with a function that helps us to deal with such missing data: the is.na function.

In the following article, I’m going to explain what the function does and how the function can be applied in practice.

Let’s dive in…

The is.na Function in R (Basics)

Before we can start, let’s create some example data in R (or R Studio).

```set.seed(951) # Set seed N <- 1000 # Sample size   x_num <- round(rnorm(N, 0, 5)) # Numeric variable x_fac <- as.factor(round(runif(N, 0, 3))) # Factor variable x_cha <- sample(letters, N, replace = TRUE) # Character variable   data <- data.frame(x_num, x_fac, x_cha) # Create data.frame```

Our data consists of three columns, each of them with a different class: numeric, factor, and character. This is how the first six lines of our data look like: Table 1: Example Data for the is.na R Function (First 6 Rows)

Let’s apply the is.na function to our whole data set:

```is.na(data) # x_num x_fac x_cha # [1,] FALSE FALSE FALSE # [2,] FALSE FALSE TRUE # [3,] FALSE FALSE FALSE # [4,] TRUE TRUE FALSE # [5,] TRUE TRUE FALSE # [6,] FALSE FALSE FALSE # ...```

The function produces a matrix, consisting of logical values (i.e. TRUE or FALSE), whereby TRUE indicates a missing value. Compare the output with the data table above — The TRUE values are at the same position as before the NA elements.

An important feature of is.na is that the function can be reversed by simply putting a ! (exclamation mark) in front. In this case, TRUE indicates a value that is not NA in R:

```!is.na(data) # x_num x_fac x_cha # [1,] TRUE TRUE TRUE # [2,] TRUE TRUE FALSE # [3,] TRUE TRUE TRUE # [4,] FALSE FALSE TRUE # [5,] FALSE FALSE TRUE # [6,] TRUE TRUE TRUE # ...```

Exactly the opposite output as before!

We are also able to check whether there is or is not an NA value in a column or vector:

```is.na(data\$x_num) # Works for numeric ... is.na(data\$x_fac) # ... factor ... is.na(data\$x_cha) # ... and character   !is.na(data\$x_num) # The explanation mark still works !is.na(data\$x_fac) !is.na(data\$x_cha)```

As you have seen, is.na provides us with logical values that show us whether a value is NA or not. We can apply the function to a whole database or to a column (no matter which class the vector has).

That’s nice, but the real power of is.na becomes visible in combination with other functions — And that’s exactly what I’m going to show you now.

On a side note:
R provides several other is.xxx functions that are very similar to is.na (e.g. is.nan, is.null, or is.finite). Stay tuned — All you learn here can be applied to many different programming scenarios!

is.na in Combination with Other R Functions

In the following, I have prepared examples for the most important R functions that can be combined with is.na.

Remove NAs of Vector or Column

In a vector or column, NA values can be removed as follows:

`is.na_remove <- data\$x_num[!is.na(data\$x_num)]`

Note: Our new vector is.na_remove is shorter in comparison to the original column data\$x_num, since we use a filter that deletes all missing values.

If you want to drop rows with missing values of a data frame (i.e. of multiple columns), the complete.cases function is preferable. Learn more…

Replace NAs with Other Values

Based on is.na, it is possible to replace NAs with other values such as zero

```is.na_replace_0 <- data\$x_num # Duplicate first column is.na_replace_0[is.na(is.na_replace_0)] <- 0 # Replace by 0```

…or the mean.

```is.na_replace_mean <- data\$x_num # Duplicate first column x_num_mean <- mean(is.na_replace_mean, na.rm = TRUE) # Calculate mean is.na_replace_mean[is.na(is.na_replace_mean)] <- x_num_mean # Replace by mean```

In case of characters or factors, it is also possible in R to set NA to blank:

```is.na_blank_cha <- data\$x_cha # Duplicate character column is.na_blank_cha[is.na(is.na_blank_cha)] <- "" # Class character to blank   is.na_blank_fac <- data\$x_fac # Duplicate factor column is.na_blank_fac <- as.character(is.na_blank_fac) # Convert temporarily to character is.na_blank_fac[is.na(is.na_blank_fac)] <- "" # Class character to blank is.na_blank_fac <- as.factor(is.na_blank_fac) # Recode back to factor```

Count NAs via sum & colSums

Combined with the R function sum, we can count the amount of NAs in our columns. According to our previous data generation, it should be approximately 20% in x_num, 30% in x_fac, and 5% in x_cha.

```sum(is.na(data\$x_num)) # 213 missings in the first column sum(is.na(data\$x_fac)) # 322 missings in the second column sum(is.na(data\$x_cha)) # 47 missings in the third column```

If we want to count NAs in multiple columns at the same time, we can use the function colSums:

```colSums(is.na(data)) # x_num x_fac x_cha # 213 322 47```

Detect if there are any NAs

We can also test, if there is at least 1 missing value in a column of our data. As we already know, it is TRUE that our columns have NAs.

```any(is.na(data\$x_num)) #  TRUE```

Locate NAs via which

In combination with the which function, is.na can be used to identify the positioning of NAs:

```which(is.na(data\$x_num)) #  4 5 14 17 22 23...```

Our first column has missing values at the positions 4, 5, 14, 17, 22, 23 and so forth.

if & ifelse

Missing values have to be considered in our programming routines, e.g. within the if statement or within for loops.

In the following example, I’m printing “Damn, it’s NA” to the R Studio console whenever a missing occurs; and “Wow, that’s awesome” in case of an observed value.

```for(i in 1:length(data\$x_num)) { if(is.na(data\$x_num[i])) { print("Damn, it's NA") } else { print("Wow, that's awesome") } } #  "Wow, that's awesome" #  "Wow, that's awesome" #  "Wow, that's awesome" #  "Damn, it's NA" #  "Damn, it's NA" #  "Wow, that's awesome" # ...```

Note: Within the if statement we use is na instead of equal to — the approach we would usually use in case of observed values (e.g. if(x[i] == 5)).

Even easier to apply: the ifelse function.

```ifelse(is.na(data\$x_num), "Damn, it's NA", "Wow, that's awesome") #  "Wow, that's awesome" "Wow, that's awesome" "Wow, that's awesome" "Damn, it's NA" #  "Damn, it's NA" "Wow, that's awesome" ...```

Video Examples for the Handling of NAs in R

You want to learn even more possibilities to deal with NAs in R? Then definitely check out the following video of my YouTuber channel.

In the video, I provide further examples for is.na. I also speak about other functions for the handling of missing data in R data frames.

Now it’s on You!

I’ve shown you the most important ways to use the is.na R function.

However, there are hundreds of different possibilities to apply is.na in a useful way.

Do you know any other helpful applications? Or do you have a question about the usage of is.na in a specific scenario?

Don’t hesitate to let me know in the comments!

Appendix

The header graphic of this page illustrates NA values in our data. The graphic can be produced with the following R code:

```N <- 2000 # Sample size   x <- runif(N) # Uniformly distributed variables y <- runif(N)   x_NA <- runif(50) # Random NAs y_NA <- runif(50)   par(bg = "#1b98e0") # Set background color par(mar = c(0, 0, 0, 0)) # Remove space around plot   pch_numb <- as.character( # Specify plotted numbers round(runif(N, 0, 9)))   plot(x, y, # Plot cex = 2, pch = pch_numb, col = "#353436")   text(x_NA, y_NA, cex = 2, # Add NA values to plot "NA", col = "red")   points(x[1:500], y[1:500], # Overlay NA values with numbers cex = 2, pch = pch_numb, col = "#353436")```