8 R Code Style

Adapted by UCD-SeRG team from original by Kunal Mishra, Jade Benjamin-Chung, and Stephanie Djajadi

Follow these code style guidelines for all R code:

8.1 General Principles

Follow tidyverse style guide: https://style.tidyverse.org
Use native pipe: |> not %>% (available in R >= 4.1.0)
Naming: Use snake_case for functions and variables; acronyms may be uppercase (e.g., prep_IDs_data)
Write tidy code: Keep code clean, readable, and well-organized
Avoid redundant logical comparisons: Use logical variables directly in conditional statements (e.g., if (x) instead of if (x == TRUE) or if (x == 1))
Use pipes to emphasize primary inputs: When writing functions and code, use the pipe operator to clearly show transformations on a primary object. The primary input should flow as the first argument to each function in the chain. Design functions so the most important argument (usually data) comes first, enabling natural pipeline composition. See the tidyverse design principles for more details.

8.2 Function Structure and Documentation

Every function should follow this pattern:

#' Short Title (One Line)
#'
#' Longer description providing details about what the function does,
#' when to use it, and important considerations.
#'
#' @param param1 Description of first parameter, including type and constraints
#' @param param2 Description of second parameter
#'
#' @returns Description of return value, including type and structure
#'
#' @examples
#' # Example usage
#' result <- my_function(param1 = "value", param2 = 10)
#'
#' @export
my_function <- function(param1, param2) {
  # Implementation
}

See also Section 6.12 for general code documentation practices.

8.3 Comments

Use comments to explain why, not what:

# Good: Explains reasoning
# Use log scale because distribution is highly skewed
ggplot(data, aes(x = log10(income))) + geom_histogram()

# Bad: States the obvious
# Create a histogram
ggplot(data, aes(x = income)) + geom_histogram()

File headers (for scripts in data-raw/ or inst/analyses/):

################################################################################
# @Organization - Example Organization
# @Project - Example Project
# @Description - This file is responsible for [...]
################################################################################

File Structure - Just as your data “flows” through your project, data should flow naturally through a script. Very generally, you want to

source your config =>
load all your data =>
do all your analysis/computation => save your data.

Each of these sections should be “chunked together” using comments. See this file for a good example of how to cleanly organize a file in a way that follows this “flow” and functionally separate pieces of code that are doing different things.

Note

If your computer isn’t able to handle this workflow due to RAM or requirements, modifying the ordering of your code to accommodate it won’t be ultimately helpful and your code will be fragile, not to mention less readable and messy. You need to look into high-performance computing (HPC) resources in this case.

Single-Line Comments - Commenting your code is an important part of reproducibility and helps document your code for the future. When things change or break, you’ll be thankful for comments. There’s no need to comment excessively or unnecessarily, but a comment describing what a large or complex chunk of code does is always helpful. See this file for an example of how to comment your code and notice that comments are always in the form of:

# This is a comment -- first letter is capitalized and spaced away from the pound sign

Multi-Line Comments - Occasionally, multi-line comments are necessary. You should manually insert line breaks to “hard-wrap” code and comments, whenever lines become longer than 80 characters. lintr should object otherwise, even for comments. Try to break lines at semantic boundaries: ends of sentences or phrases. Long lines in source code files make it more difficult to see and comment on diffs in pull requests.

In prose text chunks, Quarto ignores single line breaks, so you should also line-break your prose text in .qmd files to keep them under 80 characters.

You can configure RStudio’s settings to display the 80-character margin.

8.4 Line Breaks and Formatting

Blank Lines Before Lists

Always include a blank line before starting a bullet list or numbered list in markdown/Quarto documents. This ensures proper rendering and readability.

Correct:

Here are the requirements:

- First item
- Second item

Incorrect:

Here are the requirements:
- First item
- Second item

Here’s what happens if you don’t add the blank line:

Here are the requirements: - First item - Second item

Line Breaks in Code

For ggplot calls and dplyr pipelines, do not crowd single lines. Here are some nontrivial examples of “beautiful” pipelines, where beauty is defined by coherence:

# Example 1
school_names = list(
  OUSD_school_names = absentee_all |>
    filter(dist.n == 1) |>
    pull(school) |>
    unique |>
    sort,

  WCCSD_school_names = absentee_all |>
    filter(dist.n == 0) |>
    pull(school) |>
    unique |>
    sort
)

# Example 2
absentee_all = fread(file = raw_data_path) |>
  mutate(program = case_when(schoolyr %in% pre_program_schoolyrs ~ 0,
                             schoolyr %in% program_schoolyrs ~ 1)) |>
  mutate(period = case_when(schoolyr %in% pre_program_schoolyrs ~ 0,
                            schoolyr %in% LAIV_schoolyrs ~ 1,
                            schoolyr %in% IIV_schoolyrs ~ 2)) |>
  filter(schoolyr != "2017-18")

And of a complex ggplot call:

# Example 3
ggplot(data=data) +
  
  aes(x=.data[["year"]], y=.data[["rd"]], group=.data[[group]]) +

  geom_point(mapping = aes(col = .data[[group]], shape = .data[[group]]),
             position=position_dodge(width=0.2),
             size=2.5) +

  geom_errorbar(mapping = aes(ymin=.data[["lb"]], ymax= .data[["ub"]], col= .data[[group]]),
                position=position_dodge(width=0.2),
                width=0.2) +

  geom_point(position=position_dodge(width=0.2),
             size=2.5) +

  geom_errorbar(mapping=aes(ymin=lb, ymax=ub),
                position=position_dodge(width=0.2),
                width=0.1) +

  scale_y_continuous(limits=limits,
                     breaks=breaks,
                     labels=breaks) +

  scale_color_manual(std_legend_title,values=cols,labels=legend_label) +
  scale_shape_manual(std_legend_title,values=shapes, labels=legend_label) +
  geom_hline(yintercept=0, linetype="dashed") +
  xlab("Program year") +
  ylab(yaxis_lab) +
  theme_complete_bw() +
  theme(strip.text.x = element_text(size = 14),
        axis.text.x = element_text(size = 12)) +
  ggtitle(title)

Imagine (or perhaps mournfully recall) the mess that can occur when you don’t strictly style a complicated ggplot call. Trying to fix bugs and ensure your code is working can be a nightmare. Now imagine trying to do it with the same code 6 months after you’ve written it. Invest the time now and reap the rewards as the code practically explains itself, line by line.

8.5 Markdown and Quarto Formatting

8.5.1 Writing about code in Quarto documents

When writing about code in prose sections of quarto documents, use backticks to apply a code style: for example, dplyr::mutate(). When talking about packages, use backticks and curly-braces with a hyperlink to the package website. For example: {dplyr}.

Important: Do not use raw HTML (<a href="...">) in .qmd files. Always use Quarto/markdown link syntax instead.

8.6 Messaging and User Communication

Use cli package functions for all user-facing messages in package functions:

# Good
cli::cli_inform("Analysis complete")
cli::cli_warn("Missing data detected")
cli::cli_abort("Invalid input: {x}")

# Bad - don't use these in package code
message("Analysis complete")
warning("Missing data detected")
stop("Invalid input")

8.7 Package Code Practices

No library() in package code: Use :: notation or declare in DESCRIPTION Imports
Document all exports: Use roxygen2 (@title, @description, @param, @returns, @examples)
Avoid code duplication: Extract repeated logic into helper functions

8.8 Tidyverse Replacements

Use modern tidyverse/alternatives for base R functions:

# Data structures
tibble::tibble()           # instead of data.frame()
tibble::tribble()          # instead of manual data.frame creation

# I/O
readr::read_csv()          # instead of read.csv()
readr::write_csv()         # instead of write.csv()
readr::read_rds()          # instead of readRDS()
readr::write_rds()         # instead of saveRDS()

# Data manipulation
dplyr::bind_rows()         # instead of rbind()
dplyr::bind_cols()         # instead of cbind()

# String operations
stringr::str_which()       # instead of grep()
stringr::str_replace()     # instead of gsub()

# Date/time operations
lubridate::NA_Date_        # instead of as.Date(NA)

# Session info
sessioninfo::session_info() # instead of sessionInfo()

8.9 The here Package

The here package helps manage file paths in projects by automatically finding the project root and building paths relative to it:

library(here)

# Automatically finds project root and builds paths
data <- readr::read_csv(here("data-raw", "survey.csv"))
saveRDS(results, here("inst", "analyses", "results.rds"))

This solves the problem of different working directory paths across collaborators. For example, one person might have the project at /home/oski/Some-R-Project while another has it at /home/bear/R-Code/Some-R-Project. The here package handles this automatically.

This works regardless of where collaborators clone the repository. For more details, see the here package vignette.

See also Section 6.15 for detailed explanation of the here package.

8.10 Object Naming

Use descriptive names that are both expressive and explicit. Being verbose is useful and easy in the age of autocompletion:

# Good
vaccination_coverage_2017_18
absentee_flu_residuals

# Less good
vaxcov_1718
flu_res

Prefer nouns for objects and verbs for functions:

# Good
clean_data <- prep_study_data(raw_data)  # verb for function, noun for object

# Less clear
data <- process(input)

Generally we recommend using nouns for objects and verbs for functions. This is because functions are performing actions, while objects are not.

Use snake_case for all variable and function names. Avoid using . in names (as in base R’s read.csv()), as this goes against best practices in modern R and other languages. Modern packages like readr::read_csv() follow this convention.

Try to make your variable names both more expressive and more explicit. Being a bit more verbose is useful and easy in the age of autocompletion! For example, instead of naming a variable vaxcov_1718, try naming it vaccination_coverage_2017_18. Similarly, flu_res could be named absentee_flu_residuals, making your code more readable and explicit.

Base R allows . in variable names and functions (such as read.csv()), but this goes against best practices for variable naming in many other coding languages. For consistency’s sake, snake_case has been adopted across languages, and modern packages and functions typically use it (i.e. readr::read_csv()). As a very general rule of thumb, if a package you’re using doesn’t use snake_case, there may be an updated version or more modern package that does, bringing with it the variety of performance improvements and bug fixes inherent in more mature and modern software.

Note

You may also see camelCase throughout the R code you come across. This is okay but not ideal – try to stay consistent across all your code with snake_case.

Note

Again, it’s also worth noting there’s nothing inherently wrong with using . in variable names, just that it goes against style best practices that are cropping up in data science, so it’s worth getting rid of these bad habits now.

For more help, check out Be Expressive: How to Give Your Variables Better Names

8.11 Automated Tools for Style and Project Workflow

8.11.1 Styling

8.11.1.1 RStudio shortcuts

Code Autoformatting - RStudio includes a fantastic built-in utility (keyboard shortcut: CMD-Shift-A (Mac) or Ctrl-Shift-A (Windows/Linux)) for autoformatting highlighted chunks of code to fit many of the best practices listed here. It generally makes code more readable and fixes a lot of the small things you may not feel like fixing yourself. Try it out as a “first pass” on some code of yours that doesn’t follow many of these best practices!
Assignment Aligner - A cool R package allows you to very powerfully format large chunks of assignment code to be much cleaner and much more readable. Follow the linked instructions and create a keyboard shortcut of your choosing (recommendation: CMD-Shift-Z). Here is an example of how assignment aligning can dramatically improve code readability:

# Before
OUSD_not_found_aliases = list(
  "Brookfield Village Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Brookfield"),
  "Carl Munck Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Munck"),
  "Community United Elementary School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Community United"),
  "East Oakland PRIDE Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "East Oakland Pride"),
  "EnCompass Academy" = str_subset(string = OUSD_school_shapes$schnam, pattern = "EnCompass"),
  "Global Family School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Global"),
  "International Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "International Community"),
  "Madison Park Lower Campus" = "Madison Park Academy TK-5",
  "Manzanita Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Manzanita Community"),
  "Martin Luther King Jr Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "King"),
  "PLACE @ Prescott" = "Preparatory Literary Academy of Cultural Excellence",
  "RISE Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Rise Community")
)

# After
OUSD_not_found_aliases = list(
  "Brookfield Village Elementary"      = str_subset(string = OUSD_school_shapes$schnam, pattern = "Brookfield"),
  "Carl Munck Elementary"              = str_subset(string = OUSD_school_shapes$schnam, pattern = "Munck"),
  "Community United Elementary School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Community United"),
  "East Oakland PRIDE Elementary"      = str_subset(string = OUSD_school_shapes$schnam, pattern = "East Oakland Pride"),
  "EnCompass Academy"                  = str_subset(string = OUSD_school_shapes$schnam, pattern = "EnCompass"),
  "Global Family School"               = str_subset(string = OUSD_school_shapes$schnam, pattern = "Global"),
  "International Community School"     = str_subset(string = OUSD_school_shapes$schnam, pattern = "International Community"),
  "Madison Park Lower Campus"          = "Madison Park Academy TK-5",
  "Manzanita Community School"         = str_subset(string = OUSD_school_shapes$schnam, pattern = "Manzanita Community"),
  "Martin Luther King Jr Elementary"   = str_subset(string = OUSD_school_shapes$schnam, pattern = "King"),
  "PLACE @ Prescott"                   = "Preparatory Literary Academy of Cultural Excellence",
  "RISE Community School"              = str_subset(string = OUSD_school_shapes$schnam, pattern = "Rise Community")
)

8.11.1.2 `{styler}`

{styler} is another cool R package from the Tidyverse that can be powerful and used as a first pass on entire projects that need refactoring. The most useful function of the package is the style_dir function, which will style all files within a given directory. See the function’s documentation and the vignette linked above for more details.

Note

The default Tidyverse styler is subtly different from some of the things we’ve advocated for in this document. Most notably we differ with regards to the assignment operator (<- vs =) and number of spaces before/after “tokens” (i.e. Assignment Aligner add spaces before = signs to align them properly). For this reason, we’d recommend the following: style_dir(path = ..., scope = "line_breaks", strict = FALSE). You can also customize {styler} even more if you’re really hardcore.

Note

As is mentioned in the package vignette linked above, {styler} modifies things in-place, meaning it overwrites your existing code and replaces it with the updated, properly styled code. This makes it a good fit on projects with version control, but if you don’t have backups or a good way to revert back to the initial code, I wouldn’t recommend going this route.

styler Package

For automated styling of entire projects:

# Install styler
install.packages("styler")

# Style all files in R/ directory
styler::style_dir("R/")

# Style entire package
styler::style_pkg()

# Note: styler modifies files in-place
# Always use with version control so you can review changes

8.11.1.3 `{lintr}`

Linters are programming tools that check adherence to a given style, syntax errors, and possible semantic issues. The R linter, called lintr, can be found in this package. It helps keep files consistent across different authors and even different organizations. For example, it notifies you if you have unused variables, global variables with no visible binding, not enough or superfluous whitespace, and improper use of parentheses or brackets. A list of its other purposes can be found in this link, and most guidelines are based on the Tidyverse R Style Guide.

Note

You can customize your settings to set defaults or to exclude files. More details can be found here.

Note

The lintr package goes hand in hand with the styler package. The styler can be used to automatically fix the problems that the lintr catches.

lintr package

For checking code style without modifying files:

# Install lintr
install.packages("lintr")

# Lint the entire package
lintr::lint_package()

# Lint a specific file
lintr::lint("R/my_function.R")

The linter checks for:

Unused variables
Improper whitespace
Line length issues
Style guide violations

You can customize linting rules by creating a .lintr or lintr.R file in your project root.

8.12 Additional Resources

Tidyverse style guide (Wickham 2023): Detailed coding style conventions for writing clear, consistent R code. Covers naming, syntax, pipes, functions, and more.

8.1 General Principles

8.2 Function Structure and Documentation

8.3 Comments

8.4 Line Breaks and Formatting

8.5 Markdown and Quarto Formatting

8.5.1 Writing about code in Quarto documents

8.6 Messaging and User Communication

8.7 Package Code Practices

8.8 Tidyverse Replacements

8.9 The here Package

8.10 Object Naming

8.11 Automated Tools for Style and Project Workflow

8.11.1 Styling

8.11.1.1 RStudio shortcuts

8.11.1.2 {styler}

8.11.1.3 {lintr}

8.12 Additional Resources

8.11.1.2 `{styler}`

8.11.1.3 `{lintr}`