Copilot Instructions File
This appendix contains the complete .github/copilot-instructions.md file used in this repository to guide GitHub Copilot coding agents.
.github/copilot-instructions.md
Copilot Instructions for UCD-SeRG Lab Manual
This file contains guidelines for GitHub Copilot and other AI assistants when working with the lab manual.
Markdown and Quarto Formatting
Talking about code
When talking about code in prose sections, use backticks to apply code formatting: for example, dplyr::mutate()
When talking about packages in prose, use backticks and curly-braces with a hyperlink to the package website. For example: {dplyr}
Do not use raw HTML (<a href="...">) in .qmd files. Always use Quarto/markdown link syntax instead.
Common package URLs:
Blank Lines Before Lists
ALWAYS include a blank line before bullet lists and numbered lists in markdown and Quarto (.qmd) files.
Correct:
Here are the key points:
- First item
- Second item
- Third itemIncorrect:
Here are the key points:
- First item
- Second item
- Third itemThis applies to:
- Bullet lists (starting with
-or*) - Numbered lists (starting with
1.,2., etc.) - Lists in all .qmd files throughout the repository
Line Breaks in Plain Text
ALWAYS line-break at the ends of sentences and long phrases in plain-text paragraphs in .qmd files to avoid long lines.
Correct:
When talking about code in prose sections,
use backticks to apply code formatting.
This helps maintain readability in source files
and makes diffs easier to review.Incorrect:
When talking about code in prose sections, use backticks to apply code formatting. This helps maintain readability in source files and makes diffs easier to review.Benefits:
- Improves readability of source .qmd files
- Makes git diffs clearer and easier to review
- Helps identify specific changes in version control
- Prevents horizontal scrolling when editing
- Follows semantic line breaks best practice
Guidelines:
- Break after complete sentences (at periods)
- Break after long phrases or clauses (at commas or conjunctions)
- Break after approximately 60-80 characters when appropriate
- Keep related short phrases together on one line
- Don’t break in the middle of inline code, links, or formatting
Why This Matters
- Ensures consistent markdown rendering across different platforms
- Improves readability in both source and rendered forms
- Prevents rendering issues in Quarto books
- Follows markdown best practices
Cross-References for Figures and Tables
ALWAYS use Quarto’s cross-reference system for figures, tables, and other captioned content. See Quarto Cross-References documentation for complete details.
Required label prefixes:
- Figures:
#fig-(e.g.,#fig-data-masking,#fig-workflow-diagram) - Tables:
#tbl-(e.g.,#tbl-git-commands,#tbl-summary-stats) - Equations:
#eq-(e.g.,#eq-regression-model) - Sections:
#sec-(e.g.,#sec-introduction) - already in use throughout manual - Theorems:
#thm-(e.g.,#thm-central-limit) - Lemmas:
#lem-(e.g.,#lem-auxiliary-result) - Corollaries:
#cor-(e.g.,#cor-special-case) - Propositions:
#prp-(e.g.,#prp-main-result) - Examples:
#exm-(e.g.,#exm-simple-case) - Exercises:
#exr-(e.g.,#exr-practice-problem)
For figures (images):
{#fig-label}Important: Store images locally in the repository
DO NOT link to external image URLs (especially https://github.com/user-attachments/assets/). Always save images locally in the assets/images/ directory and reference them using relative paths.
External image links can break over time, are not included in repository archives, and may fail to render in PDF or other output formats.
Correct:
Incorrect:
For tables (markdown tables):
| Column 1 | Column 2 |
|----------|----------|
| Data | Data |
: Caption text {#tbl-label}For code-generated figures:
```{r}
#| label: fig-plot-name
#| fig-cap: "Caption text"
# R code to generate plot
```For code-generated tables:
```{r}
#| label: tbl-table-name
#| tbl-cap: "Caption text"
# R code to generate table
```Referencing in text:
- Figures:
@fig-labelproduces “Figure X” - Tables:
@tbl-labelproduces “Table X” - Equations:
@eq-labelproduces “Equation X” - Sections:
@sec-labelproduces “Section X”
Important: Always use cross-references for sections
When referring to other sections within the manual, always use the Quarto cross-reference system (@sec-label) instead of plain text references like “the section above” or “see the X section”.
Correct:
See @sec-r-ci for setting up GitHub Actions workflows.
See @sec-ai-best-practices for security considerations.Incorrect:
See the "Continuous Integration" section above.
See the "Best Practices" section for more details.Benefits of using cross-references:
- Automatically generates proper section titles and numbers
- Creates clickable links in HTML output
- Updates automatically if section titles change
- Works correctly across all output formats (HTML, PDF, DOCX, EPUB)
- Quarto will warn you if a reference is broken
Benefits:
- Automatic numbering of figures, tables, and equations
- Automatic updates when content is reordered
- Clickable cross-references in HTML and PDF output
- Consistent formatting across all output formats
- Better accessibility for screen readers
R Code Style
- Follow the tidyverse style guide: https://style.tidyverse.org
- Use native pipe
|>instead of%>% - Use
snake_casefor variable and function names - Use
.qmdfiles exclusively (not.Rmd) - All R projects should use R package structure
- Avoid redundant logical comparisons: Use logical variables directly in conditional statements (e.g.,
if (x)instead ofif (x == TRUE)orif (x == 1)) - Use
lubridate::NA_Date_instead ofas.Date(NA)for missing date values - Use pipes to emphasize primary inputs: When writing functions and code, use the pipe operator to clearly show transformations on a primary object. The primary input should flow as the first argument to each function in the chain. Design functions so the most important argument (usually data) comes first, enabling natural pipeline composition.
Quarto Code Chunk Options
Use code-fold: true for chunks where the output is important to the narrative, not the code used to produce it.
This option allows interested readers to expand and view the code while keeping the document focused on results.
Example:
```{r}
#| code-fold: true
#| fig-cap: "Distribution of variable X"
ggplot(data, aes(x = variable)) +
geom_histogram()
```This is particularly useful for:
- Complex data manipulation code that produces important summary tables
- Plot generation code where the visualization is the key message
- Lengthy setup or configuration code that supports the narrative but isn’t central to it
Do not use code-fold: true when the code itself is being taught or demonstrated.
File Organization
Using Quarto Includes for Modular Content
All chapters should use Quarto includes to decompose content into separate files. This modular approach provides significant benefits for version control, collaboration, and content management.
Why Use Includes?
Better Git History: When sections are reordered, only the main chapter file changes (moving include statements), making it immediately clear that content was reorganized rather than edited. When content is edited, only the specific include file changes. This makes reviews focused and precise.
Easier Code Review: Reviewers can see exactly what changed—either the organization (main file) or the content (include file)—without having to parse through large diffs.
Modular Maintenance: Each section lives in its own file, making it easier to:
- Find and edit specific content
- Reuse sections across chapters if needed
- Work on different sections simultaneously without merge conflicts
- Test and preview individual sections
Clear Structure: The main chapter file becomes a table of contents showing the organization at a glance.
Structure Pattern
Main chapter file (e.g., 05-coding-practices.qmd):
- Contains the chapter title and introduction
- Contains section headings (##, ###, etc.)
- Uses the
includeshortcode to pull in content (see https://quarto.org/docs/authoring/includes.html for details) - Shows the organization/outline of the chapter
Include files (e.g., 05-coding-practices/lab-protocols-for-code-and-data.qmd):
- Stored in a subdirectory matching the chapter name
- Contains only the content for that section (no heading)
- The heading stays in the main chapter file
- Named descriptively using kebab-case
Required Pattern
Always follow this pattern:
## Section Heading
{{< include folder/section-name.qmd >}}
Correct example:
## Section heading
{{< include folder/section-name.qmd >}}
Incorrect (don’t do this):
{{< include folder/section-name.qmd >}}
The heading must be in the main file, followed by a blank line, then the include statement.
File Naming Conventions
- Main chapter files:
##-chapter-name.qmd(e.g.,05-coding-practices.qmd) - Subdirectory:
##-chapter-name/(matches the main file name) - Include files:
descriptive-section-name.qmdusing kebab-case - Use descriptive names that clearly indicate the content
- Prefix with underscore
_for partial/helper files not directly included (e.g.,_lintr-summary.qmd)
Git History Benefits Example
When reordering sections:
-## Object naming
+## Function calls
-{{< include demo-folder/section-name.qmd >}}
+{{< include demo-folder/section-2.qmd >}}
-## Function calls
+## Object naming
-{{< include demo-folder/section-2.qmd >}}
+{{< include demo-folder/section-name.qmd >}}This diff clearly shows a reordering (swapping two sections) with no content changes—only the main chapter file changes.
When editing content: Only the specific include file (e.g., 05-coding-practices/function-calls.qmd) appears in the git diff, making it easy to review the actual content changes without distraction.
When to Create a New Include File
Create a new include file when:
- Adding a new section to a chapter
- A section becomes long enough to benefit from being in its own file (>20-30 lines)
- Content might be reused elsewhere
- You want to work on a section independently
Important: New subsections should usually use includes
When adding new subsections (### headings) to existing chapters, usually create a separate include file for the content. Consider these factors when deciding:
- Subsections with substantial content (>50 lines)
- Subsections that are “big and distinctive enough” to stand on their own
- Content that forms a cohesive, self-contained topic
- Likelihood of future growth or expansion
- Current size of the parent file (keep source files under 100 lines when practical)
For shorter subsections (<30 lines) in files that are well under 100 lines, inline content may be appropriate if the section is unlikely to grow significantly.
This practice ensures better git history, easier code review, and clearer organization from the start.
Migration Strategy
When working with chapters that don’t yet use includes:
- Create a subdirectory matching the chapter name
- Extract each section into its own include file
- Update the main chapter file to use includes
- Keep headings in the main file
- Ensure blank lines before include statements
- Test that rendering still works correctly
Using Includes for Code Examples and Reusable Content
Prefer using Quarto’s include shortcode over copy-pasting content whenever feasible. This applies to code examples, configuration files, and any content that exists elsewhere in the repository.
Benefits:
- Single source of truth: Changes to the original file automatically propagate
- Reduces maintenance burden and sync issues
- Ensures examples stay current and accurate
- Better git history (changes appear in one place)
For including code files:
Use the include shortcode inside a code fence with the appropriate language. For example, to include a YAML workflow file:
```{.yaml filename="demo-folder/yml.yml"}
{{< include demo-folder/yml.yml >}}
```When you need to show the include shortcode syntax itself in documentation (without it being processed), add an extra pair of curly braces: {{< include path/to/file >}}. This prevents Quarto from recognizing it as a shortcode, allowing the literal syntax to appear in the rendered output.
When to copy-paste instead:
Only copy-paste when:
- The content is a simplified example that doesn’t exist elsewhere
- You need to show a partial excerpt with modifications
- The source file contains content that shouldn’t be fully shown
- You need to demonstrate different variations of similar code
File naming for included code:
- Prefix standalone code files with
_so Quarto doesn’t try to render them (e.g.,_helper-functions.R) - Use descriptive names that indicate the purpose
- Keep included files in appropriate subdirectories
Working with DOCX Files
GitHub Copilot can read and process Microsoft Word (.docx) files, which is useful for translating edits made in Word back to Quarto format.
When working with DOCX files:
- Check git metadata first: DOCX files generated from this repository include a “Document Generation Metadata” section at the end with the branch name, commit hash, and commit date. Use this information to:
- Identify which commit generated the original DOCX
- Set up the resulting PR correctly with the appropriate base branch
- Account for any commits that have been added since the DOCX was generated
- Understand the state of the repository when the DOCX was created
- Always examine tracked changes: Use the
viewtool to read DOCX files and pay special attention to any tracked changes (insertions, deletions, formatting changes) - Review comments: Look for and address any comments in the DOCX file that may provide context or instructions for edits
- Translate edits to Quarto: When edits have been made in a DOCX file, apply the equivalent changes to the corresponding
.qmdfiles - Preserve formatting: Ensure that formatting, citations, and cross-references are properly converted to Quarto/markdown syntax
- Verify completeness: Check that all edits, including those in tracked changes and comments, have been addressed
This workflow enables a hybrid editing process where collaborators can make edits in familiar Word format, and Copilot can translate those edits back to the Quarto source files.
Additional Guidelines
- Maintain consistency with existing code style
- Preserve all existing content when refactoring
- Add blank lines before all lists
- Follow the lab’s R package development workflow (as described throughout this repo)
- When discussing current world conditions or technology capabilities: Always mention the date or time period to provide temporal context and prevent content from becoming misleading as time passes
- Determining the current date: Do not assume you know what the current date is. Instead, use the Unix command line to determine the actual date (e.g.,
date +"%Y-%m-%d"ordate +"%B %Y"), and use that when discussing current conditions, recent events, or the state of technology “as of” a particular time period
Citations and Evidence for Claims
All factual claims should be backed by either citations or direct evidence.
When writing documentation:
- Cite sources for factual statements about how tools, systems, or processes work
- Provide direct evidence by demonstrating behavior yourself (e.g., showing command output, testing functionality)
- Remove unverified explanations rather than including speculative or unsubstantiated claims
- Link to authoritative sources like official documentation, GitHub issues, or peer-reviewed materials
- For comparative or popularity claims: Provide specific metrics (e.g., GitHub stars, download counts, usage statistics) with dates rather than subjective terms like “most popular” or “widely used” without evidence
- For all factual claims: you must provide supporting evidence, either directly or by explicitly citing credible sources;
- Do not phrase claims as facts when they are really merely assumptions or common opinions that may not be universally agreed on.
When adding links to external resources:
- Always verify the content of linked pages before adding them to the manual
- Read the repository README, DESCRIPTION file, or website content to understand what the resource actually contains
- Use accurate descriptions based on the actual content, not assumptions based on the URL or name
- For GitHub repositories, check key files like README.md, DESCRIPTION, index.qmd, or _quarto.yml to understand the project’s purpose
Example of what NOT to do:
In PR #151, the initial approach failed to verify the actual content of the linked repository: - Assumed “PSW” meant “Propensity Score Weighting” based on the acronym - Created a mischaracterized description: “R package for propensity score weighting and related methods for causal inference in observational studies” - Placed the link in an incorrect section (“Useful R Packages”)
Example of what TO do:
After reviewing the actual repository files (DESCRIPTION, _quarto.yml, index.qmd): - Verified that PSW stands for “Principles of Scientific Writing” - Determined it’s a Quarto book (later revised to “handbook”) about scientific writing principles - Placed the link in the appropriate “Writing” section - Used an accurate description based on the actual content: “a handbook covering scientific writing principles including citations and evidence, word choice, and conciseness”
This practice ensures accuracy, builds trust, and helps readers verify information independently.
Testing and Validation
ALWAYS render the full Quarto book before requesting code review or finalizing your work.
Run quarto render to ensure the book builds successfully in all output formats (HTML, PDF, DOCX, EPUB). This validates that:
- All cross-references are valid
- All images can be properly converted for PDF output (use PNG format for images, not SVG)
- All code chunks execute without errors
- The book structure is correct
If the render fails, fix the issues before committing or requesting review. Common issues include:
- SVG images that cannot be converted to PDF (use PNG instead)
- Invalid cross-references
- Missing or incorrect file paths
- Syntax errors in code chunks