| Variable | Label |
|---|---|
| us_born | U.S.-born vs. Foreign-born |
| genx_mil | Birth cohort (Generation X / Millennial) |
| age_cat | Age group |
| male | Sex |
| race_eth15 | Race/ethnicity (6 categories) |
| edu4 | Educational attainment (4 levels) |
| famincome | Family income category |
| marital_stat2 | Marital status |
| region2 | U.S. Census region |
| smoker | Current smoker |
| alcohol | Heavy alcohol use |
| obese | Obesity status |
| hyperten | Hypertension diagnosis |
| healthstatus | Self-rated health (Healthy / Poor) |
| poor_health | Poor self-rated health (0 = Healthy, 1 = Poor) |
| wgt | Mortality analysis weight (mortwtsa) |
| strata | Survey stratum |
| psu | Primary sampling unit |
Publication-Ready Tables in R
A Practical Guide Using gtsummary and flextable
Overview
Producing clean, publication-ready tables is one of the most important — and most under-taught — skills in applied social science and public health research. This tutorial walks through a complete workflow using two complementary R packages:
| Package | Strength | Best for |
|---|---|---|
| gtsummary | Automatic table building with survey support | Table 1, regression tables |
| flextable | Fine-grained formatting control | Word / .docx export |
All examples use a 20,000-observation random sample of the National Health Interview Survey (NHIS), linked to the National Death Index through the NHIS Linked Mortality Files (LMF). The analysis illustrates a common demography and public health workflow: describing a sample, then modeling a binary health outcome with survey-weighted logistic regression.
- How to build a complete Table 1 (descriptive statistics), with and without survey weights
- How to control variable type summaries (continuous vs. categorical)
- How to produce and format regression tables from
svyglm()output - Advanced gtsummary techniques: removing reference rows, significance stars, managing footnotes,
tbl_merge(), andtbl_stack() - How to export polished tables to Word via flextable
1 Data and Variables
The analytic sample contains 20000 respondents drawn from NHIS waves collected between 1997 and 2018, with variables spanning demographics, socioeconomic status, health behaviors, and health outcomes.
Survey design
NHIS uses a complex multistage probability sample. Correct inference requires declaring the survey design before any analysis:
Code
library(survey)
options(survey.lonely.psu = "adjust")
des <- svydesign(
id = ~psu, # primary sampling unit
strata = ~strata, # stratification variable
weights = ~wgt, # mortality analysis weight (mortwtsa)
data = nhis,
nest = TRUE # PSUs nested within strata
)Always use nest = TRUE when PSU IDs are not unique across strata (which is the case with NHIS). Setting options(survey.lonely.psu = "adjust") prevents errors in strata with only one PSU.
Global gtsummary options
Two global options should be set once per session, before any table is built. They apply automatically to every subsequent tbl_summary(), tbl_svysummary(), and tbl_regression() call.
Code
options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))options(gtsummary.use_ftExtra = TRUE)
Activates the ftExtra backend for flextable rendering. By default, gtsummary uses a basic text renderer when converting tables to flextable via as_flex_table(). With ftExtra enabled, markdown in cell content (bold labels, italic levels, superscripts, footnote symbols) is preserved faithfully in the Word export. Without it, formatted text can appear as raw markdown syntax in the .docx file.
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))
Applies the compact theme globally. Compared to the default theme, compact reduces row padding and font size, producing tables that fit comfortably on a journal page. Using set_gtsummary_theme() as the outer wrapper is the recommended pattern in gtsummary 2.x: it registers the theme with the package engine so it persists across all tables in the session. Passing set_theme = TRUE inside theme_gtsummary_compact() ensures the theme is active immediately even if called in isolation.
Both lines are already active in this tutorial’s hidden setup chunk.
2 Part 1: Descriptive Tables (Table 1)
Table 1 in a manuscript reports sample characteristics. The tabs below progress from the simplest approach to a fully stratified, weighted table ready for publication.
tbl_summary() works directly on a data frame — no survey design needed. Appropriate for convenience samples, or as a first pass before adding weights.
Code
nhis |>
select(us_born, genx_mil, age_cat, male, race_eth15,
edu4, famincome, healthstatus) |>
tbl_summary(
label = list(
us_born ~ "Nativity",
genx_mil ~ "Birth cohort",
age_cat ~ "Age group",
male ~ "Sex",
race_eth15 ~ "Race/ethnicity",
edu4 ~ "Education",
famincome ~ "Family income",
healthstatus ~ "Self-rated health"
),
missing = "no"
) |>
bold_labels() |>
modify_caption("**Table 1. Sample Characteristics (Unweighted)**") |>
as_gt()| Characteristic | N = 20,0001 |
|---|---|
| Nativity | |
| U.S. born | 15,664 (78%) |
| Foreign born | 4,336 (22%) |
| Birth cohort | |
| Generation X | 13,117 (66%) |
| Millennial | 6,883 (34%) |
| Age group | |
| 18-24 | 4,745 (24%) |
| 25-34 | 8,430 (42%) |
| 35-44 | 5,304 (27%) |
| 45-55 | 1,521 (7.6%) |
| Sex | |
| Female | 10,956 (55%) |
| Male | 9,044 (45%) |
| Race/ethnicity | |
| Non-Hispanic white | 11,119 (56%) |
| Hispanic | 4,519 (23%) |
| Non-Hispanic Black | 2,906 (15%) |
| Non-Hispanic ANAI | 172 (0.9%) |
| Non-Hispanic Asian | 790 (4.0%) |
| Non-Hispanic other | 494 (2.5%) |
| Education | |
| Less than HS | 2,482 (12%) |
| High school | 5,340 (27%) |
| Some college | 6,635 (33%) |
| BA or higher | 5,497 (28%) |
| Family income | |
| Less than $35,000 | 8,338 (45%) |
| $35,000 - $75,000 | 6,666 (36%) |
| $75,000 - $99,999 | 1,350 (7.3%) |
| $100,000 and over | 2,184 (12%) |
| Self-rated health | |
| Healthy | 18,580 (93%) |
| Poor | 1,420 (7.1%) |
| 1 n (%) | |
tbl_svysummary() accepts the svydesign object and weights all statistics automatically. Proportions and means can differ substantially from the unweighted version when survey weights correct for unequal selection probabilities.
Code
des |>
tbl_svysummary(
include = c(us_born, genx_mil, age_cat, male, race_eth15,
edu4, famincome, healthstatus),
label = list(
us_born ~ "Nativity",
genx_mil ~ "Birth cohort",
age_cat ~ "Age group",
male ~ "Sex",
race_eth15 ~ "Race/ethnicity",
edu4 ~ "Education",
famincome ~ "Family income",
healthstatus ~ "Self-rated health"
),
missing = "no"
) |>
bold_labels() |>
modify_caption("**Table 1. Sample Characteristics (Survey-Weighted)**") |>
as_gt()| Characteristic | N = 20,7261 |
|---|---|
| Nativity | |
| U.S. born | 16,683 (80%) |
| Foreign born | 4,042 (20%) |
| Birth cohort | |
| Generation X | 12,794 (62%) |
| Millennial | 7,931 (38%) |
| Age group | |
| 18-24 | 5,647 (27%) |
| 25-34 | 7,962 (38%) |
| 35-44 | 5,418 (26%) |
| 45-55 | 1,698 (8.2%) |
| Sex | |
| Female | 10,499 (51%) |
| Male | 10,226 (49%) |
| Race/ethnicity | |
| Non-Hispanic white | 12,701 (61%) |
| Hispanic | 3,896 (19%) |
| Non-Hispanic Black | 2,682 (13%) |
| Non-Hispanic ANAI | 185 (0.9%) |
| Non-Hispanic Asian | 787 (3.8%) |
| Non-Hispanic other | 473 (2.3%) |
| Education | |
| Less than HS | 2,350 (11%) |
| High school | 5,702 (28%) |
| Some college | 6,894 (33%) |
| BA or higher | 5,725 (28%) |
| Family income | |
| Less than $35,000 | 6,654 (35%) |
| $35,000 - $75,000 | 7,447 (39%) |
| $75,000 - $99,999 | 1,693 (8.9%) |
| $100,000 and over | 3,164 (17%) |
| Self-rated health | |
| Healthy | 19,386 (94%) |
| Poor | 1,339 (6.5%) |
| 1 n (%) | |
Add by = to compare groups side by side. Chain add_overall() for a total column, and add_p() for group-comparison p-values. Use modify_header() with {n} (unweighted count) to show interpretable sample sizes in column headers.
Code
tbl1 <- des |>
tbl_svysummary(
by = us_born,
include = c(genx_mil, age_cat, male, race_eth15,
edu4, famincome, smoker, alcohol, obese,
hyperten, healthstatus),
label = list(
genx_mil ~ "Birth cohort",
age_cat ~ "Age group",
male ~ "Sex",
race_eth15 ~ "Race/ethnicity",
edu4 ~ "Education",
famincome ~ "Family income",
smoker ~ "Current smoker",
alcohol ~ "Heavy alcohol use",
obese ~ "Obese",
hyperten ~ "Hypertension",
healthstatus ~ "Self-rated health"
),
missing = "no"
) |>
add_overall(last = FALSE) |>
add_p(test.args = all_tests("svy.wilcox.test") ~ list(design = des)) |>
bold_labels() |>
italicize_levels() |>
bold_p(t = 0.05) |>
modify_header(
stat_0 ~ "**Overall**\nn = {n_unweighted}",
stat_1 ~ "**U.S.-Born**\nn = {n_unweighted}",
stat_2 ~ "**Foreign-Born**\nn = {n_unweighted}"
) |>
modify_caption("**Table 1. Sample Characteristics by Nativity**")
tbl1 |> as_gt()| Characteristic | Overall n = 200001 | U.S.-Born n = 156641 | Foreign-Born n = 43361 | p-value2 |
|---|---|---|---|---|
| Birth cohort | <0.001 | |||
| Generation X | 12,794 (62%) | 10,058 (60%) | 2,736 (68%) | |
| Millennial | 7,931 (38%) | 6,626 (40%) | 1,306 (32%) | |
| Age group | <0.001 | |||
| 18-24 | 5,647 (27%) | 4,921 (29%) | 726 (18%) | |
| 25-34 | 7,962 (38%) | 6,304 (38%) | 1,659 (41%) | |
| 35-44 | 5,418 (26%) | 4,136 (25%) | 1,282 (32%) | |
| 45-55 | 1,698 (8.2%) | 1,323 (7.9%) | 375 (9.3%) | |
| Sex | 0.10 | |||
| Female | 10,499 (51%) | 8,510 (51%) | 1,990 (49%) | |
| Male | 10,226 (49%) | 8,174 (49%) | 2,052 (51%) | |
| Race/ethnicity | <0.001 | |||
| Non-Hispanic white | 12,701 (61%) | 12,080 (72%) | 622 (15%) | |
| Hispanic | 3,896 (19%) | 1,717 (10%) | 2,179 (54%) | |
| Non-Hispanic Black | 2,682 (13%) | 2,332 (14%) | 350 (8.7%) | |
| Non-Hispanic ANAI | 185 (0.9%) | 179 (1.1%) | 6 (0.2%) | |
| Non-Hispanic Asian | 787 (3.8%) | 189 (1.1%) | 599 (15%) | |
| Non-Hispanic other | 473 (2.3%) | 187 (1.1%) | 286 (7.1%) | |
| Education | <0.001 | |||
| Less than HS | 2,350 (11%) | 1,335 (8.0%) | 1,015 (25%) | |
| High school | 5,702 (28%) | 4,725 (28%) | 977 (24%) | |
| Some college | 6,894 (33%) | 6,005 (36%) | 889 (22%) | |
| BA or higher | 5,725 (28%) | 4,599 (28%) | 1,126 (28%) | |
| Family income | <0.001 | |||
| Less than $35,000 | 6,654 (35%) | 5,098 (33%) | 1,556 (43%) | |
| $35,000 - $75,000 | 7,447 (39%) | 6,093 (40%) | 1,354 (37%) | |
| $75,000 - $99,999 | 1,693 (8.9%) | 1,451 (9.5%) | 242 (6.6%) | |
| $100,000 and over | 3,164 (17%) | 2,668 (17%) | 496 (14%) | |
| Current smoker | <0.001 | |||
| Never smoked | 13,761 (66%) | 10,518 (63%) | 3,243 (80%) | |
| Former smoker | 2,553 (12%) | 2,190 (13%) | 363 (9.0%) | |
| Current smoker | 4,383 (21%) | 3,950 (24%) | 433 (11%) | |
| Heavy alcohol use | <0.001 | |||
| Current drinker | 14,162 (68%) | 11,972 (72%) | 2,190 (54%) | |
| Lifetime abstainer | 4,659 (22%) | 3,132 (19%) | 1,527 (38%) | |
| Infrequent drinker | 1,054 (5.1%) | 866 (5.2%) | 188 (4.6%) | |
| Former drinker | 851 (4.1%) | 713 (4.3%) | 137 (3.4%) | |
| Obese | <0.001 | |||
| Not obese | 14,391 (76%) | 11,374 (75%) | 3,016 (82%) | |
| Obese | 4,527 (24%) | 3,865 (25%) | 662 (18%) | |
| Hypertension | 2,437 (12%) | 2,105 (13%) | 332 (8.2%) | <0.001 |
| Self-rated health | 0.3 | |||
| Healthy | 19,386 (94%) | 15,585 (93%) | 3,801 (94%) | |
| Poor | 1,339 (6.5%) | 1,099 (6.6%) | 241 (6.0%) | |
| 1 n (%) | ||||
| 2 Pearson’s X^2: Rao & Scott adjustment | ||||
tbl_svysummary
| Placeholder | Returns | Use when |
|---|---|---|
{n_unweighted} |
Unweighted count per group | Column headers (most readable) |
{N_unweighted} |
Total unweighted N | Overall column header |
{n} |
Sum of weights per group | Reporting weighted N |
{N} |
Total sum of weights | Rarely useful in headers |
Both {n} and {N} return decimals in weighted surveys — use {n_unweighted} for clean integer counts in column headers.
2.1 Controlling Variable Type Summaries
By default, gtsummary guesses whether each variable is categorical or continuous. You can override this with the type = argument, and customize the displayed statistics with statistic =.
Code
nhis |>
select(age_cat, edu4, poor_health) |>
tbl_summary(missing = "no") |>
bold_labels() |>
as_gt()| Characteristic | N = 20,0001 |
|---|---|
| Age group | |
| 18-24 | 4,745 (24%) |
| 25-34 | 8,430 (42%) |
| 35-44 | 5,304 (27%) |
| 45-55 | 1,521 (7.6%) |
| Education | |
| Less than HS | 2,482 (12%) |
| High school | 5,340 (27%) |
| Some college | 6,635 (33%) |
| BA or higher | 5,497 (28%) |
| Poor health | 1,420 (7.1%) |
| 1 n (%) | |
Use type = list(var ~ "continuous") to display mean ± SD instead of counts. Useful for variables that happen to be stored as factors but have a meaningful numeric interpretation (e.g., an age index).
Code
nhis |>
select(age_cat, edu4, poor_health) |>
tbl_summary(
type = list(poor_health ~ "continuous"),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
missing = "no"
) |>
bold_labels() |>
as_gt()| Characteristic | N = 20,0001 |
|---|---|
| Age group | |
| 18-24 | 4,745 (24%) |
| 25-34 | 8,430 (42%) |
| 35-44 | 5,304 (27%) |
| 45-55 | 1,521 (7.6%) |
| Education | |
| Less than HS | 2,482 (12%) |
| High school | 5,340 (27%) |
| Some college | 6,635 (33%) |
| BA or higher | 5,497 (28%) |
| Poor health | 0 (0) |
| 1 n (%); Mean (SD) | |
Use type = list(var ~ "categorical") to show counts and percentages for a numeric variable (e.g., an integer 0/1 outcome where you want both values displayed explicitly).
Code
nhis |>
select(age_cat, edu4, poor_health) |>
tbl_summary(
type = list(poor_health ~ "categorical"),
missing = "no"
) |>
bold_labels() |>
as_gt()| Characteristic | N = 20,0001 |
|---|---|
| Age group | |
| 18-24 | 4,745 (24%) |
| 25-34 | 8,430 (42%) |
| 35-44 | 5,304 (27%) |
| 45-55 | 1,521 (7.6%) |
| Education | |
| Less than HS | 2,482 (12%) |
| High school | 5,340 (27%) |
| Some college | 6,635 (33%) |
| BA or higher | 5,497 (28%) |
| Poor health | |
| 0 | 18,580 (93%) |
| 1 | 1,420 (7.1%) |
| 1 n (%) | |
| Function | Purpose |
|---|---|
tbl_svysummary(by = ...) |
Stratify columns by a grouping variable |
add_overall() |
Append an overall (unstratified) column |
add_p() |
Add p-values for group comparisons |
type = list(var ~ "continuous") |
Force a variable to display mean/SD |
type = list(var ~ "categorical") |
Force a variable to display counts/% |
statistic = list(...) |
Customize the displayed summary statistic |
bold_labels() |
Bold the variable name rows |
italicize_levels() |
Italicize the category rows |
bold_p(t = 0.05) |
Bold significant p-values |
modify_header() |
Rewrite any column header (use {n} for counts) |
modify_caption() |
Add or change table caption |
3 Part 2: Regression Tables
We model poor self-rated health (binary: 1 = Poor, 0 = Healthy) using survey-weighted logistic regression via svyglm() with a quasibinomial() family.
Code
# Model A: Demographic only
m_A <- svyglm(
poor_health ~ us_born + genx_mil + male + race_eth15,
design = des_cc,
family = quasibinomial()
)
# Model B: + Socioeconomic status
m_B <- svyglm(
poor_health ~ us_born + genx_mil + male + race_eth15 +
edu4 + famincome,
design = des_cc,
family = quasibinomial()
)
# Model C: Full model
m_C <- svyglm(
poor_health ~ us_born + genx_mil + male + race_eth15 +
edu4 + famincome + marital_stat2 +
smoker + alcohol + obese + hyperten + region2,
design = des_cc,
family = quasibinomial()
)The four tabs below demonstrate progressively more refined formatting, starting from the default output and building toward a publication-ready table.
tbl_regression() converts any model object into a formatted table. Set exponentiate = TRUE to display odds ratios instead of log-odds.
Code
tbl_regression(m_A, exponentiate = TRUE) |>
bold_labels() |>
as_gt()| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Nativity | |||
| U.S. born | — | — | |
| Foreign born | 0.73 | 0.57, 0.93 | 0.012 |
| Birth cohort | |||
| Generation X | — | — | |
| Millennial | 0.60 | 0.50, 0.73 | <0.001 |
| Sex | |||
| Female | — | — | |
| Male | 0.84 | 0.71, 0.98 | 0.027 |
| Race/ethnicity | |||
| Non-Hispanic white | — | — | |
| Hispanic | 1.80 | 1.45, 2.23 | <0.001 |
| Non-Hispanic Black | 1.49 | 1.21, 1.84 | <0.001 |
| Non-Hispanic ANAI | 2.69 | 1.53, 4.71 | <0.001 |
| Non-Hispanic Asian | 0.72 | 0.44, 1.19 | 0.2 |
| Non-Hispanic other | 0.97 | 0.56, 1.67 | >0.9 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
By default, gtsummary shows a reference row for each categorical predictor. These rows add visual clutter. A single call to remove_row_type(type = "reference") cleans them up.
Code
tbl_regression(m_A, exponentiate = TRUE) |>
bold_labels() |>
italicize_levels() |>
as_gt()| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Nativity | |||
| U.S. born | — | — | |
| Foreign born | 0.73 | 0.57, 0.93 | 0.012 |
| Birth cohort | |||
| Generation X | — | — | |
| Millennial | 0.60 | 0.50, 0.73 | <0.001 |
| Sex | |||
| Female | — | — | |
| Male | 0.84 | 0.71, 0.98 | 0.027 |
| Race/ethnicity | |||
| Non-Hispanic white | — | — | |
| Hispanic | 1.80 | 1.45, 2.23 | <0.001 |
| Non-Hispanic Black | 1.49 | 1.21, 1.84 | <0.001 |
| Non-Hispanic ANAI | 2.69 | 1.53, 4.71 | <0.001 |
| Non-Hispanic Asian | 0.72 | 0.44, 1.19 | 0.2 |
| Non-Hispanic other | 0.97 | 0.56, 1.67 | >0.9 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
Code
tbl_regression(m_A, exponentiate = TRUE) |>
remove_row_type(type = "reference") |>
bold_labels() |>
italicize_levels() |>
as_gt()| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Nativity | |||
| Foreign born | 0.73 | 0.57, 0.93 | 0.012 |
| Birth cohort | |||
| Millennial | 0.60 | 0.50, 0.73 | <0.001 |
| Sex | |||
| Male | 0.84 | 0.71, 0.98 | 0.027 |
| Race/ethnicity | |||
| Hispanic | 1.80 | 1.45, 2.23 | <0.001 |
| Non-Hispanic Black | 1.49 | 1.21, 1.84 | <0.001 |
| Non-Hispanic ANAI | 2.69 | 1.53, 4.71 | <0.001 |
| Non-Hispanic Asian | 0.72 | 0.44, 1.19 | 0.2 |
| Non-Hispanic other | 0.97 | 0.56, 1.67 | >0.9 |
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
Some journals prefer the European style: coefficient (SE) with significance stars rather than a confidence interval column. Use add_significance_stars() with conf.int = FALSE.
Code
tbl_regression(
m_A,
exponentiate = TRUE,
conf.int = FALSE
) |>
add_significance_stars(
hide_ci = TRUE,
hide_se = FALSE,
pattern = "{estimate}{stars}"
) |>
remove_row_type(type = "reference") |>
bold_labels() |>
italicize_levels() |>
modify_header(estimate ~ "**OR**", std.error ~ "**SE**") |>
modify_caption("**Table 2. Model A: Demographic Predictors of Poor Health**") |>
as_gt()| Characteristic | OR1 | SE |
|---|---|---|
| Nativity | ||
| Foreign born | 0.73* | 0.126 |
| Birth cohort | ||
| Millennial | 0.60*** | 0.095 |
| Sex | ||
| Male | 0.84* | 0.081 |
| Race/ethnicity | ||
| Hispanic | 1.80*** | 0.111 |
| Non-Hispanic Black | 1.49*** | 0.107 |
| Non-Hispanic ANAI | 2.69*** | 0.287 |
| Non-Hispanic Asian | 0.72 | 0.255 |
| Non-Hispanic other | 0.97 | 0.277 |
| 1 p<0.05; p<0.01; p<0.001 | ||
| Abbreviations: OR = Odds Ratio, SE = Standard Error | ||
pattern = "{estimate}{stars}" places the stars immediately after the estimate. Use "{estimate} {stars}" to add a space. hide_ci = TRUE drops the CI columns entirely.
When you drop the CI columns, the default footnote defining “CI” becomes orphaned. Remove it by filtering the internal table_styling$abbreviation tibble directly.
Code
tbl_no_ci <- tbl_regression(
m_B,
exponentiate = TRUE,
conf.int = FALSE
) |>
add_significance_stars(
hide_ci = TRUE,
hide_se = FALSE,
pattern = "{estimate}{stars}"
) |>
remove_row_type(type = "reference") |>
bold_labels() |>
italicize_levels() |>
modify_header(estimate ~ "**OR**", std.error ~ "**SE**")
# Surgically remove the orphaned CI footnote
tbl_no_ci$table_styling$abbreviation <-
tbl_no_ci$table_styling$abbreviation |>
dplyr::filter(column != "conf.low")
tbl_no_ci |>
modify_caption("**Table 2. Model B: Demographic + SES Predictors**") |>
as_gt()| Characteristic | OR1 | SE |
|---|---|---|
| Nativity | ||
| Foreign born | 0.60*** | 0.136 |
| Birth cohort | ||
| Millennial | 0.54*** | 0.096 |
| Sex | ||
| Male | 0.84* | 0.082 |
| Race/ethnicity | ||
| Hispanic | 1.31* | 0.129 |
| Non-Hispanic Black | 1.07 | 0.108 |
| Non-Hispanic ANAI | 1.79 | 0.303 |
| Non-Hispanic Asian | 1.24 | 0.258 |
| Non-Hispanic other | 1.15 | 0.281 |
| Education | ||
| High school | 0.71** | 0.119 |
| Some college | 0.58*** | 0.125 |
| BA or higher | 0.28*** | 0.166 |
| Family income | ||
| $35,000 - $75,000 | 0.43*** | 0.103 |
| $75,000 - $99,999 | 0.44*** | 0.204 |
| $100,000 and over | 0.26*** | 0.235 |
| 1 p<0.05; p<0.01; p<0.001 | ||
| Abbreviations: OR = Odds Ratio, SE = Standard Error | ||
Before customizing column headers with modify_header(), use show_header_names() to print the exact internal column names gtsummary assigns. This eliminates guesswork.
Code
tbl_regression(m_A, exponentiate = TRUE) |>
show_header_names()#> Column Name Header N* N_event*
#> label "**Characteristic**" 17,295 <dbl> 1,040 <dbl>
#> estimate "**OR**" 17,295 <dbl> 1,040 <dbl>
#> conf.low "**95% CI**" 17,295 <dbl> 1,040 <dbl>
#> p.value "**p-value**" 17,295 <dbl> 1,040 <dbl>
"estimate" = OR column, "conf.low" / "conf.high" = CI bounds, "std.error" = SE. Pass these strings to modify_header().
4 Part 3: Combining Tables
tbl_merge() places multiple model tables side by side under optional spanning headers — the standard multi-model format for journal manuscripts.
Code
# Helper: builds a consistently formatted model table
make_tbl <- function(model) {
tbl_regression(
model,
exponentiate = TRUE,
conf.int = FALSE
) |>
add_significance_stars(
hide_ci = TRUE,
hide_se = FALSE,
pattern = "{estimate}{stars}"
) |>
remove_row_type(type = "reference") |>
bold_labels() |>
italicize_levels() |>
modify_header(estimate ~ "**OR**", std.error ~ "**SE**")
}
t_A <- make_tbl(m_A)
t_B <- make_tbl(m_B)
t_C <- make_tbl(m_C)
# Remove orphaned CI footnotes
for (tbl_obj in list(t_A, t_B, t_C)) {
tbl_obj$table_styling$abbreviation <-
tbl_obj$table_styling$abbreviation |>
dplyr::filter(column != "conf.low")
}
tbl_merge(
tbls = list(t_A, t_B, t_C),
tab_spanner = c(
"**Model A: Demographic**",
"**Model B: + SES**",
"**Model C: Full**"
)
) |>
bold_labels() |>
modify_caption("**Table 3. Logistic Regression: Predictors of Poor Self-Rated Health**") |>
as_gt() |>
tab_footnote(
footnote = "OR = odds ratio; SE = standard error. Survey-weighted quasibinomial logistic regression. ***p < 0.001; **p < 0.01; *p < 0.05.",
locations = cells_title()
)| Characteristic |
Model A: Demographic
|
Model B: + SES
|
Model C: Full
|
|||
|---|---|---|---|---|---|---|
| OR1 | SE | OR1 | SE | OR1 | SE | |
| Nativity | ||||||
| Foreign born | 0.73* | 0.126 | 0.60*** | 0.136 | 0.78 | 0.140 |
| Birth cohort | ||||||
| Millennial | 0.60*** | 0.095 | 0.54*** | 0.096 | 0.73** | 0.104 |
| Sex | ||||||
| Male | 0.84* | 0.081 | 0.84* | 0.082 | 0.86 | 0.083 |
| Race/ethnicity | ||||||
| Hispanic | 1.80*** | 0.111 | 1.31* | 0.129 | 1.41** | 0.131 |
| Non-Hispanic Black | 1.49*** | 0.107 | 1.07 | 0.108 | 0.98 | 0.124 |
| Non-Hispanic ANAI | 2.69*** | 0.287 | 1.79 | 0.303 | 1.73 | 0.287 |
| Non-Hispanic Asian | 0.72 | 0.255 | 1.24 | 0.258 | 1.21 | 0.267 |
| Non-Hispanic other | 0.97 | 0.277 | 1.15 | 0.281 | 1.27 | 0.288 |
| Education | ||||||
| High school | 0.71** | 0.119 | 0.74* | 0.123 | ||
| Some college | 0.58*** | 0.125 | 0.71** | 0.133 | ||
| BA or higher | 0.28*** | 0.166 | 0.42*** | 0.180 | ||
| Family income | ||||||
| $35,000 - $75,000 | 0.43*** | 0.103 | 0.47*** | 0.105 | ||
| $75,000 - $99,999 | 0.44*** | 0.204 | 0.46*** | 0.210 | ||
| $100,000 and over | 0.26*** | 0.235 | 0.28*** | 0.247 | ||
| Marital status | ||||||
| Marital dissolution | 1.39** | 0.119 | ||||
| Never married | 1.03 | 0.111 | ||||
| Smoking status | ||||||
| Former smoker | 1.22 | 0.147 | ||||
| Current smoker | 1.97*** | 0.100 | ||||
| Alcohol use | ||||||
| Lifetime abstainer | 1.44** | 0.125 | ||||
| Infrequent drinker | 2.10*** | 0.151 | ||||
| Former drinker | 2.23*** | 0.157 | ||||
| Obesity | ||||||
| Obese | 1.82*** | 0.096 | ||||
| Hypertension | ||||||
| Yes | 3.19*** | 0.100 | ||||
| Census region | ||||||
| Midwest | 1.07 | 0.151 | ||||
| West | 1.03 | 0.153 | ||||
| South | 1.01 | 0.141 | ||||
| 1 p<0.05; p<0.01; p<0.001 | ||||||
| Abbreviations: OR = Odds Ratio, SE = Standard Error | ||||||
Spanning headers in tab_spanner support markdown (**bold**). All subsequent calls such as bold_labels() apply to the merged object.
tbl_stack() stacks tables vertically under group headers — ideal for comparing an unweighted model to a survey-weighted model. The difference in confidence interval widths illustrates the design effect directly.
Code
# Unweighted model (plain glm)
m_unwt <- glm(
poor_health ~ us_born + genx_mil + male + race_eth15 + edu4 + famincome,
data = nhis_cc,
family = binomial()
)
tbl_unwt <- tbl_regression(m_unwt, exponentiate = TRUE) |>
remove_row_type(type = "reference") |>
bold_labels() |>
italicize_levels()
# Survey-weighted model (already fitted as m_B)
tbl_wt <- tbl_regression(m_B, exponentiate = TRUE) |>
remove_row_type(type = "reference") |>
bold_labels() |>
italicize_levels()
tbl_stack(
tbls = list(tbl_unwt, tbl_wt),
group_header = c(
"Unweighted (plain glm)",
"Survey-Weighted (svyglm + quasibinomial)"
)
) |>
modify_caption("**Table 4. Unweighted vs. Survey-Weighted Logistic Regression**") |>
as_gt() |>
tab_footnote(
footnote = "Models include: nativity, birth cohort, sex, race/ethnicity, education, and family income. OR = odds ratio; 95% CI in brackets.",
locations = cells_title()
)| Characteristic | OR | 95% CI | p-value |
|---|---|---|---|
| Unweighted (plain glm) | |||
| Nativity | |||
| Foreign born | 0.52 | 0.42, 0.63 | <0.001 |
| Birth cohort | |||
| Millennial | 0.56 | 0.49, 0.65 | <0.001 |
| Sex | |||
| Male | 0.84 | 0.74, 0.96 | 0.008 |
| Race/ethnicity | |||
| Hispanic | 1.20 | 0.99, 1.43 | 0.057 |
| Non-Hispanic Black | 1.18 | 0.99, 1.40 | 0.056 |
| Non-Hispanic ANAI | 2.39 | 1.50, 3.67 | <0.001 |
| Non-Hispanic Asian | 1.65 | 1.05, 2.48 | 0.022 |
| Non-Hispanic other | 1.38 | 0.84, 2.15 | 0.2 |
| Education | |||
| High school | 0.65 | 0.54, 0.78 | <0.001 |
| Some college | 0.48 | 0.40, 0.58 | <0.001 |
| BA or higher | 0.25 | 0.20, 0.32 | <0.001 |
| Family income | |||
| $35,000 - $75,000 | 0.42 | 0.36, 0.49 | <0.001 |
| $75,000 - $99,999 | 0.42 | 0.30, 0.57 | <0.001 |
| $100,000 and over | 0.27 | 0.20, 0.37 | <0.001 |
| Survey-Weighted (svyglm + quasibinomial) | |||
| Nativity | |||
| Foreign born | 0.60 | 0.46, 0.79 | <0.001 |
| Birth cohort | |||
| Millennial | 0.54 | 0.44, 0.65 | <0.001 |
| Sex | |||
| Male | 0.84 | 0.71, 0.98 | 0.030 |
| Race/ethnicity | |||
| Hispanic | 1.31 | 1.01, 1.69 | 0.038 |
| Non-Hispanic Black | 1.07 | 0.86, 1.32 | 0.5 |
| Non-Hispanic ANAI | 1.79 | 0.99, 3.25 | 0.054 |
| Non-Hispanic Asian | 1.24 | 0.75, 2.06 | 0.4 |
| Non-Hispanic other | 1.15 | 0.66, 2.00 | 0.6 |
| Education | |||
| High school | 0.71 | 0.56, 0.90 | 0.004 |
| Some college | 0.58 | 0.45, 0.74 | <0.001 |
| BA or higher | 0.28 | 0.20, 0.38 | <0.001 |
| Family income | |||
| $35,000 - $75,000 | 0.43 | 0.35, 0.52 | <0.001 |
| $75,000 - $99,999 | 0.44 | 0.30, 0.66 | <0.001 |
| $100,000 and over | 0.26 | 0.16, 0.41 | <0.001 |
| 1 Models include: nativity, birth cohort, sex, race/ethnicity, education, and family income. OR = odds ratio; 95% CI in brackets. | |||
| 2 Models include: nativity, birth cohort, sex, race/ethnicity, education, and family income. OR = odds ratio; 95% CI in brackets. | |||
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
Notice how confidence intervals widen in the survey-weighted model. This reflects the design effect: clustering and stratification reduce the effective sample size, increasing standard errors.
5 Part 4: Exporting to Word with flextable
For manuscript submission, most journals require .docx files. as_flex_table() converts any gtsummary object to a flextable, which is then exported with save_as_docx().
Code
ft <- tbl_merge(
tbls = list(t_A, t_B, t_C),
tab_spanner = c(
"**Model A: Demographic**",
"**Model B: + SES**",
"**Model C: Full**"
)
) |>
bold_labels() |>
modify_caption("Table 3. Logistic Regression: Predictors of Poor Self-Rated Health") |>
as_flex_table() |>
set_table_properties(width = 1, layout = "autofit") |>
fontsize(size = 10, part = "all") |>
font(fontname = "Times New Roman", part = "all") |>
add_footer_lines(
"Note. OR = odds ratio; SE = standard error. Survey-weighted quasibinomial logistic regression. ***p < 0.001; **p < 0.01; *p < 0.05."
)
ft
| Model A: Demographic | Model B: + SES | Model C: Full | |||
|---|---|---|---|---|---|---|
Characteristic | OR1 | SE | OR1 | SE | OR1 | SE |
Nativity | ||||||
Foreign born | 0.73* | 0.126 | 0.60*** | 0.136 | 0.78 | 0.140 |
Birth cohort | ||||||
Millennial | 0.60*** | 0.095 | 0.54*** | 0.096 | 0.73** | 0.104 |
Sex | ||||||
Male | 0.84* | 0.081 | 0.84* | 0.082 | 0.86 | 0.083 |
Race/ethnicity | ||||||
Hispanic | 1.80*** | 0.111 | 1.31* | 0.129 | 1.41** | 0.131 |
Non-Hispanic Black | 1.49*** | 0.107 | 1.07 | 0.108 | 0.98 | 0.124 |
Non-Hispanic ANAI | 2.69*** | 0.287 | 1.79 | 0.303 | 1.73 | 0.287 |
Non-Hispanic Asian | 0.72 | 0.255 | 1.24 | 0.258 | 1.21 | 0.267 |
Non-Hispanic other | 0.97 | 0.277 | 1.15 | 0.281 | 1.27 | 0.288 |
Education | ||||||
High school | 0.71** | 0.119 | 0.74* | 0.123 | ||
Some college | 0.58*** | 0.125 | 0.71** | 0.133 | ||
BA or higher | 0.28*** | 0.166 | 0.42*** | 0.180 | ||
Family income | ||||||
$35,000 - $75,000 | 0.43*** | 0.103 | 0.47*** | 0.105 | ||
$75,000 - $99,999 | 0.44*** | 0.204 | 0.46*** | 0.210 | ||
$100,000 and over | 0.26*** | 0.235 | 0.28*** | 0.247 | ||
Marital status | ||||||
Marital dissolution | 1.39** | 0.119 | ||||
Never married | 1.03 | 0.111 | ||||
Smoking status | ||||||
Former smoker | 1.22 | 0.147 | ||||
Current smoker | 1.97*** | 0.100 | ||||
Alcohol use | ||||||
Lifetime abstainer | 1.44** | 0.125 | ||||
Infrequent drinker | 2.10*** | 0.151 | ||||
Former drinker | 2.23*** | 0.157 | ||||
Obesity | ||||||
Obese | 1.82*** | 0.096 | ||||
Hypertension | ||||||
Yes | 3.19*** | 0.100 | ||||
Census region | ||||||
Midwest | 1.07 | 0.151 | ||||
West | 1.03 | 0.153 | ||||
South | 1.01 | 0.141 | ||||
1*p<0.05; **p<0.01; ***p<0.001 | ||||||
Abbreviations: OR = Odds Ratio, SE = Standard Error | ||||||
Note. OR = odds ratio; SE = standard error. Survey-weighted quasibinomial logistic regression. ***p < 0.001; **p < 0.01; *p < 0.05. | ||||||
Code
# Always save in its own separate chunk
save_as_docx(ft, path = "output/Table3_regression.docx")save_as_docx() rule
Always call save_as_docx() in its own separate chunk. Sharing a chunk with table-building code causes Quarto to render the table inline and write the file simultaneously, producing duplicate output.
You can also build a flextable directly from any data frame — without going through gtsummary — when you need full control over layout.
Code
summary_tbl <- nhis_cc |>
group_by(us_born) |>
summarise(
N = n(),
Pct_poor = round(mean(poor_health) * 100, 1),
.groups = "drop"
)
flextable(summary_tbl) |>
set_header_labels(
us_born = "Nativity",
N = "N (unweighted)",
Pct_poor = "% Poor Health"
) |>
bold(part = "header") |>
bg(bg = "#f0f0f0", part = "header") |>
add_footer_lines("Source: NHIS teaching sample, n = 16,880 complete cases.") |>
autofit() |>
set_caption("Summary of Poor Health by Nativity")Nativity | N (unweighted) | % Poor Health |
|---|---|---|
U.S. born | 13,292 | 6.9 |
Foreign born | 3,588 | 5.8 |
Source: NHIS teaching sample, n = 16,880 complete cases. | ||
6 Quick Reference
| Function | Purpose |
|---|---|
tbl_summary() |
Unweighted descriptive statistics |
tbl_svysummary() |
Survey-weighted descriptive statistics |
tbl_regression() |
Regression table from model object |
tbl_merge() |
Place tables side by side |
tbl_stack() |
Stack tables vertically |
add_overall() |
Add total/overall column |
add_p() |
Append p-values for group comparisons |
type = list(var ~ 'continuous') |
Display variable as mean (SD) |
type = list(var ~ 'categorical') |
Display variable as counts (%) |
statistic = list(...) |
Customize the displayed statistic |
bold_labels() |
Bold variable name rows |
italicize_levels() |
Italicize category rows |
bold_p(t = 0.05) |
Bold significant p-values |
remove_row_type('reference') |
Drop reference category rows |
add_significance_stars() |
Add * ** *** to estimates |
modify_header() |
Rewrite column headers (use `{n}` for counts) |
modify_caption() |
Add or change table caption |
show_header_names() |
Print internal column name reference |
as_gt() |
Convert to gt for HTML rendering |
as_flex_table() |
Convert to flextable (for Word export) |
options(gtsummary.use_ftExtra = TRUE) |
Preserve markdown formatting in Word export |
set_gtsummary_theme(...) |
Apply a theme globally for the session |
| Task | Package | Function |
|---|---|---|
| Descriptive Table 1 | gtsummary | tbl_summary() / tbl_svysummary() |
| Single regression table | gtsummary | tbl_regression() |
| Multiple models side by side | gtsummary | tbl_merge() |
| Before/after comparison (stacked) | gtsummary | tbl_stack() |
| Word / .docx export | flextable | as_flex_table() + save_as_docx() |
| Custom standalone table | flextable | flextable() |