Overview

Producing clean, publication-ready tables is one of the most important — and most under-taught — skills in applied social science and public health research. This tutorial walks through a complete workflow using two complementary R packages:

Package	Strength	Best for
gtsummary	Automatic table building with survey support	Table 1, regression tables
flextable	Fine-grained formatting control	Word / .docx export

All examples use a 20,000-observation random sample of the National Health Interview Survey (NHIS), linked to the National Death Index through the NHIS Linked Mortality Files (LMF). The analysis illustrates a common demography and public health workflow: describing a sample, then modeling a binary health outcome with survey-weighted logistic regression.

What you will learn

How to build a complete Table 1 (descriptive statistics), with and without survey weights
How to control variable type summaries (continuous vs. categorical)
How to produce and format regression tables from svyglm() output
Advanced gtsummary techniques: removing reference rows, significance stars, managing footnotes, tbl_merge(), and tbl_stack()
How to export polished tables to Word via flextable

1 Data and Variables

The analytic sample contains 20000 respondents drawn from NHIS waves collected between 1997 and 2018, with variables spanning demographics, socioeconomic status, health behaviors, and health outcomes.

Table 1: Variables in the Teaching Dataset

Variable	Label
us_born	U.S.-born vs. Foreign-born
genx_mil	Birth cohort (Generation X / Millennial)
age_cat	Age group
male	Sex
race_eth15	Race/ethnicity (6 categories)
edu4	Educational attainment (4 levels)
famincome	Family income category
marital_stat2	Marital status
region2	U.S. Census region
smoker	Current smoker
alcohol	Heavy alcohol use
obese	Obesity status
hyperten	Hypertension diagnosis
healthstatus	Self-rated health (Healthy / Poor)
poor_health	Poor self-rated health (0 = Healthy, 1 = Poor)
wgt	Mortality analysis weight (mortwtsa)
strata	Survey stratum
psu	Primary sampling unit

Survey design

NHIS uses a complex multistage probability sample. Correct inference requires declaring the survey design before any analysis:

Code

library(survey)
options(survey.lonely.psu = "adjust")

des <- svydesign(
  id      = ~psu,       # primary sampling unit
  strata  = ~strata,    # stratification variable
  weights = ~wgt,       # mortality analysis weight (mortwtsa)
  data    = nhis,
  nest    = TRUE        # PSUs nested within strata
)

Important

Always use nest = TRUE when PSU IDs are not unique across strata (which is the case with NHIS). Setting options(survey.lonely.psu = "adjust") prevents errors in strata with only one PSU.

Global gtsummary options

Two global options should be set once per session, before any table is built. They apply automatically to every subsequent tbl_summary(), tbl_svysummary(), and tbl_regression() call.

Code

options(gtsummary.use_ftExtra = TRUE)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))

Why these two options matter

options(gtsummary.use_ftExtra = TRUE)

Activates the ftExtra backend for flextable rendering. By default, gtsummary uses a basic text renderer when converting tables to flextable via as_flex_table(). With ftExtra enabled, markdown in cell content (bold labels, italic levels, superscripts, footnote symbols) is preserved faithfully in the Word export. Without it, formatted text can appear as raw markdown syntax in the .docx file.

set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))

Applies the compact theme globally. Compared to the default theme, compact reduces row padding and font size, producing tables that fit comfortably on a journal page. Using set_gtsummary_theme() as the outer wrapper is the recommended pattern in gtsummary 2.x: it registers the theme with the package engine so it persists across all tables in the session. Passing set_theme = TRUE inside theme_gtsummary_compact() ensures the theme is active immediately even if called in isolation.

Both lines are already active in this tutorial’s hidden setup chunk.

2 Part 1: Descriptive Tables (Table 1)

Table 1 in a manuscript reports sample characteristics. The tabs below progress from the simplest approach to a fully stratified, weighted table ready for publication.

tbl_summary() works directly on a data frame — no survey design needed. Appropriate for convenience samples, or as a first pass before adding weights.

Code

nhis |>
  select(us_born, genx_mil, age_cat, male, race_eth15,
         edu4, famincome, healthstatus) |>
  tbl_summary(
    label = list(
      us_born      ~ "Nativity",
      genx_mil     ~ "Birth cohort",
      age_cat      ~ "Age group",
      male         ~ "Sex",
      race_eth15   ~ "Race/ethnicity",
      edu4         ~ "Education",
      famincome    ~ "Family income",
      healthstatus ~ "Self-rated health"
    ),
    missing = "no"
  ) |>
  bold_labels() |>
  modify_caption("**Table 1. Sample Characteristics (Unweighted)**") |>
  as_gt()

**Table 1. Sample Characteristics (Unweighted)**
Characteristic	N = 20,000¹
Nativity
U.S. born	15,664 (78%)
Foreign born	4,336 (22%)
Birth cohort
Generation X	13,117 (66%)
Millennial	6,883 (34%)
Age group
18-24	4,745 (24%)
25-34	8,430 (42%)
35-44	5,304 (27%)
45-55	1,521 (7.6%)
Sex
Female	10,956 (55%)
Male	9,044 (45%)
Race/ethnicity
Non-Hispanic white	11,119 (56%)
Hispanic	4,519 (23%)
Non-Hispanic Black	2,906 (15%)
Non-Hispanic ANAI	172 (0.9%)
Non-Hispanic Asian	790 (4.0%)
Non-Hispanic other	494 (2.5%)
Education
Less than HS	2,482 (12%)
High school	5,340 (27%)
Some college	6,635 (33%)
BA or higher	5,497 (28%)
Family income
Less than $35,000	8,338 (45%)
$35,000 - $75,000	6,666 (36%)
$75,000 - $99,999	1,350 (7.3%)
$100,000 and over	2,184 (12%)
Self-rated health
Healthy	18,580 (93%)
Poor	1,420 (7.1%)
¹ n (%)

tbl_svysummary() accepts the svydesign object and weights all statistics automatically. Proportions and means can differ substantially from the unweighted version when survey weights correct for unequal selection probabilities.

Code

des |>
  tbl_svysummary(
    include = c(us_born, genx_mil, age_cat, male, race_eth15,
                edu4, famincome, healthstatus),
    label = list(
      us_born      ~ "Nativity",
      genx_mil     ~ "Birth cohort",
      age_cat      ~ "Age group",
      male         ~ "Sex",
      race_eth15   ~ "Race/ethnicity",
      edu4         ~ "Education",
      famincome    ~ "Family income",
      healthstatus ~ "Self-rated health"
    ),
    missing = "no"
  ) |>
  bold_labels() |>
  modify_caption("**Table 1. Sample Characteristics (Survey-Weighted)**") |>
  as_gt()

**Table 1. Sample Characteristics (Survey-Weighted)**
Characteristic	N = 20,726¹
Nativity
U.S. born	16,683 (80%)
Foreign born	4,042 (20%)
Birth cohort
Generation X	12,794 (62%)
Millennial	7,931 (38%)
Age group
18-24	5,647 (27%)
25-34	7,962 (38%)
35-44	5,418 (26%)
45-55	1,698 (8.2%)
Sex
Female	10,499 (51%)
Male	10,226 (49%)
Race/ethnicity
Non-Hispanic white	12,701 (61%)
Hispanic	3,896 (19%)
Non-Hispanic Black	2,682 (13%)
Non-Hispanic ANAI	185 (0.9%)
Non-Hispanic Asian	787 (3.8%)
Non-Hispanic other	473 (2.3%)
Education
Less than HS	2,350 (11%)
High school	5,702 (28%)
Some college	6,894 (33%)
BA or higher	5,725 (28%)
Family income
Less than $35,000	6,654 (35%)
$35,000 - $75,000	7,447 (39%)
$75,000 - $99,999	1,693 (8.9%)
$100,000 and over	3,164 (17%)
Self-rated health
Healthy	19,386 (94%)
Poor	1,339 (6.5%)
¹ n (%)

Add by = to compare groups side by side. Chain add_overall() for a total column, and add_p() for group-comparison p-values. Use modify_header() with {n} (unweighted count) to show interpretable sample sizes in column headers.

Code

tbl1 <- des |>
  tbl_svysummary(
    by = us_born,
    include = c(genx_mil, age_cat, male, race_eth15,
                edu4, famincome, smoker, alcohol, obese,
                hyperten, healthstatus),
    label = list(
      genx_mil     ~ "Birth cohort",
      age_cat      ~ "Age group",
      male         ~ "Sex",
      race_eth15   ~ "Race/ethnicity",
      edu4         ~ "Education",
      famincome    ~ "Family income",
      smoker       ~ "Current smoker",
      alcohol      ~ "Heavy alcohol use",
      obese        ~ "Obese",
      hyperten     ~ "Hypertension",
      healthstatus ~ "Self-rated health"
    ),
    missing = "no"
  ) |>
  add_overall(last = FALSE) |>
  add_p(test.args = all_tests("svy.wilcox.test") ~ list(design = des)) |>
  bold_labels() |>
  italicize_levels() |>
  bold_p(t = 0.05) |>
  modify_header(
    stat_0 ~ "**Overall**\nn = {n_unweighted}",
    stat_1 ~ "**U.S.-Born**\nn = {n_unweighted}",
    stat_2 ~ "**Foreign-Born**\nn = {n_unweighted}"
  ) |>
  modify_caption("**Table 1. Sample Characteristics by Nativity**")

tbl1 |> as_gt()

**Table 1. Sample Characteristics by Nativity**
Characteristic	Overall n = 20000¹	U.S.-Born n = 15664¹	Foreign-Born n = 4336¹	p-value²
Birth cohort				<0.001
Generation X	12,794 (62%)	10,058 (60%)	2,736 (68%)
Millennial	7,931 (38%)	6,626 (40%)	1,306 (32%)
Age group				<0.001
18-24	5,647 (27%)	4,921 (29%)	726 (18%)
25-34	7,962 (38%)	6,304 (38%)	1,659 (41%)
35-44	5,418 (26%)	4,136 (25%)	1,282 (32%)
45-55	1,698 (8.2%)	1,323 (7.9%)	375 (9.3%)
Sex				0.10
Female	10,499 (51%)	8,510 (51%)	1,990 (49%)
Male	10,226 (49%)	8,174 (49%)	2,052 (51%)
Race/ethnicity				<0.001
Non-Hispanic white	12,701 (61%)	12,080 (72%)	622 (15%)
Hispanic	3,896 (19%)	1,717 (10%)	2,179 (54%)
Non-Hispanic Black	2,682 (13%)	2,332 (14%)	350 (8.7%)
Non-Hispanic ANAI	185 (0.9%)	179 (1.1%)	6 (0.2%)
Non-Hispanic Asian	787 (3.8%)	189 (1.1%)	599 (15%)
Non-Hispanic other	473 (2.3%)	187 (1.1%)	286 (7.1%)
Education				<0.001
Less than HS	2,350 (11%)	1,335 (8.0%)	1,015 (25%)
High school	5,702 (28%)	4,725 (28%)	977 (24%)
Some college	6,894 (33%)	6,005 (36%)	889 (22%)
BA or higher	5,725 (28%)	4,599 (28%)	1,126 (28%)
Family income				<0.001
Less than $35,000	6,654 (35%)	5,098 (33%)	1,556 (43%)
$35,000 - $75,000	7,447 (39%)	6,093 (40%)	1,354 (37%)
$75,000 - $99,999	1,693 (8.9%)	1,451 (9.5%)	242 (6.6%)
$100,000 and over	3,164 (17%)	2,668 (17%)	496 (14%)
Current smoker				<0.001
Never smoked	13,761 (66%)	10,518 (63%)	3,243 (80%)
Former smoker	2,553 (12%)	2,190 (13%)	363 (9.0%)
Current smoker	4,383 (21%)	3,950 (24%)	433 (11%)
Heavy alcohol use				<0.001
Current drinker	14,162 (68%)	11,972 (72%)	2,190 (54%)
Lifetime abstainer	4,659 (22%)	3,132 (19%)	1,527 (38%)
Infrequent drinker	1,054 (5.1%)	866 (5.2%)	188 (4.6%)
Former drinker	851 (4.1%)	713 (4.3%)	137 (3.4%)
Obese				<0.001
Not obese	14,391 (76%)	11,374 (75%)	3,016 (82%)
Obese	4,527 (24%)	3,865 (25%)	662 (18%)
Hypertension	2,437 (12%)	2,105 (13%)	332 (8.2%)	<0.001
Self-rated health				0.3
Healthy	19,386 (94%)	15,585 (93%)	3,801 (94%)
Poor	1,339 (6.5%)	1,099 (6.6%)	241 (6.0%)
¹ n (%)
² Pearson’s X^2: Rao & Scott adjustment

Header placeholder guide for tbl_svysummary

Placeholder	Returns	Use when
`{n_unweighted}`	Unweighted count per group	Column headers (most readable)
`{N_unweighted}`	Total unweighted N	Overall column header
`{n}`	Sum of weights per group	Reporting weighted N
`{N}`	Total sum of weights	Rarely useful in headers

Both {n} and {N} return decimals in weighted surveys — use {n_unweighted} for clean integer counts in column headers.

2.1 Controlling Variable Type Summaries

By default, gtsummary guesses whether each variable is categorical or continuous. You can override this with the type = argument, and customize the displayed statistics with statistic =.

Code

nhis |>
  select(age_cat, edu4, poor_health) |>
  tbl_summary(missing = "no") |>
  bold_labels() |>
  as_gt()

Table 2

Characteristic	N = 20,000¹
Age group
18-24	4,745 (24%)
25-34	8,430 (42%)
35-44	5,304 (27%)
45-55	1,521 (7.6%)
Education
Less than HS	2,482 (12%)
High school	5,340 (27%)
Some college	6,635 (33%)
BA or higher	5,497 (28%)
Poor health	1,420 (7.1%)
¹ n (%)

Use type = list(var ~ "continuous") to display mean ± SD instead of counts. Useful for variables that happen to be stored as factors but have a meaningful numeric interpretation (e.g., an age index).

Code

nhis |>
  select(age_cat, edu4, poor_health) |>
  tbl_summary(
    type    = list(poor_health ~ "continuous"),
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    missing = "no"
  ) |>
  bold_labels() |>
  as_gt()

Table 3

Characteristic	N = 20,000¹
Age group
18-24	4,745 (24%)
25-34	8,430 (42%)
35-44	5,304 (27%)
45-55	1,521 (7.6%)
Education
Less than HS	2,482 (12%)
High school	5,340 (27%)
Some college	6,635 (33%)
BA or higher	5,497 (28%)
Poor health	0 (0)
¹ n (%); Mean (SD)

Use type = list(var ~ "categorical") to show counts and percentages for a numeric variable (e.g., an integer 0/1 outcome where you want both values displayed explicitly).

Code

nhis |>
  select(age_cat, edu4, poor_health) |>
  tbl_summary(
    type    = list(poor_health ~ "categorical"),
    missing = "no"
  ) |>
  bold_labels() |>
  as_gt()

Table 4

Characteristic	N = 20,000¹
Age group
18-24	4,745 (24%)
25-34	8,430 (42%)
35-44	5,304 (27%)
45-55	1,521 (7.6%)
Education
Less than HS	2,482 (12%)
High school	5,340 (27%)
Some college	6,635 (33%)
BA or higher	5,497 (28%)
Poor health
0	18,580 (93%)
1	1,420 (7.1%)
¹ n (%)

Key gtsummary verbs for Table 1

Function	Purpose
`tbl_svysummary(by = ...)`	Stratify columns by a grouping variable
`add_overall()`	Append an overall (unstratified) column
`add_p()`	Add p-values for group comparisons
`type = list(var ~ "continuous")`	Force a variable to display mean/SD
`type = list(var ~ "categorical")`	Force a variable to display counts/%
`statistic = list(...)`	Customize the displayed summary statistic
`bold_labels()`	Bold the variable name rows
`italicize_levels()`	Italicize the category rows
`bold_p(t = 0.05)`	Bold significant p-values
`modify_header()`	Rewrite any column header (use `{n}` for counts)
`modify_caption()`	Add or change table caption

3 Part 2: Regression Tables

We model poor self-rated health (binary: 1 = Poor, 0 = Healthy) using survey-weighted logistic regression via svyglm() with a quasibinomial() family.

Code

# Model A: Demographic only
m_A <- svyglm(
  poor_health ~ us_born + genx_mil + male + race_eth15,
  design = des_cc,
  family  = quasibinomial()
)

# Model B: + Socioeconomic status
m_B <- svyglm(
  poor_health ~ us_born + genx_mil + male + race_eth15 +
                edu4 + famincome,
  design = des_cc,
  family  = quasibinomial()
)

# Model C: Full model
m_C <- svyglm(
  poor_health ~ us_born + genx_mil + male + race_eth15 +
                edu4 + famincome + marital_stat2 +
                smoker + alcohol + obese + hyperten + region2,
  design = des_cc,
  family  = quasibinomial()
)

The four tabs below demonstrate progressively more refined formatting, starting from the default output and building toward a publication-ready table.

tbl_regression() converts any model object into a formatted table. Set exponentiate = TRUE to display odds ratios instead of log-odds.

Code

tbl_regression(m_A, exponentiate = TRUE) |>
  bold_labels() |>
  as_gt()

Table 5

Characteristic	OR	95% CI	p-value
Nativity
U.S. born	—	—
Foreign born	0.73	0.57, 0.93	0.012
Birth cohort
Generation X	—	—
Millennial	0.60	0.50, 0.73	<0.001
Sex
Female	—	—
Male	0.84	0.71, 0.98	0.027
Race/ethnicity
Non-Hispanic white	—	—
Hispanic	1.80	1.45, 2.23	<0.001
Non-Hispanic Black	1.49	1.21, 1.84	<0.001
Non-Hispanic ANAI	2.69	1.53, 4.71	<0.001
Non-Hispanic Asian	0.72	0.44, 1.19	0.2
Non-Hispanic other	0.97	0.56, 1.67	>0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

By default, gtsummary shows a reference row for each categorical predictor. These rows add visual clutter. A single call to remove_row_type(type = "reference") cleans them up.

Code

tbl_regression(m_A, exponentiate = TRUE) |>
  bold_labels() |>
  italicize_levels() |>
  as_gt()

Table 6

Characteristic	OR	95% CI	p-value
Nativity
U.S. born	—	—
Foreign born	0.73	0.57, 0.93	0.012
Birth cohort
Generation X	—	—
Millennial	0.60	0.50, 0.73	<0.001
Sex
Female	—	—
Male	0.84	0.71, 0.98	0.027
Race/ethnicity
Non-Hispanic white	—	—
Hispanic	1.80	1.45, 2.23	<0.001
Non-Hispanic Black	1.49	1.21, 1.84	<0.001
Non-Hispanic ANAI	2.69	1.53, 4.71	<0.001
Non-Hispanic Asian	0.72	0.44, 1.19	0.2
Non-Hispanic other	0.97	0.56, 1.67	>0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Code

tbl_regression(m_A, exponentiate = TRUE) |>
  remove_row_type(type = "reference") |>
  bold_labels() |>
  italicize_levels() |>
  as_gt()

Table 7

Characteristic	OR	95% CI	p-value
Nativity
Foreign born	0.73	0.57, 0.93	0.012
Birth cohort
Millennial	0.60	0.50, 0.73	<0.001
Sex
Male	0.84	0.71, 0.98	0.027
Race/ethnicity
Hispanic	1.80	1.45, 2.23	<0.001
Non-Hispanic Black	1.49	1.21, 1.84	<0.001
Non-Hispanic ANAI	2.69	1.53, 4.71	<0.001
Non-Hispanic Asian	0.72	0.44, 1.19	0.2
Non-Hispanic other	0.97	0.56, 1.67	>0.9
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Some journals prefer the European style: coefficient (SE) with significance stars rather than a confidence interval column. Use add_significance_stars() with conf.int = FALSE.

Code

tbl_regression(
  m_A,
  exponentiate = TRUE,
  conf.int     = FALSE
) |>
  add_significance_stars(
    hide_ci  = TRUE,
    hide_se  = FALSE,
    pattern  = "{estimate}{stars}"
  ) |>
  remove_row_type(type = "reference") |>
  bold_labels() |>
  italicize_levels() |>
  modify_header(estimate ~ "**OR**", std.error ~ "**SE**") |>
  modify_caption("**Table 2. Model A: Demographic Predictors of Poor Health**") |>
  as_gt()

Table 8: Table 2. Model A: Demographic Predictors of Poor Health

Characteristic	OR¹	SE
Nativity
Foreign born	0.73*	0.126
Birth cohort
Millennial	0.60***	0.095
Sex
Male	0.84*	0.081
Race/ethnicity
Hispanic	1.80***	0.111
Non-Hispanic Black	1.49***	0.107
Non-Hispanic ANAI	2.69***	0.287
Non-Hispanic Asian	0.72	0.255
Non-Hispanic other	0.97	0.277
¹ p<0.05; p<0.01; p<0.001
Abbreviations: OR = Odds Ratio, SE = Standard Error

Note

pattern = "{estimate}{stars}" places the stars immediately after the estimate. Use "{estimate} {stars}" to add a space. hide_ci = TRUE drops the CI columns entirely.

When you drop the CI columns, the default footnote defining “CI” becomes orphaned. Remove it by filtering the internal table_styling$abbreviation tibble directly.

Code

tbl_no_ci <- tbl_regression(
  m_B,
  exponentiate = TRUE,
  conf.int     = FALSE
) |>
  add_significance_stars(
    hide_ci  = TRUE,
    hide_se  = FALSE,
    pattern  = "{estimate}{stars}"
  ) |>
  remove_row_type(type = "reference") |>
  bold_labels() |>
  italicize_levels() |>
  modify_header(estimate ~ "**OR**", std.error ~ "**SE**")

# Surgically remove the orphaned CI footnote
tbl_no_ci$table_styling$abbreviation <-
  tbl_no_ci$table_styling$abbreviation |>
  dplyr::filter(column != "conf.low")

tbl_no_ci |>
  modify_caption("**Table 2. Model B: Demographic + SES Predictors**") |>
  as_gt()

Table 9: Table 2. Model B: Demographic + SES Predictors

Characteristic	OR¹	SE
Nativity
Foreign born	0.60***	0.136
Birth cohort
Millennial	0.54***	0.096
Sex
Male	0.84*	0.082
Race/ethnicity
Hispanic	1.31*	0.129
Non-Hispanic Black	1.07	0.108
Non-Hispanic ANAI	1.79	0.303
Non-Hispanic Asian	1.24	0.258
Non-Hispanic other	1.15	0.281
Education
High school	0.71**	0.119
Some college	0.58***	0.125
BA or higher	0.28***	0.166
Family income
$35,000 - $75,000	0.43***	0.103
$75,000 - $99,999	0.44***	0.204
$100,000 and over	0.26***	0.235
¹ p<0.05; p<0.01; p<0.001
Abbreviations: OR = Odds Ratio, SE = Standard Error

Before customizing column headers with modify_header(), use show_header_names() to print the exact internal column names gtsummary assigns. This eliminates guesswork.

Code

tbl_regression(m_A, exponentiate = TRUE) |>
  show_header_names()

#> Column Name   Header                 N*             N_event*       
#> label         "**Characteristic**"   17,295 <dbl>   1,040 <dbl>    
#> estimate      "**OR**"               17,295 <dbl>   1,040 <dbl>    
#> conf.low      "**95% CI**"           17,295 <dbl>   1,040 <dbl>    
#> p.value       "**p-value**"          17,295 <dbl>   1,040 <dbl>

"estimate" = OR column, "conf.low" / "conf.high" = CI bounds, "std.error" = SE. Pass these strings to modify_header().

4 Part 3: Combining Tables

tbl_merge() places multiple model tables side by side under optional spanning headers — the standard multi-model format for journal manuscripts.

Code

# Helper: builds a consistently formatted model table
make_tbl <- function(model) {
  tbl_regression(
    model,
    exponentiate = TRUE,
    conf.int     = FALSE
  ) |>
    add_significance_stars(
      hide_ci  = TRUE,
      hide_se  = FALSE,
      pattern  = "{estimate}{stars}"
    ) |>
    remove_row_type(type = "reference") |>
    bold_labels() |>
    italicize_levels() |>
    modify_header(estimate ~ "**OR**", std.error ~ "**SE**")
}

t_A <- make_tbl(m_A)
t_B <- make_tbl(m_B)
t_C <- make_tbl(m_C)

# Remove orphaned CI footnotes
for (tbl_obj in list(t_A, t_B, t_C)) {
  tbl_obj$table_styling$abbreviation <-
    tbl_obj$table_styling$abbreviation |>
    dplyr::filter(column != "conf.low")
}

tbl_merge(
  tbls      = list(t_A, t_B, t_C),
  tab_spanner = c(
    "**Model A: Demographic**",
    "**Model B: + SES**",
    "**Model C: Full**"
  )
) |>
  bold_labels() |>
  modify_caption("**Table 3. Logistic Regression: Predictors of Poor Self-Rated Health**") |>
  as_gt() |>
  tab_footnote(
    footnote = "OR = odds ratio; SE = standard error. Survey-weighted quasibinomial logistic regression. ***p < 0.001; **p < 0.01; *p < 0.05.",
    locations = cells_title()
  )

Table 10: Table 3. Logistic Regression: Predictors of Poor Self-Rated Health

Characteristic	Model A: Demographic		Model B: + SES		Model C: Full
Characteristic	OR¹	SE	OR¹	SE	OR¹	SE
Nativity
Foreign born	0.73*	0.126	0.60***	0.136	0.78	0.140
Birth cohort
Millennial	0.60***	0.095	0.54***	0.096	0.73**	0.104
Sex
Male	0.84*	0.081	0.84*	0.082	0.86	0.083
Race/ethnicity
Hispanic	1.80***	0.111	1.31*	0.129	1.41**	0.131
Non-Hispanic Black	1.49***	0.107	1.07	0.108	0.98	0.124
Non-Hispanic ANAI	2.69***	0.287	1.79	0.303	1.73	0.287
Non-Hispanic Asian	0.72	0.255	1.24	0.258	1.21	0.267
Non-Hispanic other	0.97	0.277	1.15	0.281	1.27	0.288
Education
High school			0.71**	0.119	0.74*	0.123
Some college			0.58***	0.125	0.71**	0.133
BA or higher			0.28***	0.166	0.42***	0.180
Family income
$35,000 - $75,000			0.43***	0.103	0.47***	0.105
$75,000 - $99,999			0.44***	0.204	0.46***	0.210
$100,000 and over			0.26***	0.235	0.28***	0.247
Marital status
Marital dissolution					1.39**	0.119
Never married					1.03	0.111
Smoking status
Former smoker					1.22	0.147
Current smoker					1.97***	0.100
Alcohol use
Lifetime abstainer					1.44**	0.125
Infrequent drinker					2.10***	0.151
Former drinker					2.23***	0.157
Obesity
Obese					1.82***	0.096
Hypertension
Yes					3.19***	0.100
Census region
Midwest					1.07	0.151
West					1.03	0.153
South					1.01	0.141
¹ p<0.05; p<0.01; p<0.001
Abbreviations: OR = Odds Ratio, SE = Standard Error

Tip

Spanning headers in tab_spanner support markdown (**bold**). All subsequent calls such as bold_labels() apply to the merged object.

tbl_stack() stacks tables vertically under group headers — ideal for comparing an unweighted model to a survey-weighted model. The difference in confidence interval widths illustrates the design effect directly.

Code

# Unweighted model (plain glm)
m_unwt <- glm(
  poor_health ~ us_born + genx_mil + male + race_eth15 + edu4 + famincome,
  data   = nhis_cc,
  family = binomial()
)

tbl_unwt <- tbl_regression(m_unwt, exponentiate = TRUE) |>
  remove_row_type(type = "reference") |>
  bold_labels() |>
  italicize_levels()

# Survey-weighted model (already fitted as m_B)
tbl_wt <- tbl_regression(m_B, exponentiate = TRUE) |>
  remove_row_type(type = "reference") |>
  bold_labels() |>
  italicize_levels()

tbl_stack(
  tbls         = list(tbl_unwt, tbl_wt),
  group_header = c(
    "Unweighted (plain glm)",
    "Survey-Weighted (svyglm + quasibinomial)"
  )
) |>
  modify_caption("**Table 4. Unweighted vs. Survey-Weighted Logistic Regression**") |>
  as_gt() |>
  tab_footnote(
    footnote = "Models include: nativity, birth cohort, sex, race/ethnicity, education, and family income. OR = odds ratio; 95% CI in brackets.",
    locations = cells_title()
  )

Table 11: Table 4. Unweighted vs. Survey-Weighted Logistic Regression

Characteristic	OR	95% CI	p-value
Unweighted (plain glm)
Nativity
Foreign born	0.52	0.42, 0.63	<0.001
Birth cohort
Millennial	0.56	0.49, 0.65	<0.001
Sex
Male	0.84	0.74, 0.96	0.008
Race/ethnicity
Hispanic	1.20	0.99, 1.43	0.057
Non-Hispanic Black	1.18	0.99, 1.40	0.056
Non-Hispanic ANAI	2.39	1.50, 3.67	<0.001
Non-Hispanic Asian	1.65	1.05, 2.48	0.022
Non-Hispanic other	1.38	0.84, 2.15	0.2
Education
High school	0.65	0.54, 0.78	<0.001
Some college	0.48	0.40, 0.58	<0.001
BA or higher	0.25	0.20, 0.32	<0.001
Family income
$35,000 - $75,000	0.42	0.36, 0.49	<0.001
$75,000 - $99,999	0.42	0.30, 0.57	<0.001
$100,000 and over	0.27	0.20, 0.37	<0.001
Survey-Weighted (svyglm + quasibinomial)
Nativity
Foreign born	0.60	0.46, 0.79	<0.001
Birth cohort
Millennial	0.54	0.44, 0.65	<0.001
Sex
Male	0.84	0.71, 0.98	0.030
Race/ethnicity
Hispanic	1.31	1.01, 1.69	0.038
Non-Hispanic Black	1.07	0.86, 1.32	0.5
Non-Hispanic ANAI	1.79	0.99, 3.25	0.054
Non-Hispanic Asian	1.24	0.75, 2.06	0.4
Non-Hispanic other	1.15	0.66, 2.00	0.6
Education
High school	0.71	0.56, 0.90	0.004
Some college	0.58	0.45, 0.74	<0.001
BA or higher	0.28	0.20, 0.38	<0.001
Family income
$35,000 - $75,000	0.43	0.35, 0.52	<0.001
$75,000 - $99,999	0.44	0.30, 0.66	<0.001
$100,000 and over	0.26	0.16, 0.41	<0.001
¹ Models include: nativity, birth cohort, sex, race/ethnicity, education, and family income. OR = odds ratio; 95% CI in brackets.
² Models include: nativity, birth cohort, sex, race/ethnicity, education, and family income. OR = odds ratio; 95% CI in brackets.
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Note

Notice how confidence intervals widen in the survey-weighted model. This reflects the design effect: clustering and stratification reduce the effective sample size, increasing standard errors.

5 Part 4: Exporting to Word with flextable

For manuscript submission, most journals require .docx files. as_flex_table() converts any gtsummary object to a flextable, which is then exported with save_as_docx().

Code

ft <- tbl_merge(
  tbls      = list(t_A, t_B, t_C),
  tab_spanner = c(
    "**Model A: Demographic**",
    "**Model B: + SES**",
    "**Model C: Full**"
  )
) |>
  bold_labels() |>
  modify_caption("Table 3. Logistic Regression: Predictors of Poor Self-Rated Health") |>
  as_flex_table() |>
  set_table_properties(width = 1, layout = "autofit") |>
  fontsize(size = 10, part = "all") |>
  font(fontname = "Times New Roman", part = "all") |>
  add_footer_lines(
    "Note. OR = odds ratio; SE = standard error. Survey-weighted quasibinomial logistic regression. ***p < 0.001; **p < 0.01; *p < 0.05."
  )

ft

	Model A: Demographic		Model B: + SES		Model C: Full
Characteristic	OR1	SE	OR1	SE	OR1	SE
Nativity
Foreign born	0.73*	0.126	0.60***	0.136	0.78	0.140
Birth cohort
Millennial	0.60***	0.095	0.54***	0.096	0.73**	0.104
Sex
Male	0.84*	0.081	0.84*	0.082	0.86	0.083
Race/ethnicity
Hispanic	1.80***	0.111	1.31*	0.129	1.41**	0.131
Non-Hispanic Black	1.49***	0.107	1.07	0.108	0.98	0.124
Non-Hispanic ANAI	2.69***	0.287	1.79	0.303	1.73	0.287
Non-Hispanic Asian	0.72	0.255	1.24	0.258	1.21	0.267
Non-Hispanic other	0.97	0.277	1.15	0.281	1.27	0.288
Education
High school			0.71**	0.119	0.74*	0.123
Some college			0.58***	0.125	0.71**	0.133
BA or higher			0.28***	0.166	0.42***	0.180
Family income
$35,000 - $75,000			0.43***	0.103	0.47***	0.105
$75,000 - $99,999			0.44***	0.204	0.46***	0.210
$100,000 and over			0.26***	0.235	0.28***	0.247
Marital status
Marital dissolution					1.39**	0.119
Never married					1.03	0.111
Smoking status
Former smoker					1.22	0.147
Current smoker					1.97***	0.100
Alcohol use
Lifetime abstainer					1.44**	0.125
Infrequent drinker					2.10***	0.151
Former drinker					2.23***	0.157
Obesity
Obese					1.82***	0.096
Hypertension
Yes					3.19***	0.100
Census region
Midwest					1.07	0.151
West					1.03	0.153
South					1.01	0.141
1p<0.05; p<0.01; **p<0.001
Abbreviations: OR = Odds Ratio, SE = Standard Error
Note. OR = odds ratio; SE = standard error. Survey-weighted quasibinomial logistic regression. *p < 0.001; p < 0.01; *p < 0.05.

Code

# Always save in its own separate chunk
save_as_docx(ft, path = "output/Table3_regression.docx")

save_as_docx() rule

Always call save_as_docx() in its own separate chunk. Sharing a chunk with table-building code causes Quarto to render the table inline and write the file simultaneously, producing duplicate output.

You can also build a flextable directly from any data frame — without going through gtsummary — when you need full control over layout.

Code

summary_tbl <- nhis_cc |>
  group_by(us_born) |>
  summarise(
    N        = n(),
    Pct_poor = round(mean(poor_health) * 100, 1),
    .groups  = "drop"
  )

flextable(summary_tbl) |>
  set_header_labels(
    us_born  = "Nativity",
    N        = "N (unweighted)",
    Pct_poor = "% Poor Health"
  ) |>
  bold(part = "header") |>
  bg(bg = "#f0f0f0", part = "header") |>
  add_footer_lines("Source: NHIS teaching sample, n = 16,880 complete cases.") |>
  autofit() |>
  set_caption("Summary of Poor Health by Nativity")

Nativity	N (unweighted)	% Poor Health
U.S. born	13,292	6.9
Foreign born	3,588	5.8
Source: NHIS teaching sample, n = 16,880 complete cases.

6 Quick Reference

Function	Purpose
`tbl_summary()`	Unweighted descriptive statistics
`tbl_svysummary()`	Survey-weighted descriptive statistics
`tbl_regression()`	Regression table from model object
`tbl_merge()`	Place tables side by side
`tbl_stack()`	Stack tables vertically
`add_overall()`	Add total/overall column
`add_p()`	Append p-values for group comparisons
`type = list(var ~ 'continuous')`	Display variable as mean (SD)
`type = list(var ~ 'categorical')`	Display variable as counts (%)
`statistic = list(...)`	Customize the displayed statistic
`bold_labels()`	Bold variable name rows
`italicize_levels()`	Italicize category rows
`bold_p(t = 0.05)`	Bold significant p-values
`remove_row_type('reference')`	Drop reference category rows
`add_significance_stars()`	Add * * to estimates
`modify_header()`	Rewrite column headers (use `{n}` for counts)
`modify_caption()`	Add or change table caption
`show_header_names()`	Print internal column name reference
`as_gt()`	Convert to gt for HTML rendering
`as_flex_table()`	Convert to flextable (for Word export)
`options(gtsummary.use_ftExtra = TRUE)`	Preserve markdown formatting in Word export
`set_gtsummary_theme(...)`	Apply a theme globally for the session

Task	Package	Function
Descriptive Table 1	gtsummary	`tbl_summary()` / `tbl_svysummary()`
Single regression table	gtsummary	`tbl_regression()`
Multiple models side by side	gtsummary	`tbl_merge()`
Before/after comparison (stacked)	gtsummary	`tbl_stack()`
Word / .docx export	flextable	`as_flex_table()` + `save_as_docx()`
Custom standalone table	flextable	`flextable()`