Promoting data harmonization to evaluate vaccine hesitancy in … – BMC Medical Research Methodology

Testing vaccine hesitancy hypotheses using multiple datasets presents challenges. Datasets associated with household-level and contextual factors have different geographic support, defined as the area, shape, size, and orientation of spatial measurement. Data on vaccine acceptance tend to come from household-level surveys; data on political violence are typically point-level event coordinates; data on elections tend to be measured by electoral constituencies (e.g. legislative districts); key development indicators, like road infrastructure, may be available as polyline features. These data come in different formats and structures (delimited text, vectors of location attributes, raster images); areal units are not nested and have misaligned borders; some of the data (e.g. surveys) may not be georeferenced at all. Different data integration choices may yield different results, raising concerns over generalizability [5]. Differences in sampling, question wording and sequence, primary sources, operational definitions, digital image processing algorithms, and other factors ensure that no two datasets are perfect substitutes for one another, making it difficult to distinguish case-specific idiosyncracies from general patterns, and to ask, "what does country A tell us about country B?" Finally, survey data pose a separate challenge of distinguishing "snapshots" of public attitudes from stable long-term trends. We illustrate how to mitigate some of these common challenges. The SUNGEO system accounts for these issues.

SUNGEO allows users to combine data across otherwise incompatible geographic units into a common format, and facilitates the analysis and visualization of processed geospatial data (Fig.1). It includes a user-friendly web interface and API, where researchers can select among many existing variables, choose levels and methods of spatiotemporal (dis)aggregation, interpolation and integration, and decide on the boundaries of their subnational datasets. Its large collection of pre-processed data enables users to replicate their research designs across different scales, data sources, countries, and integration procedures. SUNGEO also includes an open-source software package in the R statistical programming language to process user-supplied data, merge it with pre-loaded geo-referenced data, and produce a more customizable output based on user needs and specifications. It includes an archiving tool, which allows users to contribute original data to the repository.

Overview of the SUNGEO system

This demonstration uses vaccination hesitancy data from the World Bank Groups High Frequency Phone Surveys (HFPS). The HFPS was a longitudinal cohort (panel) study on the socio-economic impacts of COVID-19 conducted in 53 countries and contexts between 2020 and 2022, with a subset of surveys including questions on vaccination hesitancy. We analyzed surveys from Indonesia, Kenya, and Malawi, as they: 1) were larger surveys with rigorous sampling methods representative of the general population; 2) included granular geographic information; and 3) were from three distinct regions (East Africa, Southern Africa, and South-East Asia). The survey datasets include sampling weights, based on the inclusion probabilities of the cell phones and landlines through which respondents were reached, along with first-time and attrition non-response weighting adjustments, and calibration with auxiliary information on regional population size, respondent sex, age group, and educational attainment. More information on each dataset can be found from the World Bank Group [6].

Contextual variables were provided from SUNGEO's preprocessed spatial data archive. Sub-national data on political violence are available for 195 countries through SUNGEO's partnership with the xSub data repository, which hosts leading event databases, including the Armed Conflict Location and Event Data Project (ACLED), the National Violence Monitoring System (NVMS), the Social Conflict Analysis Database (SCAD), and the Uppsala Conflict Data Program's Georeferenced Event Dataset (UCDP-GED). We chose among these by re-estimating our empirical models with each dataset on violence, and selecting the data source that yielded the strongest model fit (NVMS for Indonesia, UCDP-GED for Kenya, SCAD for Malawi; see Additional file 2: Appendix B3). Data on legislative elections in 168 countries are available through the Constituency-Level Elections Archive (CLEA). As a proxy measure for economic development, we used local road density, which can be calculated using the Global Roads Open Access Data Set (gRoads). More information on these datasets can be found from their respective sources [7,8,9,10,11]. We also used SUNGEO to extract data on other geographic variables that may affect attitudes toward, or the availability of, vaccines. These include ethno-linguistic fractionalization, average night light intensity, and terrain (see Additional file 2: Appendix B2 for details and estimation results) [12,13,14].

a. Vaccine surveys

HFPS data are available through the Inter-university Consortium for Political and Social Research (ICPSR). ICPSR secured the World Banks permission to access HFPS data, then carried out a disclosure risk review to prevent direct or inferential re-identification of individuals or organizations. The curation process included generating question text employing the social science variables database to compare across studies,reviewing data to ensure all translations were correct and to create the variable and values list, conducting quality control, and hosting of the data on the ICPSR website in a fully searchable format. Further detail can be found in Additional file 1: Appendix A. In Additional file 2: Appendix B1, we examine sample attrition patterns across rounds, and find that respondents who dropped out of these samples were statistically similar on observables to those who remain.

b) Contextual data

Disaggregated data on violence, elections and economic development are available through SUNGEO. In aggregate form, the violence data are event counts, representing the number of incidents of political violence observed in each spatial unit over the two decades prior to the first survey. The election data are weighted averages of local "Top-1" competitiveness from the most recent legislative election, measured as one minus the winning vote margin, where values of 1 indicate that the most recent parliamentary election was very close, and 0 indicates that it was not competitive because the winner received almost all of the votes. We also considered alternative measures of electoral competitiveness, but the "Top-1" measure yielded a generally stronger model fit (Additional file 2: Appendix B3). The road density data are local sums of primary and secondary road lengths in each administrative unit, divided by that unit's area in square kilometers.

For each country, we used SUNGEO to extract data on political violence, legislative election data, and road infrastructure data, along with other contextual datasets (Additional file 2: Appendix B1). For Indonesia and Malawi, our spatial units were level-2 administrative divisions. For Kenya, we used level-1 administrative divisions.

To link data to household-level vaccine surveys, we used SUNGEO's R package to geocode survey sampling units, assigning a pair of geographic coordinates to each unique location. This allowed us to match each surveyed household to its corresponding level-2 (or level-1, in Kenya) spatial unit, and merge the datasets geographically (see Additional file 2: Appendix B1).

We examined why some households express stable, pro-vaccine preferences, while others remain vaccine hesitant, or change their minds. Vaccine hesitancy varies spatially (across households) and temporally, with households changing their position. In the Indonesian survey, 73% of households gave the same answer to the vaccine intent question in two consecutive rounds (e.g. "yes" in rounds 4 and 5, or "no" in rounds 4 and 5). In Kenya, 68% gave the same answer across two rounds. In Malawi, 63% gave the same answer. Because the same households may give different responses on different occasions, we needed an empirical strategy that explicitly accounts for this shifting dynamic.

We modeled the survey responses as a stochastic process (Markov Chain) with two states. When asked the question, if the vaccination was available for you at no cost, would you take the vaccination, a household may either:

Express an intent to receive the Covid-19 vaccine ("yes"), or

Not express such an intent ("no").

From one round to another, a household will have some probability of staying with their previous response, and some probability of transitioning to another response. We model these transition probabilities as conditional on a series of household-level and contextual covariates:

$$text{Pr}(mathrm{y}_{mathrm{i},mathrm{t}};=;1);=;text{logit}^{-1};left[mathrm{x}_{mathrm{i}}{{theta}}_{0};+;{mathrm{y}}_{mathrm{i},mathrm{t}-1};cdot;{mathrm{x}}_{mathrm{i}}upgamma;+;{alpha}_{{mathrm{k}}_{(mathrm{i})}}+;{tau}_{mathrm{t}};+;{varepsilon}_{mathrm{i},mathrm{t}}right]$$

(1)

where y i,t is 1 if household i says "yes" in round t, and 0 if the household says "no", y i,t-1 is a first-order temporal lag, k(i) is a fixed effect for the administrative unit k in which i is located, t is a fixed effect for each survey round, and i,t are robust standard errors, clustered by administrative unit and survey round. The vector of covariates x i includes household-level measures like respondent's age and gender, and an indicator for whether the household is located in an urban area, as well as contextual information on violence, electoral competitiveness, road density, night light intensity, ELF, and terrain.

0 are regression coefficients for households that said "no" to the vaccine at t-1, and 1=0+ are coefficients for households that said "yes" at t-1. We will use these coefficient estimates to generate predicted probabilities of vaccine intent, and to construct transition probability matrices.

We estimated the model in Eq.(1) separately on integrated survey datasets from Indonesia, Kenya and Malawi.

Link:

Promoting data harmonization to evaluate vaccine hesitancy in ... - BMC Medical Research Methodology

Related Posts
Tags: