Proteome profiling of home-sampled dried blood spots reveals proteins of SARS-CoV-2 infections | Communications … – Nature.com

Using a volumetric microfluidic-based DBS device that collects precisely 10l of whole blood, a protocol was tailored to analyze 276 proteins by proximity extension assays (PEA) (Fig.1). After benchmarking the procedure in a pilot study against paired EDTA plasma samples, DBS collected in Stockholm during the spring of 2020 and in Stockholm and Gothenburg during May of 2021 were analyzed for proteins associated with SARS-CoV-2 seropositivity. The studies revealed proteins relevant to COVID-19 pathogenesis and immune response.

Home-sampling devices were mailed to random individuals in metropolitan Stockholm and Gothenburg. Dried blood spots (DBS) were collected by finger pricking and mailed back to our laboratory for analysis. We eluted proteins from the DBS discs to first determine antibodies against SARS-CoV-2. Three studies were designed with donors stratified by serostatus and matched on self-reported information: Study 1 from 2020 compared antibody-negative (IgMIgG) with antibody-positive subjects (IgM+IgG+); Study 2 from 2020 compared IgM-positive (IgM+IgG) with IgG-positive donors (IgMIgG+); Study 3 from 2021 investigated vaccination-nave donors who were either antibody-negative (IgG) or antibody-positive (IgG+). Proximity extension assays (PEA) were applied to measure the levels of 276 proteins and evaluate their association with the different immune response groups.

To assess the suitability of the DBS preparation for proteomics analyses, protein profiles of 92 circulating proteins related to cardiovascular diseases were investigated (Fig.2ac). The levels, correlations, and interquartile range (IQR) between proteins were compared between EDTA plasma collected by venous blood draw and corresponding DBS samples collected at the same visit by finger-pricking from 12 donors (Supplementary Data1). It was found that 91 out of 92 proteins were detected in >90% of the sample types, respectively, the investigated proteins could be measured in DBS and paired plasma samples.

a The volcano plot displays the difference in relative protein levels between dried blood spot (DBS) and EDTA plasma obtained from 12 donors. The differences in the abundance of 92 proteins, reported as normalized protein expression (NPX), are categorized by FDR<0.01 (horizontal dotted line) and NPXof1 (vertical dotted lines). Blue dots represent proteins with the most significant differences, orange dots show those with noticeable differences, and green dots represent proteins for which no differences were observed. b Frequency of Spearman correlation coefficients for the 92 proteins. The vertical dotted line indicates rs=0. c Differences in protein IQR between DBS and plasma. The vertical dotted lines have been added for orientation at IQR= 0 and of0.5.

As shown in Fig.2a, paired t-tests showed that proteins with elevated NPX abundance levels in DBS (FDR P<0.01) were the platelet glycoprotein VI (GP6, expressed in skin or macrophages), the bleomycin hydrolase (BLMH, expressed in skin keratinocytes), azurocidin 1 (AZU1, expressed in neutrophils), as well as caspase 3 (CASP3, expressed in granulocytes). Likewise, collagen type I alpha 1 chain (COL1A1, expressed in fibroblasts) was more abundant in the plasma samples (Supplementary Data1). The examples suggest that finger-prick DBS samples can offer improved detectability for skin and blood cell-related proteins for the PEA and possibly other assays.

We also correlated the protein profiles to compare the ranking of the paired samples (rs=0.67 [0.61, 0.99]); see Fig.2b. In general, 62% (57/92) of the protein profiles correlate between plasma and DBS (rs>0.7). Profiles of cellular proteins such as previously mentioned CASP3, proteinase 3 (PRTN3, expressed in neutrophils), F11 receptor (JAMA, expressed in epithelial cells), and selectin P (SELP, expressed on fibroblasts), were the most discordant (rs<0). On the other hand, secreted proteins such as NPPB, IGFBP1, CD163, CPB1, and proteins known to leak into blood, such as EPCAM, LDLR, and SELE, were highly concordant (rs>0.95). Profiles of proteins elevated in plasma agreed with DBS profiles (rs=0.81 [0.45, 0.99]). The observed discordance between the two specimens was primarily found for proteins with higher levels in DBS samples. In addition, we examined the IQRs of the 92 proteins in the paired DBS and plasma. The IQR of the endothelial coagulation protein VWF was noticeably higher in plasma (alongside AZU1 and CASP3). The proteins MCP1 and RETN, both secreted by hematopoietic blood cells, revealed higher IQRs in DBS (Fig.2c). Considering all targets, the protein IQRs were not significantly different between DBS and plasma (P=0.44). Testing how the different ranges of detected proteins varied within a given sample type showed that the sample IQRs for DBS were significantly larger than those for plasma IQRs (P<1.8108).

Finally, we investigated the sample types concerning the blood cell expression and protein secretion using differences in NPX (NPX) and correlation (rs) values. With data from the Human Protein Atlas, we annotated the 92 proteins for their RNA expression in tissue29 and blood cells30 and the locations of protein secretion31; see Supplementary Data1. We found that 30% of the proteins were not expressed in blood cells. The levels of these proteins were similar between DBS and plasma (NPX=0.0) and correlated well (rs=0.80); see Supplementary Table1. The remaining 70% contained proteins expressed by different blood cell types. The NPX levels of these proteins were generally higher in DBS than in plasma (NPX=1.1 [0.4, 2.0]), and the correlation was lower (rs=0.59 [0.45, 0.72]). As shown in Supplementary Table2, the proteins secreted primarily into blood were more similar between DBS and plasma (NPX=0.3; rs=0.71; N=20) than proteins secreted to other locations (NPX=1.3; rs=0.61; N=20), or the cellular proteins (NPX=0.8; rs=0.61; N=20). This analysis suggests that protein leakage from blood cells contributed to the differences between the two sample types. Proteins secreted into the circulation by other organs than blood were more similar between the sample types.

The comparative analysis of paired DBS and plasma samples, exemplified here by 92 proteins, revealed differences and commonalities between the sample types. This points to the opportunity to uncover novel associations with DBS and suggests being cautious when aiming to validate findings with the other sample type.

In April 2020, we sent 2000 home sampling kits to the Stockholm population to measure antibodies against SARS-CoV-2 in dried blood22. The levels of IgM or IgG were determined using multiplexed bead-based assays that included multiple proteins representing the viral antigens. A population-based density cut-off of the antibody levels detected for the coronavirus spike and nucleocapsid proteins was used to classify the serostatus of each sample. Since not all individuals were diagnosed by PCR or experienced symptoms from the infection, we had only self-reported information about a diagnosed infection in one of the studies. For the other, we used only IgM and IgG to group participants into phases post-infection, as suggested by others32. In May 2021, a few months after vaccines against COVID-19 became available, we repeated the sample collection by sending a second set of 2000 home-sampling kits to populations in Stockholm and Gothenburg to determine the serostatus during the second year of the pandemic. Using their serostatuses, we selected representative subsets from our collections (N=228) to perform protein profiling by PEA.

The first study (study 1) from April 2020 was collected during the pandemics first wave. It consisted of 83 DBS donors, among which 44 participants were selected based on their serological immune response (IgM+IgG+). These seropositive participants presented the peak of the immune response, which we determined by detecting IgG and IgM against multiple SARS-CoV-2 antigens. The group was matched with 37 seronegative individuals (IgMIgG) based on demographic traits and reported symptoms. There were no significant differences in self-reported symptoms, and only three subjects in the seropositive group reported severe symptoms (Table1). The seropositive subjects of study 1 had only been exposed to the wild-type variant.

The second study (study 2), also collected in April 2020, included 66 participants representing the different phases of the serological immune response against the viral infection. The stratification was based on antibodies detected against the S proteins of SARS-CoV-2. We selected 26 individuals with signs of an acute immune response against the virus by being IgM seropositive only (IgM+IgG). This group was compared with 40 individuals without detectable IgM levels but being seropositive for IgG (IgMIgG+). The IgG+ group, annotated as having already passed the acute phase, was slightly older, but otherwise, there were no significant differences between the demographics and the reported symptoms (Table2). The subjects in study 2 had only been exposed to the wild-type variant.

The third study (study 3) was conducted in late spring 2021 and included 80 unvaccinated participants who donated DBS samples more than a year into the pandemic. We stratified these as seropositive (IgG+) or negative (IgG) based on antibodies detected against the S and N proteins of SARS-CoV-2. On average, the 37 seropositive individuals reported being infected five months before DBS sampling. Compared to the previous studies conducted during the dominance of the wild-type variant, study 3 represents a set of individuals with much longer possible exposure to different viral strains before the Omicron wave. The groups were matched for sex and age. There was a slight difference in age distribution between the groups, with the seropositive slightly older. The frequency of self-reported symptoms differed, with about a third of asymptomatic seropositive donors (Table3). The infections of seropositive participants in study 3 could have been caused by different SARS-CoV-2 variants.

In the following, we provide a general overview of the data and then discuss the details and analyses conducted for the three studies. We first evaluated the data globally, searched for possible outliers, and studied the variance of the circulating protein levels in each set to judge the quality and similarity between the data sets. We then determined the common associations of the DBS proteomes with the self-reported traits of age and sex. Lastly, we applied multivariate analysis to identify combinations of proteins to differentiate the serostatus groups, and univariate analysis for associations with symptoms, serostatus, and antibody levels. In each study set, we profiled 276 proteins associated with cardiovascular and metabolic processes such as angiogenesis, blood vessel morphogenesis, inflammation, and cell adhesion.

To begin with, we investigated the general properties of the proteomics data without considering the serostatus categories. Our analysis of the DBS eluates revealed that 260 proteins (94.2%) could be detected in >90% of the samples from all three study sets. For the downstream analysis, we included 264 proteins (95.6%) above the detection limit for at least 50% of the samples in all three study sets. Replicated analysis of five unique DBS eluates revealed a high reproducibility of the protein measurements, with >90% of the proteins reporting a coefficient of variation (CV)<10% (Supplementary Data2). Global and unsupervised data analyses were performed to determine the integrity of the data and identify any patterns or biases due to seropositivity. The median NPX and IQR values were used to systematically identify possible outliers by setting the threshold to 3 SDs from the mean for each variable. We considered it unlikely that age, sex, symptoms, or serostatus would alter the protein content of samples for the analyzed targets to the degree that identifying a sample as an outlier would have a physiological reason. To account for non-biological differences between DBS samples provided by untrained individuals, we apply the antibody-specific probabilistic quotient normalization (AbsPQN), which we previously developed for affinity proteomic studies of plasma samples27. Applying AbsPQN to the three panels used in the three study sets decreased the percent variance explained by the first principal component (PC1) from 40.8% 15.8% to 15.0% 1.2%. AbsPQN reduced the differences in the average and distribution of NPX levels. Consequently, AbsPQN-processed data was used to reidentify outliers and for all the downstream analyses. We found eight samples that deviated (Supplementary Fig.1), thus resulting in their exclusion from the summary tables (Tables13). Out of 236 donors, the proteomics data from 228 samples (97%) qualified for the investigations.

Next, we evaluated the general variation in protein levels to identify stable and highly variable ones. As illustrated in Fig.3, all data sets presented a similar distribution of IQR values. There was a very good agreement of the IQR values between the three sets (rs>0.86, CV=15%; see Supplementary Data2). To highlight a few, the most dispersed levels (IQR>1.5) were found for primarily secreted proteins IGFBP1, MBL2, MEP1B, and SSC4D. Interestingly, MBL2, a protein involved in complement activation, has been previously associated with COVID-19 severity and mortality in intensive care patients25,33,34. Among the least variable proteins (IQR<0.15) were the intracellular proteins CRKL, SOD1, and BLMH, all expressed by various organs. BLMH, a protein highly expressed by the skin tissue29 and one of the proteins most differentially abundant when comparing DBS with plasma (see above). The observed concordance in IQR values of independent sample sets supported the quality and utility of the data for further detailed analyses of the COVID-19-related phenotypes.

Distribution of protein level variance across dried blood spot (DBS) samples from the population (a) study 1 (N=81), (b) study 2 (N=63) and (c) study 3 (N=77). Each dot represents the interquartile ranges (IQR) of one protein, ranked by the dispersion of normalized protein expression (NPX) values. Proteins with narrow distributions are ranked to the left, and proteins with varying levels are ranked on the right.

To learn more about the general structure of the data, we conducted unsupervised correlation analyses of 264 protein levels within each of the four serostatus groups. As depicted in the heatmaps presented in the Fig.4ac, the overall relationships between the protein correlations differed between the serostatus groups. The distributions of the correlation values centered around zero (Supplementary Fig.2). A stability analysis of the clusters was performed to prioritize the most stable clusters and choose representative protein correlations across all groups. Cluster #4 of the IgG+ in study 3 was deemed the most stable cluster with a mean Jaccard index (MJI)=0.54. The cluster contained 20 proteins originating from different PEA panels. Twelve of the 20 proteins (60%) also clustered in the other five sample sets, and the included ANXA1, PGLYRP1, ITGAM, PLAUR, RETN, TNFRSF10C, NADK, CHI3L1, LCN2, S100P, DEFA1, and PAG1. Interestingly, these twelve proteins originated from the hematopoietic system, including the bone marrow, neutrophils, eosinophils, monocytes, or lymphoid tissue. Despite belonging to different PEA panels, proteins such as LCN2, S100P, PAG1, and PLAUR were shown to correlate highly (rs>0.8) in all six sample sets. Further details about cluster assignment and protein-protein corrections across study sets can be found in Supplementary Data3. The cluster analysis suggests that proteome profiling of DBS samples can provide insights into coordinated cellular regulations of the humoral and inflammatory immune response.

The heat maps reveal the inter-protein correlations obtained from hierarchical clustering for the four serostatus groups from (a) study 1, (b) study 2, and (c) study 3. The green circles indicate the clusters containing twelve proteins that grouped together in all sample sets. The number of branches was selected based on Gap statistics.

The current knowledge about DBS-derived protein-trait associations is still sparse. Before investigating the relationships between protein levels and SARS-CoV-2 infections, we studied their association with age and sex. The two basic demographic parameters are tested for in nearly all biomedical studies, are known to influence the circulating protein levels in serum or plasma samples, and were collected from all participants in our studies. Consequently, replicating the protein-age and protein-sex associations in three study sets would indicate the datas utility. As we have shown when comparing plasma and DBS, however, the differences between the sample types could influence the outcome of the association comparison. We also note that the age distribution of the three sample sets was slightly different, see Tables13. Using a linear model, we determined the protein-trait associations and performed a meta-analysis to rank the proteins by the combined p-values (Supplementary Data4). Several proteins were associated with age or sex in all three studies with concordant directions of association (Supplementary Fig.3). This included the well-known sex-specific protein MMP3 (combined P=3.21011), a protease involved in collagen degeneration. MMP3 has been associated with coronary heart disease and acute respiratory distress syndrome35and was studied in COVID-1925. In addition (combined P<105), sex influences the proteins ALCAM and SSC4D, expressed by the parathyroid glands; CNTN1, found in the brain and sex-specific organs; and IGFBP6, a protein highly expressed in the female sex organs. RNA expression studies support the observed associations with sex29. For age, we found strong associations with GDF-15 (combined P=11017), a frequently discussed biomarker for aging36, across all three studies. In addition (combined P<1010), meta-analysis identified age-associated proteins in all datasets for the secreted neuronal protein MEPE, the lymphoid protein SELL, the endothelial proteins t-PA or the B-cell receptor CR2. The consistency of age and sex associations across all study sets confirms the data quality and supports its utility in analyzing these in the context of COVID-19.

In the following, we highlight the outcomes of investigating changes in protein levels related to SARS-CoV-2 infections. A LASSO regression analysis was used to identify a combination of proteins that differ between the serostatus groups in each of the three studies. Summary statistics and the group-specific protein values (z-scores) can be found in Supplementary Data4.

For study 1 (Fig.5a), 19 proteins were selected, of which 17 (90%) had higher levels in the seropositive group. Ranked by their importance score (Fig.5b), annexin A11 (ANXA11), found in muscle cells and granulocytes, and the low-affinity immunoglobulin gamma Fc region receptor II-a (FCGR2A), also known as CD32A or FcRII, were most informative. Both proteins had a reduced abundance in the COVID-19 seropositive group. Interestingly, FCGR2A has been described to trigger a cellular response against pathogens and is involved in phagocytosis, and a recent report suggested that these receptors can mediate the infection of monocytes with the virus37. Detecting lower levels of FCGR2A could either indicate an increased SARS-CoV-2-induced clearing of immune cells or reflect reduced access to the receptors epitopes while internalizing antibody-bound pathogens. In addition, significant differences were observed for the previously introduced MBL2 and MMP3 and proteins related to different physiological mechanisms. These included proteins secreted by the liver during the stress response and angiogenesis (ANG), a brain- and B-cell-derived neurogenic protein (CHL1), a protease secreted by the pancreas (CPB1), a platelet-derived glycoprotein involved in coagulation (GP1BA) as well as a cytokine receptor related to T-cell immunity (IL2RA). These processes have also been described in studies using venous blood draws23,38. SDC4, a cell adhesion protein found in the extracellular matrix of the liver, lung, kidney, and T-cells, has been suggested to act like ACE2 and linked to the cellular uptake of the SARS-CoV-2 virus39 and revealed anti-inflammatory functions in patients with acute pneumonia40. As shown in the network in Fig.5c, physiological relationships between some of the proteins have been suggested for acute phase processes and innate immunity, platelet activation, coagulation, and cellular adhesion. Correlation analysis of protein and IgG or IgM levels revealed only moderate relationships (rs<0.5, P<0.001; see Supplementary Fig.4). We observed the strongest correlation between circulating CHL1 and IgM levels reported for anti-RBD (rs=0.46; P=0.00002) and anti-S (rs=0.38; P=0.001). Noteworthy were the negative correlations of FCGR2A with anti-RBD (rs=0.38; P=0.0005) and S (rs=0.32; P=0.004). This is supported by studies suggesting that FCGRs mediate the uptake of the antibody-coated virus into monocytes, causing the cells to undergo lytic programmed cell death and reduce levels of circulating FCGR2A37. MMP3 and IgG levels correlated with anti-S (rs=0.37; P=0.0006) and anti-RBD (rs=0.35; P=0.003) in the opposite direction. Similar trends and relationships were determined for MBL2, VWF, GP1BA, and ANG. Univariate logistic regression for serostatus ranked MBL2, ANG, and FCGR2A on top (P<0.01). Finally, we compared the variances of protein levels between the two groups and found that the distribution of SELL levels (P=0.009) was unequal.

a Least absolute shrinkage and selection operator (LASSO) analysis shortlisted proteins differentiating seropositive (N=37) and seronegative (N=44) subjects in study 1. The y-axis represents the centered and scaled data provided as normalized protein expression (NPX) values. b Ranked importance score of selected proteins. c Using the STRING database, we identified interactions between the selected features and obtained a network centered around acute phase processes and innate immunity, host-virus interactions, coagulation, and cellular adhesion. d LASSO selected proteins from study 2 comparing donors representing the early (N=22) and late infection phases (N=41), and (e) corresponding importance scores. f LASSO selected proteins from study 3, comparing seropositive (N=40) and seronegative donors (N=37). The boxplots show the 25% and 75% quantiles (lower and upper hinges) with the median in the center, the whiskers extending to hinges 1.5 interquartile ranges (IQR), and visible data points outside these ranges.

For study 2, LASSO selected five proteins, of which LILBR1 and FAM3C were elevated in the group representing the early phase of the infection (Fig.5d, e). STRING analysis revealed no known interactions between the proteins; however, syndecan 4 (SDC4) overlapped with the proteins selected in study 1. Elevated levels of SDC4 were found for the later phase group and seropositive in study 1. With LILRB1, an immunoglobulin-like receptor found on monocytes, the metabolism-regulating protein FAM3C, the coagulation factor 11 (F11), and the lung protein cathepsin H (CTSH), a variety of biological processes were represented. Interestingly, SDC4, LILRB1, and CTSH share expression in lung tissues. Correlation analysis revealed negative coefficients between IgM levels detected for the S antigen and the levels of the proteins CTSH and SDC4 (rs>0.37; P<0.003). Using univariate logistic regression, the five proteins were weakly significantly associated with serostatus (P<0.03). When comparing the variance of protein levels in each group, the levels of CCL5 were most unequally distributed (P=0.001).

For study 3 (Fig.5f), only one protein was selected by LASSO: the complement C3d receptor 2 (CR2), also known as CD21. Found primarily in the lymphatic system and on B-cells, elevated levels of CR2 were associated with prior infection with SARS-CoV-2. Interestingly, CR2 has been described as a human receptor for the Epstein-Barr virus (EBV), representing an additional element of innate immunity and host-virus interactions41. There was a positive correlation between levels of CR2 and anti-S antibodies (rs=0.38; P=0.0006), and when using univariate logistic regression, a more significant association with serostatus than for the markers shortlisted above (P=0.0004). It is worth noting that, compared to study 1, infections of the seropositive participants in study 3 were not limited to the few months at the start of the pandemic. When comparing the protein level variances, the macrophage protein of CCL24 was most unequally distributed (P=0.009).

Finally, we used common health-related information to perform a meta-analysis of the self-reported symptoms. As shown in Tables13, we asked the participants in all three studies for COVID-related symptoms such as fever, breathing difficulties, or loss of taste and smell. Top-ranked (combined P<0.01) were two previously described age-associated proteins (GDF15, SELL) and COL18A1, an extracellular adhesion collagen expressed primarily by the liver and involved in endothelial cell migration, as well as C1QTNF1, a secreted multifunctional adipokine found in smooth muscles and adipose tissue. These associations were less significant and overlapped with those observed for age or sex. No interactions have yet been reported in STRING to suggest a direct connection between their physiological function.

In subjects representing two waves of the pandemic and pre- and post-infection phases, profiling proteins related to cardiometabolic processes in DBS samples revealed insights observed in studies performed in serum or plasma collected in the clinic. Our investigation confirmed coordinated co-regulations of protein levels in immune response, cell adhesion, and cellular virus entry processes.

See the original post:

Proteome profiling of home-sampled dried blood spots reveals proteins of SARS-CoV-2 infections | Communications ... - Nature.com

Related Posts
Tags: