A Deep Peek Into DeepSeek AI's Talent and Implications For US Innovation by Amy Zegart and Emerson Johnston
A Deep Peek Into DeepSeek AI's Talent and Implications For US Innovation by Amy Zegart and Emerson Johnston
A collaboration with
The Stanford Institute for Human-Centered AI
A Deep Peek into DeepSeek AI’s Talent and
Implications for US Innovation
We find striking evidence that China has developed a robust pipeline of homegrown
talent. Nearly all of the researchers behind DeepSeek’s five papers were educated or
trained in China. More than half of them never left China for schooling or work,
demonstrating the country’s growing capacity to develop world-class AI talent through
an entirely domestic pipeline. And while nearly a quarter of DeepSeek researchers
gained some experience at US institutions during their careers, most returned to China,
creating a one-way knowledge transfer that benefits China’s AI ecosystem.
Methodology
A total of 223 people were listed as contributors to any of DeepSeek’s five papers (see
fig. 1). Our analysis finds that 31 researchers (or just under 14 percent of the total
author pool) contributed to all five papers—what we refer to as the “Key Team.”5
Another 50 authors worked on four papers, 64 contributed to three papers, 55 were
listed on two papers, and 22 researchers contributed to just one paper.
3
As table 1 illustrates (see appendix B), it appears that DeepSeek used a shifting
categorization of talent across the five papers. In Paper 1 (DeepSeek LLM), the
reported contributor labels were organizational rather than role-based, with 53
individuals categorized into Business Team (8), Compliance Team (7), Data Annotation
Team (36), and Design Team (2). Notably, none of these labeled contributors were
credited as authors on the paper itself, which officially listed 86 authors. However, 40 of
those contributors were later credited as authors in at least one subsequent DeepSeek
paper—which is why they are captured here. This discrepancy may suggest that Paper
1’s contributor list reflected a broader pool of internal collaborators—many of whom
were not formally recognized at the time but went on to receive authorship credit as
the project evolved.
and Zhihong Shao—were also credited as Core Contributors in Paper 5, and all four
contributed to every one of the five DeepSeek papers, likely signaling their central,
long-term influence on the DeepSeek project.
The prevailing narrative has been that DeepSeek succeeded with younger, less
experienced researchers. Citation metrics, however, suggest that DeepSeek’s talent
was not so green after all.
Notably, the 31 Key Team researchers who contributed to all five papers stand out
sharply. The group averages 1,554 citations per author, with a median of 501, and a
mean h-index of 13.5 and i10-index of 25.5. Median values—an h-index of 10 and an
i10-index of 11—further indicate consistent impact across the Key Team, not just a few
outliers. These metrics provide additional evidence that the DeepSeek Key Team
consists of researchers with already credible academic track records.
This academic strength becomes even more apparent when compared to a peer group
from one of the world’s leading AI labs. According to data from the OpenAI o1 system
card (arXiv:2412.16720), the team of 265 authors listed on that release had an average
citation count of 4,403 but a median of just 338, indicating a steep drop-off beyond a
few highly cited individuals. Further, the group’s median h-index was only 6 and the
i10-index 4, reflecting more limited consistency of impact across the full group.
In contrast, both DeepSeek’s full author pool and its Key Team exhibit greater balance
between average and median performance—suggesting not only strength at the top,
but also less variation across contributors compared to the OpenAI team. These
patterns may indicate a more evenly distributed base of academic experience, rather
than one overly reliant on a handful of standout figures. DeepSeek’s research engine
5
appears not only deep but wide—an organizational trait that may prove especially
important as competition in foundation model development intensifies.
Taken together, these comparisons challenge the media narrative that DeepSeek’s
rapid ascent was driven by “untested” or inexperienced researchers. While OpenAI
continues to receive global recognition, many of DeepSeek’s central contributors—at
least by traditional bibliometric standards—were better published, more consistently
cited, and arguably more academically established at the time of their breakthrough.7
Looking longitudinally at the 201 DeepSeek authors with known affiliation data, we find
that more than half (n=111) have been trained and affiliated exclusively at Chinese
institutions throughout their careers—evidence of China’s growing capacity to develop
world-class AI talent domestically without relying on Western expertise. And the vast
majority of DeepSeek authors—98 percent (n=197)—have held at least one past or
current affiliation with a Chinese institution.
Four authors appear to have not studied, trained, or worked in China at all. Their
academic and professional roots spanned a range of global institutions: Erhang Li was
trained in the United Kingdom and the United States and is affiliated with Intel UK; Y.
Q. Wang studied in Germany at Johannes Gutenberg University Mainz; Yuduan Wang
received education in Singapore at the National University of Singapore; and Panpan
Huang studied in the United States at Purdue University. While these individuals
represent exceptions within the broader DeepSeek ecosystem, they highlight the
international reach of the global AI research community. Still, their small number
underscores how uncommon this path is among DeepSeek contributors—further
reinforcing the observation that China’s domestic pipeline is now capable of producing
world-class AI researchers largely on its own.
We found that only a quarter of DeepSeek researchers (24.3 percent, n=49) have ever
held an academic or professional affiliation with a US institution—further illustrating the
limited role American institutions have played in shaping this cohort.
As figure 2 shows, 171 of the 201 DeepSeek authors with known affiliation data were
affiliated with Chinese institutions in 2025 (the most current year available).8
Just 7 percent (n=15) of researchers currently hold US-based affiliations. These include
positions at prominent research universities (such as Stony Brook University, University
of North Texas, and the University of California, San Francisco), medical institutions
(such as Boston Children’s Hospital), and tech or biotech companies including Google,
Otsuka, and Health First. The remaining researchers are spread across a small set of
other countries, including Australia, Canada, the United Kingdom, and Singapore, with
single cases in Germany, Ireland, Panama, Poland, and Taiwan. This geographic
consolidation around China further reinforces the central role of its domestic
institutions—not just as training grounds, but as long-term professional destinations for
AI talent.
7
Figure 3 – Top 10 Institutions by Researcher Count
Within this institutional landscape, the Chinese Academy of Sciences (CAS) emerges as
the strategic center of gravity. While directly hosting only 18 authors, CAS
encompasses a total of 53 researchers when accounting for its network of 153 affiliated
institutions. This extensive institutional reach—where a “child institution” refers to an
organization with a subsidiary relationship (as defined by OpenAlex) to CAS as its
parent organization, including research institutes, laboratories, and specialized
centers—combined with remarkable research impact metrics (over 840,000 works and
23.7 million citations), positions CAS as the dominant player in this ecosystem.9 Peking
University comes in second with 21 total affiliations, but it leads in direct affiliations
with 20 researchers. Tsinghua University follows with 16 authors, then Sun Yat-sen
University and Nanjing University with 10 authors each. This distribution reveals how
China has leveraged its institutional infrastructure to support AI development, with a
network centered around CAS but distributed across multiple prestigious universities.
The concentration of talent within this network of Chinese institutions has created a
fertile environment for AI innovation that challenges the US advantage in institutional
resources.
Of the 49 DeepSeek researchers who had US affiliations at some point during their
careers, 63.3 percent (n=31) spent just one year in the United States—long enough to
gain exposure to top-tier research environments, but not long enough to establish
enduring ties. Another 18.4 percent (n=9) remained for two to four years, and 18.4
percent (n=9) stayed five years or longer, often across multiple institutions. This latter
group includes some of the most influential researchers in the cohort, such as Minghua
Zhang, who accumulated affiliations at State University of New York and Stony Brook
University spanning over a decade; Zhenda Xie, who spent eight years across UCLA
and Optica; and B. Zhang, whose recurring ties with the University of Southern
California from 2007 to 2022 preceded his return to Peking University. These 9 long-
stay researchers are not statistical outliers—they averaged 4,541 citations, held a
median h-index of 25, and had a median i10-index of 40. Despite this deep academic
integration, only 3 of the 9 currently remain affiliated with US institutions, further
underscoring how the US research ecosystem served as a powerful incubator of talent
that ultimately advanced China’s AI leadership (see fig. 4).
9
Figure 5 – Geographic Distribution of US Institutions Affiliated with DeepSeek Researchers
A second, more complex group includes researchers such as Wenfeng Liang, Minghua
Zhang, and Zhiyu Wu, whose careers span multiple transits between China and the
United States (e.g., China → USA → China → USA → China). These researchers don’t
simply return—they circulate, developing global networks and embedding themselves
in both ecosystems. This pattern of bidirectional exchange accounts for 12.2 percent
(n=6). Of these, 4 currently list US institutions as their most recent affiliation, while 2 are
affiliated with Chinese institutions. While it is difficult to determine intent or long-term
plans from affiliation data alone, these cases illustrate how cross-border mobility can
strengthen China’s AI ecosystem without necessarily requiring permanent US retention.
11
Other researchers such as Daya Guo, Guanting Chen, and Yicheng Wu take even more
global paths—passing through institutions in the United Kingdom, Singapore, Saudi
Arabia, Taiwan, or Australia. These trajectories (e.g., Taiwan → China → Australia →
USA) illustrate the rising influence of multinational knowledge acquisition, with the
United States serving as just one of many strategic destinations in a broader global
loop.
Notably, only 14.2 percent (n=7) of the DeepSeek cohort (e.g., Ruiqi Ge, Peiyi Wang,
Bingxuan Wang) followed a China → USA → Stayed path—remaining in the United
States after initial training (see fig. 6a). While still significant, this group is no longer the
default or dominant outcome. Even among researchers who began in the United States
(e.g., B. Zhang, Ruoyu Zhang), many ultimately relocated to China, with 22.4 percent
(n=11) falling into the “Started in the USA, Ended Up in China” or “Started in the USA,
Traveled, Ended Up in China” categories.
Finally, a small but illustrative set of researchers defies simple classification. Figures
such as Kuai Yu (USA → Netherlands → Singapore → China) or Zhen Zhang (USA →
China → Hong Kong → USA) reflect the complexity of today’s scientific mobility. These
researchers—counted within “Started in USA, Traveled, Ended Up in China” (6.1
percent, n=3) or “Started in USA, Traveled, Ended Up in USA” (4.1 percent, n=2)—
reveal how transnational scientific careers are increasingly nonlinear and dynamic.
Taken together, these patterns reveal important features of global AI talent flows. The
United States remains a vital node in international research training—but it is not the
fulcrum or the end point. Most of DeepSeek’s researchers are not being trained in the
United States, and those who are trained here are not retained. Instead, they are
passing through. These findings suggest that American institutions are serving as
steppingstones, equipping elite researchers with high-impact skills, connections, and
credentials that are ultimately reinvested into China’s AI ecosystem. Importantly, the 49
DeepSeek researchers with US affiliations at some point in their careers were among
the most academically accomplished in the entire research cohort, averaging 2,168
citations (median 565), with a mean h-index of 17 and i10-index of 34—figures
significantly higher than those for the broader DeepSeek author pool. These are not
peripheral actors, but central contributors to one of China’s most advanced AI efforts.
For US policymakers, our DeepSeek talent analysis suggests it is high time to reassess
long-standing assumptions that the world’s best and brightest naturally want to study
and stay in the United States. Attracting and permanently retaining the world’s best
minds—once a cornerstone of American technological dominance—appears
increasingly misaligned with twenty-first-century educational realities. DeepSeek is, at
its core, a story of homegrown capacity: Half of its researchers have never left China,
the overwhelming majority have deep institutional ties to China, and even many who
trained in the United States ultimately returned to China—potentially advancing
China’s position in the global AI race.
13
Geopolitical Implications
The talent patterns revealed in our analysis have significant geopolitical implications.
For centuries, the sources of national power have stemmed from tangible assets—such
as territory that could be conquered, populations that could be taxed or conscripted,
goods that could be embargoed, militaries that could be deployed. Those tangible
sources of national power still matter, but in the technology age, power also derives
from intangible assets such as data, technology, and knowledge inside people’s heads.
Knowledge power has never been more important for economic and geopolitical
competition; it is the ultimate portable weapon.
These findings challenge a long-held belief that the United States will always attract the
world’s best talent. In reality, however, top global talent has options. DeepSeek’s talent
story suggests that the United States cannot assume a permanent talent lead. Instead,
the nation needs to compete much more aggressively to attract, welcome, and retain
the world’s best and brightest while urgently growing domestic capabilities by
improving K‒12 STEM (science, technology, engineering, mathematics) education at
home.
Ultimately, DeepSeek AI represents more than just another advance in language model
technology. It reveals talent patterns that challenge long-held US assumptions about
innovation advantage. Our analysis of DeepSeek’s research network suggests that
conventional wisdom about US dominance in talent development and retention may
no longer hold true, with significant implications for future technological competition.
Amy Zegart is the Morris Arnold and Nona Jean Cox Senior Fellow at the Hoover
Institution, director of Hoover’s Technology Policy Accelerator, and a senior fellow and
associate director at Stanford University’s Institute for Human-Centered AI.
The following five papers released by DeepSeek AI between January 2024 and
February 2025 formed the basis for our institutional and author-trajectory analysis:
15
into a latent vector, while DeepSeekMoE enables training strong models at an
economical cost through sparse computation. Compared with DeepSeek 67B,
DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves
42.5 percent of training costs, reduces the KV cache by 93.3 percent, and
boosts the maximum generation throughput to 5.76 times. We pretrain
DeepSeek-V2 on a high-quality and multisource corpus consisting of 8.1T
tokens, and further perform supervised fine-tuning and reinforcement learning
(RL) to fully unlock its potential. Evaluation results show that, even with only 21B
activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier
performance among open-source models.
17
Appendix B: Tables
Table 1: Author Count and Contributor Roles for DeepSeek Publications (2024–25)
Paper 1: DeepSeek LLM January 2024 86 Business Team (8), Compliance Team (7),
Data Annotation Team (36), Design Team
(2)**
Paper 2: DeepSeek V2 May 2024 156 Research & Engineering (105), Data
Annotation (31), Business & Compliance
(18), Mixed Roles (2: Data Annotation +
R&E, Business & Compliance + R&E)
Paper 3: DeepSeek VCoder 2 June 2024 39 Core Contributor (4), Contributor (35)
Paper 4: DeepSeek V3 December 2024 197 Research & Engineering (148), Data
Annotation (30), Business & Compliance
(17), Mixed Roles (2: Data Annotation +
R&E, Business & Compliance + R&E)
Paper 5: DeepSeek R1 January 2025 200† Core Contributor (18), Contributor (176) –
This total number (194) reflects the no. of
authors from the PDF.
* This number reflects unique authors listed for each paper, consolidating names across both the ArXiv
and PDF versions where applicable. Discrepancies between sources are noted:
● Paper 2 and Paper 4 PDFs each contain two duplicate names (Shengfeng Ye and Yanhong Xu),
due to those individuals being listed in multiple contributor categories.
● Paper 5:
o The PDF includes one duplicate name (Shengfeng Ye).
o The ArXiv includes two duplicate names (Shengfeng Ye and Yanhong Xu).
o Nine authors appear on only one version (either ArXiv or PDF), including:
▪ On ArXiv but not on PDF: Chenyu Zhang, Han Bao, Haocheng Wang, Huajian Xin, Jiawei
Wang
▪ On PDF but not on ArXiv: Jinhao Tu, Kaichao You, Mingxu Zhou, Wanjia Zhao
** The contributor categories listed for Paper 1 reflect a separate contributor pool that was not credited
as authors on that paper. The numbers shown in the chart represent the total number of individuals in
each category at that time.
† Paper 5 showed discrepancies in authorship counts: The PDF version originally listed 195 authors, but
one author (Shengfeng Ye) was listed twice, resulting in 194 unique names. The ArXiv entry listed 197
authors. When combining both lists and removing duplicates, the total came to 201 unique authors.
The list below includes the 31 individuals who are credited as authors on all five DeepSeek AI
papers. An asterisk (*) indicates those identified as core contributors in the fifth paper.
19
Table 3: Scholarly Output and Citation Metrics of DeepSeek and OpenAI Research
Teams
The following list includes US-based academic, research, medical, and industry
institutions where DeepSeek authors have held prior or current affiliations. This
includes both educational and professional roles. Asterisk (*) indicates a current
affiliation based on the most recent OpenAlex data.
21
1 University of Notre Dame Notre Dame United States
1 Rutgers, The State University of New Jersey New Brunswick United States
1 The Ohio State University Wexner Medical Center Columbus United States
23
1 Health First* Rockledge United States
1
DeepSeek’s announcement roiled US markets, leading to a 3 percent decline in the NASDAQ
composite and a 17 percent drop in NVIDIA shares, erasing $600 billion in value. It was the largest
single-day loss of a company in US history—a figure equivalent to 65 percent of the annual US defense
budget. For more information: https://www.cnn.com/2025/01/27/tech/deepseek-stocks-ai-
china/index.html.
2
See Appendix A for details on each of the five DeepSeek papers.
3
Note: For the first four papers, author lists were consistent between PDF and arXiv metadata. However,
for the fifth paper (arXiv:2501.12948), we found discrepancies between authors listed in the PDF and the
arXiv metadata, so we included all unique authors from both sources to ensure comprehensive coverage.
Additionally, 11 authors across all papers could not be found with OpenAlex profiles and were excluded
from the analysis.
4
OpenAlex is a tool hosted by OurResearch, a nonprofit focused on open science tool development.
5
See Appendix B for a full list of names in the Key Team.
6
The h-index captures the number of publications with at least h citations (i.e., an h-index of 13 implies
13 papers cited at least 13 times), while the i10-index counts how many works have at least 10
citations—useful for gauging consistency across a body of work.
7
See Appendix B for the full dataset.
8
While 211 authors were included in the full bibliometric analysis, the affiliation-based breakdowns in
this chart total 201 due to 10 individuals with no available institutional data in OpenAlex. These 10
authors also had very limited bibliometric profiles, with an average of just 4.4 publications, 8.6 citations,
and near-zero recent citation activity—suggesting that they are likely junior researchers or early-career
contributors. Their omission from affiliation analysis does not significantly affect aggregate findings but is
noted here for transparency.
9
For more information about institutional relationships in OpenAlex, see https://docs.openalex.org/api-
entities/institutions/institution-object.