Cite this report

Gender Equality in Research & Innovation – 2024 Review

Published: 20 June 2024 | Version 2 | DOI: 10.17632/bb5jb7t2zv.1

Contributors: Nicolien van der Linden, Guillaume Roberge, Dmitry Malkov

Analysis of bibliometric data

Data sources

Scopus data as of August 1, 2023, covering data from 1998–2022, were used for the preparation of bibliometric indicators related to publication outputs. Analyses were limited to peer-reviewed content of three publication types: articles, reviews and conference papers. Bibliometric indicators presented in this report and detailed below, such as average of publications and average field-weighted citation impact (FWCI), were based on these publication types.

Scopus is a comprehensive, source-neutral abstract and citation database curated by independent subject matter experts who are recognized leaders in their fields. Its 91+ million items include data from 7,000+ publishers, 94,000+ affiliation profiles and 17+ million authors. Scopus puts powerful discovery and analytics tools in the hands of researchers, librarians, research managers and funders to promote ideas, people and institutions. Delivering a comprehensive overview of the world’s research output in the fields of science, technology, medicine, social sciences and arts and humanities, its state-of-the-art search tools and filters help uncover relevant information, monitor research trends, track newly published research and identify subject experts. Worldwide, Scopus is used by more than 3,000 academic, government and corporate institutions and is the main data source that supports the Elsevier Research Intelligence portfolio. 

Author definition and disambiguation (Scopus author profiles)

Data analyses were based on authors as listed in the author byline of publications. Scopus uses a sophisticated author-matching algorithm to precisely identify publications by the same author. The Scopus Author Identifier assigns each author a unique ID (called the author ID) and groups together all documents published by that author into a Scopus Author Profile, matching alternate spellings and variations or the author’s last name and distinguishing between authors with the same last name by differentiating on data elements associated with the publication (such as affiliation, disciplines and co-authors). The profile is enriched with manual, author-supplied feedback, both directly through Scopus and via Scopus’s direct links with ORCID (Open Researcher and Contributor ID; https://orcid.org/). Gender is not captured in Scopus Author Profiles. (See “Author gender inference” for details on how we inferred gender).

Authors included in the analysis

Authors included in the analysis were limited to those authors for whom a first name could be determined (based on name data from the August 1, 2023 snapshot of Scopus, as described in “First name determination”) and a gender could be predicted based on the latest version of the NamSor API (i.e., surpassing a calibrated probability of 0.85; described in the section “Gender probability score”).

Identification of active authors/researchers

To report on populations of authors of scientific publications, we relied on Scopus author identifiers (AUID) to identify individuals who we deemed relevant to the analysis. Multiple approaches to identifying the relevant population could have been implemented, each with pros and cons. For this study, we focused on active authors (or active researchers), defined as individuals actively publishing scientific peer-reviewed content. To avoid including individuals who only publish very sporadically, a threshold of two publications over a five-year period was set for inclusion. Therefore, any Scopus author linked to at least two publications for a specific five-year period was included as an active author/researcher for that period (and that period only).

Because of the use of five-year periods to define most groupings, data presented on an annual basis in the report are based on five-year moving windows. Data are reported as the last year of the moving window. For instance, data reported as 2002 are based on the 1998–2002 period.

Author country and assignation to ASJC categories and disciplines (subject areas)

We assigned authors to countries and disciplines based on their publication output during the period of interest. Full counting was used (see “Full counting of publications” for an explanation of this method). Authors were assigned to a country if more than 30% of their publications during the five-year period of interest were from that country. Authors assigned to any of the EU-27 countries were also assigned to the category EU‑27. Similarly, an author was assigned to a subject category or discipline if more than 30% of their publications during the five-year period were in that discipline. If an author did not meet these thresholds for any country or any discipline, they were not assigned to a country or subject area but were accounted for in the aggregated analysis across all disciplines and/or at the world level.

Selection of countries included in the analysis

The selection of countries for presentation in this report was made to ensure representation on a global scale as well as minimum levels of data to ensure robust analyses. Countries selected have at least 4,000 active authors per period (enumerated based on the description in the section “Authors included in the analysis”).  We included an additional threshold to ensure that the prediction of gender based on name was similar across countries included in the report: we limited countries included in the report to those for which a gender could be inferred for at least 85% of author IDs (for more details, see “Author gender inference” below). However, when referring to global figures, we included all authors for whom we could infer gender with 85% confidence (not excluding any country). In particular, because of difficulty inferring gender for Chinese authors, China was not selected for inclusion in the analysis, though any Chinese authors who met our criteria and for whom we could infer gender at our threshold level are included in the world level.

Please note that numbers shown in this report may differ from those in other sources, both because the data that meet our criteria form a subset of the full Scopus database, and because we use full counting (see “Full counting of publications” for details).

Subject areas included in the analysis

Journal titles in Scopus are classified under four broad All Science Journal Classification (ASJC) categories (Life Sciences, Physical Sciences, Health Sciences, and Social Sciences and the Humanities), which are further divided into 27 major disciplines. Titles may belong to more than one discipline. The analyses in this report were based on the four broad ASJC categories and 26 of the 27 ASJC disciplines, with titles classified as “Multidisciplinary” by ASJC reclassified to other ASJC subject areas for this report.

Scopus ASJC Major Subject Areas

Broad Cluster

STEM

Agriculture and Biological Sciences

Life Sciences

X

Arts and Humanities

Social Sciences

 

Biochemistry, Genetics and Molecular Biology

Life Sciences

X

Business, Management and Accounting

Social Sciences

 

Chemical Engineering

Physical Sciences

X

Chemistry

Physical Sciences

X

Computer Science

Physical Sciences

X

Decision Sciences

Social Sciences

 

Dentistry

Health Sciences

 

Earth and Planetary Sciences

Physical Sciences

X

Economics, Econometrics and Finance

Social Sciences

 

Energy

Physical Sciences

X

Engineering

Physical Sciences

X

Environmental Science

Physical Sciences

X

Health Professions

Health Sciences

 

Immunology and Microbiology

Life Sciences

X

Materials Sciences

Physical Sciences

X

Mathematics

Physical Sciences

X

Medicine

Health Sciences

 

Neuroscience

Life Sciences

X

Nursing

Health Sciences

 

Pharmacology, Toxicology and Pharmaceutics

Life Sciences

X

Physics and Astronomy

Physical Sciences

X

Psychology

Social Sciences

 

Social Sciences

Social Sciences

 

Veterinary

Health Sciences

 

Furthermore, findings in science, technology, engineering and mathematics (STEM) research are also reported. STEM was defined by grouping the ASJC 27 categories as identified in the table above.

To define subdisciplines of research within medicine, we leveraged a system in which the major subdisciplines of medicine are further classified into subcategories. Because there is considerable overlap in some subcategories, we defined medicine subdisciplines by grouping the appropriate subcategories based on the frequency that publications were categorized in overlapping subcategories. For example, we created the subdiscipline “Fertility & Birth” because a high percentage of publications in the subcategory “Obstetrics and Gynecology” were also classified in the subcategory “Reproductive Medicine.” The final selection of research subcategories in medicine is shown here.

Health Science Discipline Name

Subcategories Included

Cancer

Cancer Research

Oncology

Cardiology & Pulmonology

Cardiology and Cardiovascular Medicine

Pulmonary and Respiratory Medicine

Diabetes & Endocrinology

Endocrinology

Endocrinology, Diabetes and Metabolism

Emergency Medicine

Critical Care and Intensive Care Medicine

Emergency Medicine

Fertility & Birth

Obstetrics and Gynecology

Reproductive Medicine

General Clinical Medicine

General Medicine

Family Practice

Internal Medicine

Infectious Diseases & Allergy

Immunology and Allergy

Infectious Diseases

Microbiology (medical)

Pediatrics

Pediatrics, Perinatology and Child Health

Public Health

Epidemiology

Health Policy

Public Health, Environmental and Occupational Health

Radiology & Imaging

Radiology, Nuclear Medicine and Imaging

Surgery

Surgery

 

Sustainable Development Goals

In addition, Scopus content is classified under the United Nations Sustainable Development Goals (SDGs). The SDGs challenge the global community to build a world where no one is left behind. Since 2018, Elsevier has generated SDG search queries to help researchers and institutions track and demonstrate progress toward the SDG targets. The analysis on SDG research in this report was based on the Elsevier 2023 SDGs Mapping, which can be found at https://elsevier.digitalcommonsdata.com/datasets/y2zyy9vwzy/1

Author gender inference

We used NamSor to infer the gender of authors. NamSor treats gender as a binary variable and is only able to infer gender as “woman” or “man.” We acknowledge that this poses a limitation to gender inclusiveness. A binary gender was inferred for author IDs using the NamSor API (August 2023 release). The API provides a Gender Calibrated Probability and gender classification based on three data points: country of origin, first name and last name. We generated these three data points for authors based on information related to each author ID. See “determination of author country of origin” and “First name determination” for details on this process.

Determination of author country of origin

We determined each author’s country of origin based on the country of affiliation listed on the publications from their first year of publication in Scopus (i.e., articles, reviews and conference papers). In some cases, authors had published in more than one country in their first year of publication. In these cases, we designated the country with the largest number of publications as the author’s country of origin. Authors with equal numbers of publications in two or more countries were excluded from the gender disambiguation analysis. The process used to determine the author’s country of origin is summarized here.

For each author ID:

First name determination

First and last name are required as input data for NamSor. Therefore, only author IDs with a first and last name were passed through the NamSor API to retrieve a Gender Probability Score. All author IDs for whom no first name data was available were not included in the analysis.

Different variants of an author’s name are commonly observed across their publications. To identify the best first name to pass through NamSor for each author, we assessed all the name variants associated with each author ID. For each author ID in the Scopus snapshot, we examined all publications on which the author ID appears in the author field and generated a list of all distinct first names associated with the author ID. Based on this list, we generated a table with a revised first name for each author ID. The process used to determine the best first name to pass through NamSor is described here.

In cases where only a single first name was associated with an author ID:

These author IDs were excluded from the analysis.

In cases where multiple first names were associated with an author ID, we selected the longest available name following removal of nonsensical characters, provided this name was not composed of a string of initials, according to the following steps for each author ID:

Gender Probability Score

The NamSor Gender Calibrated Probability was used to predict the gender of each author. The NamSor Probability is a value returned by NamSor along with the predicted gender for each name passed to the NamSor API. The Gender Probability is an assigned probability constructed by NamSor using several gold sets. The values for Gender and Gender Probability are the output we retrieve after passing the best first name, last name and country of origin data to their GenderGeoBatch API and the best first and last name data to their GenderBatch API.

Gender Probability ranges from 0.5 to 1.0. A score of 0.5 indicates high uncertainty that the inferred gender is correct and a score of 1.0 indicates the highest level of certainty that the inferred gender is correct. Guided by years of reporting with the NamSor inference, we use a cutoff Gender Probability of 0.85. All names that fall below that threshold are considered as “gender unknown”.

The gender inferred for each name-country combination returned by their GenderGeoBatch API was then matched to author IDs based on the best first name, last name and country of origin. The gender inferred for each name returned by their GenderBatch API is then matched based on first and last name.

Our preference was to use the returned value from GenderGeoBatch as it considers three data points. However, should the Probability returned from GenderGeoBatch not meet the threshold of 0.85, we assessed the gender and Probability returned by GenderBatch to see if the threshold of 0.85 was met. This may happen if a particular name is extremely rare in the country with which it was submitted to GenderGeoBatch.

Author publication history and career stages

Author publication history was determined based on the year in which the author’s first publication appears in Scopus. Then, publication age of each publication was computed per author based on the year of the first publication and the year of each publication. We then binned publications of authors into career stage groups as follows:

These categories enabled analysis based on career stages, accounting for changes as authors progress through career stages. To assign an author to a career stage, two publications during the corresponding five-year career stage was used as the minimum threshold for inclusion.

Full counting of publications

In most analyses in the report, publications are counted using the full counting method, rather than fractional counting.

In fractional counting, each publication is divided among its authors.

Fractional counting is used only in the authorship analysis, in Section 3.1 of the report (see “Authorship analysis” below for details).

In full counting, each publication is counted once for each of its authors.

In full counting, publication totals may seem inflated because publications are being counted multiple times (the publication in the example above is counted as 4).

Full counting is used in the analyses of multi- and interdisciplinarity, open access, scholarly impact and societal metrics.

Authorship analysis

Most of the analyses in this report are based on the arithmetic mean of average scores of authors. However, one analysis comparing shares of active authors with shares of authorship (Section 3.1 Research Authorship) instead relies on fractional shares of publications. In that case, each author in a publication is given a fraction of the publication, based on the number of authors (a short explanation of fractional counting is included in “Full counting of publications” above). For example, each author on a publication with a total of 5 authors will be credited with 0.2 publications. That fraction is further divided if an author is linked to more than one address on the publication, to ensure that summing across countries on a publication adds up to 1 publication.

The share of authorship among men and women in a country and subject area was then calculated by taking the sum of fractionalized publications for authors in the group, and dividing by the total fractional count across both genders to obtain shares of authorship.

Average field-weighted citation impact (FWCI)

Field-weighted citation impact (FWCI) is an indicator of the academic impact or reach of a publication. It is calculated by comparing the number of citations actually received by a publication with the number of citations expected for a publication of the same type, publication year and subject area. An FWCI of more than 1.0 indicates that the publication has been cited more than would be expected based on the global average for similar publications. For example, an FWCI of 2.1 means that the publication has been cited 110% more than the world average for similar publications. An FWCI of less than 1.0 indicates that the publication has been cited less than would be expected based on the global average for similar publications. For example, an FWCI of 0.9 means that the publication has been cited 10% less than the world average for similar publications.

In general, the FWCI is defined as:

FWCI =  Ci / Ei 

With

Ci = citations received by publication i

Ei = expected number of citations received by all similar publications in the publication year plus the following three years

When a publication was allocated to more than one subject area, the harmonic mean was used to calculate FWCI.

How we calculated average FWCI

To measure citation impact by gender, we group all publications authored by women (or men) during the period of interest and calculate the average field-weighted citation impact (FWCI) score for those publications. This was done using fractional counting.

Interdisciplinarity

The disciplinary diversity of references (DDR) of a publication is computed based on the material cited by the publication and reflects the diversity of knowledge that is being integrated into the publication. The indicator considers (a) the number of different subfields that are being cited, (b) the distribution of those citations across the cited subfields, and (c) the intellectual proximity of those subfields to one another. For example, a paper that draws on knowledge from four different subfields would have a higher DDR score than a paper that draws on only three. Similarly, a paper that cites one subfield 90% of the time and the other subfields only 10% of the time would have a lower score than a paper that cites its various subfields in roughly equal measure. Finally, a paper that integrates knowledge from biology and from chemistry would have a lower score than a paper that integrates knowledge from biology and the performing arts, because the former pair is more intellectually proximate than the latter pair. Each paper’s DDR score is adjusted to the average of all papers worldwide published in the same subfield and same year.

In this report, the share of author’s papers among the top 10% with the highest DDR in the world were computed.

To calculate the interdisciplinarity score, the share of publications among the top 10% with the highest DDR in the world were measured among men and women first for each author based on their publications during the period of interest. This was done using full counting, which has the effect of inflating total publication counts (see “Full counting of publications” for details).

The average share among men and women in a country and subject area was then calculated by taking the arithmetic mean of the share for authors in the group.

Multidisciplinarity

The disciplinary diversity of authors (DDA) reflects the diversity of the prior disciplinary backgrounds of a paper’s co-authors. This indicator was developed to account for the number of distinct disciplines, the cognitive distance that separates them, and the balance between them. A paper co-authored by authors whose previous papers were distributed across subfields of science in a similar pattern (i.e., having similar relative frequency across subfields) would score lower than papers bringing together authors with different backgrounds (as measured by the subfields from their prior publications), even if those authors, individually, have published in a less diverse set of subfields. In other words, it is having differences between the backgrounds of each co-author that increases multi-disciplinary integration and not having individual authors with more diverse backgrounds. Nevertheless, authors having diverse backgrounds may be more likely to increase the multi-disciplinary integration of one paper, but only if this diversity is sufficiently different from the subfields of the remaining authors. As a result of this approach, a single-author publication, no matter the diversity of its author’s background, will always receive the minimum score, because the indicator is intended to capture diversity across different authors. Similar to the DDR, the share of an entity’s papers with a DDA score in the top 10% will be measured and normalized to the average of all papers worldwide, published in the same subfield and same year.

To calculate the multidisciplinarity score, the share of publications among the top 10% with the highest DDA in the world were measured among men and women first for each author based on their publications during the period of interest. This was done using full counting, which has the effect of inflating total publication counts (see “Full counting of publications” for details).

The average share among men and women in a country and subject area was then calculated by taking the arithmetic mean of the share for authors in the group.

A note regarding the interpretation of multidisciplinarity scores

Readers may wonder why scores for both women and men on multidisciplinarity are both quite high and well above the expected 10%. This is due to the nature of the indicator itself and how it is built. Publications that are highly multidisciplinary (in the top 10% of papers), will tend to have a higher number of authors. Indeed, there is a strong correlation between the number of authors and odds of being highly multidisciplinary. For instance, highly multidisciplinary publications from 2021 have, on average, 6.9 authors, contrasted with only 4.9 authors for all remaining publications, a difference of 2 authors per publication (40% more). For publications within the bottom 10%, the average number drops to 1.6 per publication. This also means that multidisciplinarity increases as the number of authors rises, peaking and stabilizing at about 10 authors. Because full counting is used to calculate this indicator (see “Full counting of publications”), a single publication in the top 10% will count towards the score of a high number of individuals, while those not in the top 10% will lower the score of fewer individuals.

The outcome of this is that the share for both groups can end up above the 10% threshold. The reason why the same phenomenon is not observed for highly interdisciplinary publications is that this indicator is based on references, not authors, and thus this relationship between a high number of authors and high score is absent.

Open access

Open access (OA) analyses were prepared using data from Unpaywall, which assigns a preferred open access status to publications based on the best open access location of documents available in their data. A single OA status was assigned to a publication:

More information about Unpaywall open access statuses can be consulted online at https://support.unpaywall.org/support/solutions/articles/44001777288-what-do-the-types-of-oa-status-green-gold-hybrid-and-bronze-mean-.

To determine the open access status of Scopus publications, Unpaywall data are linked to Scopus based on Digital Object Identifiers (DOI). Publications in Scopus without DOIs are excluded from the open access analysis.

To calculate the average share in open access among men and women, we first calculated the share of open access (per OA status) for each author based on publications during the period of interest. This was done using full counting, which has the effect of inflating total publication counts (see “Full counting of publications” for details). The average share among men and women in a country and subject area was calculated by taking the arithmetic mean of the share for authors in the group.

Patent citations

Patent citations were measured using a linking of the reference lists of non-patent literature cited in patents to Scopus. A mapping to Scopus of non-patent literature cited in LexisNexis content is maintained by Elsevier’s Analytical Services and returns close to 95% of all valid citations to Scopus, with a precision above 99%. For this analysis, patent applications filed at the United States Patent and Trademark Office (USPTO), the European Patent Office (EPO), the Intellectual Property Office (IPO) of the United Kingdom and the Japan Patent Office (JPO), and through the World Intellectual Property Organization (WIPO) were considered.

To calculate the patent citation score, the share of publications cited at least once in patents was measured first for each author based on their publications during the analyzed period. This was done using full counting, which has the effect of inflating total publication counts (see “Full counting of publications” for details).

The average share among men and women in a country and subject area was then calculated by taking the arithmetic mean of the share for authors in the group. Because patent citations take time to accrue, the share of publications cited in patents decreases as we get closer to the present. Thus, the shares were normalized annually against the average share at the world level for men and women combined, bringing the world level to 1.00. A score above 1.00 indicates more patent citations than expected in a given year, while a score below 1.00 indicates fewer patent citations than expected.

Citations in policy documents

Citation in policy documents were measured data from the Overton database.[1] Overton is the world’s largest searchable index of policy documents, guidelines, think-tank publications and working papers. Its database consists of more than 1.65 million policy documents, with data collected from 182 countries and over a thousand sources worldwide. These policy documents include white papers from international multilateral organizations, as well as guidelines from city councils, parliamentary transcripts and other classes of the so-called “gray literature.” Around half of these documents make citations to academic or scholarly publications. More than 2 million distinct journal-based publications are cited by at least one policy document in the database.

Elsevier uses Overton for tracking research uptake in policy. A qualitative assessment of these citations by Elsevier’s Science-Metrix team revealed that while they “should not be interpreted as indicative of advanced policy outcomes of research directly reaching the legislative or executive processes, they can be seen as achievements in contributing to the first stages of these processes, at the intersection between governance and academia.”[2]

One known limitation of the Overton database is that it displays a bias towards English-language documents originating in Anglo-Saxon countries. While the impact on the analysis for this report is expected to be limited as analyses mostly focus on comparison between women and men, often within the context of the same country, caution is advised when attempting cross-country comparisons for non-English speaking nations.

To determine the policy citation status of Scopus publications, Overton data are linked to Scopus based on Digital Object Identifiers (DOI). Publications in Scopus without DOIs are excluded from the open access analysis.

To calculate the policy citation score, the share of publications cited at least once in policy documents was measured among men and women first for each author based on their publications during the period of interest. This was done using full counting, which has the effect of inflating total publication counts (see “Full counting of publications” for details).

The average share among men and women in a country and subject area was then calculated by taking the arithmetic mean of the share for authors in the group. Because policy citations take time to accrue, the share of publications cited in policy documents decreases as we get closer to present time. To ease interpretation of the data, the shares were thus normalized annually against the average share at the world level for men and women combined, bringing the world level at 1.00. A score above 1.00 indicates more patent citations than expected in a given year, while a score below 1.00 indicates fewer citations than expected.

Alternative metrics

Alternative metrics assess the uptake of scientific literature by popular culture, such as news and online content. Alternative metrics are measured by linking records in Scopus to PlumX, the data source of journalistic and trade news documents, Wikipedia mentions, and blog mentions. PlumX maintains a database recording the uptake of scientific outputs beyond the scientific literature in, for example, social media, blogs, news, and educational resources.

For this report, mentions in three media were retained: mentions in news, blogs and Wikipedia.

To determine the status of mentions in these three media of Scopus publications, PlumX data are linked to Scopus based on Digital Object Identifiers (DOI). Publications in Scopus without DOIs are excluded from these analyses.

To calculate each of these alternative metrics, a paper-level score is first measured. Because mentions in these platforms are relatively rare, a binary status acts as the score instead of the count of mentions, in an attempt to diminish the effect of outliers. Thus, each publication is either given a score of 1 if cited in a given platform, and 0 if not. Then, the world average per subfield, year and document type is computed, and the difference between the observed value and the world average is computed, resulting in a difference score.  

The alternative metric score for each platform was then measured among men and women for each author based on their publications during the period of interest, taking the average of the difference scores. This was done using full counting, which has the effect of inflating total publication counts (see “Full counting of publications” for details).

The alternative metric scores for news, blogs and Wikipedia among men and women in a country and subject area was then calculated by taking the arithmetic mean of the share for authors in the group.

Publications by mixed-gender research teams

it is important to note that publications attributed to men and women authors substantially overlap. This is because a significant portion of research output is authored by mixed-gender research teams. To put this in perspective, as of 2022, the share of highly multidisciplinary publications across all women authors (14.3%) amounted to approximately 275,000 publications. Out of these, 250,000 were co-authored with men. In turn, the share of highly multidisciplinary publications across all men authors (14%) amounted to 335,000, with the same 250,000 being co-authored with women.

This overlap is important to keep in mind when interpreting the analysis, as we are not talking about research authored exclusively by women or men. Nevertheless, when calculated across all women and men authors and vast amounts of data, the indicators reflect the characteristics of each gender group, even though the underlying publications overlap.

Analysis of funding data

Data source

The funding data used for the analyses in this report were based on a snapshot of the Elsevier’s funding database, taken January 26, 2024.

Awards included in the analysis

Awards were attributed to the measured period according to their start year (for instance an award with a start date in 2020 is included under periods 2016–2020, 2017–2021 and 2018–2022). All awards available in the data were included for the analysis.

Awardees included in the analysis

Awardees included in the report were limited to those for whom a Scopus author ID was available and whom were deemed active (as per the “Identification of active authors” section),  and for which the associated Scopus author ID data (first name, last name and country of origin) could support gender inference (as described in the section “Analysis of bibliometric data”). Awardee names in the Elsevier’s funding database were matched to Scopus author IDs based on awardee name and institution details.

Funding agencies included in the analysis

All agencies indexed in the Elsevier funding database that granted a research award with a start date during the time period 2009–2022 to an individual with an author ID were included in the analysis and contributed to the country-level statistics.

Awardee country assignation

We assigned awardees to countries based on the country where the funding agency awarding the grant is located. Awardees who received grants from more than one funding agency based in more than one country thus counted towards more than one country or region.

Selection of countries included in the analysis

Awardees were aggregated at the country level based on the country of the awarding funding agency. Therefore, awardees were assigned to a country based on the location of the funding agency. To ensure we were working with a robust data set, we limited the analyses to those countries selected for the bibliometric analysis with at least 1,000 awardees (enumerated based on the description in the section “Awardees included in the analysis”) during the period 2018–2022.

Awardee gender inference

Awardees were matched to their Scopus author ID and gender was inferred as described in the section “Author gender inference.”

Analysis of patent data

Data source

Patent data were based on patent data from LexisNexis, covering three patent authorities: the United States Patent and Trademark Office (USPTO), the European Patent Office (EPO) and the World Intellectual Property Office (WIPO). Published patent applications were selected as the main unit of measurement for the analyses.

While LexisNexis provides data for more than 100 patent authorities, known limitations in terms of country information of inventors make it difficult to properly report findings at national levels using most authorities. For the IP5 offices (the 5 offices covering the largest markets), only data from the USPTO and EPO are sufficiently complete for use in the context of this report, resulting in data for the China National Intellectual Property Administration (CNIPA), the Japan Patent Office (JIPO) and the Korean Patent Office (KIPO) being left out of the analysis. To broaden the reach of the analysis, data from WIPO have been included for this edition, providing a more international view to inventorship.

Inventor definition and disambiguation

Data analyses were based on team composition on patent applications, as defined by names of inventors on the patent (those who contribute to the claims of a patentable invention).

The decision to use team composition instead of disambiguated inventors came from the absence of a robust unique identifier for inventors in the data source. To circumvent the issue, each patent was instead tagged as one of the six following categories:

These categories are mutually exclusive, and thus country level metrics for each one of these categories add up to 100%. The unknown status is used in cases where there is at least one inventor for whom the gender cannot be inferred and where the team composition status cannot be determined because of this. One unknown case excludes “women-only” or “men-only” as a valid choice, but depending on the number of inventors, the remaining three categories can still be inferred in some cases where the 60% threshold is attained regardless of the unknown cases. For cases for which it is not possible to assign one of the remaining three categories given the balance of men and women, the unknown status applies.

The share of patent applications per team composition status FWCI in a country and subject area was calculated by taking the count of patents for each status, divided by the total number of patents excluding the counts for the unknown category.

Country assignation

Inventors were assigned to the country indicated on the address linked to the inventors in the patent.

Selection of countries included in the analysis

Countries included in the patent analysis are the same as those selected for the bibliometric analysis.

Inventor gender inference

As described in the section “Author gender inference,” a binary gender was inferred for inventors on each of the patent applications using the NamSor API.

Subject areas and subfields included in the analysis

Patents are classified into multiple International Patent Classification (IPC) codes, which provide a system of symbols to classify patents according to technologies. The patent analysis was performed for the eight main sections of the IPC system, including under these any patent with at least one IPC code starting with the letter associated to each section. The sections are:

This classification enabled the investigation of patterns in team composition on patents across technology areas.

 

[1] https://www.overton.io/

[2] Pinheiro, H., Vignola-Gagné, E. and Campbell D. (2021). A large-scale validation of the relationship between cross-disciplinary research and its uptake in policy-related documents, using the novel Overton altmetrics database. Quantitative Science Studies, 2(2): 616-642.