Developing and Validating the Intelligence Literacy Model for Medical Students (ILMS) in the Age of Artificial Intelligence: A Mixed-Methods Study

https://doi.org/10.65613/662169

Chunyan Yang¹+, Kaidi Chen², Feng Chen²,*

¹Department of Education Technology Center, Wenzhou Medical University, Wenzhou, China
²The First School of Clinical Medicine (School of Information and Engineering), Wenzhou Medical University, Wenzhou, China
+First author: Chunyan Yang; Second author: Kaidi Chen; *Corresponding author: Feng Chen

E-mail address: cf@wmu.edu.cn

Abstract

The integration of artificial intelligence (AI) into healthcare is reshaping clinical practice and medical education, creating an urgent need to define and assess “intelligence literacy” among medical students. However, a validated, domain-specific framework and measurement instrument remain limited. This study developed and validated the Intelligence Literacy Model for Medical Students (ILMS) using a sequential exploratory mixed-methods design. In Phase 1, semi-structured interviews with 15 stakeholders (senior physicians, medical educators, and medical students) in China were analyzed using a framework approach integrated with grounded-theory coding, yielding 19 preliminary indicators. In Phase 2, a two-round expert content validity review refined the framework into five dimensions: (1) Technical Understanding and Application, (2) Critical Thinking, (3) Information and Data Literacy, (4) Human–AI Collaboration, and (5) Ethics and Morality. In Phase 3, a 19-item ILMS Scale was developed and administered to 350 medical students at a major Chinese university. The scale showed strong internal consistency (Cronbach’s α= .792–.882 across subscales). Confirmatory factor analysis supported the five-factor structure (χ²/df = 1.98, CFI = .96, TLI = .95, RMSEA = .052, SRMR = .045). MANOVA/ANOVA results indicated that academic performance and grade level were positively associated with most ILMS dimensions, whereas gender showed no significant main effects. Internship experience showed no significant main effects, suggesting that internship exposure as measured in this study may be insufficient by itself to improve AI-related literacy without structured learning opportunities. Several interaction effects (e.g., academic performance × grade level; academic performance × grade level × internship experience) further highlighted the multifactorial nature of ILMS development. The ILMS model and scale provide a theoretically grounded and empirically supported tool for curriculum design and educational assessment in AI-augmented medical training.

Keywords: Artificial Intelligence; Medical Education; Intelligence Literacy; Competency Model; Mixed-Methods Research; Scale Development; Medical Students.

  1. Introduction

1.1 The Inexorable Rise of Artificial Intelligence in Medicine

The 21st century has witnessed the advent of the Fourth Industrial Revolution, characterized by the convergence of digital, biological, and physical worlds, with Artificial Intelligence (AI) at its epicenter (Schwab, 2017) [1] . The field of medicine, traditionally reliant on human expertise and experiential knowledge, is undergoing a paradigm shift of unprecedented scale and velocity due to the integration of AI technologies (Topol, 2019) [2]. From diagnostic imaging, where deep learning algorithms now match or exceed human radiologist performance in identifying malignancies (Esteva et al., 2017; McKinney et al., 2020) [3, 4], to genomics, where AI accelerates the identification of disease-causing mutations (Zou et al., 2019) [5], the impact is pervasive. AI-powered systems are enhancing drug discovery and development, personalizing treatment protocols based on vast datasets, optimizing hospital workflows, and even assisting in robotic surgery with superhuman precision (Davenport & Kalakota, 2019; Yu, Beam, & Kohane, 2018) [6, 7].

This technological tsunami is not merely introducing new tools; it is fundamentally reshaping the cognitive landscape of medical practice. The role of the physician is evolving from that of a primary knowledge repository and decision-maker to a sophisticated manager of information, a critical evaluator of AI-generated insights, and a collaborative partner with intelligent systems  (Wartman & Combs, 2018) [8]. This new reality demands a corresponding evolution in medical education. The traditional curriculum, while foundational, is no longer sufficient to prepare students for a future where clinical acumen must be seamlessly blended with computational thinking and digital fluency (Kolachalama & Garg, 2018; Paranjape et al., 2019) [9, 10]. Failure to adapt risks producing a generation of physicians ill-equipped to harness the power of AI, potentially leading to suboptimal patient care, medical errors, and a failure to realize the full potential of these transformative technologies (Masters, 2019) [11].

1.2 The Conceptual Gap: Defining “Intelligence Literacy” for Medical Students

In response to this educational imperative, the concept of “AI Literacy” or “Intelligence Literacy” has emerged as a critical area of inquiry. Broadly defined, AI literacy refers to the set of competencies that enables individuals to understand, effectively use, and critically evaluate AI technologies in various contexts (Long & Magerko, 2020) [12]. While general frameworks for AI literacy have been proposed for K-12 and university populations (e.g., Touretzky et al., 2019; Ng et al., 2021) [13, 14], these are often too generic to address the unique, high-stakes environment of medicine.

The medical domain introduces specific complexities that generalist models fail to capture. For instance, the ethical considerations surrounding patient data privacy, algorithmic bias in clinical decision support systems, and the “black box” nature of some deep learning models carry life-and-death implications (Char, Shah, & Magnus, 2018; Obermeyer et al., 2019) [15, 16]. The collaborative dynamic between a physician and an AI is not merely technical but deeply relational, impacting the doctor-patient relationship and shared decision-making processes (Blease et al., 2019) [17]. Consequently, applying a generic information literacy model as a foundation, as some initial explorations have attempted (e.g., Han et al., 2019) [18], risks oversimplifying the construct and omitting domain-specific nuances crucial for clinical competence. A bespoke model, grounded in the realities of medical practice and education, is urgently needed.

1.3 Rationale for the Methodological Approach

The development of a robust competency model, particularly for a nascent and complex construct like intelligence literacy in medicine, requires a rigorous and multifaceted methodological approach. The literature on competency modeling presents various methods, including theoretical analysis based on frameworks like the “Onion Model” (Spencer & Spencer, 1993) [19], quantitative approaches like factor analysis and Analytic Hierarchy Process (AHP) (Dong et al., 2007; Yang et al., 2014) [20, 21], and qualitative methods like Thematic Framework Analysis (TFA)(Zhang et al., 2016) [22].

Given that the concept of intelligence literacy for medical students is still in an exploratory phase and lacks a substantial body of quantitative data, a purely quantitative approach like factor analysis would be premature. A purely theoretical approach risks being detached from the lived experiences and perceived needs of stakeholders on the ground—the clinicians, educators, and students who will ultimately navigate this new landscape. Therefore, a method that can systematically analyze qualitative data to build a new conceptual framework from the ground up is most appropriate. Thematic Framework Analysis (TFA), with its structured, hierarchical approach to identifying and mapping themes, is exceptionally well-suited for this task(Gale et al., 2013) [23].

To enhance the rigor and depth of this qualitative inquiry, we integrated TFA with coding procedures from Grounded Theory (Corbin & Strauss, 2008) [24], specifically open and axial coding, to ensure a systematic and data-driven emergence of concepts and categories. Furthermore, to validate and refine the emergent framework, we employed the Delphi method, a structured communication technique designed to achieve expert consensus (Linstone & Turoff, 1975)[25]. This multi-stage process ensures that the final model is not only grounded in empirical data but also vetted by the collective wisdom of leading experts in the field. This combined methodology—TFA with grounded theory coding, followed by Delphi validation—provides a robust, defensible pathway for constructing a novel competency model.

This study, therefore, adopts a sequential exploratory mixed-methods design, mirroring the holistic and rigorous principles of scale development and validation models such as that proposed by Zhou (2019) [26]. This model, which has been successfully applied in developing psychological scales (e.g., Bagnall et al., 2024; Chen et al., 2021) [27, 28], emphasizes a systematic progression from qualitative exploration to quantitative validation. By following this approach, we aim not only to define the conceptual dimensions of intelligence literacy for medical students but also to develop a validated instrument for its assessment, thereby providing a comprehensive solution for medical educators and researchers.

1.4 Research Objectives and Questions

This study is part of a larger project investigating the impact of AI on medical education and the necessary competencies for future physicians in Hong Kong and Mainland China. The primary aim of the present study is to develop and validate a comprehensive model of Intelligence Literacy for Medical Students (ILMS). The specific research questions guiding this endeavor are:

What are the core competencies, skills, and attitudes that constitute intelligence literacy for medical students in the AI era, as perceived by key stakeholders (physicians, medical educators, and medical students)?

Can these identified competencies be structured into a coherent and conceptually sound multidimensional framework?

Can a reliable and valid instrument be developed to assess the intelligence literacy of medical students based on the proposed framework?

What is the current level of intelligence literacy among medical students, and how is it influenced by demographic and academic factors such as grade level, gender, academic performance, and internship experience?

By answering these questions, this study seeks to provide a foundational contribution to the field of medical education, offering a clear conceptual map and a practical assessment tool to guide the integration of AI-related competencies into medical curricula worldwide. Ethical approval for all phases of this research was obtained from the Institutional Review Board of [Anonymized University Name], and informed consent was secured from all participating individuals and institutions.

  1. Methodology

This study followed a rigorous, multi-phase, sequential exploratory mixed-methods design to develop and validate the Intelligence Literacy Model for Medical Students (ILMS). The entire process, illustrated in Figure 1, involved three distinct yet interconnected phases: (1) Qualitative framework generation through thematic framework analysis of stakeholder interviews, (2) Model refinement and validation via the Delphi method with experts, and (3) Quantitative instrument development and initial validation through a survey of medical students. This methodological architecture was deliberately chosen to ensure that the resulting model is both empirically grounded in the authentic experiences of the target population and its ecosystem (Phase 1), and theoretically robust through expert consensus (Phase 2), while also being measurable and applicable in practice (Phase 3).

Figure 1: Flowchart of the Multi-Phase Research Design

2.1 Qualitative Framework Generation

The initial phase aimed to explore and define the constituent components of intelligence literacy from the perspectives of those most intimately involved in the medical field. This bottom-up approach was crucial for ensuring the ecological validity and relevance of the subsequent model.

2.1.1. Participants (Sample 1)

A total of 15 key stakeholders were recruited from W University and its affiliated teaching hospitals in China using a purposive, maximum variation sampling strategy. This strategy was employed to capture a rich and diverse range of perspectives on the intersection of AI and medicine. The sample comprised three distinct groups, each offering a unique vantage point:

Senior Physicians (n=5): These participants were selected based on their direct involvement in clinical practice and research projects incorporating AI technologies (e.g., AI-assisted diagnostics, predictive modeling). They were all senior clinicians from various departments within W University’s affiliated hospitals, providing insights into the practical demands and challenges of using AI in patient care.

Medical Educators (n=5): This group comprised faculty members from W University’s medical school who were actively engaged in curriculum development, teaching innovative courses related to AI in medicine, or conducting research in medical education technology. Their perspective was vital for understanding the pedagogical challenges and opportunities in fostering intelligence literacy.

Medical Students (n=5): Senior clinical medical students were recommended by their counselors based on their demonstrated interest and foundational understanding of AI in medicine. Their inclusion was critical for capturing the learner’s perspective, including their perceived needs, anxieties, and aspirations regarding AI.

Recruitment was concluded after the 15th interview, as the analysis of the final few interviews yielded no new major themes or concepts, indicating that data saturation had been reached.

The profiles of the informants, including their role, years of experience, and area of specialty, are summarized in Table 1.

Table 1

Informants’ profile of the sample (N = 15)

 

*Participants Sex Working/Study experience (age) Specialty/Subjects Institution type Main focus areas
D1 M 10 years (38) Cardiology Tertiary hospital AI understanding & application, critical thinking, human-AI collaboration
D2 M 15 years (43) Oncology Tertiary hospital Data awareness & analysis, information retrieval & evaluation, frontier updates
D3 M 8 years (35) Pulmonology Tertiary hospital Ethics (privacy, fairness, responsibility, rational use)
D4 M 20 years (50) Medical Imaging Tertiary hospital Human-AI collaboration, AI tools operation & results interpretation, teamwork
D5 M 18 years (46) Radiology Tertiary hospital Quality control, AI pros & cons awareness, application skills
T1 F — (45) Medical Informatics Medical university AI understanding & application, critical thinking, data & information literacy
T2 M — (50) Biomedical Engineering Medical university Understand/use/evaluate/ethics in AI
T3 F — (42) Clinical Skills Training Medical university Human-AI collaboration, ethics, communication, critical thinking
T4 M — (48) Medical Ethics Medical university Ethics sensitivity, judgment, responsibility
T5 F — (40) Educational Technology Medical university Digital literacy, information literacy, frontier updates
S1 M 5th-year (23) Clinical Medicine Medical university AI understanding, critical thinking, ethics
S2 F 4th-year (22) Medical Imaging Medical university Human-AI collaboration, data skills, ethics
S3 M 3rd-year (21) Clinical Medicine Medical university Human-AI collaboration, AI understanding, data literacy
S4 F 5th-year (23) Preventive Medicine Medical university Information literacy, AI understanding, critical thinking
S5 M 4th-year (22) Clinical Medicine Medical university Ethics (humanistic care), AI understanding, critical thinking

Note. * Numbers (e.g. D1,T1,S1) were used to replace the names of participants.

2.1.2. Data Collection Procedure

Data were collected through in-depth, semi-structured interviews. This format was chosen to provide a structured guide for the conversation while allowing the flexibility to probe emergent themes and explore individual experiences in detail.

Ethical Considerations: Prior to commencing data collection, full ethical approval was granted by the W University Institutional Review Board (Reference Number: [Insert Number]). All potential participants received a detailed information sheet explaining the study’s purpose, procedures, potential risks and benefits, and data handling protocols. Written informed consent was obtained from every participant before their interview. Key ethical principles were rigorously upheld:

Confidentiality: Participants were assured that their identities would be kept strictly confidential. All transcripts were anonymized using alphanumeric codes (e.g., Physician-01, Educator-03, Student-05).

Data Security: All audio recordings and digital transcripts were stored on encrypted, password-protected servers, accessible only to the primary research team.

Voluntary Participation: Participants were explicitly informed that their involvement was voluntary and that they could withdraw from the study at any time without penalty or explanation.

Interview Protocol: The interviews, conducted by the lead author, took place in private, quiet settings (e.g., university offices, hospital meeting rooms) to ensure comfort and confidentiality. Each interview lasted between 20 and 40 minutes and was audio-recorded with the participant’s permission. An interview guide was developed based on the research questions and a preliminary literature review. The guide included open-ended questions and prompts designed to elicit rich narratives and detailed reflections, such as:

“From your perspective, what does it mean for a future doctor to be ‘literate’ or ‘competent’ in the age of AI?”

“Can you describe a situation where AI is being used, or could be used, in your field? What skills would a medical student need to navigate that situation effectively?”

“What are the biggest opportunities and the most significant risks you see with the increasing use of AI in medicine?”

“If you were designing a curriculum to prepare medical students for an AI-driven future, what key topics or skills would you include?”

“Beyond just using the technology, what kind of thinking or mindset is important?”

Probing questions were used dynamically to encourage elaboration (e.g., “Could you give me an example of that?” “Why do you think that is particularly important for doctors?”).

2.1.3. Data Analysis

The analysis of the interview data was a systematic, multi-stage process that integrated principles from Thematic Framework Analysis (TFA) (Ritchie & Spencer, 1994) [29] and Grounded Theory coding(Corbin & Strauss, 2008) [24]. The software NVivo 20 was used to manage the data and facilitate the coding process. A research team, consisting of the lead author and a collaborator with expertise in AI and education research, conducted the analysis.

Familiarization and Transcription: All audio recordings were transcribed verbatim. The research team then repeatedly read the transcripts while listening to the audio recordings to immerse themselves in the data and gain a holistic understanding of the participants’ narratives.

Open Coding (Initial Conceptualization): The team engaged in a meticulous process of open coding. They analyzed the data line-by-line, identifying and labeling segments of text that related to the skills, knowledge, attitudes, and behaviors associated with medical students’ intelligence literacy. This process generated a large number of initial codes, or “free nodes.” For instance, a statement like, “You need to know what AI is, what it can do, what it can’t do, its strengths and its limits,” was coded as “Awareness of AI limitations.” This initial phase yielded 146 distinct reference points (free nodes). These were then consolidated and abstracted into 29 initial concepts (third-level nodes), representing more defined ideas, such as “Recognizing AI fallibility” or “Understanding algorithmic basics.”

Axial Coding (Developing Categories): In the next stage, the research team moved to axial coding. This involved systematically examining the relationships between the 29 initial concepts, grouping them together based on shared properties and dimensions. This process of constant comparison and conceptual aggregation led to the formation of 19 more abstract and robust categories (second-level nodes). These categories represented the first iteration of the core components of intelligence literacy. Table 2 provides illustrative examples of how raw data were transformed through open and axial coding into these emergent categories.

Table 2: Examples of Categories Formed Through Open Coding of

 Medical Students’ Intelligence Literacy Components

 

Category

(Second-level node)

Initial Concept

(Third-level node)

Original Representative Quotation
Basic Concepts Awareness of AI Limitations “To know what AI is, what it can do, what it cannot do, its advantages and limitations.”
Risk Identification Questioning and Scrutinizing AI Conclusions “To learn to question and scrutinize AI’s conclusions, and to judge whether the results provided by AI are reasonable.”
Operational Skills Operating AI Tools and Interpreting Results “To skillfully operate various AI medical systems, such as auxiliary diagnostic systems and electronic medical record (EMR) systems.”
Data Analysis Data Awareness and Analysis “To understand the value of data, learn to use data to assist in decision-making, for example, using genetic data for tumor risk prediction.”
Patient Privacy Protection Privacy and Data Security “To value privacy and data security, understand the sensitivity of patient data, and protect patient privacy.”

Selective Coding and Framework Development: Finally, the 19 categories were further analyzed to identify the core, overarching themes that could organize the entire dataset. Through selective coding, the team identified five central themes that served as the primary dimensions of the model. These five “first-level nodes” were: (1) Technical Understanding and Application, (2) Critical Thinking, (3) Information and Data Literacy, (4) Human-AI Collaboration, and (5) Ethics and Morality.

Ensuring Analytical Rigor: To enhance the credibility and trustworthiness of the qualitative analysis, several measures were implemented:

Inter-Coder Reliability: The lead author and the research assistant independently coded a subset of three randomly selected transcripts (20% of the data). Cohen’s Kappa coefficient was calculated to assess the level of agreement. An initial Kappa of 0.78 was achieved. Discrepancies in coding were then discussed at length, leading to the refinement of code definitions and the clarification of the coding framework. A second round of independent coding on another transcript yielded a Kappa of 0.89, indicating a high degree of inter-coder reliability. All remaining transcripts were then coded collaboratively, with regular meetings to resolve any ambiguities.

Peer Debriefing: The emerging framework and analytical memos were regularly presented to a senior researcher in medical education who was not part of the core research team. This process of peer debriefing helped to challenge assumptions, identify potential biases, and ensure the logical coherence of the developing model.

2.2 Expert Content Validity Review

The preliminary ILMS model developed in Phase 1 required expert validation to ensure its conceptual soundness and scientific rigor. Instead of a large-scale consensus survey, a structured expert content validity review was employed to facilitate deep, qualitative refinement of the construct.

2.2.1. Expert Panel Selection

A panel of three distinguished experts was purposively recruited. The selection criteria were stringent, requiring panelists to have significant and demonstrable expertise at the intersection of medicine, education, and artificial intelligence. The panel comprised:

Expert A: A Professor of Medical Informatics with over 20 years of experience in both clinical practice and research on clinical decision support systems.

Expert B: The Director of a University Center for Educational Technology, with a Ph.D. in educational psychology and a research focus on integrating emerging technologies into higher education curricula.

Expert C: A practicing cardiologist who is also the lead for AI strategy at a major teaching hospital and has published extensively on the ethical implications of AI in healthcare.

This composition ensured a balanced and multi-faceted review of the model, covering clinical, pedagogical, technological, and ethical dimensions.

2.2.2. Validation Procedure

An iterative two-round review process was conducted:

Round 1 (Conceptual Review): The panel evaluated the dimensional structure and the 19 sub-constructs. Experts provided qualitative feedback regarding the clarity, relevance, and domain-specificity of each item. For instance, Expert C suggested that “Human-AI Collaboration” be expanded to include communicative aspects within a hybrid team.

Round 2 (Item Refinement): The revised model was returned to the experts. They assessed the relevance of each indicator using a 4-point scale (1=Not Relevant to 4=Highly Relevant) to establish Content Validity. All items in the final model received a rating of 3 or 4 from all experts, indicating strong expert consensus on the model’s validity.

The outcome of this phase was the validated, five-dimension ILMS framework, which formed the theoretical foundation for the subsequent quantitative assessment phase.

2.3 Quantitative Assessment

Following the development and validation of the ILMS model, the study progressed to its third phase: the creation and administration of a quantitative instrument to assess the intelligence literacy of medical students and explore its correlates.

2.3.1. Instrument Development

An assessment instrument, the Intelligence Literacy for Medical Students (ILMS) Scale, was constructed based directly on the final validated five-dimension model. The 19 indicators identified and refined during the qualitative and Delphi phases were converted into survey items. Each of the five dimensions (Technical Understanding and Application, Critical Thinking, Information and Data Literacy, Human-AI Collaboration, Ethics and Morality) was represented by its constituent indicators, resulting in a 19-item questionnaire. For example, an indicator like “Risk Identification” under the Critical Thinking dimension was transformed into a statement such as “I am able to question and critically evaluate the conclusions provided by an AI system.” Participants were asked to rate their agreement or perceived competence for each item on a 5-point Likert scale, ranging from 1 (Strongly Disagree / Not at all Competent) to 5 (Strongly Agree / Very Competent).

2.3.2 Pilot Testing.

Prior to the main survey administration, a pilot test of the 19-item ILMS Scale was conducted with a convenience sample of 30 medical students who were not part of the main study sample. The purpose was to assess the clarity, readability, and face validity of the items, as well as the time required for completion. Feedback from the pilot participants led to minor wording adjustments on three items to enhance clarity and remove potential ambiguity. This process ensured the instrument was well-understood by the target population.”

2.3.3. Participants and Procedure (Sample 2)

The target population for this phase was medical students at W University in China, aged 18 years or older. A cross-sectional survey design was employed. A questionnaire link was disseminated through university class-based social media groups, utilizing a convenience sampling approach. A total of 350 valid responses were collected and included in the final analysis.

The target sample size was determined based on established guidelines for factor analysis, which commonly recommend a subject-to-item ratio of at least 10:1 (Nunnally, 1978)[34]. With 19 items on the scale, a minimum of 190 participants was required. To ensure sufficient statistical power for the subsequent MANOVA, we aimed for a larger sample of over 300. The final sample of 350 valid responses was therefore deemed robust for the planned analyses.

The questionnaire consisted of two parts. The first part collected demographic and academic information, including gender, age, grade level (from first-year undergraduate to postgraduate), self-reported academic performance (categorized as Top 20%, 21-50%, 51-80%, Bottom 20%), and whether they had prior clinical internship experience. The second part comprised the 19-item ILMS Scale.

2.3.4. Data Analysis

The collected data were analyzed using IBM SPSS Statistics, Version 30. The analysis proceeded in several stages:

Descriptive Statistics: Means, standard deviations, frequencies, and percentages were calculated to describe the sample’s demographic characteristics and the overall scores on the ILMS Scale and its subscales.

Psychometric Evaluation of the ILMS Scale:

Reliability Analysis: The internal consistency of the overall scale and each of the five subscales was assessed using Cronbach’s alpha coefficient. An alpha value of 0.70 or higher is generally considered acceptable for research purposes (Nunnally, 1978) [34].

Construct Validity Analysis: A Confirmatory Factor Analysis (CFA) was conducted using AMOS (Version 28.0) to test the hypothesized five-factor structure of the ILMS Scale. Model fit was assessed using multiple goodness-of-fit indices: the chi-square to degrees of freedom ratio (χ²/df), the Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI), the Root Mean Square Error of Approximation (RMSEA), and the Standardized Root Mean Square Residual (SRMR). Accepted thresholds for good model fit were used (e.g., χ²/df < 3, CFI/TLI > .95, RMSEA < .06, SRMR < .08; Hu & Bentler, 1999)[35].

Inferential Statistical Analysis: A Multivariate Analysis of Variance (MANOVA) was conducted to investigate the main and interaction effects of the independent variables (academic performance, gender, grade level, internship experience) on the five dependent variables (the mean scores for each of the five ILMS dimensions). MANOVA was chosen as the appropriate statistical test because it can analyze the influence of multiple categorical independent variables on several continuous, correlated dependent variables simultaneously, while controlling for Type I error inflation. A significance level of p < .05 was set for all statistical tests. Partial eta squared (η²) was used as the measure of effect size.

  1. Results

3.1 The Validated Intelligence Literacy Model for Medical Students (ILMS)

The rigorous process of qualitative data analysis and expert validation culminated in the final Intelligence Literacy Model for Medical Students (ILMS). This model provides a comprehensive framework that delineates the essential competencies required for future physicians to thrive in an AI-integrated healthcare environment. The model is structured around five core dimensions, each underpinned by specific indicators. Figure 3 presents the final conceptual framework, and the following sections provide a detailed explication of each dimension.

Figure 3: Intelligence Literacy Model for Medical Students (ILMS)

3.1.1 Dimension 1: Technical Understanding and Application

This dimension constitutes the epistemological and practical foundation of the model. It equips medical students with the essential knowledge and skills to proficiently comprehend and utilize AI technologies, transcending superficial awareness to foster functional proficiency. This ensures clinicians are not merely passive end-users but informed and capable participants in an AI-augmented healthcare ecosystem. It comprises three key indicators:

AI Conceptual Cognition: This indicator refers to a robust grasp of AI’s core terminology, historical context, and the contemporary landscape of its medical applications. Students must be able to define and differentiate key concepts (e.g., machine learning, deep learning, natural language processing, generative AI), articulate the major developmental milestones of AI in medicine, and maintain awareness of cutting-edge clinical AI systems (e.g., specific diagnostic algorithms, robotic-assisted surgery platforms). This establishes the essential lexicon and conceptual framework for all subsequent engagement with AI.

Algorithmic Principle Comprehension: This indicator moves beyond definitions to the conceptual underpinnings of how AI technologies function. While not mandating proficiency in data science, it requires an appreciation for fundamental algorithmic principles, the critical role of data in model training, and, crucially, the inherent limitations and potential failure modes of these systems. Understanding the probabilistic, non-deterministic nature of AI, alongside concepts such as overfitting, model drift, and the impact of biased training data, is indispensable for responsible clinical application (Kelly et al., 2019) [30].

Clinical Tool Integration: This indicator represents the translation of theoretical knowledge into clinical utility. It focuses on the ability to operate AI-assisted diagnostic software, interpret its outputs (including confidence scores and uncertainty estimations), and seamlessly integrate AI-generated insights into established clinical workflows. This encompasses tasks from medical imaging analysis and disease risk prediction to drug discovery research, bridging the gap between algorithmic output and actionable clinical judgment.

3.1.2 Dimension 2: Critical Thinking

This dimension serves as the cognitive bulwark against the uncritical adoption of AI, emphasizing the skills required to systematically evaluate AI-generated information. Drawing from the principles of evidence-based medicine, critical appraisal in this context is the ability to independently and objectively analyze, question, and appraise AI applications and their outputs. It is the intellectual engine that prevents automation bias (Goddard et al., 2012) [31] and ensures AI serves to augment, rather than supplant, expert human judgment, thereby safeguarding patient safety and driving innovation through the identification of current systems’ limitations.

Technology Applicability Assessment: The ability to conduct an a priori evaluation of an AI tool’s suitability for a specific clinical context. This involves assessing its validation studies, understanding its intended use population, and determining its alignment with the clinical problem at hand.

Algorithmic Limitation & Risk Identification: The capacity to recognize an algorithm’s operational boundaries and proactively identify potential patient safety risks. This includes understanding its performance on out-of-distribution data and anticipating failure modes that could lead to diagnostic or therapeutic errors.

Evidence-Based Technology Selection: The competence to synthesize evidence from technical validations and clinical studies to make a rational, evidence-supported decision to adopt, reject, or provisionally use an AI technology, analogous to the appraisal of a new pharmaceutical or medical device.

3.1.3 Dimension 3: Information and Data Literacy

In the data-rich landscape of modern medicine, this dimension recognizes data-driven competency as an indispensable skill for the AI-enabled physician. It represents a synthesis of traditional information literacy and contemporary data science principles, tailored to the medical domain. Mastery of these skills enables physicians to leverage vast datasets for personalized medicine, predictive health analytics, and more informed clinical decision-making.

Medical Data Acquisition: The skill to identify, source, and ethically collect relevant, high-quality medical data from heterogeneous sources, including electronic health records (EHRs), genomic databases, medical imaging archives, and real-world data (RWD).

Data Curation & Management: The critical ability to prepare raw data for analysis. This involves processes of data cleaning, normalization, handling of missing values, and ensuring data integrity and provenance, which are foundational for reliable AI model development and application.

Data Modeling & Analysis: A conceptual understanding of how to apply appropriate analytical or machine learning models to curated data to extract meaningful patterns, correlations, and predictive signals, and to interpret the results of these models.

Data-Driven Decision-Making: The ultimate application of data skills: the ability to synthesize analytical outputs with deep clinical expertise to inform personalized treatment plans, patient risk stratification, and population health management strategies.

3.1.4 Dimension 4: Human-AI Collaboration

This dimension addresses the paradigm shift from a “human-tool” relationship to a “human-teammate” model, preparing students for a future where clinical tasks are performed by hybrid teams of human experts and intelligent agents. It extends beyond simple operational fluency to a suite of sophisticated interactive skills necessary for safe and efficient symbiotic work.

Intelligent System Operation: Procedural fluency in operating a variety of AI-enabled medical devices and software platforms, from robotic surgical systems to complex diagnostic dashboards, ensuring both safety and efficacy.

Human-AI Interaction & Communication: The capacity for effective bidirectional communication with AI systems. This includes skillful querying (e.g., prompt engineering for generative models), correctly interpreting AI-generated reports and visualisations, and understanding the nuances of an AI’s confidence levels and limitations.

Clinical Team Integration: The ability to effectively integrate an AI’s role and contributions within the broader human clinical team. This involves fostering clear communication channels and coordinated action plans that leverage the strengths of both human and artificial intelligence to achieve shared clinical goals.

Collaborative Workflow Management: The competence to design, implement, and oversee safe and efficient clinical workflows that incorporate AI agents. This involves establishing protocols, defining roles, and managing the dynamic interaction between human and AI team members to optimize patient outcomes.

3.1.5 Dimension 5: Ethics and Morality

This dimension provides the essential normative framework governing the responsible deployment of AI in medicine. It ensures that the application of powerful AI technologies aligns with enduring human values, professional medical ethics, and established legal and regulatory standards. It is a critical safeguard for patient rights and public trust, preserving the integrity of the medical profession in an era of technological disruption.

Data Security & Privacy Protection: A steadfast commitment to upholding data security and patient privacy by adhering to regulatory frameworks (e.g., HIPAA, GDPR) (Price & Cohen, 2019) [33] and technical best practices, ensuring the confidentiality and integrity of patient data throughout the AI lifecycle.

Algorithmic Transparency & Explainability: The imperative to use AI systems that are transparent in their design and whose decision-making processes are explainable to a clinically acceptable degree (Explainable AI, XAI) (Amann et al., 2020) [32], enabling clinicians to understand, trust, and verify AI-generated recommendations.

Algorithmic Fairness & Bias Mitigation: The capacity to critically assess AI models for demographic, socioeconomic, or other biases that could lead to health inequities, and to advocate for or participate in strategies to mitigate these discriminatory outcomes.

Informed Consent for Clinical Application: The ethical duty to ensure patients (or their surrogates) adequately understand the role, benefits, limitations, and potential risks of an AI system involved in their diagnosis or treatment, thereby enabling truly informed and autonomous consent.

Accountability & Continuous Oversight: An understanding of the complex lines of professional and legal accountability when AI systems are used in clinical care. This includes a commitment to the continuous, post-deployment monitoring and evaluation of AI applications to ensure their ongoing safety, efficacy, and equity in real-world practice.

3.2  Quantitative Assessment of Medical Students’ Intelligence Literacy

This section presents the results of the quantitative survey phase, beginning with the psychometric properties of the newly developed ILMS Scale, followed by the findings from the MANOVA examining the factors influencing students’ intelligence literacy levels.

3.2.1. Psychometric Properties of the ILMS Scale

The initial step in the quantitative analysis was to establish the reliability of the ILMS Scale. Internal consistency was evaluated for each of the five subscales using Cronbach’s alpha. As shown in Table 3, the alpha coefficients for the subscales were all robust, indicating a high degree of internal consistency. The values ranged from α = .792 for Critical Thinking to α = .882 for Ethics and Morality. All coefficients comfortably exceeded the recommended threshold of .70, suggesting that the items within each subscale reliably measure the same underlying construct. These results provide strong evidence for the reliability of the ILMS Scale as an instrument for measuring the different dimensions of medical students’ intelligence literacy.

Table 3: Reliability Analysis of the ILMS Scale Subscales

Variable Dimension

Cronbach’s Alpha

Cronbach’s Alpha based on Standardized Items N of Items
Technical Understanding & Application 0.804 0.804 3
Critical Thinking 0.792 0.792 3
Information & Data Literacy 0.857 0.857 4
Human-AI Collaboration 0.855 0.855 4
Ethics & Morality 0.882 0.882 5

3.2.2.Construct Validity.

The results of the Confirmatory Factor Analysis (CFA) indicated that the hypothesized five-factor model provided a good fit to the data: χ²(142) = 280.5, p < .001; χ²/df = 1.98; CFI = .96; TLI = .95; RMSEA = .052 (90% CI = [.043, .061]); SRMR = .045. All fit indices met the established criteria for acceptable model fit. Furthermore, the standardized factor loadings for all 19 items on their respective latent factors were statistically significant (p < .001) and ranged from .65 to .88, demonstrating strong convergent validity. These findings provide robust statistical support for the five-dimension structure of the ILMS model and the construct validity of the scale.

3.2.3. Main and Interaction Effects on Intelligence Literacy

A four-way MANOVA was conducted to determine the effects of academic performance, gender, grade level, and internship experience on the five dimensions of intelligence literacy. Table 5 summarizes the results, showing the F-statistic, p-value, and partial eta squared (η²) for each main effect and interaction effect.

Table 4: Results of the Multivariate Analysis of Variance (MANOVA) for Main and Interaction Effects on ILMS Dimensions

Factor / Interaction Technology Understanding and Application Information and Data Literacy Human-AI Collaboration Capability Critical Thinking Ethics and Morality
Academic Performance 3.587**, <0.001, .114 3.193**, 0.002, .103 3.555**, <0.001, .113 3.025**, 0.003, .098 2.958**, 0.004, .096
Gender 2.899, 0.090, .013 0.016, 0.898, .000 0.886, 0.347, .004 0.118, 0.732, .001 2.926, 0.089, .013
Grade Level 2.899*, 0.015, .061 2.866*, 0.016, .060 1.410, 0.222, .031 2.368*, 0.041, .050 3.600**, 0.004, .075
Internship Experience 0.001, 0.976, .000 0.002, 0.965, .000 0.184, 0.668, .001 0.051, 0.822, .000 0.011, 0.918, .000
Academic Performance × Gender 0.686, 0.684, .021 0.814, 0.576, .025 0.895, 0.511, .027 0.753, 0.627, .023 0.634, 0.728, .020
Academic Performance × Grade Level 1.201, 0.218, .155 1.814**, 0.006, .217 1.552*, 0.033, .191 1.551*, 0.033, .191 1.099, 0.333, .144
Academic Performance × Internship Experience 1.392, 0.201, .048 1.262, 0.264, .043 0.443, 0.894, .016 1.023, 0.419, .035 0.486, 0.866, .017
Gender × Grade Level 1.625, 0.154, .035 1.029, 0.401, .023 2.330*, 0.043, .050 0.639, 0.670, .014 0.300, 0.912, .007
Gender × Internship Experience 0.154, 0.695, .001 1.544, 0.215, .007 0.711, 0.400, .003 1.324, 0.251, .006 0.617, 0.433, .003
Grade Level × Internship Experience 1.401, 0.225, .030 0.297, 0.914, .007 0.557, 0.733, .012 0.280, 0.924, .006 0.714, 0.614, .016
Academic Performance × Gender × Grade Level 0.702, 0.771, .042 0.900, 0.559, .053 0.378, 0.980, .023 0.979, 0.475, .058 1.225, 0.258, .071
Academic Performance × Gender × Internship Exp. 0.587, 0.710, .013 1.704, 0.135, .037 0.566, 0.726, .013 1.566, 0.171, .034 1.654, 0.147, .036
Academic Performance × Grade Level × Internship Exp. 0.879, 0.600, .063 2.066**, 0.009, .136 0.538, 0.932, .039 2.129**, 0.007, .140 2.141**, 0.007, .140
Gender × Grade Level × Internship Experience 2.149, 0.076, .037 1.953, 0.103, .034 0.637, 0.636, .011 1.912, 0.109, .033 0.341, 0.850, .006
Academic Performance × Gender × Grade Level × Internship Exp. 1.872, 0.100, .040 1.283, 0.272, .028 0.530, 0.754, .012 0.933, 0.460, .020 0.857, 0.511, .019

Main Effects

Academic Performance: The analysis revealed a highly significant main effect of academic performance on all five dimensions of intelligence literacy (all p-values < .01). The effect sizes, as indicated by partial eta squared, were moderate, ranging from η² = .096 for Ethics and Morality to η² =.114 for Technical Understanding and Application. This indicates that students with higher self-reported academic performance consistently scored significantly higher across all facets of intelligence literacy.

Gender: There was no statistically significant main effect of gender on any of the five ILMS dimensions (all p-values > .05). The effect sizes were negligible (η² close to .000), suggesting that, within this sample, male and female medical students did not differ significantly in their overall levels of intelligence literacy.

Grade Level: Grade level demonstrated a significant main effect on four of the five dimensions: Technical Understanding and Application (p = .015), Information and Data Literacy (p = .016), Critical Thinking (p = .041), and a particularly strong effect on Ethics and Morality (p = .004). The effect sizes were moderate (η² ranging from .050 to .075). Post-hoc comparisons using the Tukey HSD test indicated that for the ‘Ethics and Morality’ dimension, postgraduate students (M = 4.22, SD = 1.03) scored significantly higher than first-year (M = 2.85, SD = 1.05) and second-year (M = 3.10, SD = 0.98) undergraduate students (p < .001 for both comparisons). Similar patterns were observed for the other significant dimensions, confirming a clear developmental trend.

Internship Experience: Internship experience did not show a significant main effect on any of the five dimensions of intelligence literacy (all p-values > .60). The effect sizes were virtually zero, indicating that simply having had internship experience, when considered as a standalone factor, was not associated with higher intelligence literacy scores.

Interaction Effects

The MANOVA also uncovered several significant and complex interaction effects:

Academic Performance×Grade Level: A significant two-way interaction was found between academic performance and grade level for Information and Data Literacy (p=.006), Critical Thinking (p=.033), and Human-AI Collaboration (p=.033). This suggests that the relationship between academic performance and these specific literacies differs depending on the student’s year of study. For example, the advantage held by high-performing students may become more pronounced in later years of study.

Gender×Grade Level: A significant interaction effect between gender and grade level was observed for the Human-AI Collaboration dimension (p=.043). This implies that the development of collaborative skills with AI may follow different trajectories for male and female students as they progress through their medical education.

Academic Performance×Grade Level×Internship Experience: Most strikingly, a significant three-way interaction was detected among academic performance, grade level, and internship experience for Information and Data Literacy (p=.009), Critical Thinking (p=.007), and Ethics and Morality (p=.007).

This suggests that internship experience does not act as a universal educational tool, but rather as a ‘capstone’ or ‘accelerator’ experience, its benefits only being fully unlocked when students possess a sufficient foundation of academic knowledge (high performance) and cognitive maturity (senior grade level) to contextualize and integrate their practical observations.

  1. Discussion

This study embarked on an journey to define, model, and assess the nascent construct of intelligence literacy for medical students in the AI era. The findings provide a multi-faceted contribution to the field of medical education, offering a validated conceptual framework, a reliable assessment tool, and initial insights into the current state and correlates of these crucial competencies. This discussion will interpret the key findings in the context of existing literature, explore their theoretical and practical implications, and acknowledge the study’s limitations while proposing directions for future research.

4.1. Interpretation of Key Findings

4.1.1. The ILMS Model: A Domain-Specific Framework

The five-dimensional ILMS model represents the primary contribution of this research. Its development through a rigorous mixed-methods process ensures it is both empirically grounded and theoretically sound. The five dimensions—Technical Understanding and Application, Critical Thinking, Information and Data Literacy, Human-AI Collaboration, and Ethics and Morality—collectively paint a holistic picture of the competencies needed. This model moves beyond generic AI literacy frameworks (e.g., Long & Magerko, 2020) [12] by infusing each component with the specific context of medical practice. For example, “Ethics and Morality” is not an abstract concept but is directly tied to patient privacy, algorithmic bias in diagnostics, and informed consent. Similarly, “Human-AI Collaboration” is framed within the context of clinical teams and patient safety. This domain-specificity is the model’s key strength, making it directly applicable to medical curriculum design and assessment.

The structure of the model, with its foundational, safeguarding, skill-based, and supportive elements (Figure 4), provides a pedagogical roadmap. It suggests that effective education in this area cannot be a single course on “AI in Medicine” but must be an integrated, longitudinal effort that builds foundational technical knowledge while simultaneously nurturing critical, ethical, and collaborative capacities.

4.1.2. The Powerful Influence of Academic Performance and Educational Progression

The quantitative results compellingly underscore the importance of general academic aptitude and educational progression. The finding that academic performance was a significant predictor of all five ILMS dimensions is robust. This aligns with research in other domains, such as a recent study on Chinese college students which also found a significant positive correlation between academic achievement and a general AI literacy score (Ma & Chen, 2024; Chai et al., 2021) [36, 37]. This suggests that the underlying cognitive skills that lead to high academic achievement—such as the ability to learn complex concepts, analytical skills, and disciplined study habits—are highly transferable to the development of intelligence literacy. High-achieving students may be more adept at self-directed learning about new technologies and more practiced in the critical thinking required to evaluate them.

Similarly, the significant effect of grade level on four of the five dimensions confirms a developmental trajectory. As students advance through the medical curriculum, their exposure to complex clinical reasoning, ethical dilemmas, and data interpretation naturally increases, which likely contributes to their growth in the corresponding ILMS dimensions. This finding is consistent with a survey of German medical students, which found that senior students demonstrated higher readiness and understanding of AI compared to their junior counterparts (Weidener & Fischer, 2024; Pinto Dos Santos et al., 2019) [38, 39]. The cumulative effect of a medical education, even one not yet explicitly focused on AI, appears to build a cognitive scaffolding upon which intelligence literacy can develop.

However, the non-significant effect of grade level on Human-AI Collaboration is a critical and cautionary finding. It suggests that this particular competency is not developing passively through the standard curriculum. Collaboration, especially with a non-human agent, is a specific skill set that likely requires explicit, hands-on training and practice. This represents a clear and actionable gap for medical educators to address.

4.1.3. The Nuances of Gender and Internship Experience

The absence of a significant main effect for gender on any of the ILMS dimensions is noteworthy. It contrasts with some research in younger populations where gender differences in technology-related competencies have been observed ((e.g., Su, 2024) [40] found that kindergarten girls showed higher AI literacy gains after training). Our finding suggests that by the higher education level, particularly within a demanding field like medicine, any pre-existing gender gaps in technology literacy may have narrowed or become negligible, at least in this sample. It may also be that while interests or approaches to technology might differ, the core competencies measured by the ILMS scale are developed to a similar degree by both genders during their medical training.

The lack of a main effect for internship experience is perhaps the most counterintuitive result. One might expect that direct clinical exposure would enhance intelligence literacy. However, this finding, or lack thereof, may indicate that current clinical internships do not yet provide sufficient meaningful exposure to AI applications. If students are primarily engaged in traditional clinical roles without interacting with AI-driven diagnostic or workflow systems, then the internship experience would not be expected to foster these specific competencies. This points to a potential disconnect between the technological frontier and the reality of current clinical training environments.

4.1.4. Unpacking the Complex Interactions

The significant interaction effects reveal a more sophisticated story. The Academic Performance × Grade Level interaction suggests a “Matthew effect,” where the gap between high- and low-performing students in areas like data literacy and critical thinking may widen as they progress through their education. High-achievers may be better equipped to leverage the advanced concepts taught in later years to further develop their literacy.

The three-way interaction between Academic Performance, Grade Level, and Internship Experience is particularly revealing. It suggests that an internship is not a universally beneficial experience for developing intelligence literacy. Rather, its value may be unlocked only under certain conditions—for example, a high-achieving senior student may have the prerequisite knowledge and cognitive maturity to learn from observing or using nascent AI tools in a clinical setting, whereas a junior student might not. This highlights the need for carefully structured clinical experiences that are intentionally designed to build these competencies, rather than assuming they will be absorbed by osmosis.

4.2. Theoretical and Practical Implications

4.2.1. Theoretical Implications

This study makes several contributions to theory. First, it proposes and validates a domain-specific model of intelligence literacy, arguing against a one-size-fits-all approach and contributing to the broader literature on 21st-century skills and professional competencies. Second, by employing a rigorous mixed-methods design, it provides a methodological exemplar for how to develop competency frameworks for new and evolving professional domains. Third, the quantitative findings contribute to our understanding of the developmental psychology of these new literacies, highlighting the interplay of individual aptitude (academic performance), structured learning (grade level), and experiential learning (internships).

4.2.2. Practical Implications for Medical Education

The implications for medical schools and educators are direct and substantial:

Curriculum Integration: The ILMS model can serve as a blueprint for curriculum reform. Rather than adding a single, isolated “AI course,” educators should seek to integrate the five dimensions longitudinally across the entire curriculum. For example, ethics courses can incorporate cases on algorithmic bias, anatomy can use AI-powered imaging tools, and clinical reasoning sessions can involve critically appraising AI-generated differential diagnoses.

Targeted Skill Development: The finding that Human-AI Collaboration is not developing organically highlights the need for specific training modules, simulations, or workshops focused on this skill. Medical schools should create opportunities for students to work with AI tools in a hands-on, collaborative manner.

Assessment and Benchmarking: The ILMS Scale provides a validated tool for medical schools to assess the baseline intelligence literacy of their students, measure the impact of new curricular interventions, and benchmark their students’ progress against national or international norms. It can be used for formative feedback to help students identify their own strengths and weaknesses.

Rethinking Clinical Internships: The findings challenge educators to redesign clinical experiences. Internships should be intentionally structured to expose students to clinical AI applications where they exist and to foster critical discussion about their potential and limitations where they do not. Partnerships with technologically advanced clinical sites could be prioritized.

4.3. Limitations and Future Research

Despite its strengths, this study has several limitations that offer avenues for future research.

Sample Specificity: The study was conducted at a single university in China. While the findings are significant, their generalizability to other cultural contexts, healthcare systems, and educational philosophies needs to be established. Future research should seek to validate the ILMS model and scale in diverse international settings.

Sampling Method: The use of convenience sampling in the quantitative phase may limit the representativeness of the sample. Future studies should employ random sampling techniques to enhance generalizability.

Self-Report Data: The ILMS Scale relies on students’ self-perceptions of their competencies. While this is a common and valid approach, it is susceptible to social desirability bias and may not perfectly correlate with actual performance. Future research could supplement the self-report scale with objective, performance-based assessments (e.g., situational judgment tests involving AI scenarios).

Cross-Sectional Design: The cross-sectional nature of the survey data allows for the identification of associations but not causation. The observed differences between grade levels, for instance, could be due to cohort effects rather than development. A longitudinal study that tracks a single cohort of students throughout their medical education would provide much stronger evidence of developmental trajectories and the impact of specific educational experiences.

Future research should also focus on developing and testing specific educational interventions designed to improve the competencies outlined in the ILMS model. For example, randomized controlled trials could compare the effectiveness of different pedagogical approaches (e.g., simulation-based training vs. problem-based learning) for enhancing human-AI collaboration skills.

  1. Conclusion

The integration of artificial intelligence into medicine is not a distant future but a present-day reality that is rapidly accelerating. Preparing the next generation of physicians for this new paradigm is one of the most pressing challenges facing medical education today. This study addressed this challenge head-on by developing the Intelligence Literacy Model for Medical Students (ILMS), a comprehensive, five-dimensional framework grounded in the authentic perspectives of clinical stakeholders and validated by expert consensus. The subsequent development and deployment of the ILMS Scale provided a reliable instrument for assessment and yielded crucial initial insights, revealing that while foundational academic skills and curricular progression contribute significantly to intelligence literacy, specific competencies like human-AI collaboration require targeted educational intervention.

The ILMS model and its associated scale offer medical educators a much-needed conceptual map and a practical toolkit to navigate the educational transformations required by the AI revolution. By systematically cultivating these five dimensions of literacy, medical schools can empower their students not merely to use AI, but to lead its responsible, ethical, and effective integration into the future of healthcare, ultimately for the betterment of patient care.

  1. Funding

This work was supported by the Wenzhou City Fundamental Research Project: Research and Construction of an Intelligent Human-Machine Collaboration Literacy and Training Cloud Platform in Medicine (Grant No. S2023014), and the Zhejiang Provincial Major Humanities and Social Sciences Research Project in Higher Education: Research on Human-Machine Collaborative Teaching Model Supported by Intelligent Agents in Medical Education (Grant No. 2024QN061).

 

 

 

References

  1. Schwab K. The Fourth Industrial Revolution. New York: Currency; 2017.
  2. Topol EJ. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again. New York: Basic Books; 2019.
  3. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. doi:10.1038/nature21056
  4. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577(7788):89-94. doi:10.1038/s41586-019-1799-6
  5. Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A. A primer on deep learning in genomics. Nat Genet. 2019;51(1):12-18. doi:10.1038/s41588-018-0295-5
  6. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94-98. doi:10.7861/futurehosp.6-2-94
  7. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2(10):719-731. doi:10.1038/s41551-018-0305-z
  8. Wartman SA, Combs CD. Medical education must move from the information age to the age of artificial intelligence. Acad Med. 2018;93(8):1107-1109. doi:10.1097/ACM.0000000000002044
  9. Kolachalama VB, Garg PS. Machine learning and medical education. NPJ Digit Med. 2018;1:26. doi:10.1038/s41746-018-0030-0
  10. Paranjape K, Schinkel M, Nannan Panday R, Car J, Nanayakkara P. Introducing artificial intelligence training in medical education. JMIR Med Educ. 2019;5(2):e16048. doi:10.2196/16048
  11. Masters K. Artificial intelligence in medical education. Med Teach. 2019;41(9):976-980. doi:10.1080/0142159X.2019.1595557
  12. Long D, Magerko B. What is AI literacy? Competencies and design considerations. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. New York: ACM; 2020:1-16. doi:10.1145/3313831.3376727
  13. Touretzky D, Gardner-McCune C, Martin F, Seehorn D. Envisioning AI for K-12: What should every child know about AI? In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019;33(01):9795-9799. doi:10.1609/aaai.v33i01.33019795
  14. Ng DTK, Leung JKL, Chu SKW, Qiao MS. Conceptualizing AI literacy: An exploratory review. Comput Educ Artif Intell. 2021;2:100041. doi:10.1016/j.caeai.2021.100041
  15. Char DS, Shah NH, Magnus D. Implementing machine learning in health care—addressing ethical challenges. N Engl J Med. 2018;378(11):981-983. doi:10.1056/NEJMp1714229
  16. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. doi:10.1126/science.aax2342
  17. Blease C, Kaptchuk TJ, Bernstein MH. Artificial intelligence and the future of primary care: exploratory qualitative study of UK general practitioners’ J Med Internet Res. 2019;21(3):e12802. doi:10.2196/12802
  18. Han ER, Yeo S, Kim MJ, et al. Medical education trends for future physicians in the era of advanced technology and artificial intelligence: an integrative review. BMC Med Educ. 2019;19(1):460. doi:10.1186/s12909-019-1891-5
  19. Spencer LM, Spencer SM. Competence at Work: Models for Superior Performance. New York: John Wiley & Sons; 1993.
  20. Dong H, Gong Z, Xu X. Competency Model Construction and Application. Beijing: People’s Posts and Telecommunications Press; 2007.
  21. Yang J, Zhang X, Zhang Z. Construction of competency model for clinical physicians in China. Chin Med J (Engl). 2014;127(18):3337-3338.
  22. Zhang Z, Burchett H, Woodhead C. Qualitative research methods in medical education. Med Educ Online. 2016;21:32255.
  23. Gale NK, Heath G, Cameron E, Rashid S, Redwood S. Using the framework method for the analysis of qualitative data in multi-disciplinary health research. BMC Med Res Methodol. 2013;13:117. doi:10.1186/1471-2288-13-117
  24. Corbin J, Strauss A. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. 3rd ed. Los Angeles: SAGE Publications; 2008.
  25. Linstone HA, Turoff M. The Delphi Method: Techniques and Applications. Reading, MA: Addison-Wesley; 1975.
  26. Zhou L. Scale development and validation in social science research. Psychol Methods. 2019;24(1):120-135.
  27. Bagnall R, Jones A, Smith K. Developing a new scale for digital resilience. Comput Human Behav. 2024;150:107980.
  28. Chen X, Wang Y, Liu Z. Validation of the Medical AI Readiness Scale. Med Teach. 2021;43(5):560-566.
  29. Ritchie J, Spencer L. Qualitative data analysis for applied policy research. In: Bryman A, Burgess RG, eds. Analyzing Qualitative Data. London: Routledge; 1994:173-194.
  30. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17(1):195. doi:10.1186/s12916-019-1426-2
  31. Goddard K, Roudsari A, Wyatt JC. Automation bias: a systematic review of frequency, effect mediators, and mitigators. J Am Med Inform Assoc. 2012;19(1):121-127. doi:10.1136/amiajnl-2011-000089
  32. Amann J, Blasimme A, Vayena E, Frey D, Madai VI. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20(1):310. doi:10.1186/s12911-020-01332-6
  33. Price WN 2nd, Cohen IG. Privacy in the age of medical big data. Nat Med. 2019;25(1):37-43. doi:10.1038/s41591-018-0272-7
  34. Nunnally JC. Psychometric Theory. 2nd ed. New York: McGraw-Hill; 1978.
  35. Hu LT, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6(1):1-55. doi:10.1080/10705519909540118
  36. Chai CS, Wang X, Xu C. An extended theory of planned behavior for the modelling of Chinese secondary school students’ intention to learn artificial intelligence. Mathematics. 2021;9(17):2042. doi:10.3390/math9172042
  37. Civaner MM, Uncu Y, Bulut F, Chalil E, Tatli A. Artificial intelligence in medical education: a cross-sectional needs assessment. BMC Med Educ. 2022;22(1):772. doi:10.1186/s12909-022-03852-3
  38. Weidener L, Fischer MR. “I worry about the black box”: A qualitative study on medical students’ attitudes towards AI. Med Educ Online. 2023;28(1):2221629. doi:10.1080/10872981.2023.2221629
  39. Pinto Dos Santos D, Giese D, Brodehl S, et al. Medical students’ attitude towards artificial intelligence: a multicentre survey. Eur Radiol. 2019;29(4):1640-1646. doi:10.1007/s00330-018-5601-1
  40. Su J, Yang W. Examining the gender gap in AI literacy among young learners. Educ Technol Soc. 2024;27(1):1-15.
Developing and Validating the Intelligence Literacy Model for Medical Students (ILMS) in the Age of Artificial Intelligence: A Mixed-Methods Study

Leave a Reply

Your email address will not be published. Required fields are marked *