OBJECTIVE: This study aimed to describe the methodologies used to develop and evaluate models that use artificial intelligence (AI) to analyse lung images in order to detect, segment (outline borders of), or classify pulmonary nodules as benign or malignant. METHODS: In October 2019, we systematically searched the literature for original studies published between 2018 and 2019 that described prediction models using AI to evaluate human pulmonary nodules on diagnostic chest images. Two evaluators independently extracted information from studies, such as study aims, sample size, AI type, patient characteristics, and performance. We summarised data descriptively. RESULTS: The review included 153 studies: 136 (89%) development-only studies, 12 (8%) development and validation, and 5 (3%) validation-only. CT scans were the most common type of image type used (83%), often acquired from public databases (58%). Eight studies (5%) compared model outputs with biopsy results. 41 studies (26.8%) reported patient characteristics. The models were based on different units of analysis, such as patients, images, nodules, or image slices or patches. CONCLUSION: The methods used to develop and evaluate prediction models using AI to detect, segment, or classify pulmonary nodules in medical imaging vary, are poorly reported, and therefore difficult to evaluate. Transparent and complete reporting of methods, results and code would fill the gaps in information we observed in the study publications. ADVANCES IN KNOWLEDGE: We reviewed the methodology of AI models detecting nodules on lung images and found that the models were poorly reported and had no description of patient characteristics, with just a few comparing models' outputs with biopsies results. When lung biopsy is not available, lung-RADS could help standardise the comparisons between the human radiologist and the machine. The field of radiology should not give up principles from the diagnostic accuracy studies, such as the choice for the correct ground truth, just because AI is used. Clear and complete reporting of the reference standard used would help radiologists trust in the performance that AI models claim to have. This review presents clear recommendations about the essential methodological aspects of diagnostic models that should be incorporated in studies using AI to help detect or segmentate lung nodules. The manuscript also reinforces the need for more complete and transparent reporting, which can be helped using the recommended reporting guidelines.