arXiv 2503.14023
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
By Mihai Nadas, Laura Diosan, et al.
Published 2025-03-18
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
This survey reviews how large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains. By producing artificial but task-relevant examples, these models can significantly augment or even substitute for real-world datasets, particularly in scenarios where labeled data is scarce, expensive, or sensitive. This paper surveys recent advances in leveraging LLMs t…