arXiv 2503.14023

Synthetic Data Generation Using Large Language Models: Advances in Text and Code

By Mihai Nadas, Laura Diosan, et al.

Published 2025-03-18

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

This survey reviews how large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains. By producing artificial but task-relevant examples, these models can significantly augment or even substitute for real-world datasets, particularly in scenarios where labeled data is scarce, expensive, or sensitive. This paper surveys recent advances in leveraging LLMs t…

View the original paper on arXiv