arXiv 2503.14023

Synthetic Data Generation Using Large Language Models: Advances in Text and Code

By Mihai Nadas, Laura Diosan, et al.

Published 2025-03-18

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

This survey reviews how large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains. By producing artificial but task-relevant examples, these models can significantly augment or even substitute for real-world datasets, particularly in scenarios where labeled data is scarce, expensive, or sensitive. This paper surveys recent advances in leveraging LLMs t…

View the original paper on arXiv