Imagine you’re building an AI research assistant that extracts structured metadata from arXiv papers. A user enters a paper URL, and your LLM needs to extract the title, authors, abstract summary, key concepts, and methodology into a clean JSON format with predefined schemas—what we call structured outputs.
In the traditional approach, users see nothing but a loading spinner for 8-10 seconds while your LLM processes the entire paper. Then suddenly, all fields populate at once—title, authors, summary, everything appears simultaneously. There’s no progress indication, no early feedback, and no way for users to start reading the title while the summary is still being generated. Streaming LLM responses solves this problem and provides instant feedback to the user.
Streaming structured JSON requires a way to validate it progressively as it arrives. You could build your own partial JSON parser and validator—tracking incomplete strings, handling truncated objects, managing field-by-field validation logic. But this quickly becomes a complex, error-prone endeavor that distracts from your core application logic.
The Solution: Partial JSON Validation
Pydantic v2.10.0 introduced experimental partial JSON validation, turning this complex problem into a one-liner. Instead of building custom parsing logic, you can now validate and process streaming JSON with a simple flag: experimental_allow_partial=True.
With partial validation, those metadata fields in our research assistant could populate progressively—the title appears first, then authors, then the summary builds sentence by sentence. Users get immediate feedback and can start consuming information while the AI is still working.
Implementing Partial Validation: A Code Walkthrough
Structured outputs let you define the exact output structure of an LLM. One easy way to do this is by defining a Pydantic model. To extract metadata from arXiv papers, we can define the following Pydantic model:
from typing import List
from pydantic import BaseModel, Field
class ArxivPaperMetadata(BaseModel):
title: str = Field(default="", description="Title of the paper")
authors: List[str] = Field(
default_factory=list, description="List of paper authors"
)
abstract_summary: str = Field(
default="", description="2-3 sentence summary of the abstract"
)
key_concepts: List[str] = Field(
default_factory=list, description="3-5 main concepts from the paper"
)
methodology: str = Field(default="", description="Research methodology used")
potential_applications: List[str] = Field(
default_factory=list, description="Potential real-world applications"
)With a Pydantic model defined, you can call the client’s parse endpoint with the model as text_format argument. This will ensure that the LLM output is a json string containing exactly the structure you define by your Pydantic model. The response object also has an output_parsed attribute, which is already a populated instance of your Pydantic model.
import os
from openai import OpenAI
# Assume 'text' contains the full text of the arXiv paper
text = "..."
# Initialize OpenAI client
client = OpenAI(
api_key=os.getenv("LITELLM_API_KEY"), base_url=os.getenv("LITELLM_API_URL")
)
# Call OpenAI API (non-streaming)
response = client.responses.parse(
model="gpt-4.1-mini",
input=[
{"role": "system", "content": "You are an expert academic paper analyzer. Extract structured metadata from the provided paper text. Focus on accuracy and completeness."},
{"role": "user", "content": f"Paper text:\n{text}"},
],
text_format=ArxivPaperMetadata,
temperature=0, # not necessary for strucutred outputs but recommended for extraction tasks
)
# Get parsed Pydantic model
metadata = response.output_parsedThis works perfectly for a single, complete response. But to stream the output, each chunk of data must be parsed and validated as it arrives. This creates a tedious challenge, which Pydantic now solves elegantly. The two key components are:
- A
pydantic.TypeAdapterfor your model. - The adapter’s
.validate_json()method, called with theexperimental_allow_partialflag.
from pydantic import TypeAdapter
adapter = TypeAdapter(ArxivPaperMetadata)
partial_data = adapter.validate_json(partial_json_string, experimental_allow_partial="trailing-strings")In this scenario, partial_data would be an instance of our Pydantic model, containing all the values that are already present in the partial_json_string. The experimental_allow_partial argument can take one of three values:
Falseor'off': Disables partial validation (default)Trueor'on': Enables partial validation for complete values. It will parse a JSON object as long as all its string values are complete (e.g.,{"key": "value"}). It will not parse an incomplete string like{"key": "val}.'trailing-strings': The most permissive mode. It enables partial validation and also parses incomplete strings at the end of the JSON object. This is the setting required to achieve the “live typing” effect seen in the video demo.
Putting it all together, we can combine Pydantic’s TypeAdapter with the OpenAI streaming using server-sent events to yield validated data objects on the fly:
def streaming_extraction(text: str):
with client.responses.stream(
model="gpt-4.1-mini",
input=[
{"role": "system", "content": "You are an expert academic paper analyzer. Extract structured metadata from the provided paper text. Focus on accuracy and completeness."},
{"role": "user", "content": f"Paper text:\n{text}"},
],
text_format=ArxivPaperMetadata,
temperature=0, # not necessary for strucutred outputs but recommended for extraction tasks
) as stream:
# Create TypeAdapter for partial validation
adapter = TypeAdapter(ArxivPaperMetadata)
partial_json = ""
for event in stream:
if event.type == "response.output_text.delta":
partial_json += event.delta
# Validate the accumulated JSON string
partial_data = adapter.validate_json(
partial_json, experimental_allow_partial="trailing-strings"
)
yield partial_dataThis streams valid json objects immediately which can be used for display or further processing.
Key Considerations and Trade-offs
Pydantic’s partial validation transforms static loading experiences into dynamic, progressive interfaces. It can significantly reduce perceived latency and enable real-time feedback during long AI generations.
However, this is an experimental feature and comes with important constraints.
Key Constraints
- Default Values Required: All fields in your Pydantic model must have a default value to prevent validation errors on incomplete data.
- TypeAdapter Only: Partial validation works exclusively through TypeAdapter instances, not directly on the model (e.g.,
ArxivPaperMetadata.model_validate_jsondoes not support it yet). - Validation Overhead: The continuous validation within a stream loop adds computational overhead.
When to Use Partial Validation
- Ideal for user-facing applications where progressive results add significant value (e.g., dashboards, chat interfaces).
- Best for LLM responses that take several seconds to complete, as it improves the user experience.
When to Avoid It
- For internal batch processing jobs where user feedback is not a factor.
- For very fast API calls (sub-second) where the complexity of setting up a stream outweighs the minimal benefits.
In my next article, I’ll show how this enables advanced streaming tool-calling patterns.