What You Get
Output from a schema analysing EU Parliament speeches on migration

The schema that was used to extract the data

Building a Schema
Start with your actual questions. What do you want to know about your documents?- What positions are being taken?
- How is the issue framed?
- Who’s mentioned, and in what context?
- What’s the emotional temperature?
Good Instructions vs. Vague Instructions
The difference between useful output and noise is specificity.- ✓ This works
- ✗ This doesn't
security_stance
Type: Number (1-10)
Extract: Position on border security. 1 = open borders, 10 = strict enforcement
Extract: Position on border security. 1 = open borders, 10 = strict enforcement
talking_point_topics
Type: List
Extract: Main topics discussed - migration policy, demographic change, EU solidarity, public services, etc.
Extract: Main topics discussed - migration policy, demographic change, EU solidarity, public services, etc.
Field Types
| Type | Use for | Example |
|---|---|---|
| String | Freeform text, summaries, answers, categories | summary: "Speaker argues for..." |
| Number | Ratings, scales, counts, amounts | security_stance: 7 |
| List | Multiple values, tags, topics | topics: ["migration", "EU solidarity"] |
Binary fields for filtering: It’s often useful to include 0/1 or yes/no fields for filtering results. Something like
is_spam: 0 or is_relevant: 1 lets you quickly filter out noise in dashboards.Special Field Names
Certain field names unlock specific features when you view results:| Field name | What it enables | How to prompt |
|---|---|---|
location | Geographic maps - locations get geocoded and plotted | Ask for single most relevant location or region relevant |
timestamp | Time series charts - combine with number fields to track changes over time | Ask for a ISO timestamp of the article’s main event or publication date |
summary | Highlighted in result views as the main description | Ask for a summary with your preferred style and length |
tags | Counted and aggregated across results | Either freeform or just from a few given tags |
It is always good to provide a fallback choice like “None” or “Unapplicable”.
talking_point_topics as a list, you’ll see frequency distributions in dashboards.
Numbers with defined scales (1-10, 1-5) work well for comparative analysis - they enable meaningful aggregation and time series when combined with timestamps.

