What You Get
Output from a schema analysing EU Parliament speeches on migration

The schema that was used to extract the data

Building a Schema
Start with your actual questions. What do you want to know about your documents?- What positions are being taken?
- How is the issue framed?
- Who’s mentioned, and in what context?
- What’s the emotional temperature?
Good Instructions vs. Vague Instructions
The difference between useful output and noise is specificity.- ✓ This works
- ✗ This doesn't
security_stance
Type: Number (1-10)
Extract: Position on border security. 1 = open borders, 10 = strict enforcement
Extract: Position on border security. 1 = open borders, 10 = strict enforcement
talking_point_topics
Type: List
Extract: Main topics discussed - migration policy, demographic change, EU solidarity, public services, etc.
Extract: Main topics discussed - migration policy, demographic change, EU solidarity, public services, etc.
Field Types
| Type | Use for | Example |
|---|---|---|
| String | Freeform text, summaries, answers, categories | summary: "Speaker argues for..." |
| Number | Ratings, scales, counts, amounts | security_stance: 7 |
| List | Multiple values, tags, topics | topics: ["migration", "EU solidarity"] |
Binary fields for filtering: It’s often useful to include 0/1 or yes/no fields for filtering results. Something like
is_spam: 0 or is_relevant: 1 lets you quickly filter out noise in dashboards.Special Field Names
Certain field names unlock specific features when you view results:| Field name | What it enables | How to prompt |
|---|---|---|
location | Geographic maps - locations get geocoded and plotted | Ask for single most relevant location or region relevant |
timestamp | Time series charts - combine with number fields to track changes over time | Ask for a ISO timestamp of the article’s main event or publication date |
summary | Highlighted in result views as the main description | Ask for a summary with your preferred style and length |
tags | Counted and aggregated across results | Either freeform or just from a few given tags |
It is always good to provide a fallback choice like “None” or “Unapplicable”.
talking_point_topics as a list, you’ll see frequency distributions in dashboards.
Numbers with defined scales (1-10, 1-5) work well for comparative analysis - they enable meaningful aggregation and time series when combined with timestamps.
Tips
Include examples in your instructions. “Source type: government, activist, expert, journalist, or anonymous” gives the AI a bounded set to work with. Use scales for subjective measures. “Emotional intensity (1-10)” is more useful than “how emotional is it?” because you can aggregate and compare. Start simple, then add. Begin with basic extraction (who, what, where), confirm it works, then layer in analysis fields (sentiment, framing, stance). Look at failures. When extraction is wrong or inconsistent, the instructions usually need tightening. What did the AI misunderstand?Sharing Schemas
Schemas can be uploaded to the library for others to use. They can see your methodology, critique it, replicate it, build on it. Transparency about how analysis is done matters - it’s what separates systematic research from vibes.Related pages
Running Analysis
Apply schemas to your documents
Curating Fragments
Promote extractions to persistent metadata
Assets & Bundles
Upload and organise documents for analysis
Dashboards
Visualise results with charts and tables

