Intro
Test 1.23
Lola shcrfeibt hier Texte
How we engineer data
Table of Contents
Why Open Politics Exists
- All things regarding politics, be they news, conflicts or legislative procedures, are hard to keep track of. It’s hard to find the time to read through all the documents and news articles necessary to gain a broad and well-informed understanding of political situations. Technology offers great possibilities to make such processes more accessible.
Recently, the advent of Large Language Models has extended the capabilities of textual analysis and understanding. Especially the ability to formulate tasks in natural language opens up new possibilities for analysing text data. Potentially revolutionising the way qualitative and quantitative research can be combined.
This project specialises on assembling data, building infrastructure and embedding tools into meaningful user interfaces.
We generally categorise our work into three pillars
Update: SSARE Release
SSARE is Open Politics’ data aggregation system and vector storage endpoint. It aims to create up-to-date and relevant datasets for the LLMs to work with.
A microservice infrastructure continuously scrapes news sites and stores them in a vector storage and a relational database (Postgres). Sources can be added with Python scripts which yield a dataframe with: URL | Headline | Paragraphs | Source. Just clone the service, add your scripts and bring your own data endpoint into production.
Want to engage? Look into our Developer Jour Fixe!
- Interested in the project? Want to contribute? Share a thought?
- Every Wednesday 15:30 Berlin Time
- Discord Server
Join and talk about the project, ask questions, propose ideas, or just listen in.
Currently needed: - Data Scraper Modules
- Interdisciplinary collaboration on the instruction sets for the LLMs
- Prompt Engineering suggestions
- Frontend/UX/UI work
Tasks
Generally researching:
- Issue/ Area Identification
- Actor Identification
- Stance Triangulation
Including but not limited to tasks like:
- Information summarization
- Vector storage & retrieval
- Information clustering
- Entity Extraction (Named Entity Recognition)
- Q&A Chatbots (for interactive information)
- Providing historical context
- Statement & Intention decoding
- Visual representation of political data
- Monitoring and alerts
[..+ unmaintained and heavily overloaded list of features]
curl -H “Authorization: Bearer 0EZD8VdOlFFIcCnc2d5Y72W7pvmRJ4USpZbRQMPGPwo38vvUHiQJEmIktCNB5Az2”
-H “OCS-APIRequest: true”
“https://cloud.open-politics.org/index.php/apps/tables/api/1/tables” -X GET
curl -X POST “https://cloud.open-politics.org/apps/oauth2/api/v1/token”
-H “Content-Type: application/x-www-form-urlencoded”
-d “client_id=YOUR_CLIENT_ID”
-d “client_secret=YOUR_CLIENT_SECRET”
-d “grant_type=refresh_token”
-d “refresh_token=YOUR_REFRESH_TOKEN”
gtYV5gAERDgtaJqI60m6uIgJBkxseugkI2etx78NgCdys0hkGaDNRIdgfA9oIPd6 0EZD8VdOlFFIcCnc2d5Y72W7pvmRJ4USpZbRQMPGPwo38vvUHiQJEmIktCNB5Az2
curl -X GET “https://cloud.open-politics.org/apps/oauth2/authorize”
?response_type=code
&client_id=gtYV5gAERDgtaJqI60m6uIgJBkxseugkI2etx78NgCdys0hkGaDNRIdgfA9oIPd6
&redirect_uri=https://cloud.open-politics.org/callback