Introduction

The CAP BABEL Project uses the Comparative Agendas Project policy codebook’s major topics to identify the policy areas of texts. In the near future, minor topic coding service will be also available.

The codes assigned are unequivocal, mutually exclusive, and cover all potential policy issues. The codebook distinguishes 21 major policy areas: macroeconomics, civil rights, health, agriculture, labour, education, environment, energy, immigration, transportation, law and crime, social welfare, housing, domestic commerce, defence, technology, foreign trade, international affairs, government operations, public lands and culture.

You can upload your datasets here for automated CAP-coding. If you wish to submit multiple datasets one after another, please wait 5-10 minutes between each of your submissions. There are two possibilities for upload: pre-coded datasets or non-coded datasets. The explanation of the form and the dataset requirement is available here.

The upload requires to fill the following form on metadata regarding the dataset. We kindly ask you to upload your dataset, and in case of a pre-coded dataset, if available, please attach the codebook used besides the dataset.

The non-coded datasets should contain an id and a text column. The column names must be in row 1. You are free to add supplementary variables to the dataset beyond the compulsory ones in the columns following them.

Pre-coded datasets must contain the following columns: id, year, major_topic, text. The column names must be in row 1. Uploading a pre-coded sample is optional, but it can help us with calculating performance metrics and fine-tuning the language model behind CAP Babel Machine. The detailed rules of validations are available here. The mandatory data format of major_topic is numeric. All textual CAP categories must be converted to the appropriate numeric code before uploading. Furthermore, records with no policy content should be coded with 999. You are free to add supplementary variables to the dataset beyond the compulsory ones in the columns following them. Automatic processing requires to follow these rules.

If the files you would like to upload are bigger than 100MB, please fill out everything else and send us the files on the following email address: poltextlab@poltextlab.com

Submit a dataset:

    Loading...

    The research was supported by the Ministry of Innovation and Technology NRDI Office and the European Union, in the framework of the RRF-2.3.1-21-2022-00004 Artificial Intelligence National Laboratory project.