InkVisitor is an open-source browser-based application for the manual entry of complex structured data from textual resources in the humanities and the social sciences. The data are entered in the form of statements, which interconnect entities of various different classes into semantic quadruples (subject, verb, object 1, object 2) following the syntactic structure of texts. InkVisitor serves as a data-entry front-end for RethinkDB research databases which then allow for various kinds of quantitative and computational analyses of the data to be performed.
InkVisitor has been developed in the Dissident Networks Project (DISSINET), a historical and social scientific research project focusing on medieval religious dissidence, inquisition, and inquisitorial records. The development has received substantial funding from the Czech Science Foundation (EXPRO project No. GX19-26975X “Dissident Religious Cultures in Medieval Europe from the Perspective of Social Network Analysis and Geographic Information Systems”) and the European Research Council (ERC Consolidator Grant, project No. 101000442 “Networks of Dissent: Computational Modelling of Dissident and Inquisitorial Cultures in Medieval Europe”).
The lead developer of the application is Adam Mertel. Other contributors of code include Petr Hanák, Ján Mertel and others. The lead authors of the data model are David Zbíral and Robert L. J. Shaw. Other contributors to the data model and testers include Tomáš Hampejs, Jan Král, Katia Riccardo and others.
The data model upon which InkVisitor is built is destined to capture complex natural-language data with all their uncertainties, narrative perspective, and semantic nuance. It does so in the semantically rich form of quadruples, which are an extended form of semantic triples destined to capture, more naturally than triples, the structure of languages with two grammatical objects (e.g., “Mary gave a present to Peter”: subject, verb, object 1, object 2) as well as with chains of statements (e.g., “Mary told Peter – that she would give him a present”) and multiple sentence constituents.
The data model captures not only editorial assessment of certainty level, but also the overall perspective in its two key aspects: epistemic level and modality. Epistemic level expresses whether a claim is directly in the text, or is an interpretation thereof, or is an inference by and large independent from the text. Modality captures whether a given statement of the text is a positive utterance of the state of affairs, a question, a condition, a wish etc.
While the DISSINET data model has been developed primarily to capture medieval inquisitorial records, it represents a quite generalized data model of rare complexity from which historians, humanists, and social scientists will benefit for bridging between quantitative and qualitative research in consequential ways and transcend the limitations of, on the one hand, Computer-Assisted Qualitative Data Analysis Software (CAQDAS) with its strongly hypothesis-driven and classificatory approach, and on the other, the limitations of fact-oriented tabular data collection which tends to lose much of the salient discursive qualities of the original data sources.
The DISSINET data model represents a unique way to precisely represent textual sources in a deeply structured and interconnected manner.
A Statement is an Entity type which has the purpose of relating other Entities. All statements (as well as all other entities) have unique IDs, and their original order in the text is preserved, thereby allowing to follow the development of an action in the narrative, the interplay of questions and answers, etc.
The other Entity types covered in the data model are:
- Action type
- Concept
- Person
- Group
- Being (typically, animal)
- (Physical) Object
- Location
- Event
- Value
- Text / Text part
- Resource
The uses of properties include:
- Modelling adjectives concerning actants.
- Instantiating Entities to parent types.
- Defining time and place of action (including in fuzzy terms).
- Recording other adverbials, for example those concerning manner of action, circumstances, causes or consequences of action.
The environment supports three user roles:
- admin,
- editor,
- viewer.
Further, admins may grant particular users (editors and viewers) access rights for specified territories. Editor role may be granted by "edit" rights, viewer role has "view" rights.
Entity Detail is accessible to all roles. The viewer is not allowed to change any value, while the editor may change label, detail, notes, language and add, remove and edit property statements with the status of "pending." Moreover, all metaprops in detail that he creates are getting status "pending." Admin has full access to internal attributes of the entity (status, class) and metaprops. All metaprops he creates have the status "approved."
Only admin and editor with edit rights in the parent Territory (T) may edit, add or remove a child T. Editors and viewers do not see T they have no rights to in the T Tree. Only "edit" rights for the T grant the rights to add a new Statement under that particular T, or any other child of that T. That means that the admin has first to create a T and grant edit rights to editors.
To administrate the users rights, admin roles may access the administration window, where they can append new territories to editors and viewers. They can also create new users, change roles, see passwords or delete users. Admin role is not possible to be assigned or deleted through this environment.
For more information see client package
For more information see server package
For more information see database package
Package containing typescript definitions, types and enums, that should be available to both client & server.
Install pnpm
version >=8.6.0
. You can switch the pnpm
versions by running corepack prepare pnpm@<version> --activate
.
Go to all three folders in packages
(client
, server
, database
) and run pnpm i
in each of them.
Before continuing, please ensure that you have database instance setup & running (see deploying with docker below)
To deploy the Inkvisitor instance, you can use Docker (or Podman), host it on Kubernetes cluster, or build and deploy the packages separately.
To use docker to deploy the InkVisitor application:
- Install docker, docker-compose tool.
- Install pnpm.
- Clone | Fork | Download the Inkvisitor repository.
- For server - prepare
.env
files for servers listed underenv_file
sections indocker-compose.yml
file. Check the server's README.md and example.env files for more information. Install server dependencies bypnpm i
- For client app - prepare
.env.<ENV>
file that should identify the appropriate environment under thebuild -> args
section indocker-compose.yml
file. You can see the client's README.md and example.env files to ensure you have included all the necessary configuration information. - Run the database - first, prepare
.env
file according to the documentation. Then, run either as a standalone service or containerized usingdocker-compose up -d database
. Now, you have to create a databaseinkvisitor
- one option is to navigate tohttp://localhost:8080/#dataexplorer
and run queryr.dbCreate("inkvisitor")
. - The database will be now empty, so to set up the database structure and import some testing data, go to
packages/database
and runpnpm start
(pnpm i
might be needed as well). Following the information in the prompt - first, choose databaseinkvisitor
by pressing theL
key, then pick a dataset to import using theD
key. We recommend to use theempty
dataset for the first run. Then, pressX
to process the import. Navigate tohttp://localhost:8080/#dataexplorer
and enter queryr.db('inkvisitor').table('entities')
to check if the import went fine. - Build app image (will also be done in next step if not available) by running
docker-compose build inkvisitor
(ordocker-compose build inkvisitor-<env>
). - Run the containerized application with the command
docker-compose up inkvisitor
(orinkvisitor-<env>
).
See kube directory for examples. Please, check your cluster's capabilities as the setup could differ.
The InkVisitor codebase consists of three interconnected packages (parts) - the client application, the server, and the database (set of tools/import datasets). You can deploy those packages individually if you do not want to use Docker. In each step, make sure to have the appropriate .env.<env>
file accessible - see the Readme.md file in the package for more information.
- Follow tutorials on the official page to install RethinkDB on your machine, or simply use
docker compose up -d database
. - Use the import script to create the database structure and (optional) import some testing data by running
pnpm start
and following the information in the prompt. Use at leastempty
dataset - for bare minimum.
The client application runs on static files - html/css/js additional assets. These files need to be moved to your HTTP server by:
- Build the frontend app by
pnpm run build-<env>
to create/update thedist
folder.<env>
dictates which.env.<env>
file will be used. - Copy contents of
dist
folder to the directory used by your HTTP server.
The server is also built in Javascript, using mainly the Node Express libraries. You need to first build the application, move the build to your server and run it from there.
- Run
pnpm run build
to transpile the code. - Move the
dist
folder to your server that supports the Node.js environment. - Do
ENV_FILE=<env> yarn run start:dist
to run the built application with a loaded.env.<env>
file.
For quicker use, use can also run the server directly without building using pnpm start
(this uses nodemon
tool and .env.development
environment file).
The app does not support ssl internally, it should be handled in upper layer, ie. nginx. Please adjust APIURL
variable in client's .env
file to use https
instead of http
.
Make sure the ports required by each application are not blocked. Required ports are listed in docker-compose.yml. Examples:
Setup for additional system specific features (reverse proxies etc) are beyond the scope of this readme. You should be able to allow at least the port for server api (default is 3000) and port for serving client files (if using docker image, it would be served by server api). You don't need to allow public access to database, however rethinkdb serves monitoring tool on port 8080.
Layer/middleware which controls access to respective resources (meaning the endpoint) in this application. Access is defined in model called AclPermission. Resource is identified by controller and method name. Specific access-check is defined by permission entry and matching user's group.
Exceptions: admin is by default allowed to use all permissions. Permission with '*' value is allowed by all roles.
User { role: "xxx" } Permission { controller: "actants", name: "get", roles: ["xxx", "yyy"]} => allowed
User { role: "bbb" } Permission { controller: "actants", name: "get", roles: ["xxx", "yyy"]} => not allowed
User { role: "bbb" } Permission { controller: "actants", name: "get", roles: ["*"]} => allowed by wildcard
User { role: "admin" } Permission { controller: "actants", name: "get", roles: []} => allowed by default