Overview

The HTTP-API user import is a system that allows applications to drive the UCS@school import framework from remote, instead of calling a command line frontend. A graphical users interface (a UMC module) has been created for users to start imports and read artifacts of previous import jobs. A Python module was written to ease the use of the HTTP-API. The UMC module uses that Python-API to access the HTTP-API.

The components participating in the HTTP-API import are on an abstract level:

  • clients
  • the REST (Web-API) engine
  • a task scheduling/queuing system
  • storage and database services
  • intermediary services

Components

To be more precise, let’s take a look at the topology picture and go from client (left) to backend (bottom/right):

Interaction of components.
  1. Clients are HTTP clients. UCS@school provides a Python-API that makes interaction with the HTTP-API comfortable. This makes the Python code the HTTP client.

  2. The HTTP-API can only be accessed through HTTPS on the DC master. The web server Apache runs a reverse proxy for the URL path /api/ (UCRV [1]). All access to that path is forwarded to Gunicorn. Apache configuration file: [2].

  3. The Gunicorn Python web application (WSGI) server is listening on localhost on port 8000 (UCRV [3]) for HTTP connections. Gunicorns configuration file: [4].

    1. Gunicorn starts the WSGI application returned by ucsschool.http_api.app.wsgi:application().
    2. That is the entry point for the Django web application framework.
    3. Django parses and routes requests to the appropriate view/controller functions.
      1. Djangos routing configuration is in [7] and its main configuration file is [5]. It is configured to log to [6].
      2. Django uses Django PAM to authenticate against the local PAM stack.
      3. The Django REST framework is used to create a HTTP-API with object level authorization, model resource mapping, pagination, request validation etc.
      4. Django uses its own ORM to read/store objects from/in a PostgreSQL database. The DB and required credentials are setup in the join script [8]. When updating the data models, DB migration code can be generated to modify the database to handle the new schema.
      5. When a client creates a new user_import resource through the HTTP-API, two things happen:
        1. A new Python UserImport object is created and stored by Django in the database.
        2. A new import job is scheduled for execution.
      6. The Python School and UserImport objects are, what the Django REST framework maps to HTTP resources. Not all data of those objects is not stored in the database however. When accessing the log file, password file and summary file attributes, their values will be transparently read from the filesystem.
    4. When creating a new user_import resource, a new import job should be started. that happens in UserImportJobSerializer.create(). That happens, when it executes dry_run() or import_users() from ucsschool.http_api.import_api.tasks: it creates a new Celery task.
  4. The tasks data (specifically the database ID of the UserImport object) will be sent through the message queuing system RabbitMQ to one of the two Celery master processes. The routing of a task into a queue is determined by settings.CELERY_ROUTES.

  5. A Celery master process will schedule the tasks execution in one of its worker processes. There are two process groups, because that allows for a different scheduling for dry-runs and real imports: dry-runs can be run 4 in parallel, but of a real import job, there must execute only one at a time. Running pstree -a | grep celery shows this:

    _images/celery_processes.png
  6. When it’s time for a task to run, it will fetch the UserImport object using its database ID, and pass a function to the import framework as settings.progress_notification_function. During the import, the function will be called to update the result.result attribute of its associated UserImport object. The Django ORM will store that in the database. Thus, if a client continually retrieves the user_import resource, it will see the progress of the import job. The UMC import module uses this to update the progress bar.

CSV data

The format of the CSV file can be configured in the same way it is done for the command line import (see command line import manual (only german)).

To create an example CSV file, that works with the default configuration file for HTTP-API-imports user_import_http-api.json, run /usr/share/ucs-school-import/scripts/ucs-school-testuser-import with the --httpapi argument.

The contents of the file should look similar to this:

"Schule","Vorname","Nachname","Klassen","Beschreibung","Telefon","EMail"
"SchuleEins","Cia","Rothenbühler","1a","A student.","+46-728-963204","ciam.rothenbuehlerm@uni.dtr"
"SchuleEins","Sergia","Groppel","1b","A student.","+80-043-223750","sergiam.groppelm@uni.dtr"
[..]

Important

The column with the class names (Klassen) must not include the school name. The school name will automatically be prepended to it.

Footnotes

[1]The URL path /api/ is configurable through UCRV ucsschool/import/http_api/URL_path/api.
[2]Apaches configuration file is /etc/apache2/sites-available/ucs-school-import-http-api.conf.
[3]Gunicorns listeneing port is configurable through UCRV
[4]Gunicorns configuration file is /etc/gunicorn.d/ucs-school-import.
[5]Djangos configuration file is /usr/share/pyshared/ucsschool/http_api/app/settings.py. But to handle configuration files the Debian way, that file contains just a function to read /etc/ucsschool-import/settings.py, where the real configuration can be found.
[6]Django /var/log/univention/ucs-school-import/http_api.log.
[7]Requests are routed according to /usr/share/pyshared/ucsschool/http_api/app/urls.py.
[8]60ucs-school-import-http-api.uinst