At the Center for Information and Language Processing (CIS) I'm currently working on the Digitised Manuscripts to Europeana EU-awarded Wittgenstein finder application.
Therefore we decided to build up a proper testing and integration infrastructure to test the whole web application and the search engine backend. As we used the great web-based git repository mangement software called GitLab I built a continuous integration setup so that every commit to our repository is automatically built and tested in a kind of chroot jail. For that setup we are using the GitLab Continuous Integration package and many different GNU/Linux based distributions for testing.
For the whole continuous integration setup we use three virtual machines. The (hardware) virtualization is done with QEMU with kernel-based virtual machine support. Furthermore we use the latest 13.2 series of openSUSE on the hostsystem. Inside of all virtual machines an GNU/Linux Arch system is running. Despite of all negative comments about the "new" (I've been using it over 2 years on servers and desktop now) init system called systemd, I personally say it's working and all reported bugs, e.g. in systemd-networkd for using static ipv6 addresses in systemd version smaller than 213, have been fixed very soon and in a very friendly manner. At the moment QEMU in version 2.2.90 is running. As we detect a hard disk problem with the 2.1/2.2 series of QEMU and the latest 3.19 kernel (inside the virtual machine) the virtio support is enabled. So a typical QEMU start command for running a virtual machine looks like:
qemu-system-x86_64 \ -daemonize \ -enable-kvm \ -drive file=vm1.img,cache=none,if=virtio,format=raw \ -m 15G \ -vnc :1 \ -k de \ -net nic,macaddr=52:54:00:00:00:51 \ -net tap,ifname=tap1,script=no,downscript=no \ -monitor telnet:localhost:7001,server,nowait,nodelay \ -smp cores=2,threads=2So we are using a kind of "tapped" networking with a (telnet) based monitor console.
# runner id | Distribution | GCC | Boost | CMake | clang |
---|---|---|---|---|---|
1 | Arch | 4.9.2 | 1.57.0 | 3.2.1 | 3.6.0 |
2 | Debian 7 | 4.7.2 | 1.49.0 | 2.8.9 | 3.0 |
3 | Ubuntu 14.04 | 4.8.2 | 1.54.0 | 2.8.12 | 3.4 |
4 | Ubuntu 14.10 | 4.9.1 | 1.55.0 | 2.8.12.2 | 3.5.0 |
5 | Ubuntu 15.04 | 4.9.2 | 1.55.0 | 3.0.2 | 3.6.0 |
6 | openSUSE 13.1 | 4.8.1 | 1.53.0 | 2.8.11.2 | 3.3 |
7 | openSUSE 13.2 | 4.8.3 | 1.54.0 | 3.0.2 | 3.5.0 |
8 | Fedora 20 | 4.8.3 | 1.54.0 | 2.8.12.2 | 3.4.2 |
9 | Fedora 21 | 4.9.2 | 1.55.0 | 3.0.2 | 3.5.0 |
10 | CentOS 7 | 4.8.2 | 1.53.0 | 2.8.11 | 3.4.2 |
11 | Debian Testing | 4.9.2 | 1.55.0 | 3.0.2 | 3.5.0 |
12 | Debian Testing with latest GCC | 5.0.0 20150320 | 1.55.0 | 3.0.2 | 3.5.0 |
13 | Arch ARM | 4.9.2 | 1.57.0 | 3.2.1 | 3.6.0 |
The 13th machine is not a virtual or container-virtualized machine: It is a new Raspberry PI 2. That means our software/search engine backend is also tested on ARM.
For all 12 container-virtualized machine I wrote so-called Dockerfiles in order to get the GitLab CI runner built and running. Here's a small example of how our Dockerfiles look like:
from archlinux maintainer Stefan Schweter <stefan@schweter.it> run pacman --noconfirm -Syu run pacman-db-upgrade run echo "de_DE.UTF-8 UTF-8" > /etc/locale.gen run locale-gen env LANG de_DE.UTF-8 workdir /root run curl --silent -L https://gitlab.com/gitlab-org/gitlab-ci-runner/repository/archive.tar.gz | tar xz # Install dependencies for building the ci-runner run pacman --noconfirm -Sy ruby # Default location for gems is per-user, not system-wide -> change this first # More information: https://wiki.archlinux.org/index.php/ruby#Bundler run echo "gem: --no-user-install" > /etc/gemrc # Now bundler can be installed (system-wide) run gem install bundler # Dependencies for single gems # charlock_holmes run pacman --noconfirm -Sy make icu gcc patch workdir /root/gitlab-ci-runner.git # Some stupid ruby 2.2 workarounds for compiling the json gem run sed -i "s/\('json'\), '~> 1.7.7'/\1, '~> 1.8.2'/" Gemfile run sed -i "s/\(.*json\) (1.7.7)/\1 (1.8.2)/" Gemfile.lock run sed -i "s/\(.*json\) (~> 1.7.7)/\1 (~> 1.8.2)/" Gemfile.lock # Now we can build the ci-runner run bundle install --deployment run mkdir -p /root/.ssh cmd test -z $RUNNER_TOKEN && bundle exec ./bin/setup_and_run ||\ echo "---" > config.yml &&\ echo "url: $CI_SERVER_URL" >> config.yml &&\ echo "token: $RUNNER_TOKEN" >> config.yml &&\ bundle exec ./bin/runner # Include git to fetch/clone the repositories run pacman --noconfirm -Sy git # Project-related dependencies will be installed here run pacman --noconfirm -Sy clang cmake boost recode
The finder application backend - internally it is called wf - consists of a few unit tests. In combination with the cmake and the Boost Test Library all test cases are automatically executed after building the search backend in one of the virtual containers. That ensures that the search backend builds on different Linux distributions and takes care about distribution specific bevahiour, e.g. Toolchain changes.
Here's an example of a test case for the search engine backend:
#define BOOST_TEST_DYN_LINK #define BOOST_TEST_MODULE Query_test #include <boost/test/unit_test.hpp> #include "automaton/SimpleQuery.hpp" #include "automaton/UserQuery.hpp" #include "automaton/SkopeQuery.hpp" using wf::Query; using wf::SimpleQuery; using wf::UserQuery; using wf::SkopeQuery; using WordVec = wf::Query::WordVec; /******************************************************************************/ static void testQuery(Query& query, const WordVec& gold) { auto res = query.token(); BOOST_CHECK_EQUAL(res.size(), gold.size()); for (auto i = 0U; i < std::min(res.size(), gold.size()); ++i) BOOST_CHECK_EQUAL(res[i], gold[i]); } /******************************************************************************/ BOOST_AUTO_TEST_CASE(SimpleQuery_test) { SimpleQuery query; query.setQuery("((A B) *) | C"); testQuery(query, {"(", "(", "A", "B", ")", "*", ")", "|", "C"}); }
Another important test case is to make sure that the application frontend works like expected. For that purpose we use a full web stack called phantomjs in combination with CasperJS. Here's a small example (with coffee script) to do a kind of frontend testing:
testhost = casper.cli.get "testhost" timeout = casper.cli.get "timeout" casper.options.waitTimeout = timeout; casper.test.begin "Wittfind search works as expected", 12, suite = (test) -> casper.start "http://#{testhost}/", -> @test.assertTitle "WiTTFind — CIS∕WAB 2015", "homepage title is the one expected" @test.assertTextExists "Regelbasiertes Finden", "'Regelbasiertes Finden' exists" @test.assertExists ".form-control", "main input field is found" # Leave it and you'll spend hours to find out that it must be loaded before! @waitForResource "include/main.js" @waitForResource "include/search.js" @fill ".form-group", pattern: "denken", false casper.then -> # Necessary, as we allow prevent the default form submit behaviour # And: It is also necessary to use a casper.then before # Leave it, and you get to the strange http://127.0.0.1/?pattern=Liebe # behavior. This *costs* hours to solve... @click ".btn-default" casper.then -> @waitForSelector ".number_all_hits" casper.then -> @test.assertHttpStatus 200, "search response code good" @test.assertTextExists "Es wurden 207 Treffer gefunden", "Retrieval text for 「denken」 appears." @test.assertTextExists "Wenn Sie noch mehr Treffer sehen wollen", "Get CIS account message after search appears."
Testing (and automatically executing tests) is very important so we extended the test scenario in order to test on many different GNU/Linux distributions. Moreover we wrote unit and end-to-end tests to cover test cases for the search engine backend and also for testing our JavaScript code on the frontend. An overview of the used technologies and software was given. Feel free to give comments or ask questions: stefan at schweter dot eu.
I want to thank Dr. Maximilian Hadersbeck and Alois Pichler for giving me the opportunity to work on this awesome and interesting project! Thanks to Daniel for giving me sublime help on things like docker, Makefiles, web technologies (the list goes on and on) and to my colleagues also working hard on the Wittgenstein project: Florian, Matthias, Angela, Roman and Yuliya (and many more!). Thanks to our sysadmin (and ipv6 expert) Thomas for providing us the necessary hardware and support for software installation, network configuration and his expertise on administration and (not to forget) ipv6 networking!