featured for How E-Olymp tests submissions

How E-Olymp tests submissions

Mar 30, 2021

Today we’ll take a look at how E-Olymp tests submissions: what happens after you submit a solution, how it is executed, how the results are verified and what outcomes there are. Hope this article will help you better understand how the system works and make it easier to work with it.

We’ll also take a look at how some of the new features on the website, namely test suites and interactive problems.

For starters, it’s worth mentioning that the website and the testing system are two separate applications. The website allows users to submit solutions and view the results, while the testing system runs and evaluates them. This separation makes it possible to simplify each of the applications and clearly defines responsibilities between them.

When you click the “Submit” button on the website, your solution is added to the testing queue, from where the testing system will pick it up and starts testing process.

Testing queue

The testing queue is kind of an additional buffer between the website and the testing system. Even if the testing system does not have capacity to process all of the submissions, the website will continue to operate and accept new ones. The queue will keep submissions until one of the testing systems is free.

At E-Olymp, several test systems are always running, they are constantly check the queue and receive new submissions.

Each testing system has a separate dedicated server and only verifies one submission at a time. Submissions are never verified in parallel on the same server, as this can affect the result.

Usually submissions do not stay in the queue for a long time, but sometimes they can remain there for a bit of time. When users are making a lot of submissions, they start to accumulate in the queue and you can see that your submission remains in the “pending” status for a certain time.

Preparation and compilation

When the testing system receives a submission from the queue, it starts preparation and compilation procedure. At this stage, the testing system prepares everything needed for execute your program and perform testing.

First of all, the testing system creates an empty working directory and a new sandbox environment for your submission. Then, it will create a file with the source code and execute the compilation command. For example, for C++, the system will create a file source.cpp and execute the command g++ source.cpp -o a.exe.

Interestingly, for the C++ programming language, the compilation command includes the constants EOLYMP and ONLINE_JUDGE. You can use the #ifdef EOLYMP preprocessor command to control how the same code will compile on your computer and on the testing system.

For other programming languages, you can use an environment variable called EOLYMP.

If the compilation command is executed with an error, the system will report the result “Compilation error”. In this case, the output of the compilation command is sent back to the website so that you can see the text of the error.

The compilation command will differ depending on the programming language. For programming languages that require compilation to machine code (or bytecode), the compile command will create an appropriate binary file. For programming languages that do not require compilation, the compilation command is not executed, or it just performs a simple syntax check, to make sure you haven’t forgotten “;” somewhere.

If the source file is not needed to run, it is removed.

At this stage, on the server of the testing system, there is a directory containing only the files necessary to run the program. For C++ it is just the binary a.exe, for PHP it is the file source.php, and for Java it will be the file Main.jar.

If the compilation is successful, the system proceeds to run your program.

Running and verifying

All problems at the website have one or more tests. Each test consists of an input file and an answer file. This step is run for each test separately in a loop. That is, the files input.txt and answer.txt mentioned below will be different for each test.

First, the input file input.txt is copied to the working directory, and then the run command is executed.

Like the compile command, the run command is different for each programming language. For example, for C++, the run command is simply the name of the binary file a.exe, and for PHP the execution command looks like this php source.php.

When launching the program, the testing system redirects the contents of the file input.txt to the program’s standard input (stdin), and the standard output (stdout) of the program is redirected to the file output.txt. This means that instead of getting input from the keyboard, your program will read data from the file input.txt, and whatever your program prints to the screen will be automatically saved to the file output.txt.

It works similar to when your run your program like this: a.exe < input.txt > output.txt in the command line.

The syntax a.exe < input.txt > output.txt is a command line syntax. The testing system uses low level system calls, such as the fork to create a new process, dup2, to redirect the input and output streams, and execvp to execute the command.

The testing system starts to monitor the program process, awaiting its completion. At the moment the program starts, the testing system starts the execution timer. When this timer exceeds the maximum allowed limit (the limit for the program execution time), the testing system stops the process with a special signal to the operating system.

At this stage, the files input.txt and output.txt appeared on the server of the testing system. The program has finished or was terminated by the operating system.

Upon completion, the testing system receives data on the resource usage of your program. This data includes exit code, memory size, and processor usage time.

Recently on the E-Olymp we have added CPU usage time separately from execution time. These values are shown on the solution page in the Duration and CPU columns.

Interestingly, the execution time will not always coincide with the CPU usage time. While your program is running, the operating system may pause it, for example, if your program is waiting for input or the operating system switches to another process. In this case, the execution time will be longer than the CPU usage time, because your program was idle without CPU usage.

On the other hand, if your program uses more than one processor core, the execution time may be less than the CPU usage time. For example, if your program used 2 processor cores for 1 second, 2 seconds of processor time will be counted.

The testing system limits CPU usage to a single core, but some programming languages ​​(eg. Java, C#) may still partially use the additional core.

Try sending the program with a sleep instruction and see the execution time and cpu usage.

The next step, the testing system starts the process of verifying the answer.

First of all, the program execution time is checked. The system compares the execution time with the limit and if it is exceeded, the testing system counts the result “Time limit exceeded”.

Then, system checks memory usage limit. If the amount of memory used during execution exceeds the allowed limit, the testing system counts the result “Memory Overflow”.

Then, the exit code is checked. According to the standard, the exit code of the program should be 0 in case of success, and different from 0 in case of an error. For example, if your program tries to access an invalid area of ​​memory, the operating system will terminate your program with a nonzero exit code.

Thus, if your program returns a nonzero exit code, the testing system will count the result “Runtime error” and proceed to run the next test.

The program itself may also return a nonzero exit code. For example, if you add return 1; in the main function in C++, you will see a runtime error.

Below is a short list of common runtime errors. If you see a Runtime Error result, check the points below:

  • The program returns a nonzero exit code: make sure that the main function contains the return 0; command.
  • The program uses the array index out of bound: make sure the arrays are the correct size and initialized.
  • Using an uninitialized pointer or null pointer.
  • In interpreted programming languages ​​(PHP, Python, JavaScript and others), runtime errors can occur due to errors in the code. For example, a call to a non-existent function, an attempt to use an uninitialized variable, the use of a non-existing library, and so on.

If the exit code is zero, the procedure for comparing the response will begin.

At this point, the system copies the file with the correct answer.txt to the working directory and compares it to the answer in output.txt.

Depending on the problem, the answer can be compared in different ways. A special program is assigned to each problem to perform comparison. This program is called a “checker”.

In many problems, the answer is compared byte by byte, that is, so each byte in the output.txt file must match the corresponding byte in answer.txt. In such problems, it is very important to format the answer correctly, add all the necessary spaces and newline characters. Otherwise the answer will not be accepted.

In some problems, values ​​are compared according to their type, that is, numbers are compared as numbers and strings are compared as strings. This allows you to compare numbers with a certain precision or compare strings regardless of the case of letters in the answer. In such problems, additional space and newline characters will not affect the result.

In problems in which there can be more than one correct answer, a custom checker is used. It’s written specifically for the problem. This type of checkers usually builds the necessary data structures in memory and analyzes the answer. For example, in the problem of finding the shortest path in a graph, the checker will actually build a graph and try to follow the path given in the answer.

Depending on the result of the comparison, the check may end with a result “Accepted” or “Wrong answer”. In some tasks, the checker may give partial points for the result “Wrong answer”.

At this point, the system cleans up the working directory (deletes all files except those required to execute the program) and starts checking the next test.

After the last test, the testing system deletes the working folder and environment and makes a request to get the next submission from the queue.

Test suites

Recently, the website got the ability to use test suites. You may have seen that instead of a simple list, some of the tests are now bundled together.

Test suites are groups of tests that are combined together and represent a sub-problem. Each of the sub-problems represents a special case for the problem. For example, sub-problems can have different constraints on the input values, the first would use values from 1 to 100, and the second would use values from 101 to 1000. Suite with number 0 usually represents tests from the statement.

Each test suite has scoring and feedback settings.

There are two options for scoring test suites: each test individually or a suite as a whole. In the first option, you receive points for each test, regardless of whether you succeeded in passing the other tests in the suite. In the second option, you need to pass all the tests of the suite in order to get points. If even one test fails, the whole suite will not give you a single point.

Another feature of test suites is providing limited feedback. Depending on the problem settings, suite tests can be visible or hidden. If feedback is limited, the results are displayed as follows:

  • If all of the tests in the suite have passed, you will see outcome “Accepted” and the maximum values ​​for time and memory use are displayed.
  • If the test suite contains tests that failed, the result of the first non-passing test will be shown along with its execution time and memory usage.

Additionally, the suites may have dependencies among themselves. The author of a problem can configure test suites so that you need to pass all of the tests in one suite for testing of another suite to start. Otherwise, the other suite status will be shown as “Blocked”.

So, suites can be scored by test or as a whole, can provide limited feedback, and can also have dependencies among themselves. All these settings are selected by the author of the problem in order to obtain the required level of difficulty.

Interactive problems

For the last stage of the All-Ukrainian Programming Olympiad, the so-called interactive problems were also implemented on the website. These are special problems in which a special interactive program is used instead of the static input.txt and output.txt files.

In general, the verification process is the same as one described above, with the difference that during the verification your program is launched in alongside with the interactive program created by the author of the problem. Programs are run in such a way that the ouput of the interactor program is passed to the input of your program, and the output data that your program generates is passed as input to the interactor program.

This allows to create dynamic tests in which your program can query the interactor program and receive dynamic responses.

Test results depend on the result returned by the interactor program. If you provided the correct answer, the interactor program will send a “Accepted” signal to the testing system or a “Wrong answer” signal if the answer is not correct.

If you are interested in trying to solve an interactive problem, you can try submitting the solution to problem #10435. This is one of the typical interactive problems. In this problem, the interactor (the judge) randomly generates a number, and your program must guess it.

Output only problems

Usually, you can fine one or more examples in the problem statement. It makes it easier to see the format of the input data and to better understand the problem. But the tests that are used to evaluate and validate the solution are not available. That is, you do not just need to calculate the answer, but rather create an algorithm that will work for any set of input data.

But there are also problems in which you are given all the input data and you just need to find the answer. In such problems, you have no limits on execution time or memory usage. You can run the program as you like and as many times as you like. In theory, you can even calculate the answer manually, without writing a program. The only thing that matters is the answer you can get.

An example of such a problem is the problem of the sponsorship round of the All-Ukrainian Programming Olympiad 2021. In problem #10457 you are given a file that describes the “stars in the sky”. You need to find constellations that meet the criteria specified in the problem statement and make up their coordinates in the appropriate format.

In order to send an answer to problems of this kind, you need to send a solution specifying the “Plain Text” programming language. In this case, during the check, the system will simply create a file with the answer output.txt.

I hope you enjoyed this post and I would love to share more information with you. If you are interested in a topic or have questions about this post, let me know vai feedback form.